Question on queues

Hi there, hope this isn’t a dumb question but I’m new to KumoMTA and I wanted to get my head around the way queues work. I am sending out about 100,000 emails for a campaign, and our composer system is releasing them in 1,500 chunks through KumoMTA every 1 hour. I’ll amp this up for faster processing once I get this stable.

But I’m finding something weird. Our system is running on Rocky Linux 8, as a KVM on Proxmox. I’ve configured it with 4 CPUs and 16GB memory. The CPU never seems to get above 10% utlization and memory never more than about 20%. I’m finding that after about 4,000 emails are sent through it (that is 2.5 cycles), the queues just become almost unresponsive. My guess is that they are filling up and there is some problem with taking on more inbound emails until the queues are cleared. Our logs show that although each hour clicks by and it attempts to add more to the MTA, after about 4,500 are sent (which go out really fast) then it grinds to a real snail’s pace. It isn’t that it is not sending them, but it is just jammed up and releases at about 100 per hour rather than the normal speed I was expecting.

What’s the best way to see what is in the queue and any suggestions for optimization settings here? Clearly I don’t have this dialed in right and I’d love to get this back online. One thing I did find is that if I reboot KumoMTA, on restart it clears the queues really quickly. I thought maybe a memory issue, but increasing memory didn’t have any impact.

Myles

Hey there @welcoming-rabbit, thanks for posting. Please read the “Troubleshooting” and “How to Ask for Help” buttons below. If you would like a 1:1 support session from the KumoMTA team, details are at the “Book a Support Session” button below.

You may want to check your sysctl settings. Rocky ships with a default of net.core.somaxconn = 1024 and you should crank that up to 4096

You can get good queue info with the metrics API

https://docs.kumomta.com/reference/http/metrics.json/

Is the ‘kcli queue-summary’ still working or is it unresponsive? You might need to add this setting to the shaping.toml: enable_mta_sts = false . I added it to my default section in the shaping .toml. I had an unresponsive KumoMTA due to that feature. Alternatively Wez made a new dev release to fix this locking issue.

Thank you so much for the tips. I’m implementing those now. I do have something additional to add here that I’ve observed. Our industry is higher education, and one common thing that seems to be added to Office 365/Exchange servers is support of a service called “Proofpoint”. These guys block by IP and it seems at the network level. When I reboot Kumo it has all of these connections sitting in queue that clear out, and they all seem to have something in common - they are network connections that are being held in limbo by this Proofpoint service.

I suspect what is happening is that we attempt a send on Kumo, it gets to enough of these Proofpoint servers, and each is being held back for some reason which then holds our queue in limbo after it reaches a certain threshold. Is there a way to change the behavior so that if an external service does this to a send, we can drop the connection after 10 seconds or something to clear them from queue?

To add further to this, on reboot the queues appear to be full of “TransientFailure” type connections - I suspect if I can have these removed from the queues in a short period, this problem will go away. I have adjusted our settings & sysctl settings and should know in a few hours if this has helped.

Having the full text of some of those trans fails would be really helpful. They tell an important story. Can you also please include all the info requested from Cakey bot above? My guess is that you are sending at too high a volume on those IPs and do not have your shaping tuned appropriately.

These are the errors I’m getting back in the logs:

Recipient:
Subject:
Response Code: 400
Response: KumoMTA internal: failed to connect to any candidate hosts: connect to ResolvedAddress { name: “mxa-007b0c01.gslb.pphosted.com.”, addr: 205.220.177.71 } port 25 and read initial banner: Command rejected Response { code: 554, enhanced_code: None, content: “Blocked - see https://ipcheck.proofpoint.com/?ip=96.44.142.30”, command: None }, connect to ResolvedAddress { name: “mxb-007b0c01.gslb.pphosted.com.”, addr: 205.220.165.71 } port 25 and read initial banner: Command rejected Response { code: 554, enhanced_code: None, content: “Blocked - see https://ipcheck.proofpoint.com/?ip=96.44.142.30”, command: None }
Type: TransientFailure

I am parsing the JSON response to the core elements here in my tracking

gist:1e9f3e60aa7e7ab83ab10483cd630092

https://gist.github.com/k0d3g3ar/1e9f3e60aa7e7ab83ab10483cd630092

That gist seems to only be the default shaping file. Without your init.lua it is impossible to tell if KumoMTA is actually using it.
KumoMTA is not Postfix, and can easily send on the order of 10 to 100x move volume in the same time period as Postfix so it is very easy to burn your IPs in a matter of minutes if the shaping is not managed. It is critical to watch these transfails and bounces to ensure you are sending at reasonable volumes for the ISPs. This is why Deliverability Professionals get paid.

The message above is telling your that your IP is blocked from sending and you will continue to get 554 block messages until you remediate. This is not a KumoMTA problem directly, but we ( or some of our Deliverability friends) could help you for a Professional services fee.

That address is definitely live and accepting mail, but you specifically have been blocked from sending to it

I highly recommend stopping your send, examining your shaping and init files to make sure they are actually being used, end going to beg for forgiveness from the nice folks at Proofpoint.

If you are interested, this is the part where the team at KumoMTA actually charges for PS services to help guide you to a succesful implementation and IP warmup process.. We can also help with deliverability like this, or recommend one of our many friends who do that for a living.

I opened a support ticket with Proof Point to see if they will remove the block. If I can see the MTA operating stable for the next 8 hours, I will consider paying a services fee to optimize it. I’m still watching it to ensure that I have the systems configured optimally for reliable operations. So far, so good.

I checked the init.lua to ensure that the shaping config was being called, and it appears to be. Nothing commented out, etc.

Please tell me are you using telnet to send the e-mails to ses??