We have a customer who sends weekly newsletters to ~300,000 subscribers, the emails are usually around 100KB in size, and the list is clean (0.1% bounce rate). We have assigned a dedicated IP to this customer (they send both transactional and the newsletter from this IP). 89.5% of the emails emails are sent to Gmail recipients. Every week, before they start sending the newsletter, the TTFA for all customers is under 1 seconds, and once they start sending emails, it goes to up ~10 seconds. The whole time server load is quite low. I would’ve expected that because they have a dedicated source pool (dedicated IP), their impact on other source pools and tenants would be minimal, but that doesn’t seem to be the case. How can I minimize their impact on TTFA for other customers without spinning up a separate server for these types of email campaigns?
Hey there @original-baboon, thanks for posting. Please read the “Troubleshooting” and “How to Ask for Help” buttons below. If you would like a 1:1 support session from the KumoMTA team, details are at the “Book a Support Session” button below.
Can this be related to the json webhook calls? Are the webhook calls synchronous or async? I don’t have any exact numbers on the response time for the webhooks when this customer is sending the weekly newsletter, but I don’t expect it to have increased that much: The load on the server handing the webhook calls is low, and the code handing the webhook events does not block the request, just does some minimal validation, returns 200 status code and then takes care of processing the event in a separate goroutine, so I don’t expect it to be causing this issue even if they are blocking.
Could it be because of the sqlite queries? I have memoization for everything in place, and the tables in the sqlite db are quite small as well.
The only blocking request that could happen in smtp_server_message_received and http_message_generated is related to adding link tracking, but it’s not enabled for this customer (they handle tracking on their side), so that can’t be it either.
Not sure if this is related or makes any difference, but they inject the messages using the HTTP API, while most other customers inject using SMTP.
if the ready query is full, newly injected messages are briefly delayed. take a look at your metrics; if the ready query count is plateaued during their send then that may be what you’re seeing. the solution would be to increase the max_ready setting
Thanks @free-spirited-yorksh, made the change - they will send again tomorrow and I’ll send an update here about the result.
Another question, is connection_limit calculated for all sources or is it per source? I mean, if I set it to 5 for gmail, does that mean that Kumo will make a maximum of 5 connections to Gmail in total or is that 5 connections per source IP?
it’s per path, so it applies to a specific combination of site-name and source
I read your original question again; are you saying that the whole server has increased latency when this customer sends? I’d suggest looking at the metrics during the send to see if that suggests at a bottleneck. kcli top is the quick and easy way to look at them
Yes, I see a latency for other tenants as well. I’ll check kcli top tomorrow as well.
I think the documentation should be updated to clarify this, right now it says connection_limit “Specifies the maximum number of concurrent connections that will be made from the current MTA machine to the destination site.” which kind of sounds like it doesn’t take the path into account.
all the settings there are specific to the path itself, as is stated at the top of that page