Have you fixed DNS yet?
Yep, that’s using the node local dns resolver now, without unbound:
name_servers = {
'169.254.20.10:53',
}
})```
This means that each IP address can have up to 2 simultaneous SMTP connections to that destination. Yes?
yes, but the connection rate still limits the throughput as mentioned above
That really can be the bottleneck with our volume? Because those defaults should be overridden by:
max_deliveries_per_connection = 30
provider_connection_limit = 10
max_message_rate = "200/s"
max_connection_rate = "200/min"
max_ready = 2048
And that was when it happened still during those logs.
If I calculate correctly it’s 300*200/min. that is far far above our volume
It doesn’t happen when traffic is low, no issues last night for example.
TBH, I’m not really following this thread very closely. I mentioned the above as something you need to consider for your overall configuration, as seemingly strange delays can just be that the system is enforcing the constraints you specified.
Yea no worries at all! Appreciate all the help I’m getting while searching ![]()
If I understand that correctly it means that if the configuration forces a delay, it won’t necessarily cause a Delayed event (and thus not always write to spool)?
Also, scheduled queue is not necessarily related to writing to spool right?
if messages are throttled in ready queue due to shaping constraints, and the ready queue is not full, then they can sit in the ready queue until the throttle opens up and allows the queue maintainer to make progress, and won’t generate a Delayed event. The Delayed event is logged when moving messages back to the Scheduled queue, either because the ready queue is full or because a transient response was returned from the destination.
Spool is written to only once during reception, but if you’ve implemented lua events that modify the message or its metadata, you will trigger writes to the spool when the message is next moved to the Scheduled queue.
we try to be smart about moving messages back to the Scheduled queue if the throttle is long enough to warrant it, so that things aren’t silently lingering
I’d suggest looking at delayed_due_to_ready_queue_full, delayed_due_to_message_rate_throttle and delayed_due_to_throttle_insert_ready metrics. Ideally those are all zero, but they might be non-zero in your case, and suggest what to look at next
Amazing, appreciate this! Thanks
Wanted to post an update here, after lots of tinkering I found our sweet spot I think! Mostly increased max_connection_rate and that definitely did the trick while not increasing transient failures.
Really want to share my appreciation to all the help I was getting here, thanks a lot.
Back again with the same issue haha, but wanted to quickly check.
We again increased the connection rate, but are seeing that mails to MS365 are stuck in the ready queue because of:
connection_limited: acquiring connection lease shaping-provider-office365-t-ip-1-limit @ 2025-12-01 11:14:26.186226304 UTC
However, I did increase the limit a lot:
match=[{MXSuffix=".mail.protection.outlook.com"}]
max_deliveries_per_connection = 30
provider_connection_limit = 25
max_message_rate = "500/s"
max_connection_rate = "2000/min"
max_ready = 3072```
No bounces/deferrals, just waiting in the ready queue. If I calculate this correctly it should handle 60.000 mails per minute per IP, though we never hit that (we're at a couple K per hour at most).
TSA is empty, so all good there:
```# Generated by tsa-daemon
# Number of entries: 0```
Increased it a bit more and all solved again, but I think I don’t quite yet understand the calculations good enough yet.
The issue lies with provider_connection_limit, not rate. If you have a large number of Office 365 emails to send and exceed 25, this problem will occur.
Is that 25 connecions at a time?
I was also wondering that, because that is how I understood it as well, that the provider_connection_limit sets the concurrent limit per source IP to a provider.
Yes, it’s at the provider level.
However, in my production environment, although the daily sending volume is quite high, this issue has never occurred. I just checked — My configuration file uses the default value of 25.