Hi, I’m not sure if we ran into a bug or if we’re overlooking something.
We’re running KumoMTA as a smarthost service in front of PowerMTA. On Wednesday we hit PowerMTA’s inbound connection limit under high load which triggered bulk delaying of several ready queues.
Yesterday we received a complaint from a tenant suspecting that some of their messages failed to deliver. After digging around in logs we discovered some of the tenant’s messages having received only Reception but missing Delivery events:
{
"type": "Reception",
"id": "f908c846d5c011f0a2e29600041f590a",
"event_time": "2025-12-10T12:08:43.707233876Z",
"created_time": "2025-12-10T12:08:43.702483800Z",
"num_attempts": 0,
"nodeid": "48ab41d6-f2d1-4667-acdd-42ac38c6c870"
}
We couldn’t find anything else anomalous, so we opted to restart KumoMTA nodes just in case, which dispatched pending messages immediately:
{
"type": "Delivery",
"id": "f908c846d5c011f0a2e29600041f590a",
"event_time": "2025-12-11T08:28:24.170357304Z",
"created_time": "2025-12-10T12:08:43.702483800Z",
"num_attempts": 1220,
"nodeid": "48ab41d6-f2d1-4667-acdd-42ac38c6c870"
}
The example Delivery record shows 1220 retry attempts (we aggressively retry for smarthosting), so the system was attempting to deliver the message but failed. We don’t log Delayed events and thus don’t know the internal reason for the delay, nor has the system logged any other events for these messages.
What could have happened here? Why did these messages get stuck in the spool and failed to dispatch? Why did restarting KumoMTA dispatch these messages immediately? Isn’t KumoMTA supposed to auto-recover from the bulk delayed state?
We use TSA and Redis throttling. Could the traffic shaper have throttled down far enough to effectively block deliveries? Or is traffic shaping not relevant in this context? We logged ~23k bulk delaying events over a 2 minute period.
We’re running KumoMTA version 2025.10.06.
Thanks!
