High rate of "failed to connect to any candidate hosts" transient errors

I’ve noticed that over the past 2-3 days, there’s a higher-than-usual rate of transient errors with gmail, outlook, gmx and other domains:

KumoMTA internal: failed to connect to any candidate hosts: connect to ResolvedAddress { name: \"mx00.emig.gmx.net.\", addr: 212.227.15.9 } port 25 and read initial banner: reading banner: Not connected, connect to ResolvedAddress { name: \"mx01.emig.gmx.net.\", addr: 212.227.17.5 } port 25 and read initial banner: reading banner: Not connected

Not sure if this is related to my server / network or KumoMTA. How can I track / debug what’s going on?

Hey there @original-baboon, thanks for posting. Please read the “Troubleshooting” and “How to Ask for Help” buttons below. If you would like a 1:1 support session from the KumoMTA team, details are at the “Book a Support Session” button below.

Another example:

KumoMTA internal: failed to connect to any candidate hosts: connect to ResolvedAddress { name: \"mx01.mail.icloud.com.\", addr: 17.42.251.62 } port 25 and read initial banner: reading banner: Not connected, connect to ResolvedAddress { name: \"mx01.mail.icloud.com.\", addr: 17.57.152.5 } port 25 and read initial banner: reading banner: Not connected, connect to ResolvedAddress { name: \"mx01.mail.icloud.com.\", addr: 17.57.156.30 } port 25 and read initial banner: reading banner: Not connected, connect to ResolvedAddress { name: \"mx01.mail.icloud.com.\", addr: 17.56.9.31 } port 25 and read initial banner: reading banner: Not connected, connect to ResolvedAddress { name: \"mx01.mail.icloud.com.\", addr: 17.57.154.33 } port 25 and read initial banner: reading banner: Not connected, connect to ResolvedAddress { name: \"mx01.mail.icloud.com.\", addr: 17.57.155.25 } port 25 and read initial banner: reading banner: Not connected, connect to ResolvedAddress { name: \"mx02.mail.icloud.com.\", addr: 17.56.9.31 } port 25 and read initial banner: reading banner: Not connected, connect to ResolvedAddress { name: \"mx02.mail.icloud.com.\", addr: 17.57.154.33 } port 25 and read initial banner: reading banner: Not connected, connect to ResolvedAddress { name: \"mx02.mail.icloud.com.\", addr: 17.57.156.30 } port 25 and read initial banner: reading banner: Not connected, connect to ResolvedAddress { name: \"mx02.mail.icloud.com.\", addr: 17.57.155.25 } port 25 and read initial banner: reading banner: Not connected, connect to ResolvedAddress { name: \"mx02.mail.icloud.com.\", addr: 17.42.251.62 } port 25 and read initial banner: reading banner: Not connected, connect to ResolvedAddress { name: \"mx02.mail.icloud.com.\", addr: 17.57.152.5 } port 25 and read initial banner: reading banner: Not connecte

outlook:

KumoMTA internal: failed to connect to any candidate hosts: connect to ResolvedAddress { name: \"baumeisterneuberger-at02e.mail.protection.outlook.com.\", addr: 52.101.68.21 } port 25 and read initial banner: reading banner: Not connected, connect to ResolvedAddress { name: \"baumeisterneuberger-at02e.mail.protection.outlook.com.\", addr: 52.101.68.5 } port 25 and read initial banner: reading banner: Not connected, connect to ResolvedAddress { name: \"baumeisterneuberger-at02e.mail.protection.outlook.com.\", addr: 52.101.73.11 } port 25 and read initial banner: reading banner: Not connected, connect to ResolvedAddress { name: \"baumeisterneuberger-at02e.mail.protection.outlook.com.\", addr: 52.101.68.0 } port 25 and read initial banner: reading banner: Not connected

Please share the version of kumomta that you’re currently running, as well as the configuration for that egress path

kumod 2024.06.18-308d4301

[source]
[source.send]
source_address = '5.78.73.137'
ehlo_domain = 'send.ahasend.com'

[source.srv1]
ehlo_domain = 'srv1.ahasend.com'
socks5_proxy_server = '10.1.0.4:5000'
socks5_proxy_source_address = '5.78.103.76'

[source.srv3-s1]
source_address = '5.78.73.137'
ehlo_domain = 'srv3-s1.ahasend.com'
socks5_proxy_server = '185.214.99.2:5000'
socks5_proxy_source_address = '185.214.99.2'

...

[pool]
[pool.good]

[pool.good.srv3-s2]
weight = 60

[pool.good.srv3-s3]
weight = 60

[pool.good.srv3-s59]
weight = 10

[pool.neutral]
[pool.neutral.srv1]
weight = 100

[pool.neutral.srv3-s1]
weight = 20

[pool.neutral.srv3-s4]
weight = 1

[pool.neutral.srv3-s5]
weight = 1
...

I haven’t done an exhaustive check, but it seems to me that the transient failure is only happening with srv3-* sources

are there any error messages showing up in the journal?

is the proxy process running ok on that particular host? Are there things in the journal for it there?

Nothing in the journal for kumomta

proxy server is running on srv3, nothing in its journal

The proxy server executable on srv3 was from an older version of kumomta, I updated it to the same version as the kumomta server and restarted the proxy service

the NotConnected error result can be returned in one of two circumstances during the initial connection attempt:

  • The client object has explicitly closed the connection and a read is attempted afterwards (cannot be happening in this initial connection case)
  • There is a successful 0-byte read from the socket, indicating EOF

that suggests to me some kind of issue in the proxy

if you see this again now that you’ve updated the proxy, it would be interesting to strace the proxy. My hunch is that maybe there was a resource utilization issue (maybe too many open files? Maybe not enough ephemeral ports?) that prevented it from establishing the outbound side of the proxy connection

either way, it sounds like we could do with some more diagnostic info in the proxy

I also made some changes to sysctl.conf

Seems like the issue is fixed now, but I’ll keep an eye on the logs and let you know if I notice it happening again in the next couple of days

Also curious: what do you have set for your connect_timeout? The proxy server itself has a default of 60 seconds, which you can change via --timeout-seconds. You will want the proxy timeout to the same (maybe 1 second longer?) than what you use in kumomta