random tls sending issue on servers behind load balancer

i’ve see sometimes TLS sending errors on MX-setups that uses sophos and hornetsecurity anti-spam solutions. they seem to load balance their smtp connection not by multiple MX addresses rather they use a single IP with multiple backends.

for example the domain ebnerstolz.de points to 4 MX servers which all resolve to the same IP address and based on their error message I see different hostnames:

encryption rule based rejected TLS required. mx-gate10-hz1
encryption rule based rejected TLS required. mx-gate151-hz1
...
encryption rule based rejected TLS required. mx-gate42-hz1

an other example with sophos is verbraucherzentrale.nrw.
both MX entries point to the same IP addresses and we see messages like that one:

Those errors does not always appear, sometimes the delivery is working without any problem. we dont have any TSA rule for the domain ebnerstolz.de or verbraucherzentrale.nrw that might lead into trying not to do TLS. We use default TLS / rust TLS.

I already created once a ticket for verbraucherzentrale.nrw (TLS error on sophos appliance TLS1.3) and we found a workaround by rewriting the response, but as I’m seeing this message now on hornet security too, it might be worth it to make some more investigations.

i’m currently running on kumod 2024.12.13-1037b1a1

I believe we have seen something similar in testing and it was difficult to diagnose. Essentially one ip on a firewall or gateway that NATs to a pool of receivers, but one or more in the pool are behaving differently so the IP is inconsistent.

There is a community rule that catches TLS issues that might be repurposed for this.

Hard to see on my phone, but if you look at the shared version of shaping.toml you will see it.

If you make a copy of that and put it in your own shaping file, you could tune it to a regex like “mismatch of TLS”

The default in the version you are running is to remember failed handshakes and not attempt TLS in the future, via the remember_broken_tls = "3 days" default in the shaping config.

It sounds like there is something transiently wonky with the handshake at that site, which results in not trying to use TLS in subsequent attempts.

It would be good to see the TransientFailure log records from it so that we can see what happened.

sure i will provide it

actually the server responed with a 5xx error and as they require TLS

20241219-092935.700013944:{"type":"Bounce","id":"dd639e93bdeb11ef9305bc2411f042d4","sender":"xxx,"recipient":"yyy@ebnerstolz.de","queue":"default-tenant@ebnerstolz.de","site":"185.98.184.141->(mx01|mx02|mx03|mx04).hornetsecurity.com@smtp_client","size":97152,"response":{"code":554,"enhanced_code":{"class":5,"subject":7,"detail":10},"content":"encryption rule based rejected TLS required. mx-gate17-hz1","command":".\r\n"},"peer_address":{"name":"mx01.hornetsecurity.com.","addr":"94.100.132.8"},"timestamp":1734600626,"created":1734600620,"num_attempts":0,"bounce_classification":"AuthenticationFailed","egress_pool":"pool-1","egress_source":"185.98.184.141","source_address":{"address":"185.98.184.141:18851"},"feedback_report":null,"meta":{},"headers":{"Message-Id":"<xxx>","X-Envid":"abc"},"delivery_protocol":"ESMTP","reception_protocol":"ESMTP","nodeid":"476588fd-acb2-436c-b60e-4c84e2172869","session_id":"e2da06ba-0251-4bbf-8369-67bed4608b66"}

so actually remember broken tls will not solve my issue as this maybe make it even worse :grinning_face_with_smiling_eyes:

here also TSA shaping
shaping.toml.zip (1.21 MB)

you could try setting remember_broken_tls = "0 seconds" for that domain and see what shows up as TransientFailures?

if we try without TLS they will bounce it with 500 like in this example

If this is transient, then perhaps rewriting the 500 DSN to a 400 and retrying it will then retry when they will accept?

  "encryption rule based rejected TLS required": 454
}

right: setting the remember time to 0 seconds should allow us to see the transient failures from the attempt to establish tls failing

because we need to understand that, the root cause for tls not handshaking

alright, i adjust my shaping and gonna let u know as soon i have the needed informations.

thank you

{"type":"TransientFailure","id":"221cb6b9beb311ef8e0a9abde88bd18c","sender":"abc","recipient":"xxx@bollmann-kollegen.de","queue":"default-tenant@bollmann-kollegen.de","site":"185.98.184.65->(mx01|mx02|mx03|mx04).hornetsecurity.com@smtp_client","size":20822,"response":{"code":454,"enhanced_code":{"class":5,"subject":7,"detail":10},"content":"encryption rule based rejected TLS required. mx-gate107-hz1 (kumomta: status was rewritten from 554 -> 454)","command":".\r\n"},"peer_address":{"name":"mx01.hornetsecurity.com.","addr":"94.100.132.8"},"timestamp":1734686207,"created":1734686205,"num_attempts":0,"bounce_classification":"Uncategorized","egress_pool":"pool-1","egress_source":"185.98.184.65","source_address":{"address":"185.98.184.65:47749"},"feedback_report":null,"meta":{},"headers":{"Message-Id":"<abc>","X-Envid":"7343-7362-16318"},"delivery_protocol":"ESMTP","reception_protocol":"ESMTP","nodeid":"476588fd-acb2-436c-b60e-4c84e2172869","session_id":"6950d022-463d-4850-99d4-3e341dd30194"}

after a view tries the email got delivered