MX getting resolved to wrong IP?

fhf · November 4, 2024, 3:10pm

One of our customers is sending support form submissions to their support email and we didn’t have any issues sending them until this morning. I just noticed ~400 messages transiently failed (some with 5 attempts now), where the transient failure peer_address is

"peer_address": {
    "name": "",
    "addr": ""
}

response.content is :

KumoMTA internal: failed to connect to any candidate hosts: connect to ResolvedAddress { name: \"bitpin.ir.\", addr: 185.143.233.120 } port 25 and read initial banner: deadline has elapsed, connect to ResolvedAddress { name: \"bitpin.ir.\", addr: 185.143.234.120 } port 25 and read initial banner: deadline has elapsed

The email is being sent to support@bitpin.ir, dig says mx record is mailer.bitpin.io which resolves to 135.181.110.161 . The IP address 185.143.233.120 shown above in the response.content is actually the IP of the a record for bitpin.ir. For some reason Kumo is using that instead of the mx record?

I’m on kumod 2024.09.02-c5476b89.

fhf · November 4, 2024, 3:12pm

Full transient failure log: transfail.json · GitHub

The log for the last delivery to the same address:

gist.github.com

https://gist.github.com/farhadhf/1da65f626e3292b933f2967bea0ec4b6

delivery.json

{
    "type": "Delivery",
    "id": "e9d728259a8311efb9ac960002cafe7c",
    "sender": "e9d728259a8311efb9ac960002cafe7c@psrp.mailer.bitpin.ir",
    "recipient": "support@bitpin.ir",
    "queue": "d9c14c6a-392c-4e47-bb06-3b30b56b2be0@bitpin.ir",
    "site": "srv3-s8-\u003email.bitpin.io@smtp_client",
    "size": 2690,
    "response": {
        "code": 250,

This file has been truncated. show original

fhf · November 4, 2024, 3:17pm

Seems like kumo.dns.lookup_mx is finding the MX record and returning the correct IP (testing with a test script that I’m running with kumod --policy /tmp/dns.lua --user kumod)

{
  ["site_name"] = mail.bitpin.io,
  ["is_mx"] = true,
  ["is_secure"] = false,
  ["domain_name"] = bitpin.ir.,
  ["by_pref"] = {
    [10] = {
      [1] = mail.bitpin.io.,
    },
  },
  ["is_domain_literal"] = false,
  ["hosts"] = { [1] = mail.bitpin.io.,},
}

wez · November 4, 2024, 3:20pm

FWIW, you can do:

$ /opt/kumomta/sbin/resolve-site-name bitpin.ir
mail.bitpin.io

to figure out the site name being used

wez · November 4, 2024, 3:20pm

is it possible that that site changed their dns since you received the initial message?

fhf · November 4, 2024, 3:21pm

Checking with them now to see if they changed something

wez · November 4, 2024, 3:22pm

Part of the SMTP spec says that a failure to resolve an MX record should result in the use of the A record instead. It sounds like that might be what happened here?

wez · November 4, 2024, 3:22pm

does the timing of that correlate with a lot of DNS traffic to your local DNS servers?

fhf · November 4, 2024, 3:23pm

No more than usual

fhf · November 4, 2024, 3:24pm

I have unbound as the local DNS server on Ubuntu 22.04

fhf · November 4, 2024, 3:24pm

also using kumo.dns.configure_unbound_resolver

fhf · November 4, 2024, 3:24pm

Just restarted Kumo and all these messages were delivered

fhf · November 4, 2024, 3:25pm

example: gist:7ee16603b31ba9964a862c468f8d7c2d · GitHub

wez · November 4, 2024, 3:28pm

can you switch to the hickory resolver? the non-unbound one. We had another customer experience some transient DNS failures with the unbound resolver recently which makes me suspicious.

I’ve had a lot of good experience with the embedded unbound resolver in the past, but it seems like there might be something funky here.

fhf · November 4, 2024, 3:28pm

This one? configure_resolver - KumoMTA Docs

wez · November 4, 2024, 3:28pm

yep

wez · November 4, 2024, 3:29pm

unless you need to specify alternate upstream name servers, you can omit either configure_*_resolver call

fhf · November 4, 2024, 3:30pm

Done - I was using unbound because the documentation says

If you have enabled DANE for output SMTP then you must enable the unbound resolver in order to be able to process DNSSEC correctly.

and I have DANE enabled IIRC.

wez · November 4, 2024, 3:30pm

ah, then you do need to use unbound

wez · November 4, 2024, 3:31pm

pick your poison for now, I suppose