MX getting resolved to wrong IP?

One of our customers is sending support form submissions to their support email and we didn’t have any issues sending them until this morning. I just noticed ~400 messages transiently failed (some with 5 attempts now), where the transient failure peer_address is

"peer_address": {
    "name": "",
    "addr": ""
}

response.content is :

KumoMTA internal: failed to connect to any candidate hosts: connect to ResolvedAddress { name: \"bitpin.ir.\", addr: 185.143.233.120 } port 25 and read initial banner: deadline has elapsed, connect to ResolvedAddress { name: \"bitpin.ir.\", addr: 185.143.234.120 } port 25 and read initial banner: deadline has elapsed

The email is being sent to support@bitpin.ir, dig says mx record is mailer.bitpin.io which resolves to 135.181.110.161 . The IP address 185.143.233.120 shown above in the response.content is actually the IP of the a record for bitpin.ir. For some reason Kumo is using that instead of the mx record?

I’m on kumod 2024.09.02-c5476b89.

Full transient failure log: transfail.json · GitHub

The log for the last delivery to the same address:

Seems like kumo.dns.lookup_mx is finding the MX record and returning the correct IP (testing with a test script that I’m running with kumod --policy /tmp/dns.lua --user kumod)

{
  ["site_name"] = mail.bitpin.io,
  ["is_mx"] = true,
  ["is_secure"] = false,
  ["domain_name"] = bitpin.ir.,
  ["by_pref"] = {
    [10] = {
      [1] = mail.bitpin.io.,
    },
  },
  ["is_domain_literal"] = false,
  ["hosts"] = { [1] = mail.bitpin.io.,},
}

FWIW, you can do:

$ /opt/kumomta/sbin/resolve-site-name bitpin.ir
mail.bitpin.io

to figure out the site name being used

is it possible that that site changed their dns since you received the initial message?

Checking with them now to see if they changed something

Part of the SMTP spec says that a failure to resolve an MX record should result in the use of the A record instead. It sounds like that might be what happened here?

does the timing of that correlate with a lot of DNS traffic to your local DNS servers?

No more than usual

I have unbound as the local DNS server on Ubuntu 22.04

also using kumo.dns.configure_unbound_resolver

Just restarted Kumo and all these messages were delivered

example: gist:7ee16603b31ba9964a862c468f8d7c2d · GitHub

can you switch to the hickory resolver? the non-unbound one. We had another customer experience some transient DNS failures with the unbound resolver recently which makes me suspicious.

I’ve had a lot of good experience with the embedded unbound resolver in the past, but it seems like there might be something funky here.

This one? configure_resolver - KumoMTA Docs

yep

unless you need to specify alternate upstream name servers, you can omit either configure_*_resolver call

Done - I was using unbound because the documentation says

If you have enabled DANE for output SMTP then you must enable the unbound resolver in order to be able to process DNSSEC correctly.

and I have DANE enabled IIRC.

ah, then you do need to use unbound

pick your poison for now, I suppose