Issue with some(?) UTF-8 characters in content.from.name

One of our customers reported that they are receiving errors when injecting using the HTTP API.

invalid header: 0: at line 1: =?UTF-8?q?=D8=B1=D9=87=D9=86=D9=85=D8=A7_=DA=A9=D8=A7=D9=84=D8=AC?= <no-␍ ^_______________________________ expected ':', found < 1: at line 1, in group: =?UTF-8?q?=D8=B1=D9=87=D9=86=D9=85=D8=A7_=DA=A9=D8=A7=D9=84=D8=AC?= <no-␍ ^___________________________________________________________________________________________________ 2: at line 1, in Alt: =?UTF-8?q?=D8=B1=D9=87=D9=86=D9=85=D8=A7_=DA=A9=D8=A7=D9=84=D8=AC?= <no-␍ ^___________________________________________________________________________________________________ 3: at line 1, in address: =?UTF-8?q?=D8=B1=D9=87=D9=86=D9=85=D8=A7_=DA=A9=D8=A7=D9=84=D8=AC?= <no-␍ ^___________________________________________________________________________________________________ 4: at line 1, in Many1: =?UTF-8?q?=D8=B1=D9=87=D9=86=D9=85=D8=A7_=DA=A9=D8=A7=D9=84=D8=AC?= <no-␍ ^___________________________________________________________________________________________________ 5: at line 1, in obs_address_list: =?UTF-8?q?=D8=B1=D9=87=D9=86=D9=85=D8=A7_=DA=A9=D8=A7=D9=84=D8=AC?= <no-␍ ^___________________________________________________________________________________________________ 6: at line 1, in Alt: =?UTF-8?q?=D8=B1=D9=87=D9=86=D9=85=D8=A7_=DA=A9=D8=A7=D9=84=D8=AC?= <no-␍ ^___________________________________________________________________________________________________ 7: at line 1, in address_list: =?UTF-8?q?=D8=B1=D9=87=D9=86=D9=85=D8=A7_=DA=A9=D8=A7=D9=84=D8=AC?= <no-␍ ^___________________________________________________________________________________________________ stack traceback: [C]: in method 'from_header' /opt/kumomta/etc/policy/dkim_sign.lua:244: in upvalue 'do_dkim_sign' /opt/kumomta/etc/policy/dkim_sign.lua:423: in upvalue 'dkim_signer' [string "/opt/kumomta/etc/policy/init.lua"]:454: in function <[string "/opt/kumomta/etc/policy/init.lua"]:370>

I tried reproducing this error and it seems like we get this error when the content.from.name parameter a non-ASCII string, and it appears to be triggered by some strings, not all.

The error messages are too long for pasting here, created a gist with sample payloads here message.md · GitHub

./kumod --version
kumod 2024.11.08-d383b033

Interesting. Is it possible to get the source as it is injected to the api? I’m curious to see exactly how the input is formatted.

I put 3 examples it in the gist, 2 not working (with the resulting error messages) and 1 working, do you mean something else?

actually, it seems liike to be not only related to the from.name, but also from.email in some cases or a combination of them?

This fails with the error above:

{
  "envelope_sender": "no-reply@email.ahasend.com",
  "content": {
    "text_body": "test",
    "html_body": "test",
    "subject": "sample subject",
    "from": {
      "email": "no-reply@email.ahasend.com",
      "name": "رهنما کالج"
    }
  },
  "recipients": [
    {
      "email": "hf.farhad@gmail.com",
      "name": "hf.farhad@gmail.com"
    }
  ]
}

but this one goes through without any issues (the only change is that the one above is sent from no-reply@email.ahasend.com, while this one is sent from noreply@email.ahasend.com without the dash)

{
  "envelope_sender": "no-reply@email.ahasend.com",
  "content": {
    "text_body": "test",
    "html_body": "test",
    "subject": "sample subject",
    "from": {
      "email": "noreply@email.ahasend.com",
      "name": "رهنما کالج"
    }
  },
  "recipients": [
    {
      "email": "hf.farhad@gmail.com",
      "name": "hf.farhad@gmail.com"
    }
  ]
}

Then, we have this, with the dash in from.email, but no non-ASCII characters in from.name, which goes through without any errors:

{
  "envelope_sender": "no-reply@email.ahasend.com",
  "content": {
    "text_body": "test",
    "html_body": "test",
    "subject": "sample subject",
    "from": {
      "email": "no-reply@email.ahasend.com",
      "name": "AhaSend"
    }
  },
  "recipients": [
    {
      "email": "hf.farhad@gmail.com",
      "name": "hf.farhad@gmail.com"
    }
  ]
}

Another one, with the dash in from.email and UTF-8 characters in from.name, goes through without any errors:

{
  "envelope_sender": "no-reply@email.ahasend.com",
  "content": {
    "text_body": "test",
    "html_body": "test",
    "subject": "sample subject",
    "from": {
      "email": "no-reply@email.ahasend.com",
      "name": "فرهاد هدایتی فرد"
    }
  },
  "recipients": [
    {
      "email": "hf.farhad@gmail.com",
      "name": "hf.farhad@gmail.com"
    }
  ]
}

First example, form.name is UTF-8 (رهنما کالج), from.email contains a -, fails.
Second example, from.name is UTF-8 (رهنما کالج), from.email does not contain a -, works fine.
Third example, from.name is ASCII, from.email contains a -, works fine.
Fourth example, from.name is a different UTF-8 string (فرهاد هدایتی فرد), from.email contains a -, works fine.

Another example that fails with UTF-8 from.name and - in from.email:

{
  "envelope_sender": "no-reply@email.ahasend.com",
  "content": {
    "text_body": "test",
    "html_body": "test",
    "subject": "sample subject",
    "from": {
      "email": "no-reply@email.ahasend.com",
      "name": "آهاسند"
    }
  },
  "recipients": [
    {
      "email": "hf.farhad@gmail.com",
      "name": "hf.farhad@gmail.com"
    }
  ]
}

injection response:

{"success_count":0,"fail_count":1,"failed_recipients":["hf.farhad@gmail.com"],"errors":["hf.farhad@gmail.com: invalid header: 0: at line 1:\n=?UTF-8?q?=D8=A2=D9=87=D8=A7=D8=B3=D9=86=D8=AF?= <no-␍\n                                                 ^______________________________\nexpected ':', found <\n\n1: at line 1, in group:\n=?UTF-8?q?=D8=A2=D9=87=D8=A7=D8=B3=D9=86=D8=AF?= <no-␍\n^_______________________________________________________________________________\n\n2: at line 1, in Alt:\n=?UTF-8?q?=D8=A2=D9=87=D8=A7=D8=B3=D9=86=D8=AF?= <no-␍\n^_______________________________________________________________________________\n\n3: at line 1, in address:\n=?UTF-8?q?=D8=A2=D9=87=D8=A7=D8=B3=D9=86=D8=AF?= <no-␍\n^_______________________________________________________________________________\n\n4: at line 1, in Many1:\n=?UTF-8?q?=D8=A2=D9=87=D8=A7=D8=B3=D9=86=D8=AF?= <no-␍\n^_______________________________________________________________________________\n\n5: at line 1, in obs_address_list:\n=?UTF-8?q?=D8=A2=D9=87=D8=A7=D8=B3=D9=86=D8=AF?= <no-␍\n^_______________________________________________________________________________\n\n6: at line 1, in Alt:\n=?UTF-8?q?=D8=A2=D9=87=D8=A7=D8=B3=D9=86=D8=AF?= <no-␍\n^_______________________________________________________________________________\n\n7: at line 1, in address_list:\n=?UTF-8?q?=D8=A2=D9=87=D8=A7=D8=B3=D9=86=D8=AF?= <no-␍\n^_______________________________________________________________________________\n\n\nstack traceback:\n\t[C]: in method 'from_header'\n\t/opt/kumomta/etc/policy/dkim_sign.lua:244: in upvalue 'do_dkim_sign'\n\t/opt/kumomta/etc/policy/dkim_sign.lua:423: in upvalue 'dkim_signer'\n\t[string \"/opt/kumomta/etc/policy/init.lua\"]:454: in function <[string \"/opt/kumomta/etc/policy/init.lua\"]:370>\n"]}

but if I change the from.email to noreply@email.ahasend.com (removing the dash) in the same payload it goes through without errors.

Sorry, I think I missed the gist samples earlier. So there may be a combination of issues here. I have not had time to test these yet but will take a look.

Making some progress, but it is weird.
in some cases, it appears that certain characters in the from NAME are causing the from ADDRESS to put a space between the no- and reply:
“From: “آها سند” <no- reply@email.ahasend.com>”
^^ That space was not there when it was injected.
And if i use a different NAME the space does not appear.

There was some work done in that release that may be related, but it seems unlikely:
A MIME message rebuild could improperly re-encode unicode Subject lines into a series of quoted-printable encoded-words, causing spaces between those words to be effectively discarded when the subject is decoded. The header re-encoding will now prefer to re-assemble unstructured fields as a single encoded-word to avoid this.

It might be related, that one also happened with non-ascii characters only, where spaces between words were removed.

But then again maybe not? That one only happened with subjects and not the from header if I remember correctly

Adding a content-type header seems to help with formatting:

            "Content-Type": "text/plain; charset=ISO-8859-6",
            "Content-transfer-encoding": "8bit"
        }

I also found that removing the right-most character in the NAME field fixed the added space in the ADDRESS fields. I have no idea why.