Issue with UTF-8 in from.name in HTTP API injection

It seems like some UTF-8 strings are not handled correctly in from.name when injected using the HTTP API. For example with this payload:

{
  "envelope_sender": "no-reply@email.ahasend.com",
  "content": {
    "text_body": "test",
    "html_body": "test",
    "subject": "sample subject",
    "from": {
      "email": "noreply@email.ahasend.com",
      "name": "وجیهه تست برای نام"
    }
  },
  "recipients": [
    {
      "email": "hf.farhad@gmail.com",
      "name": "hf.farhad@gmail.com"
    }
  ]
}

The subject from name is shown as

وجیهه تست بر =?UTF-8?q?=D8=A7=DB=8C =D9=86=D8=A7=D9=85?=

in Gmail.

It seems like attachments are turned off here? wanted to attach a screenshot.

Sorry, by subject I meant from.name :man_facepalming:

Issue with UTF-8 in from.name in HTTP API injection

I fixed permissions, try screenshot again?

Thanks!

I guess “don’t do that” is not an option?
What version are you running? Some UTF handling changes were made recently.

That’s what I’ve asked them to do for now, just wanted to let you guys know about the issue. It’s on kumod 2025.01.29-833f82a8

OK great - thank you

We are looking into it, but do not have any kind of ETA. In the mean time, please use SMTP injection.

We encode the From header like this for that payload:

From: =?UTF-8?q?=D9=88=D8=AC=DB=8C=D9=87=D9=87_=D8=AA=D8=B3=D8=AA_=D8=A8=D8=B1?=
        =?UTF-8?q?=D8=A7=DB=8C_=D9=86=D8=A7=D9=85?= <noreply@email.ahasend.com>

and that is conforming to the specs.

In addition, I tried composing in gmail to a crazy version of my own address, and the To header that it produces has the same kind of structure wrt. word wrapping:

To: =?UTF-8?Q?a_very_long_long_long_long_long_long_long_long_long_long_?=
    =?UTF-8?Q?a_very_long_long_long_long_long_long_long_long_long_long_a_v?=
    =?UTF-8?Q?ery_long_long_long_long_long_long_long_long_long_long_a_very?=
    =?UTF-8?Q?_long_long_long_long_long_long_long_long_long_long_a_very_lo?=
    =?UTF-8?Q?ng_long_long_long_long_long_long_long_long_long_a_very_long_?=
    =?UTF-8?Q?ng_=DB=8C=D9=87=D9=87?= <wez@wezfurlong.org>

I don’t want to say that gmail has a bug here, just that, if we have a bug in our encoding of that field, it’s not clear what it is.

I will note that when I composed a mail from gmail using exactly your input, gmail chose to base64 it:

To: =?UTF-8?B?2YjYrNuM2YfZhyDYqtiz2Kog2KjYsdin24wg2YbYp9mF?= <wez@wezfurlong.org>

You can pre-rfc2047-header-encode the from.name if you want, and we’ll pass that through.

I’m hesitant to want to try to change anything here right now because it’s really not clear what the bug is.

I just had some time to do some more testing, and found another problematic subject انتشارات جمال.

Also, this issue is not limited to gmail, it’s happening on outlook as well.

Sending the same message through the API results in this from.name value (taken from test-SMTP.eml attached belowo):

From: =?UTF-8?q?=D8=A7=D9=86=D8=AA=D8=B4=D8=A7=D8=B1=D8=A7=D8=AA_=D8=AC=D9=85?=
    =?UTF-8?q?=D8=A7=D9=84?= <no-reply@email.ahasend.com>

with the API, the generated From header is (taken from test-API.eml):

From: =?UTF-8?q?=D8=A7=D9=86=D8=AA=D8=B4=D8=A7=D8=B1=D8=A7=D8=AA_=D8=AC=D9=85_?=
    =?UTF-8?q?=3D=3FUTF-8=3Fq=3F=3DD8=3DA7=3DD9=3D84=3F=3D?= <noreply@email.ahasend.com>

The API version has a small difference at the end of the first line (85?= in SMTP vs 85_?= in API)

The SMTP message was constructed and sent using PHPMailer.
test-API.eml (5.4 KB)
test-SMTP.eml (5.85 KB)

Same issue on outlook.

=?UTF-8?q?=3D=3FUTF-8=3Fq=3F=3DD this part from test-API looks like it is embedding some qp encoded value of its own in there; looks like something has been double encoded. What was the input payload there? Is there some additional logic in your policy that might be modifying the content after it has been generated?

Just double checked, other than adding / removing some headers and meta using set_meta, append_header, and remove_all_named_headers we’re making no changes to the message in http_message_generated .

This is the payload:

{
  "envelope_sender": "no-reply@email.ahasend.com",
  "content": {
    "text_body": "test",
    "html_body": "test",
    "subject": "test with API",
    "from": {
      "email": "noreply@email.ahasend.com",
      "name": "انتشارات جمال"
    }
  },
  "recipients": [
    {
      "email": "hf.farhad@gmail.com",
      "name": "hf.farhad@gmail.com"
    }
  ]
}

My http_message_generated code: http_message_generated.lua · GitHub

the check_fix_conformance call is what is changing that header value

local kumo = require 'kumo'

local request = kumo.serde.json_parse [[
{
  "envelope_sender": "no-reply@email.ahasend.com",
  "content": {
    "text_body": "test",
    "html_body": "test",
    "subject": "test with API",
    "from": {
      "email": "noreply@email.ahasend.com",
      "name": "انتشارات جمال"
    }
  },
  "recipients": [
    {
      "email": "hf.farhad@gmail.com",
      "name": "hf.farhad@gmail.com"
    }
  ]
}
]]

for _, msg in ipairs(kumo.api.inject.build_v1(request)) do
  print(msg:get_data())

  msg:check_fix_conformance(
    -- check for and reject messages with these issues:
    'NON_CANONICAL_LINE_ENDINGS',
    -- fix messages with these issues:
    'NEEDS_TRANSFER_ENCODING|MISSING_DATE_HEADER|MISSING_MESSAGE_ID_HEADER|LINE_TOO_LONG'
  )

  print('after fix')
  print(msg:get_data())
end
$ kumod --script --policy ./builder.lua
Content-Type: multipart/alternative;
        boundary="0opD4Ub9RUih8hYuaRHXTQ"
To: "hf.farhad@gmail.com" <hf.farhad@gmail.com>
From: =?UTF-8?q?=D8=A7=D9=86=D8=AA=D8=B4=D8=A7=D8=B1=D8=A7=D8=AA_=D8=AC=D9=85?=
        =?UTF-8?q?=D8=A7=D9=84?= <noreply@email.ahasend.com>
Subject: test with API
Mime-Version: 1.0
Date: Sat, 15 Mar 2025 16:22:53 +0000

--0opD4Ub9RUih8hYuaRHXTQ
Content-Type: text/plain;
        charset="us-ascii"

test
--0opD4Ub9RUih8hYuaRHXTQ
Content-Type: text/html;
        charset="us-ascii"

test
--0opD4Ub9RUih8hYuaRHXTQ--

after fix
Content-Type: multipart/alternative;
        boundary="0opD4Ub9RUih8hYuaRHXTQ"
To: "hf.farhad@gmail.com" <hf.farhad@gmail.com>
From: =?UTF-8?q?=D8=A7=D9=86=D8=AA=D8=B4=D8=A7=D8=B1=D8=A7=D8=AA_=D8=AC=D9=85_?=
        =?UTF-8?q?=3D=3FUTF-8=3Fq=3F=3DD8=3DA7=3DD9=3D84=3F=3D?= <noreply@email.ahasend.com>
Subject: test with API
Mime-Version: 1.0
Date: Sat, 15 Mar 2025 16:22:53 +0000
Message-ID: <bf2ab3ed01b911f09266cc28aa0a5c5a@email.ahasend.com>

--0opD4Ub9RUih8hYuaRHXTQ
Content-Type: text/plain;
        charset="us-ascii"

test
--0opD4Ub9RUih8hYuaRHXTQ
Content-Type: text/html;
        charset="us-ascii"

test
--0opD4Ub9RUih8hYuaRHXTQ--

So that gives me something to dig into!