It looks like calling msg:check_fix_conformance breaks attachments when body contains UTF-8.
kumo.on('smtp_server_message_received', function(msg, conn_meta)
if msg:sender().email == 'test@ahasend.com' then
local file = io.open("/tmp/msg-before-fix.eml", "w")
file:write(msg:get_data())
file:close()
end
local failed = msg:check_fix_conformance(
-- check for and reject messages with these issues:
'NON_CANONICAL_LINE_ENDINGS',
-- fix messages with these issues:
'NEEDS_TRANSFER_ENCODING|MISSING_DATE_HEADER|MISSING_MESSAGE_ID_HEADER'
)
if failed then
kumo.reject(552, string.format('5.6.0 %s', failed))
end
if msg:sender().email == 'test@ahasend.com' then
local file = io.open("/tmp/msg-after-fix.eml", "w")
file:write(msg:get_data())
file:close()
end
-- the rest of init...
Please see msg-before-fix.eml and msg-after-fix.eml files attached below. The attachment content (which in this case is an ics file) has changed after calling check_fix_conformance , which in this case results in a broken event being shown on Gmail.
Sending the same email with body set to test instead of تست does not result in the same situation and the attachment won’t change. msg-after-fix.eml (19.7 KB) msg-before-fix.eml (18 KB)
the html content, supplied as explicitly binary utf-8 text. This requires transfer encoding in order to be relayed successfully via SMTP
the ics attachment, supplied implicitly as US-ASCII (because there is no charset= field, and because the content type is text/), but actually it is binary utf-8 content inside the base64 transfer encoding.
The first part triggers the NEEDS_TRANSFER_ENCODING check in check_fix_conformance, and it is successfully rewritten with transfer encoding.
The second part is also rewritten by the rebuild that was triggered by the first part. When we extract its content, since it is text, we try to decode the part:
US-ASCII is treated as equivalent to windows-1252 in the rust charset crate. This is a pragmatic choice on the part of the crate authors; US-ASCII is a subset of windows-1252, and windows 1252 is close enough to iso-8859-1 for most purposes.
The actually-utf8-bytes in that part are not strictly 7-bit ASCII, but do happen to be technically valid bytes for windows-1252
The byte stream is therefore successfully decoded from windows-1252 and into UTF-8, but it is bogus because the data isn’t actually windows-1252
The resulting data is then put into the rebuilt message and labelled as UTF-8
Fundamentally, the encoding on the input message is ambiguous. Do you control that input? I would recommend that both parts explicitly set the correct charset and transfer encoding.
I tried to use the charset detection stuff in check_fix_conformance - KumoMTA Docs on this, but this particular conformance issue isn’t detectable in its current form