Having Issues With Non-UTF8

Howdy fellas!

One of my clients sent an email that contains a plaintext part that looks like:

Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

ChefYesChef ☰ ⊗ Home  About Us  Contact Us ChefYesChefThank you for subscribing to the ChefYesChef Newsletter. The details submitted were: First name: asdadlast name: sadsadaEmail address: asdsdDA@maileroo-tester.com Please click the link below to confirm your subscriptionClick here: https://chefyeschef.co.uk/subscribe.php?wpmailer_id=subscribe&u=19&f=1&t=0&id=uaMrMQ1EUmjDhL9nYx21vk84rTTAPggi Thank you for your interestIan McAndrewhttp://chefyeschef.co.uk/+44 (0)79 734 88670Ian McAndrewAward winning, Michelin Star Chef Ian McAndrew shares his creative recipes with you in his three wonderful books. Ian's Books are available on CKBKThis e-mail was sent to you by ChefYesChef. To ensure delivery to your inbox (not bulk or junk folders), please add our e-mail address to your address book.© ChefYesChef 2024

--b1=_RWMziSbQqiS5z4qbWKaBc1BYUhZgOrNcn1BUesF2HU

While the part looks innocent, it’s got some Unicode characters: 00A0 (NBSP). On tracing the SMTP server, this is how KumoMTA parses it:

[127.0.0.1:35600->127.0.0.1:9999]   21s  -> ChefYesChef ☰ ⊗ Home  About Us  Contact Us ChefYesChefThank you for subscribing to the ChefYesChef Newsletter.�The details submitted were:�First name: asdsadlast name: asdEmail address: asdasd@maileroo-tester.com�Please click the link below to confirm your subscriptionClick here: https://chefyeschef.co.uk/subscribe.php?wpmailer_id=subscribe&u=20&f=1&t=0&id=B5f2YSVqCYfdJRW5iLgTKxSiEmyElkVj�Thank you for your interestIan McAndrewhttp://chefyeschef.co.uk/+44 (0)79 734 88670Ian McAndrewAward winning, Michelin Star Chef Ian McAndrew shares his creative recipes with you in his three wonderful books. Ian\'s Books are available on CKBKThis e-mail was sent to you by ChefYesChef. To ensure delivery to your inbox (not bulk or junk folders), please add our e-mail address to your address book.� ChefYesChef 2024

I am not sure why the NBSP characters are converting to replacement characters. Also, DKIM doesn’t work on this email.

[127.0.0.1:35600->127.0.0.1:9999]   21s === smtp_server_message_received: Error: DKIM signer: message is not ASCII or UTF-8: invalid utf-8 sequence of 1 bytes from index 980

Funny how this email doesn’t work with the MongoDB either..

Uncaught:
BSONError: Invalid UTF-8 string in BSON document
Caused by:
TypeError: The encoded data was not valid for encoding utf-8

Having Issues With Non-UTF8

Okay, I’m confused. NBSP is a valid UTF-8 character

Feb 14 12:13:22 email-router email-router[258526]: 2025/02/14 12:13:22 Email is not UTF-8
Feb 14 12:13:22 email-router email-router[258526]: Checking UTF-8 validity:
Feb 14 12:13:22 email-router email-router[258526]: Invalid UTF-8 byte at index 733: 0xA0
Feb 14 12:13:22 email-router email-router[258526]: Invalid UTF-8 byte at index 761: 0xA0
Feb 14 12:13:22 email-router email-router[258526]: Invalid UTF-8 byte at index 833: 0xA0
Feb 14 12:13:22 email-router email-router[258526]: Invalid UTF-8 byte at index 1012: 0xA0
Feb 14 12:13:22 email-router email-router[258526]: Invalid UTF-8 byte at index 1411: 0xA9

Let me try a workaround

AND, it worked!

Lol

I just auto detect the encoding.. and convert it to UTF-8

No more issues, lol

Looks like the email was in ISO-8859-1 (Latin-1)