anymail / django-anymail

Django email backends and webhooks for Amazon SES, Brevo (Sendinblue), MailerSend, Mailgun, Mailjet, Postmark, Postal, Resend, SendGrid, SparkPost, Unisender Go and more

Home Page:https://anymail.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

from_email surrounded by quotation marks when it contains special characters with Mailgun

Flexonze opened this issue · comments

HI! We have recently changed our ESP from Sparkpost to Mailgun. Using this librairy has saved us a lot of time. We thank you for that!

Everything works well except for one annoying detail: The from email gets surrounded by quotes when containing a special character like 'é' or 'è'. This was not an issue when using Sparkpost.

Here's an example of what I mean:

def send_welcome_email(from_email, recipients):
    html_content = get_template("welcome.html")
    message = EmailMultiAlternatives(
        subject="Welcome to MyApp",
        body="Welcome!",
        from_email=from_email,
        to=recipients,
    )
    message.attach_alternative(html_content, "text/html")
    message.send()

Now if we use this method like this:

send_welcome_email("Felix (via MyApp) <noreply@example.com>", ["john@example.com"]) 

I get the expected result (In Gmail, when I click "Show original" I can see the following)

From: "Felix (via MyApp)" <noreply@example.com>

But, if I use the method like this:

send_welcome_email("Félix (via MyApp) <noreply@example.com>", ["john@example.com"])  # Notice the 'é' instead of the 'e'

I get this result:

From: "\"Félix (via MyApp)\"" <noreply@example.com>

When we were using Sparkpost, this was not a problem (we got the expected result with the 'é'). It looks quite ugly and unprofessional in the recipients' mailboxes, and since our application is aimed at french users and many french names have special characters like 'é' or 'è' in them, this is a big problem for us.

Hope someone can help :)
Thank you!


  • Anymail version: 8.4.0
  • ESP: Mailgun (previously Sparkpost)
  • Django 3.2, python 3.9.6

Thanks for the report, and for including the exact content that's failing.

This appears to be a Mailgun bug. Mailgun seems to be mishandling email display names that contain both non-ASCII characters and parentheses, by adding an extraneous set of quotes. (This occurs for both From and To addresses.)

A workaround is to leave out the parentheses. When I tested from_email="Félix via MyApp <noreply@example.com>", that went through without any extra quotes. (You'd probably also need to avoid commas in the display name, though I haven't tested that.) [Edit: I've tested it, see later comment.]

I've tried a couple of different ways to communicate the From field to Mailgun's API, but couldn't find a solution that avoided their broken quoting logic. I'll try to follow up with Mailgun support next week when I have some time, and add some more details here—or let me know if you get in touch with them.

[Briefly, display names need quotes when they contain commas or parentheses, and they need RFC 2047 encoding when they contain non-ASCII characters, but Mailgun is mixing the two inappropriately. Anymail passes Mailgun a properly RFC 2047-encoded display name: =?utf-8?b?RsOpbGl4ICh2aWEgTXlBcHAp?=. Mailgun is decoding that, apparently deciding to add quotes because it contains parens, and then re-encoding the entire thing—including the added quotes—as =?utf-8?q?"F=C3=A9lix_(via_MyApp)"?=.]

Thanks for the quick response and explanation! It is really appreciated :)

You are right, I get the expected result when I leave out the parentheses. This will be our quick fix for now, even though it's not ideal.

Let me know if you get a follow up from the Mailgun support. Have a nice day!

I've done some more testing, and confirmed that this is a bug in the Mailgun message sending API, not in django-anymail or any client library. If the from or to parameter includes a display-name that contains both non-ASCII characters and any of (, ), , or ", Mailgun will send a message that wraps the name in unnecessary "quotes" (and that has a technically invalid From or To header).

I've reported this to Mailgun support (Mailgun ticket #2105686).

More details…

Quick background to establish terminology (apologies if this is stuff you already know):

  • In email address headers (From, To, etc.), an address can be either a bare email addr-spec like user@example.com, or a name-addr combo that includes a display-name and an addr-spec in angle brackets, like User Name <user@example.com>. (RFC 5322 section 3.4)
  • Commas, parentheses, and some other characters have special meaning in address headers, so if you want to include them in a display-name you must use a quoted-string like "Example, Inc. (Dev Team)" <dev@example.com>. To include a literal quote character, you backslash escape it: "I'm \"quoting\" this". (RFC 5322 section 3.2.4)
  • Email headers can only contain ASCII characters. To include non-ASCII characters in them, you must use an RFC 2047 encoded-word like =?utf-8?q?F=C3=A9lix?=: that's Félix using the utf-8 charset and q quoted-printable content transfer encoding. (RFC 2047)
  • An encoded-word can use either b base64 encoding, or q quoted printable encoding. Quoted printable is a lot like url query parameter % encoding, but uses =DD for hex-encoded bytes and _ for space. And with some important limitations punctuation can use either hex or ASCII characters: F=C3=A9lix=21 and F=C3=A9lix! are both valid quoted printable encodings of the utf-8 Félix!. (RFC 2047 sections 4 and 5)

OK, now to what's going wrong here.

Mailgun's message sending API tries to take care of all of these rules for you. You can give it a from or to parameter with a not-quite-valid display-name having a comma or parentheses, and it will wrap it in quotes for you to make a valid quoted-string. Give it Unicode characters, and it generates the RFC 2047 encoded-word for you. Great!

Except… when you give Mailgun a name that contains both non-ASCII characters and commas/parentheses/quotes, Mailgun tries (incorrectly) to apply both encodings:

  • Félix (via MyApp) (original name)
  • "Félix (via MyApp)" (has parentheses, so quoted-string)
  • =?utf-8?q?"F=C3=A9lix_(via_MyApp)"?= (has non-ASCII characters, so RFC 2047 encoded-word—but of the quoted-string, not of the original name)

… and that's what ends up in the message's From header. But the quoted-string step is both unnecessary and wrong here. A correct encoding would be:

  • Félix (via MyApp) (original name)
  • =?utf-8?q?F=C3=A9lix_=28via_MyApp=29?= (has non-ASCII characters, so RFC 2047 encoded-word)
  • and we're done! (No parentheses in the encoded-word, so no need for further quoting.)

Worse, what Mailgun sends is not actually a valid RFC 2047 encoded-word for an address header. Remember a display-name can't contain quotes or parentheses outside a quoted-string, and =?utf-8?q?"F=C3=A9lix_(via_MyApp)"?= is not a quoted-string (it starts with =?, not "). But it contains parentheses! And quotes! This is why quoted printable encoded-words in address headers have "some important limitations" for raw ASCII punctuation (see RFC 2047 section 5 (3)). Fortunately, most email apps seem to let this technicality slide.

Still worse, Mailgun does this even when an API client passes it a from parameter that's already properly encoded (like Anymail does). Anymail says from should be =?utf-8?q?F=C3=A9lix_=28via_MyApp=29?= <noreply@example.com>; Mailgun's API decodes that display-name and then incorrectly re-encodes it using the logic above. [Depending on the exact display-name, Anymail might instead use the equivalent base64 encoding =?utf-8?b?RsOpbGl4ICh2aWEgTXlBcHAp?=. Under the hood, Anymail relies on Python's standard library email.utils.formataddr(), because all this encoding stuff is tricky!]

Why don't you see this problem with other ESPs? Either their API takes separate name and email (addr-spec) parameters and builds the From header for you (like Sparkpost), or their API expects the client to provide a properly-quoted-and-encoded address field (Anymail does) and just uses that unmodified. (Also, to be fair, plenty of other ESPs have other charset related bugs in their APIs—and Anymail has had its share of past issues, too.)

I've tried several different potential workarounds, and couldn't find one that worked1, so the fix for this will have to come from Mailgun. If you really want parentheses in your display-name, you might experiment with some other Unicode bracket characters—it's only the ASCII parens (0x28 and 0x29) that cause problems. E.g., fullwidth parentheses Félix (via MyApp) might be OK. (But of course test how that shows up in email apps you care about.)

One final note: If you're trying to debug email headers, know that Gmail's "show original" view doesn't actually show the raw headers. Gmail decodes RFC 2047 encoded-words and makes some other readability improvements in "show original". To see the actual original raw message, use Gmail's "download message" and open the file in a text editor.


1 Actually, I did find one workaround if the display-name starts with an ASCII-only word. You can encode User Félix (via MyApp) properly as User =?utf-8?q?F=C3=A9lix_=28via_MyApp=29?= (RFC 5322 atom followed by whitespace followed by RFC 2047 encoded-word), and this seems to keep Mailgun's re-encoding logic from kicking in. But that seems too limiting and weird to try to put into Anymail's code.

Mailgun support has replicated the issue and forwarded it to their engineers.

Hi @medmunds, I'm one of the developers at Mailgun responsible for this logic. I completely agree with your summary. This is really how it should work. Unfortunately, what we see in the wild is that some mailbox providers aren't RFC compliant and wont display the encoded word version of the display name properly without the extra quotes. In the past we actually got more support tickets about display names broken this way than we do today regarding the extra quotes. Also, the reason you see this logic applied regardless of which version is submitted is because the API normalizes those fields before passing them to the services responsible for sending the message. Have you tried using our /messages.mime endpoint? That would give you much more control over how the message is rendered.

Hi @b0d0nne11. Really appreciate you providing this info here—thanks!

Is there a reason Mailgun doesn't QP encode ( and ) as =28 and =29 in address headers? (Are there some mailbox clients that can decode other QP bytes but choke on =28...=29?)

It seems like Mailgun is trying to use a version of QP encoding that's not allowed in address headers. That is, in a Subject header it's OK to use raw parens inside QP, but they're prohibited in From/To/Cc headers. (See item "(3)" in RFC 2047 section 5.)

So, something like this would definitely not be a valid From header, because it has raw ( and ) that aren't allowed in address headers:

From: =?utf-8?q?F=C3=A9lix_(via_MyApp)?= <noreply@example.com>

… and if that's what Mailgun used to send, it makes sense there were problems. I'd imagine even totally RFC compliant mailbox providers would have trouble with that. So working around that invalid QP by adding some (also invalid) " is one approach. But maybe a cleaner fix would be keeping the raw parens out of the QP section by using =28 and =29 instead?

Related question: do you happen to remember some of the mailbox clients that had trouble with this? I'd like to add them to my testing.

Also, thanks for the pointer to /messages.mime. One hitch is that Anymail also supports Mailgun's templates and variables, and I don't think those are available through the mime sending endpoint. But I'll try to look at maybe making this an option for Anymail users who don't need those features.

Happy to help :)

With regard to the special characters in the QP encoding. Thanks for pushing on that. That's one area where the code has changed fairly recently. We're using Go's QEncoding from the mime package. That encoder doesn't differentiate between these different sets of allowed characters. I believe the previous logic used base64 encoding for display names and we may have to go back to that. I'll open a ticket to look into it. It looks to me like a regression.

I don't remember for sure which providers and I don't trust my memory enough to say any here. I went looking through old tickets and PRs after my last comment and it's been quite a while since we settled on this format. After we fix the issue above we can do some testing and make sure this is still required. Maybe things have changed in the last few years 🤞. I can't make any promises, however.

Lastly, I think /messages.mime should still support variables. Obviously that doesn't help with templates which don't make a lot of sense in the context of that call.

No worries… I had done a bunch of testing (and relearning the RFCs) to try to verify this wasn't something Anymail (or Django or Python) was doing wrong, so just passing those results along here.

Speaking of which, I saw one other weird case that might (or might not) be helpful to you. When I sent with the from param containing =?utf-8?q?F=C3=A9lix_(via_MyApp)?=, Mailgun didn't try to normalize and re-encode that, but instead just quoted the whole thing as-is:

From: "=?utf-8?q?F=C3=A9lix_(via_MyApp)?=" <noreply@example.com>

Now, that's definitely not what you'd want to end up in the header. (RFC 2047 has a "MUST NOT" about encoded-words inside quoted-words.) But to be fair it's kind of what I asked for with that invalid-for-headers QP display-name, so I wouldn't call this case a Mailgun bug. It just seemed odd that Mailgun handled it differently from other encoded-words. Maybe whatever the API uses for decoding RFC 2047 isn't symmetric with the encoder?

I think a strictly compliant mail app would display that name as a literal =?utf-8?..., which, I dunno, might actually be someone's (billionaire's child's) name. Surprisingly, though, Gmail seems to ignore the quotes, decodes the encoded-word, and displays it as Félix (via MyApp)—so I guess this is a common enough error for them to special case it. (I would not suggest this as a workaround to the original issue, though, because who knows what you'd end up seeing in other mail apps.)

I've documented the current behavior in Anymail's section on Mailgun limitations and quirks.

Thanks for your help guys! Really appreciated