Error decoding attachment file names in utf-8
4k7 opened this issue · comments
Hi,
Any file name in the attachment is decoded incorrectly if it has at least one character with a length of more than one byte.
The fixUnescapedQuotes function in file header.go is not working properly.
This code
param = param[startingQuote+1 : closingQuote]
escaped := false
for strIdx := range param {
switch param[strIdx] {
case '"':
// We are inside of a quoted string, so lets escape this guy if it isn't already escaped.
if !escaped {
sb.WriteByte('\\')
escaped = false
}
sb.WriteByte(param[strIdx])
case '\\':
// Something is getting escaped, a quote is the only char that needs
// this, so lets assume the following char is a double-quote.
escaped = true
sb.WriteByte('\\')
default:
escaped = false
sb.WriteByte(param[strIdx])
}
}
should be replaced with this one (WriteByte -> WriteRune).
param = param[startingQuote+1 : closingQuote]
escaped := false
for _, c := range param {
switch c {
case '"':
// We are inside of a quoted string, so lets escape this guy if it isn't already escaped.
if !escaped {
sb.WriteByte('\\')
escaped = false
}
sb.WriteRune(c)
case '\\':
// Something is getting escaped, a quote is the only char that needs
// this, so lets assume the following char is a double-quote.
escaped = true
sb.WriteByte('\\')
default:
escaped = false
sb.WriteRune(c)
}
}
And one more value for header_test.go:
{
input: "application/rtf; charset=iso-8859-1; name=\"тест.rtf;\".rtf",
want: "application/rtf; charset=iso-8859-1; name=\"\\\"тест.rtf;\\\".rtf\"",
},
Thanks.
Please submit a pull request against the develop branch.
@int01 the document is us-ascii
at this point.
This should be invalid: Content-Type: application/rtf; charset=iso-8859-1; name="тест.rtf;".rtf
RFC2047 makes provisions for this via quoted-printable or base64 encoding. An example of the base64 variant would be encoded like this: Content-Type: application/rtf; charset=iso-8859-1; name="=?UTF-8?B?0YLQtdGB0YI=?=.rtf;".rtf
@jhillyerd I move to close this issue. @int01 Please reopen if you feel inclined.
It appears that I was mistaken on the choice of encoding for non-utf-8 media-parameter attribute values:
"application/rtf; charset=iso-8859-1; name*=utf-8''%22%D1%82%D0%B5%D1%81%D1%82.rtf%3B%22.rtf"
is how it would actually be encoded.
Here is a playground fo demonstration
however, the fact still stands that the document is encoded in us-ascii
at this point in time and WriteByte
is the appropriate method to use in this function.
Your code (with WriteByte) breaks any strings in UTF-8 encoding. It doesn't matter if they are encoded with or without RFС compliance.
So you still need this fix.
Thanks.
I think this is now fixed, but it was done at the reading instead of writing stage.