Error decoding attachment file names in utf-8

Question

Error decoding attachment file names in utf-8

4k7 opened this issue 4 years ago · comments

Hi,

Any file name in the attachment is decoded incorrectly if it has at least one character with a length of more than one byte.

The fixUnescapedQuotes function in file header.go is not working properly.

This code

		param = param[startingQuote+1 : closingQuote]
		escaped := false
		for strIdx := range param {
			switch param[strIdx] {
			case '"':
				// We are inside of a quoted string, so lets escape this guy if it isn't already escaped.
				if !escaped {
					sb.WriteByte('\\')
					escaped = false
				}
				sb.WriteByte(param[strIdx])
			case '\\':
				// Something is getting escaped, a quote is the only char that needs
				// this, so lets assume the following char is a double-quote.
				escaped = true
				sb.WriteByte('\\')
			default:
				escaped = false
				sb.WriteByte(param[strIdx])
			}
		}

should be replaced with this one (WriteByte -> WriteRune).

		param = param[startingQuote+1 : closingQuote]
		escaped := false
		for _, c := range param {
			switch c {
			case '"':
				// We are inside of a quoted string, so lets escape this guy if it isn't already escaped.
				if !escaped {
					sb.WriteByte('\\')
					escaped = false
				}
				sb.WriteRune(c)
			case '\\':
				// Something is getting escaped, a quote is the only char that needs
				// this, so lets assume the following char is a double-quote.
				escaped = true
				sb.WriteByte('\\')
			default:
				escaped = false
				sb.WriteRune(c)
			}
		}

And one more value for header_test.go:

		{
			input: "application/rtf; charset=iso-8859-1; name=\"тест.rtf;\".rtf",
			want:  "application/rtf; charset=iso-8859-1; name=\"\\\"тест.rtf;\\\".rtf\"",
		},

Thanks.

James Hillyerd · Answer 1 · Tue Apr 28 2020 08:51:54 GMT+0800 (China Standard Time)

Please submit a pull request against the develop branch.

Neil · Answer 2 · Fri May 29 2020 05:12:51 GMT+0800 (China Standard Time)

@int01 the document is us-ascii at this point.
This should be invalid: Content-Type: application/rtf; charset=iso-8859-1; name="тест.rtf;".rtf
RFC2047 makes provisions for this via quoted-printable or base64 encoding. An example of the base64 variant would be encoded like this: Content-Type: application/rtf; charset=iso-8859-1; name="=?UTF-8?B?0YLQtdGB0YI=?=.rtf;".rtf

Neil · Answer 3 · Tue Jun 02 2020 00:11:46 GMT+0800 (China Standard Time)

@jhillyerd I move to close this issue. @int01 Please reopen if you feel inclined.

Neil · Answer 4 · Tue Jun 02 2020 04:05:07 GMT+0800 (China Standard Time)

It appears that I was mistaken on the choice of encoding for non-utf-8 media-parameter attribute values:
"application/rtf; charset=iso-8859-1; name*=utf-8''%22%D1%82%D0%B5%D1%81%D1%82.rtf%3B%22.rtf" is how it would actually be encoded.
Here is a playground fo demonstration
however, the fact still stands that the document is encoded in us-ascii at this point in time and WriteByte is the appropriate method to use in this function.

1 · Answer 5 · Sun Aug 16 2020 04:20:42 GMT+0800 (China Standard Time)

Your code (with WriteByte) breaks any strings in UTF-8 encoding. It doesn't matter if they are encoded with or without RFС compliance.

So you still need this fix.
Thanks.

James Hillyerd · Answer 6 · Wed Aug 26 2020 23:52:26 GMT+0800 (China Standard Time)

I think this is now fixed, but it was done at the reading instead of writing stage.