Attachments produce overlong base64 lines.

Question

Attachments produce overlong base64 lines.

jazzboME opened this issue 6 years ago · comments

I've been testing an application that generates a PDF file (via gofpdf) and then used MailYak to attach the generated file to an email.

My email server at work uses SpamAssassin to flag potential span and it is flagging the emails as having overlong BASE64 lines -- exceeding the 76 character limit defined for email in RFC2045.

And I have to agree -- when looking at the original email (We use gmail). We do end up with long lines in the BASE64 output. It ends up looking like this:

Content-ID: <test2.pdf>
Content-Transfer-Encoding: base64
Content-Type: application/pdf;
        filename="test2.pdf"

JVBERi0xLjQKJcfsj6IKNSAwIG9iago8PC9MZW5ndGggNiAwIFIvRmlsdGVy
IC9GbGF0ZURlY29kZT4+CnN0cmVhbQp4nM1cS48ctxEewLe9GP4HcwjglZEd
8dkkr4F248gKVhptLAdxToodIIAcyPr/QIqPnqlqVnVzdi04EOCdmW6yyWJ9
VV892h/36qD3Kv9rf99/uHp+DPt/f4I/af/+05Xe53+f3v9ylS9P2qh9SPCf
X3+6+nnwp4/wYzj4+Wdt4POvP+3ffbP/hb1CZjFqnvhgrTXewFJtdCYE+OCc
1dqX1cGNe69tns1NUe3debJ8ySXVLhin0M9TMAf2gnOK+9mowN5vU2B/Do6f
33rNzW9hX+z9Jp4XutiamdilwhdhKs0uVSdhqTqwS9VOWKo2jv1Z6X7+50et
s5apcoYfr3TRxX378/7D/k8P+R4LvxxS8vuHn6+qnmrQjL0PDmZ8+HD1j+vd
9AxUGPQipevdi93r/C0l+H69e7u7f3YDX3WYor/efbv7Ai7CGu0EF7/eHXd3
+WYTphCudz+Qod/u3uzeobu/eHajnU3l0z8fXl7dPly9gX+w7qpHsCC/197n
z9qVTVY1/zj/Wtaer+ARBXR62r/4b5tOxssTBeZTFVjewMN/6vob0AFN0TiT
8eUBaMrmD8kH+LD6qAgHDo8K3bOMP4T5dJ6DEL2LLtprEPjf4JtNPk4TiDifjYWFwdG8LqIGEBdJ5/M0RoGw
X8HvanJRWRjwNl8x1kYPt32fv6jovJlg/A9lkFMxn6Q02Z/rTdEmndfSdMOE
[...]

With that penultimate line above measuring 84 characters. I've created a simple PDF that causes this if you want to try it, and if I manually attach the same PDF to an email in GMail then it works fine: Gmail makes every line exactly 60 chars.

So I'm not sure if this a problem with how the splitter works on long lines, how the lines are written out by the base64 encoder or how google's smtp server handles input.

I put some Printlns into splitter and it looks like the splitter is generating for the same block:

JVBERi0xLjQKJcfsj6IKNSAwIG9iago8PC9MZW5ndGggNiAwIFIvRmlsdGVy
IC9GbGF0ZURlY29kZT4+CnN0cmVhbQp4nM1cS48ctxEewLe9GP4HcwjglZEd
8dkkr4F248gKVhptLAdxToodIIAcyPr/QIqPnqlqVnVzdi04EOCdmW6yyWJ9
VV892h/36qD3Kv9rf99/uHp+DPt/f4I/af/+05Xe53+f3v9ylS9P2qh9SPCf
X3+6+nnwp4/wYzj4+Wdt4POvP+3ffbP/hb1CZjFqnvhgrTXewFJtdCYE+OCc
1dqX1cGNe69tns1NUe3debJ8ySXVLhin0M9TMAf2gnOK+9mowN5vU2B/Do6f
33rNzW9hX+z9Jp4XutiamdilwhdhKs0uVSdhqTqwS9VOWKo2jv1Z6X7+50et
s5apcoYfr3TRxX378/7D/k8P+R4LvxxS8vuHn6+qnmrQjL0PDmZ8+HD1j+vd
9AxUGPQipevdi93r/C0l+H69e7u7f3YDX3WYor/efbv7Ai7CGu0EF7/eHXd3
+WYTphCudz+Qod/u3uzeobu/eHajnU3l0z8fXl7dPly9gX+w7qpHsCC/197n
z9qVTVY1/zj/Wtaer+ARBXR62r/4b5tOxssTBeZTFVjewMN/6vob0AFN0TiT
8eUBaMrmD8kH+LD6qAgH
Do8K
3bOMP4T5dJ6DEL2LLtprEPjf4JtNPk4TiDifjYWFwdG8LqIGEBdJ5/M0RoGw
X8HvanJRWRjwNl8x1kYPt32fv6jovJlg/A9lkFMxn6Q02Z/rTdEmndfSdMOE
[...]

Kevin Foss · Answer 1 · Sun Dec 09 2018 08:43:12 GMT+0800 (China Standard Time)

So after some testing, I think we just need to add a line break after outputting the remaining portion in the Write from lineSplitter, otherwise we get situations like above where it writes a long leftover, and then the first 60 chars of the next part and we end up with a overlong line.

I'll try to work on a test of this as well as a patch for a PR.

Kevin Foss · Answer 2 · Sun Dec 09 2018 21:31:25 GMT+0800 (China Standard Time)

So I don't know what approach you want to take here, if I update the splitter to add CRs then it will break a lot of other tests, I can do that, and adjust all of the tests. I just don't know if you want the splitter to create choppier output like that or if there is a more elegant solution here.

For the time being, this test shows the problem where out of spec lines are created, using one of the attachment samples that was already there. (Adds a dependency on bufio)

func TestMailYakWriteAttachments_lineSplitter(t *testing.T) {
	t.Parallel()

	tests := []struct {
		// Test description.
		name string
		// Receiver fields.
		rattachments []attachment
		// Expected results.
		ctype   string
		disp    string
		data    string
		wantErr bool
	}{
		{
			"Attachment > 512, split to lines of 60 or less",
			[]attachment{
				{
					"qed.txt",
					strings.NewReader(
						`Now it is such a bizarrely improbable coincidence that anything so mind-bogglingly ` +
							`useful could have evolved purely by chance that some thinkers have chosen to see it ` +
							`as the final and clinching proof of the non-existence of God. The argument goes something ` +
							`like this: "I refuse to prove that I exist," says God, "for proof denies faith, and ` +
							`without faith I am nothing." "But," says Man, "The Babel fish is a dead giveaway, ` +
							`isn't it? It could not have evolved by chance. It proves you exist, and so therefore, ` +
							`by your own arguments, you don't. QED." "Oh dear," says God, "I hadn't thought of ` +
							`that," and promptly vanishes in a puff of logic. "Oh, that was easy," says Man, and ` +
							`for an encore goes on to prove that black is white and gets himself killed on the next ` +
							`zebra crossing.`,
					),
					false,
					"",
				},
			},
			"text/plain; charset=utf-8;\n\tfilename=\"qed.txt\"",
			"attachment;\n\tfilename=\"qed.txt\"",
			"Tm93IGl0IGlzIHN1Y2ggYSBiaXphcnJlbHkgaW1wcm9iYWJsZSBjb2luY2lk\r\n" +
                "ZW5jZSB0aGF0IGFueXRoaW5nIHNvIG1pbmQtYm9nZ2xpbmdseSB1c2VmdWwg\r\n" +
                "Y291bGQgaGF2ZSBldm9sdmVkIHB1cmVseSBieSBjaGFuY2UgdGhhdCBzb21l\r\n" +
                "IHRoaW5rZXJzIGhhdmUgY2hvc2VuIHRvIHNlZSBpdCBhcyB0aGUgZmluYWwg\r\n" +
                "YW5kIGNsaW5jaGluZyBwcm9vZiBvZiB0aGUgbm9uLWV4aXN0ZW5jZSBvZiBH\r\n" +
                "b2QuIFRoZSBhcmd1bWVudCBnb2VzIHNvbWV0aGluZyBsaWtlIHRoaXM6ICJJ\r\n" +
                "IHJlZnVzZSB0byBwcm92ZSB0aGF0IEkgZXhpc3QsIiBzYXlzIEdvZCwgImZv\r\n" +
                "ciBwcm9vZiBkZW5pZXMgZmFpdGgsIGFuZCB3aXRob3V0IGZhaXRoIEkgYW0g\r\n" +
                "bm90aGluZy4iICJCdXQsIiBzYXlzIE1hbiwgIlRoZSBCYWJlbCBmaXNoIGlz\r\n" +
                "IGEgZGVhZCBnaXZlYXdheSwgaXNuJ3QgaXQ/IEl0IGNvdWxkIG5vdCBoYXZl\r\n" +
                "IGV2b2x2ZWQgYnkgY2hhbmNlLiBJdCBwcm92ZXMgeW91IGV4aXN0LCBhbmQg\r\n" +
                "c28gdGhlcmVmb3JlLCBieSB5b3VyIG93biBhcmd1bWVudHMsIHlvdSBkb24ndC4gUUVELiIgIk9oIGRlYXIs\r\n" +
                "IiBzYXlzIEdvZCwgIkkgaGFkbid0IHRob3VnaHQgb2YgdGhhdCwiIGFuZCBw\r\n" +
                "cm9tcHRseSB2YW5pc2hlcyBpbiBhIHB1ZmYgb2YgbG9naWMuICJPaCwgdGhh\r\n" +
                "dCB3YXMgZWFzeSwiIHNheXMgTWFuLCBhbmQgZm9yIGFuIGVuY29yZSBnb2Vz\r\n" +
                "IG9uIHRvIHByb3ZlIHRoYXQgYmxhY2sgaXMgd2hpdGUgYW5kIGdldHMgaGlt\r\n" +
                "c2VsZiBraWxsZWQgb24gdGhlIG5leHQgemVicmEgY3Jvc3Npbmcu",
			false,
		},
	}
	for _, tt := range tests {
		tt := tt
		t.Run(tt.name, func(t *testing.T) {
			t.Parallel()

			m := MailYak{attachments: tt.rattachments}
			pc := testPartCreator{}

			// use actual lineSplitter rather than nopSplitter
			if err := m.writeAttachments(&pc, lineSplitterBuilder{}); ( err != nil) != tt.wantErr {
				t.Errorf("%q. MailYak.writeAttachments() error = %v", tt.name, err )
			}

			// Ensure there's an attachment
			if len(pc.attachments) != 1 {
				t.Fatalf("%q. MailYak.writeAttachments() unexpected number of attachments = %v, want 1", tt.name, len(pc.attachments))
			}

			if pc.attachments[0].contentType != tt.ctype {
				t.Errorf("%q. MailYak.writeAttachments() content type = %v, want %v", tt.name, pc.attachments[0].contentType, tt.ctype)
			}

			if pc.attachments[0].disposition != tt.disp {
				t.Errorf("%q. MailYak.writeAttachments() disposition = %v, want %v", tt.name, pc.attachments[0].disposition, tt.disp)
			}

			scanner := bufio.NewScanner(bytes.NewReader(pc.attachments[0].data.Bytes()))
			for scanner.Scan() {
				if len(scanner.Text()) > maxLineLen {
					t.Errorf("%q. linelength = %d want <= %d\n", tt.name, len(scanner.Text()), maxLineLen)
				}
			}

			if pc.attachments[0].data.String() != tt.data {
				t.Errorf("%q. MailYak.writeAttachments() data = \n%v, want \n%v", tt.name, pc.attachments[0].data.String(), tt.data)
			}
		})
	}
}

This fails on the long line. The problem I'm finding in testing is that the Splitter tests don't incorporate the rest of the header parts (which affects the splitter) while the attachment tests use a NOP splitter. In the former case, I never get a line over 64 characters (still technically in spec for email); in the latter, I only get raw data. This test incorporates the line splitter with the full attachment process to trigger the error. I realize the splitter and attacher are "separate" and are tested independently but I don't know how else to test for this problem.

Dom · Answer 3 · Mon Dec 10 2018 19:50:13 GMT+0800 (China Standard Time)

Hey @jazzboME

First off thanks for identifying the problem and the great writeup - it's really refreshing to get a well described ticket!

My intention was for the splitter to write lines of a fixed length (much like Gmail is doing), I'm not sure if having arbitrary newlines in the encoded body is considered valid but I would aim for a solution that outputs fixed line length with no unexpected new lines if possible.

I'm happy to pick this up and have a go if you like, it might be a simple fix but my memory regarding the splitter implementation is a little fuzzy so it might not be! Let me know if you intend on opening a PR, otherwise I'll make some time to take a look! :)

Thanks again!
Dom

Kevin Foss · Answer 4 · Mon Dec 10 2018 22:51:37 GMT+0800 (China Standard Time)

Thanks for getting back to me. Sorry for the spamming the issue a little bit with replies as I worked through the issue. Unfortunately, I don't have a quick fix either. Adding the line feeds "works," but only in the sense that gmail doesn't complain, but I'm not sure it is really compliant, and to be honest, doesn't look great.

Initially I was thinking we could generate the entire buffer for the base64 encoding and then split -- but that comes with a memory usage impact. In short, I don't have a great solution, so if you could take a look at it, that would be great.

Thanks,
-Kevin

Dom · Answer 5 · Tue Dec 11 2018 04:04:22 GMT+0800 (China Standard Time)

Hi @jazzboME

I've pushed a fix for this - thanks for your help!

Dom

Kevin Foss · Answer 6 · Tue Dec 11 2018 04:57:07 GMT+0800 (China Standard Time)

Unfortunately, I don't think this fix works.

I just sent myself a quick email by attaching the mailyak logo (106, 478 bytes) -- and the attachment gets to me as 106,004 bytes and is an invalid PNG. Doing a quick diff of the base64 produced, and it looks like what used to be the remainder section after the breaks were iterated is getting left out of the output, but I'm still trying to track down what exactly is happening.

Dom · Answer 7 · Tue Dec 11 2018 04:58:45 GMT+0800 (China Standard Time)

OK, this is what I get for being lazy and pushing to master - I'll revert and turn this into a branch - thanks for your help!

Great catch!

Dom · Answer 8 · Tue Dec 11 2018 05:01:05 GMT+0800 (China Standard Time)

OK - reverted and branched to bugfix/splitter-chunked-writes

Let me know if you find a reproducible test case - I'll take a look too.

Thanks for your help, it's truly appreciated!
Dom

Kevin Foss · Answer 9 · Tue Dec 11 2018 05:41:25 GMT+0800 (China Standard Time)

splitter_test.go

I'm using this for my splitter_test. Incorporates a 976 byte png which loses 4 bytes with the version in the chunked branch.

Kevin Foss · Answer 10 · Tue Dec 11 2018 09:12:48 GMT+0800 (China Standard Time)

Please see #33. This is the first time I've submitted a pull request for anything, so hopefully it is reasonable, but I think it solves the issue.

Previously the lineSize was being reset before updating i. So if, for example, lineSize was 56, we would correctly output 56 bytes, but then lineSize was being reset to 60, so the next chunk was taken at a 60 byte offset, rather than 56 and we'd lose those 4 bytes.

Kevin Foss · Answer 11 · Wed Dec 12 2018 05:31:30 GMT+0800 (China Standard Time)

Just tried this again after pulling down the updated bugfix branch and it looks to be working fine now. Valid attachments encoded in nice neat 60 col rows.

So from my tests, it looks like we'd be fine merging into test.

Dom · Answer 12 · Wed Dec 12 2018 23:58:48 GMT+0800 (China Standard Time)

Awesome! Thanks @jazzboME