mattosaurus / PgpCore

.NET Core class library for using PGP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Verified file has extra characters.

LucTGitHub opened this issue · comments

Hello,
I can sign an XML file, and then verify it. The veried file does not match the original. There are a couple extra characters at beginning.

Example

You can use XML in this zip file for testing. Upzip the file and sign the XML. When you verify, the verfied file has the extra characters. 
input.zip

Looks like this is the UTF8 encoding BOM for the file.

I can see this being added for your file but not in other XML docs I'm testing with. Possibly it's getting added twice.

image

Your file is encoded as UTF-8-BOM, and includes those 3 characters as the first 3 bytes. When it's opened in Notepad++ or read in using something like File.ReadAllLines then these bytes are ignored and not included in the viewed string.

If you read it in byte by byte as a stream then they are included.

using (Stream inputFileStream = new FileInfo(@"\input\input.xml").OpenRead())
{
    // Read the stream character by character and create a string
    StringBuilder sb = new StringBuilder();
    while (inputFileStream.Position < inputFileStream.Length)
    {
        sb.Append((char)inputFileStream.ReadByte());
    }

    // String includes UTF8 BOM characters
    string input = sb.ToString();
}

Because the various methods (Sign and Verify in this case) all work on the raw stream they read in the initial 3 BOM bytes and treat them as any other content character would be treated. I think this is expected behaviour, however I agree it's not ideal.

I'm not really sure how best to deal with this, though my suggestion would be that the files should be generated with the UTF-8-BOM or it should be stripped out before getting to PGPCore. I don't really like the idea of selectively ignoring these characters once they've been passed to PGPCore as they could have been done so as part of a legitimate message but if you've got any suggestions I'm happy to discuss.

Thanks for looking into this!
With this same input.xml file, I don't see this issue when I "Encrypt and Sign" and then "Decrypt and Verify".
Do you know why having encryption and decryption seems to help in this situation?

I think it's because we use StreamWriter in the Verify method, and output the message bytes to it one by one as we read them for signature verification whereas for the Decrypt method we pipe the stream directly to the output stream.

int ch;
while ((ch = pgpLiteralStream.ReadByte()) >= 0)
{
pgpOnePassSignature.Update((byte)ch);
await contentStreamWriter.WriteAsync((char)ch);
}

There might be a better way to do this, I'll take a look into it later this week.

I also found that Verify seems to always output UTF-8-BOM files.
In this test, I create a text file with Notepad and put just a sentense in there. The file is UTF-8 encoded.
Then I sign the file and verify it. The verified file is encoded in UTF-8-BOM.
I wonder why Verify takes a signed UTF-8 file and outputs a UTF-8-BOM file?

Just adding another test case here
Sign and Verify also has issue with a docx.
input.docx

No run-time errors, but the verified file cannot be opened with Microsoft Words. This is not an issue with Encrypt-Sign and Decrypt-Verify.
Thanks!

This was an issue with converting the decrypted bytes to chars when writing them to the output stream so the BOM got written as literal characters. I've switched to using a BinaryWriter from StreamWriter and writing bytes out directly now. This should be fixed in v6.3.

I'm still having the same issue with version v6.3.
But if I clone the project and reference it in Visual Studio, then it works.
Thanks!

Hmm, I think I must have published an older build of the package.

I've pushed v6.3.1 and that seems to work as expected now.

Yes, that worked. Thank you so much for your help!!!