drewnoakes / metadata-extractor-dotnet

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Jpeg EXIF comment missing last character

rustygreen opened this issue · comments

I have a JPG image with an EXIF UserComment and the last character in the comments shows up as a unicode block character, rather than the actual value "1":
image

Here is the image that causes the issue:
0054012NNN01

Difficult to describe, but there are two things going on here:

  • ExifDescriptorBase.GetUserCommentDescription is checking 10 bytes in the encodingName and throwing away extra nulls and spaces out of those 10, which is not correct. ExifTool always assumes this type of comment header is 8 bytes in length and has conditionals for nulls and spaces to choose the encoder. Everything in the byte array after those 8 to the end is part of the comment text.
  • In this case the comment bytes are stored in big endian, but no BOM is present (usual for many byte arrays) to help make an encoder decision.

What happens here is ignoring \0 up to the first 10 ends up removing the first '0' byte that's actually part of the text. Unicode (LE) decoding then does work since it is fed (apparently) LE bytes, but the bytes then end one byte short to make a correct LE character.

I get the feeling this 10 byte thing was done to fix other comments stored in BE. The only real way to fix this is: 1) look at only the first 8 bytes; followed by 2) guess the byte order.

Another option is to correct the 10 bytes vs. 8 bytes issue, but then store the header and comment bytes separately in the directory and NOT try to guess the encoding or byte order -- at least not try very hard.

Users would then have access to the actual comment bytes that matter and could try other encodings or byte orders as they saw fit.