drewnoakes / metadata-extractor-dotnet

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Encoding Error extracting XMP data from PNG file

Webreaper opened this issue · comments

Hi Drew,

I've got a user who's having an issue when Damselfly attempts to extract metadata from a PNG file.

I've attached the image here:

CaptionTest

When I run the image through exiftool you can see the caption "This is the caption" in the 'Description' field.

image

However, when load the metatdata using metadata-extractor, the XmlDirectory doesn't have any tags, and just has an error saying Error processing XMP data: Unsupported Encoding. Any ideas what's wrong here?

I'm on .Net 8, running on Linux, and using MatadataExtractor v2.8.1. Let me know if you need any other info.

The debugger shows the embedded XMP is invalid. The inner exception is:

System.Xml.XmlException: 'Data at the root level is invalid. Line 177, position 20.'

Looking at the decoded bytes we see:

image

I assume a tool re-wrote the XMP, reducing the length of the segment, without zeroing out the overflow or shortening the segment.

Exiftool must have some logic for this case. We can look to do the same. I haven't thought about this very much, but it seems that scanning for the <?xpacket end="r"?> could help here. I don't know the significance of the r here though.

Are you willing and able to donate this image to the regression test suite so that we can track this issue there?

Any scanning here should ideally be performed:

  • in reverse, to reduce overhead
  • on the bytes directly, not on the decoded string

Once found, the byte array length can be adjusted when handing off to XmpCore:

var xmpMeta = XmpMetaFactory.ParseFromBuffer(xmpBytes, offset, length, parseOptions);

Just going to add a bit of information here as this is my image I provided to webreaper for debugging. If it helps you at all, the XMP was written with lightroom classic version 13.1. You may send this image to whomever you need to to work through the issue.

@grainsoflight we maintain a repository of test images, and I think your case is interesting enough that I'd like to add it there: https://github.com/drewnoakes/metadata-extractor-images

It's a public repository, so please ensure you're happy with it being preserved in that way (though attaching the image to the post here means it's essentially already public).

Thats fine with me

This problem exists in the Java library as well, though with a slightly different error message.

JAVA   [ERROR: XMP] Error processing XMP data: XML parsing failure
DOTNET [ERROR: XMP] Error processing XMP data: Unsupported Encoding
  • in reverse, to reduce overhead

Reverse order won't work. It's likely that the marker still exists at the end (see above screenshot for an example).

How can I pick this up to test? Or will you be making a new release?

I hope to get a build out soon. If you want to test before that, you can build your own version.

No prob. I can wait. Just wasn't sure if you had a dev pipeline / repo, similar to how Matt does it with Skiasharp.

I'd love to set up automatic releases from CI. We have it in NetMQ too and it's very handy. One of these days :)

Yeah, took me ages to set it up with Damselfly - github actions are a bit of a PITA. But totally worth it.