drewnoakes / metadata-extractor-dotnet

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error processing XMP data: Unsupported Encoding

MadsSwensson opened this issue · comments

When extracting metadata, i get the following error: "Error processing XMP data: Unsupported Encoding"

AmsterdamSnowsuitDoubleZipAW19_DarkGreen_Set_01

Attached is a sample image

Here is the full stack trace:

XmpCore.XmpException: Unsupported Encoding ---> XmpCore.XmpException: XML parsing failure ---> System.Xml.XmlException: Data at the root level is invalid. Line 225, position 20.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
at System.Xml.XmlTextReaderImpl.ParseRootLevelWhitespace()
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r)
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r, LoadOptions o)
at System.Xml.Linq.XDocument.Load(XmlReader reader, LoadOptions options)
at System.Xml.Linq.XDocument.Load(XmlReader reader)
at XmpCore.Impl.XmpMetaParser.ParseStream(Stream stream, ParseOptions options)
--- End of inner exception stack trace ---
at XmpCore.Impl.XmpMetaParser.ParseStream(Stream stream, ParseOptions options)
at XmpCore.Impl.XmpMetaParser.ParseXmlFromByteBuffer(ByteBuffer buffer, ParseOptions options)
--- End of inner exception stack trace ---
at XmpCore.Impl.XmpMetaParser.ParseXmlFromByteBuffer(ByteBuffer buffer, ParseOptions options)
at XmpCore.Impl.XmpMetaParser.Parse(ByteBuffer byteBuffer, ParseOptions options)
at XmpCore.XmpMetaFactory.ParseFromBuffer(Byte[] buffer, Int32 offset, Int32 length, ParseOptions options)
at MetadataExtractor.Formats.Xmp.XmpReader.Extract(Byte[] xmpBytes, Int32 offset, Int32 length)

Can you share your code please?

Nevermind. I was able to reproduce the issue.

The XMP is:

<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c145 79.163499, 2018/08/13-16:40:22        ">
 <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
...snip...
 </rdf:RDF>
</x:xmpmeta>
<?xpacket end="r"?>>

Note the double closing bracket at the end. Perhaps some software that edited this image failed to do so properly.

If I modify the program's data while it's running to exclude that last character, it produces 172 XMP tags and no error.

So the short answer here is that the image is invalid. I think this is the kind of error the library might reasonable recover from, however I'm not sure right now what the right approach is.

@MadsSwensson do you see this on other images too? I'm wondering if you see other malformed endings to these files. I'm assuming that software re-writes the XMP in place without trimming data from the end, resulting in any of (for example):

<?xpacket end="r"?>
<?xpacket end="r"?>>
<?xpacket end="r"?>?>
<?xpacket end="r"?>"?>
...

@drewnoakes
I have four images that fail with the same error. It sounds about right that the software that the editor uses, has invalidated the images, since this is the first time in ~2 years its failing.
She will be back from vacation on Monday, I will investigate further from there on.
Thanks for your time.

Reopening this as in general I like to recover from errors where it is possible, reliable and safe to do so.

In this case it may be possible to check for the actual location of the trailing <?xpacket end="r"?> tag. We'd have to be careful to make sure this wouldn't break other scenarios however.

There could be a fairly large efficiency challenge with this kind of problem: any forward inspection you try to do before handing it off to XmpCore implies loading the entire thing into memory first. Many streaming opportunities will get lost.

@kwhopper agree that's a good thing to think about if/when implementing this. It should still be possible to detect the closing tag in a streaming scenario without buffering the full content.

I guess the library only uses XmpMetaFactory when byte arrays are present. One usage is directly against one, while the second is against a MemoryStream-wrapped byte buffer. So, the full content is always buffered anyway. Unless/until we try to use XmpCore against an actual stream, this should be possible.

It should still be possible to detect the closing tag in a streaming scenario without buffering the full content.

... unless you use the stream versions of XmpMetaFactory. XmpCore will use an XmlReader underneath and we'll tend to lose control of the parsing - which is actually what's going on with these errors anyway. In that case, some work would need to be done in XmpCore for this to work assuming it's possible.

Any news on this issue?