drewnoakes / metadata-extractor-dotnet

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Keyword extraction for PNG files?

Webreaper opened this issue · comments

I'm trying to pull the keywords from the attached image. If I run Exiftool, the keywords show up correctly (e.g., 'tree' and 'autumn'). But reading it using the .Net Metadata-Extractor doesn't seem to load the keywords; they're not in the IptcDirectory, nor anywhere that I can find. I've trawled through the entire structure and tags in the debugger and can't see any keyword tags anywhere. What am I missing? Are PNG keywords supported?

tree

Looks like a bug in how we process IPTC data in PNG files. I won't have time to look at it for a while. Would you like to investigate and submit a pull request if you find a fix?

Might be able to, although wouldn't have the first clue where to start....

Hey. I’m only a user, so cannot help much! These keywords (to an untrained eye):

  • are not read by Adobe or Apple Preview
  • all three appear under one heading in exiftool, while new ones are added one-per-heading (see pt. 1 below)
  • when added to in exiftool, only the added ones are visible in Adobe/Apple Preview (see pt. 2 below)
  • even the added ones are not visible to metadata-extractor (may be that our implementation is to blame, I wouldn’t know!)
  • xmp.dc:subject equivalent ones added by the MWG option by exiftool are visible but not comntain the original ones; the fact that exiftool won’t copy the original ones to dc:subjects may provide a pointer (?) – pt. 3 below
  1. Original image (exiftool -v2):
PNG zTXt (113 bytes):
  + [Photoshop directory, 40 bytes]
  | IPTCData (SubDirectory) -->
  | - Tag 0x0404 (28 bytes)
  | + [IPTC directory, 28 bytes]
  | | -- IPTCApplication record --
  | | Keywords = Autumn, tree, le
  | | - Tag 0x0019, IPTCApplication record (16 bytes, string[0,64])
  1. Added two keywords using exiftool -use MWG -MWG:Keywords+="":
PNG zTXt (132 bytes):
  + [Photoshop directory, 64 bytes]
  | IPTCData (SubDirectory) -->
  | - Tag 0x0404 (52 bytes)
  | + [IPTC directory, 52 bytes]
  | | -- IPTCApplication record --
  | | Keywords = Autumn, tree, le
  | | - Tag 0x0019, IPTCApplication record (16 bytes, string[0,64])
  | | ApplicationRecordVersion = 4
  | | - Tag 0x0000, IPTCApplication record (2 bytes, int16u)
  | | Keywords = atari
  | | - Tag 0x0019, IPTCApplication record (5 bytes, string[0,64])
  | | Keywords = commodore
  | | - Tag 0x0019, IPTCApplication record (9 bytes, string[0,64])
  1. xmp.dc:subjects array after adding new ones in pt. 2:
PNG iTXt (905 bytes):
  + [XMP directory, 883 bytes]
  | XMPToolkit = Image::ExifTool 12.08
  | Subject = atari
  | - Tag 'x:xmpmeta/rdf:RDF/rdf:Description/dc:subject/rdf:Bag/rdf:li 10'
  | Subject = commodore
  | - Tag 'x:xmpmeta/rdf:RDF/rdf:Description/dc:subject/rdf:Bag/rdf:li 11'

Thanks @paperboyo that's really helpful. This does seem like quite an exotic way of storing keywords in a PNG file, but clearly the data is in there and readable, so I'd like for MetadataExtractor to be able to pull it out.

Might be able to, although wouldn't have the first clue where to start....

I'm happy to help where I can. At a high level:

  • The PNG file is arranged into chunks. Each chunk has a type.
  • Your file has a zTXt chunk which currently produces this output:
    [PNG-zTXt - 0x000d] Textual Data = Raw profile type iptc: 
    IPTC profile
          40
    3842494d040400000000001c1c02190010417574756d6e2c20747265652c206c651c0200
    00020004
    
  • The PNG handling code is supposed to handle that output in some way, but is failing to do so.

Currently the tree of directories for your file looks like this:

- PNG-IHDR
- PNG-iCCP
    - ICC Profile
- Exif IFD0
    - Exif SubIFD
- PNG-pHYs
- XMP
- PNG-zTXt
- File Type
- File

I would expect to see in there something like:

- PNG-zTXt
    - IPTC

In other words, the PNG-zTXt directory produces a child IPTC directory.

There is no error message so either we don't attempt this at all, or it's attempting but deciding it cannot proceed for some reason.

You can stick a breakpoint here:

else if (keyword == "Raw profile type iptc")
{
if (TryProcessRawProfile(out int byteCount))
{
yield return new IptcReader().Extract(new SequentialByteArrayReader(textBytes), byteCount);
}
else
{
yield return ReadTextDirectory(keyword, textBytes, chunkType);
}
}

If that doesn't get hit for some reason, try here:

else if (chunkType == PngChunkType.zTXt)
{
var reader = new SequentialByteArrayReader(bytes);
var keyword = reader.GetNullTerminatedStringValue(maxLengthBytes: 79).ToString(_latin1Encoding);
var compressionMethod = reader.GetSByte();
var bytesLeft = bytes.Length - keyword.Length - 1 - 1 - 1 - 1;
byte[]? textBytes = null;
if (compressionMethod == 0)
{
if (!TryDeflate(bytes, bytesLeft, out textBytes, out string? errorMessage))
{
var directory = new PngDirectory(PngChunkType.zTXt);
directory.AddError($"Exception decompressing {nameof(PngChunkType.zTXt)} chunk with keyword \"{keyword}\": {errorMessage}");
yield return directory;
}
}
else
{
var directory = new PngDirectory(PngChunkType.zTXt);
directory.AddError("Invalid compression method value");
yield return directory;
}
if (textBytes != null)
{
foreach (var directory in ProcessTextChunk(keyword, textBytes))
{
yield return directory;
}
}
}

Thanks!!