Keyword extraction for PNG files?
Webreaper opened this issue · comments
I'm trying to pull the keywords from the attached image. If I run Exiftool, the keywords show up correctly (e.g., 'tree' and 'autumn'). But reading it using the .Net Metadata-Extractor doesn't seem to load the keywords; they're not in the IptcDirectory
, nor anywhere that I can find. I've trawled through the entire structure and tags in the debugger and can't see any keyword tags anywhere. What am I missing? Are PNG keywords supported?
Looks like a bug in how we process IPTC data in PNG files. I won't have time to look at it for a while. Would you like to investigate and submit a pull request if you find a fix?
Might be able to, although wouldn't have the first clue where to start....
Hey. I’m only a user, so cannot help much! These keywords (to an untrained eye):
- are not read by Adobe or Apple Preview
- all three appear under one heading in exiftool, while new ones are added one-per-heading (see pt. 1 below)
- when added to in exiftool, only the added ones are visible in Adobe/Apple Preview (see pt. 2 below)
- even the added ones are not visible to metadata-extractor (may be that our implementation is to blame, I wouldn’t know!)
xmp.dc:subject
equivalent ones added by theMWG
option by exiftool are visible but not comntain the original ones; the fact that exiftool won’t copy the original ones todc:subjects
may provide a pointer (?) – pt. 3 below
- Original image (
exiftool -v2
):
PNG zTXt (113 bytes):
+ [Photoshop directory, 40 bytes]
| IPTCData (SubDirectory) -->
| - Tag 0x0404 (28 bytes)
| + [IPTC directory, 28 bytes]
| | -- IPTCApplication record --
| | Keywords = Autumn, tree, le
| | - Tag 0x0019, IPTCApplication record (16 bytes, string[0,64])
- Added two keywords using
exiftool -use MWG -MWG:Keywords+=""
:
PNG zTXt (132 bytes):
+ [Photoshop directory, 64 bytes]
| IPTCData (SubDirectory) -->
| - Tag 0x0404 (52 bytes)
| + [IPTC directory, 52 bytes]
| | -- IPTCApplication record --
| | Keywords = Autumn, tree, le
| | - Tag 0x0019, IPTCApplication record (16 bytes, string[0,64])
| | ApplicationRecordVersion = 4
| | - Tag 0x0000, IPTCApplication record (2 bytes, int16u)
| | Keywords = atari
| | - Tag 0x0019, IPTCApplication record (5 bytes, string[0,64])
| | Keywords = commodore
| | - Tag 0x0019, IPTCApplication record (9 bytes, string[0,64])
xmp.dc:subjects
array after adding new ones in pt. 2:
PNG iTXt (905 bytes):
+ [XMP directory, 883 bytes]
| XMPToolkit = Image::ExifTool 12.08
| Subject = atari
| - Tag 'x:xmpmeta/rdf:RDF/rdf:Description/dc:subject/rdf:Bag/rdf:li 10'
| Subject = commodore
| - Tag 'x:xmpmeta/rdf:RDF/rdf:Description/dc:subject/rdf:Bag/rdf:li 11'
Thanks @paperboyo that's really helpful. This does seem like quite an exotic way of storing keywords in a PNG file, but clearly the data is in there and readable, so I'd like for MetadataExtractor to be able to pull it out.
Might be able to, although wouldn't have the first clue where to start....
I'm happy to help where I can. At a high level:
- The PNG file is arranged into chunks. Each chunk has a type.
- Your file has a
zTXt
chunk which currently produces this output:[PNG-zTXt - 0x000d] Textual Data = Raw profile type iptc: IPTC profile 40 3842494d040400000000001c1c02190010417574756d6e2c20747265652c206c651c0200 00020004
- The PNG handling code is supposed to handle that output in some way, but is failing to do so.
Currently the tree of directories for your file looks like this:
- PNG-IHDR
- PNG-iCCP
- ICC Profile
- Exif IFD0
- Exif SubIFD
- PNG-pHYs
- XMP
- PNG-zTXt
- File Type
- File
I would expect to see in there something like:
- PNG-zTXt
- IPTC
In other words, the PNG-zTXt
directory produces a child IPTC
directory.
There is no error message so either we don't attempt this at all, or it's attempting but deciding it cannot proceed for some reason.
You can stick a breakpoint here:
metadata-extractor-dotnet/MetadataExtractor/Formats/Png/PngMetadataReader.cs
Lines 409 to 419 in 427ab46
If that doesn't get hit for some reason, try here:
metadata-extractor-dotnet/MetadataExtractor/Formats/Png/PngMetadataReader.cs
Lines 224 to 255 in 427ab46
Thanks!!