Webreaper / Damselfly

Damselfly is a server-based Photograph Management app. The goal of Damselfly is to index an extremely large collection of images, and allow easy search and retrieval of those images, using metadata such as the IPTC keyword tags, as well as the folder and file names. Damselfly includes support for object/face detection.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Seems not support non-Latin keyword.

02015678 opened this issue · comments

Thanks for this great open source project. I've be finding alternative of Picasa for years, and finnaly found Damselfly.
Below are some feedback

  1. Adding ASCII keyword cannot displayed real-time. I think Damselfly can actually write EXIF/IPTC background, but on front-end web interface, I think should displayed newly added keyword immediately.
  2. It seems that Damselfly does not support non-Latin keyword. I checked the log, if write ASCII keyword then it'fine. But if write non-Latin unicode keyword, then the status turns into reconnecting... and after connection established, and check log, and the keyword is not even in the log. It seems that Damselfly just crash for non-Latin unicode keyword.

Thanks for the feedback. Do you have an example of the keyword you were adding, so I can do some testing? :)

In terms of the "not adding keywords in real-time", that's due to the way the app is designed. You can read more about the process here: https://github.com/Webreaper/Damselfly/blob/master/docs/Technical.md#how-does-damselfly-manage-exif-data - in reality I find it only takes 30s or so before the updated tags are displayed once the EXIF data has been written.

Thank you for your reply.

  1. example. you can try damselfly (the insect) in other languages. e.g. 豆娘 in Chinese.
  2. Either way is okay. Though I personally think Picasa do a better way. It has a database itself, so face tag or keyword tag operation are "real-time" for front-end user interface. While back-end procedure can take time to take effect later in background.

Okay, I had a look at this, using a test image. So the first problem is that due to all sorts of inconsistencies with types of quotes and backslashes, which are then incompatible with EXIF keywords, I sanitise the text to remove unicode sequences.
However, even if I remove that, it seems that unicode characters aren't supported trivially in EXIF keywords.

If I find a random image, and run this to add the Chinese characters for 'Damselfly' to the image, using this command:

exiftool  -keywords-="豆娘"  -keywords+="豆娘"  -overwrite_original  
                       -m  "/Users/markotway/Pictures/RHS Chelsea 2021 - Monday (wide) 20-Sep-2021/P9200198.JPG"

and then run exiftool on the image to see the output, I get this (truncated):

Keywords                        : ??

It's possible the keyword may be encoded correctly, but my terminal isn't displaying it right. What do you see in that case?

For now, I don't think it's going to support it properly. Looking at the FAQ, it seems like it might be possible, but I'd need to manage the character set explicitly for particular users.

At the moment, I probably don't have time to work through all the use-cases and figure out a good solution to this (I have more important priorities, such as getting a working Face-recognition solution after MSFT blocked Azure Face).

So I'm sorry, but I can't help with this right now. But I'll leave the issue in place in case I can come up with a better solution in future.

  1. Ensure your terminal (and remote shell software, if any) use UTF-8. This should be default on Mac & Linux, but not on Windows. Check this out.
  2. Ensure the text is encoded in UTF-8, for example, copy "豆娘" on this web page, should be UTF-8.
  3. Use below command to add UTF-8 IPTC keyword:
    exiftool -charset IPTC=UTF8 -iptc:keywords="豆娘" -overwrite_original -m DAMSELFLY.JPG
  4. Use below command to extract UTF-8 IPTC keyword:
    exiftool -charset IPTC=UTF8 DAMSELFLY.JPG
  5. I've tried, using syntax above, the added keyword can be corrected read by Damselfly. So actually the work can be narrow down to, exchange
    exiftool -keywords="豆娘" -overwrite_original -m DAMSELFLY.JPG
    to
    exiftool -charset IPTC=UTF8 -iptc:keywords="豆娘" -overwrite_original -m DAMSELFLY.JPG
  6. IPTC reserves a dedicated property bag to explicitly specify text encoding, and UTF-8 is now the recommended one by IPTC. As for XMP, UTF-8 is a must, no other choice. EXIF has some historical issue, like ZIP archive, does not take that into consideration, this lead to language/locale/encoding headache. Check this out.

For your reference.

Thanks - so to confirm, if you manually add the unicode keyword, Damselfly renders it correctly in the GUI? If so that narrows the problem and I may be able to look at allowing it to pass through to exiftool. :)

Great to hear that the problem might get simple. I further search Exiftool for guideline of where to store keywords, link.
This webpage says that EXIF is officially limited to ASCII. IPTC can explicitly specify charset. XMP assumes UTF8, and not only have a lot of official fields, but also a bunch of fields defined by 3rd party vendors.

I pick Chinese (typical east asia language, left-aligned, two-char width) and Arabic (typical middle east language, right-aligned, complex ligature rule) for example.
Keyword A Damsefly: use Chinese "豆娘"
Keyword B Insect: use Arabic "حشرة"

We can use below command to write tags:
exiftool -charset IPTC=UTF8 -iptc:CodedCharacterSet=UTF8 -iptc:keywords+="حشرة" -iptc:keywords+="豆娘" -XMP-dc:Subject+="حشرة" -XMP-dc:Subject+="豆娘" -overwrite_original -m EXIFTOOL.jpg

Note that -charset IPTC=UTF8 only tell ExifTool write tags using UTF8, but not to tell other tools later parse the picture. In addition, -iptc:CodedCharacterSet=UTF8 is required to write an explicit charset declaration in IPTC. In case imported photo already has iptc:CodedCharacterSet defined other than UTF8 value, charset transform should be carried by using ExifTool -L.

Use Picasa and Windows to write these two tags into image, and use ExifTool GUI to investigate.

  1. Picasa write IPTC:Keywords and XMP-dc:Subject.
  2. Windows write EXIF:XPKeywords (using UTF8), XMP-dc:Subject and XMP-microsoft:LastKeywordXMP.

For comparison, I upload three photos, each adding above-mentioned same two-tags, to NAS. And then open Damselfly to check. Photos edited by Picasa and ExifTool can be recognized by Damselfly, while photo edited by Windows (alone) can NOT. I guess currently Damselfly only read IPTC keywords.

Therefore, for best compatibility with other Software. At least IPTC:Keywords and XMP-dc:Subject should be writtern.
EXIF can also be written using UTF8. (For latin, UTF-8 is equal to ASCII. For other languages, though UTF8 is not officially supported, but might be the best practice.) While XMP-microsoft and/or other 3rd party XMP field should kept untouched.

Direct uploaded images might be altered by Github, and metadata might be lost. Below is sample in ZIP archive.
EXIFTOOL.zip