exiftool / exiftool

ExifTool meta information reader/writer

Home Page:https://exiftool.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Out of memory issue when using json and error output on large set of files

taatuut opened this issue · comments

(I must start by stating that exiftool is a great tool, thanks for that in the first place)

I'm using exiftool 12.37 in Git Bash for Windows on Windows 10 to scan a large network share with millions of files to extract metadata with following command:

exiftool -fast2 -q -q -n -g -ext avi -ext xlsx -json -efile7! /Logs/Geojson/results_error.txt -r "/path/to/folder/with/subfolders" > /Logs/Geojson/results.json

In the actual command more -ext tags are added to filter metadata for few tens of file types.

On the network share there are a lot of files with long path names (>260 characters) being reported as Error: File not found - /path/to/folder/with/subfolders/somefile.ext

When running the command it does write the output to both the error and json file. However, exiftool memory consumption grows over time until it reports Out of memory!.

At that time I have a results file of ~2.08GB and error file of ~15MB (with ~58.000 lines: these do include full path to file but not the error message).

I have not found a way to rewrite the command to avoid memory consumption growing hence I report this as a bug. Happy to hear if the command can be setup in a different way. I found that I cannot use -json with -o because -json output goes to console that is why I use > results.json to save in a file.

The -o option is for output image files. Use -w or -W for formatted text output (eg. json).

The memory and filename problems are unique to Windows, and are not ExifTool bugs per se.

There may be a work-around for the memory problem: Try the alternate Windows package -- it seems to have better memory handling characteristics: https://oliverbetz.de/pages/Artikel/ExifTool-for-Windows

Ah cool. thanks for clearing up my misunderstanding, and pointing to the alternative for Windows, I'll give it a try shortly.

You are right regarding the filename problems, related to Windos (although the feature to support long path names is enabled). I was wondering if it would be possible to also catch the error message in the error log in addition to the file name.

The -efile option captures only the filename if that is what you are talking about. To capture the message you need to pipe stderr to a file.

Thanks again for additional info, and also for pointing to the package from Olivier Betz. I ran the same command with his exe, and this time it worked without issue. Memory goes to ~3.5GB where previously it went over >12GB exhausting available memory. Resulting json file is about 2.7GB. So indeed a different and in this case better way of handling memory. Really cool tool, it replaces and improves some of the code I wrote.