Out of memory issue when using json and error output on large set of files
taatuut opened this issue · comments
(I must start by stating that exiftool is a great tool, thanks for that in the first place)
I'm using exiftool 12.37 in Git Bash for Windows on Windows 10 to scan a large network share with millions of files to extract metadata with following command:
exiftool -fast2 -q -q -n -g -ext avi -ext xlsx -json -efile7! /Logs/Geojson/results_error.txt -r "/path/to/folder/with/subfolders" > /Logs/Geojson/results.json
In the actual command more -ext
tags are added to filter metadata for few tens of file types.
On the network share there are a lot of files with long path names (>260 characters) being reported as Error: File not found - /path/to/folder/with/subfolders/somefile.ext
When running the command it does write the output to both the error and json file. However, exiftool memory consumption grows over time until it reports Out of memory!
.
At that time I have a results file of ~2.08GB and error file of ~15MB (with ~58.000 lines: these do include full path to file but not the error message).
I have not found a way to rewrite the command to avoid memory consumption growing hence I report this as a bug. Happy to hear if the command can be setup in a different way. I found that I cannot use -json with -o because -json output goes to console that is why I use > results.json
to save in a file.
The -o option is for output image files. Use -w or -W for formatted text output (eg. json).
The memory and filename problems are unique to Windows, and are not ExifTool bugs per se.
There may be a work-around for the memory problem: Try the alternate Windows package -- it seems to have better memory handling characteristics: https://oliverbetz.de/pages/Artikel/ExifTool-for-Windows
Ah cool. thanks for clearing up my misunderstanding, and pointing to the alternative for Windows, I'll give it a try shortly.
You are right regarding the filename problems, related to Windos (although the feature to support long path names is enabled). I was wondering if it would be possible to also catch the error message in the error log in addition to the file name.
The -efile option captures only the filename if that is what you are talking about. To capture the message you need to pipe stderr to a file.
Thanks again for additional info, and also for pointing to the package from Olivier Betz. I ran the same command with his exe, and this time it worked without issue. Memory goes to ~3.5GB where previously it went over >12GB exhausting available memory. Resulting json file is about 2.7GB. So indeed a different and in this case better way of handling memory. Really cool tool, it replaces and improves some of the code I wrote.