szTheory / exifcleaner

Cross-platform desktop GUI app to clean image metadata

Home Page:https://exifcleaner.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Remove extended filesystem attributes on Mac/Linux (xattr/mdls) such as Source/Quelle

tayfuuun opened this issue · comments

Hello,

a screenshot from the PDF file before I use your tool:
image

a screenshot after I use your tool:
image

Everything is removed only the Source is left. Can you check if its possible to remove the source of the PDF file?

Thanks, good catch! I'll check it out. In the meantime if you could possibly find the exiftool command line options that take remove it, it will be even easier for me to modify ExifCleaner to use those options automatically. The easiest way to do this would be to run the exiftool command and verify that it removed the Source/Quelle field in your sample PDF. That will help me get this feature ready faster. If not it's OK I can figure it out. It just might take me a bit longer.

@szTheory sorry no time for this one.
Good luck and thank you.

Thanks, good catch! I'll check it out. In the meantime if you could possibly find the exiftool command line options that take remove it, it will be even easier for me to modify ExifCleaner to use those options automatically. The easiest way to do this would be to run the exiftool command and verify that it removed the Source/Quelle field in your sample PDF. That will help me get this feature ready faster. If not it's OK I can figure it out. It just might take me a bit longer.

@szTheory can you please check the source for the other file formats too? JPG, PNG.

@szTheory any update?

Sorry no, can you provide some generic example documents and images with a Source/Quelle that is not being erased? I also recently released a new version of ExifCleaner. It's a long shot but maybe you can download that and see if it helps, since I did fix a few bugs.

@szTheory files and tests with the latest version 3.4.0

[Edit: files removed]

Results

Also with the newest version the source is not removed from PDF and PNG files.

Interesting, even if I run exiftool directly on those files even with the -v verbose flag it's not picking up the Source/Quelle metadata, but when I tested it on a Mac it shows the field in the file info window for both the PDF and the PNG. I'll have to look into this more.

If you run mdls myfile.png it shows what looks like some Mac-specific metadata like kMDItemProfileName and kMDItemWhereFroms that exiftool is not picking up on. Will have to see how to add support if it's built into exiftool already and just need some different command line flags, or if ExifCleaner has to bolt on extra functionality. In the meantime you can remove the metadata with xattr -c myfilehere.pdf (the -c flag means clear) and confirm afterwards by running mdls again on the file. See this link for more info.

@szTheory xattr -c myfilehere.pdf working! Nice. When you implement this in your tool, you would make me very happy.

Current plan for Mac

  • spin up an extra mdls process to read extended filesystem attributes in the "# exif before" column, then another one with the -c flag to clear them, then another one to populate the "# exif after" column.
  • If possible figure out a way to keep the mdls process alive in a process pool and keep them alive to process multiple files to minimize process overhead, like is done with exiftool.
  • Or pass multiple files at once to a single process per-CPU core.
  • Investigate if there are any extended filesystem attributes that mdls -c still leaves behind and how to deal with them.

Current plan for Linux

  • research the extended filesystem attributes more. There is probably variation between the Linux filesystems.
  • If possible find a single tool that deals with all the Linux file systems uniformly

Current plan for Windows

  • Find an existing command tool, perhaps C/C++ or Powershell that cleans Windows extended filesystem attrs

Current plan for all OS targets

  • Extract the extended filesystem attribute cleanup into a single NPM package, or C/C++ tool with Node CAPI extension.

Without help this is likely going to take more than a year of low time comittment work. If someone provides a drop-in solution like there is with exiftool then it will go faster.

@szTheory one small update
the commend xattr -c myfilehere.pdf working fine for images (.png, .jpg), but when I use it for PDF files, not every information is deleted on a macOS.

image

Thanks yeah I noticed that too. I'm not sure what to do about it. Maybe it's a bug in xattr. I couldn't find any guide that mentions this failing. Everything just recommended to use xattr which clearly is not doing enough, even after I played around with all of its command line options.

I don't know enough about these filesystems to find a comprehensive solution, so hopefully someone can recommend a starting point. Ideally there would be one tool that gets rid of all the extended filesystem attributes. Better yet, one that works for all filesystem types, across all the major operating systems. Then that tool could be vetted and integrated with ExifCleaner for a single drag and drop that gets rid of everything, instead of being a patchwork process that depends on your environment.