zhangqm666 / compress-office

Compress images in docx/pptx/xlsx files using ImageOptim. (MacOS)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Compress Office

English | 中文

A simple tool that use ImageOptim to compress images in docx/pptx/xlsx documents, and can reduce file size.

screenshot

Installing

Clone this repo first:

git clone https://github.com/cometeme/compress-office.git

Then we need:

  1. ImageOptim: https://github.com/ImageOptim/ImageOptim
  2. Python 3.8 (and you need to install rich package via pip)
  3. trash: https://github.com/ali-rantakari/trash
  4. fd (Optional, can optimize searching speed): https://github.com/sharkdp/fd

If you have Homebrew,run these commands to install dependencies (or install manually):

brew install imageoptim python@3.8 trash
pip3.8 install rich

Then start ImageOptim, open "Preferences" menu and modify the settings. You can choose to perform lossless or lossy compression on the pictures in the document. The former can maintain the image quality, while the latter can better reduce size. You can also choose whether to remove EXIF information. EXIF contains informations about the time, gps and camera settings. Removes EXIF in the picture can better protect your privacy.

Usage

Run compress-office.py with files or folders you want to compress. You can pass in multiple paths. If you pass in a directory, the program will traverse in the directory and find files that can be compressed.

python3.8 compress-office.py [path1] [path2] ...

For example:

python3.8 compress-office.py ~/Documents ./test.docx

When program run for the first time, it will create a file called process_history.csv, which records the path of the compressed file and its md5. When running again, if program finds that the file has not changed (file path is in the history and its md5 is the same as recorded), then the program will skip the file without re-compressing, because doing so is not meaningful. If you really need to recompress a file, just delete its record from csv file.

After the compression is complete, the original document will be moved to recycle bin. Please check your documents before empty the recycle bin.

Use fd to speed up searching (optional)

The program uses python's built-in glob for file traversal by default, but it is very slow when there are many files, so the program supports the use of fd to speed up the search.

Enter brew install fd in the console to install fd, then edit compress-office.py and change use_fd = False to use_fd = True to use fd to speed up the search.

Other questions

Q1: ImageOptim can only run on MacOS, can I use this tool on other platforms?

A1: Yes, but you need to find a substitute for ImageOptim first, such as Trimage. Then modify the function.py in this project, and change the cache directory cache_folder and the compression command compress_command to make this program work.

Q2: How does this tool compress the pictures in the document? Will it corrupt my documents?

A2: The docx/pptx/xlsx document is essentially a zip compressed package, in which the resources are packaged together. This program decompresses the documents passed in by user into a cache directory one by one, use ImageOptim to compress all the pictures in the cache directory, recompresses them, and puts new file back.

Therefore, in theory, using this program will not corrupt files when compressing theme, but in order to prevent unpredictable bugs, it is recommended that back up the files before compress theme. At the same time, this program will move the original document to the recycle bin after compression, and if you found a problem, you can restore the original file from the recycle bin.

Q3: Why there are many pictures in the recycle bin after compression?

This is because ImageOptim puts the original pictures in the recycle bin when compressing pictures, and there is no setting to cancel this, so this problem cannot be solved temporarily.

About

Compress images in docx/pptx/xlsx files using ImageOptim. (MacOS)

License:MIT License


Languages

Language:Python 100.0%