Use 7zip for compression: ~50% smaller than current one with zip files
crazy25000 opened this issue · comments
Out of curiosity, tested compressing files since this repo hitting storage limit. Used the GBPUSD folder as a test: https://github.com/philipperemy/FX-1-Minute-Data/tree/master/2000-Jun2019/gbpusd
- Extracted contents of each zip file
- Compressed all the CSV and TXT files using 7zip CLI:
7z a -r -mx=9 gbpusd *.csv *.txt
- New zip file: 27mb
- Original folder size: 52mb
There are other compression algorithms/tools that have higher compression ratios, but found them to be unstable or too slow.
@crazy25000 right. But I thought that git already stores the BLOB as compressed if I can recall well.
Hmm....I cloned the repo locally and these are the sizes I see with the git objects and the folder containing most of the data. If the difference getween .git
and 2000-Jun2019
represents the compression, ~400mb, re-compressing all of the files would be worth it. If not, than I tried to help 😛
2.1G ./.git
2.1G ./.git/objects
2.1G ./.git/objects/pack
2.5G ./2000-Jun2019
Yeah it would be pretty worth it, indeed. But then we have to re-write the git history to overwrite all the files.
I haven't done that before and I assume it would be easy to do it through CLI. If you decide to try it or need help with the compression at least, I can help with that. Unzipping and re-compressing didn't take too long, about 2 minutes for gbpusd
@crazy25000 yeah if you can do that that would be helpful. You can clone this repo, and use the Big F*** Gun: https://rtyley.github.io/bfg-repo-cleaner/ to remove the files from history and add yours as ZIP.
You can also use git-filter-branch
. I usually use the filter branch but it's less user friendly.
@philipperemy would you create a clean branch that I can push to? I tried using that tool to create a clean one, but not able to get it into a working state 😅
I've already re-compressed the files and ready to re-upload.
@crazy25000 thanks!! How should I do that?
Oh also people in Finance dont really know what is 7z lol. ZIP for them is probably better..
I'm going to re-upload and open a PR afterwards. I messed up locally :p......it will still be in ZIP format, but smaller size.
alright cool!
Merged into 3fcbe41