philipperemy / FX-1-Minute-Data

HISTDATA - Dataset composed of all FX trading pairs / Crude Oil / Stock Indexes. Simple API to retrieve 1 Minute data (and tick data) Historical FX Prices (up to date).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use 7zip for compression: ~50% smaller than current one with zip files

crazy25000 opened this issue · comments

Out of curiosity, tested compressing files since this repo hitting storage limit. Used the GBPUSD folder as a test: https://github.com/philipperemy/FX-1-Minute-Data/tree/master/2000-Jun2019/gbpusd

  1. Extracted contents of each zip file
  2. Compressed all the CSV and TXT files using 7zip CLI: 7z a -r -mx=9 gbpusd *.csv *.txt
  3. New zip file: 27mb
  4. Original folder size: 52mb

There are other compression algorithms/tools that have higher compression ratios, but found them to be unstable or too slow.

@crazy25000 right. But I thought that git already stores the BLOB as compressed if I can recall well.

Hmm....I cloned the repo locally and these are the sizes I see with the git objects and the folder containing most of the data. If the difference getween .git and 2000-Jun2019 represents the compression, ~400mb, re-compressing all of the files would be worth it. If not, than I tried to help 😛

2.1G	./.git
2.1G	./.git/objects
2.1G	./.git/objects/pack
2.5G	./2000-Jun2019

Yeah it would be pretty worth it, indeed. But then we have to re-write the git history to overwrite all the files.

I haven't done that before and I assume it would be easy to do it through CLI. If you decide to try it or need help with the compression at least, I can help with that. Unzipping and re-compressing didn't take too long, about 2 minutes for gbpusd

@crazy25000 yeah if you can do that that would be helpful. You can clone this repo, and use the Big F*** Gun: https://rtyley.github.io/bfg-repo-cleaner/ to remove the files from history and add yours as ZIP.

You can also use git-filter-branch. I usually use the filter branch but it's less user friendly.

@philipperemy would you create a clean branch that I can push to? I tried using that tool to create a clean one, but not able to get it into a working state 😅

I've already re-compressed the files and ready to re-upload.

@crazy25000 thanks!! How should I do that?

Oh also people in Finance dont really know what is 7z lol. ZIP for them is probably better..

I'm going to re-upload and open a PR afterwards. I messed up locally :p......it will still be in ZIP format, but smaller size.

alright cool!

Merged into 3fcbe41