pmgbergen / DarSIA

Darcy scale image analysis toolbox

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Git history needs cleanup

jwboth opened this issue · comments

The git history contains large images althoug removed. This results in large data when pushing even small changes. Currently the size of the commit history is around 100 MB. To fix this, we have to identify the removed files and remove them also from the commit history.

We will have to rewrite the commit history on main, which scares me a bit. But there is no way around it.

We could try using this tool: https://github.com/newren/git-filter-repo/

Oh.. I did not know.. What is your plan? I can look at it this weekend if you want.

I tried to have a look already but I do not feel confident enough so far. If you want to give it a go, feel free to do so. The strategy will be to identify the commits that are large, and simply remove them. This will make it impossible to get back to that stage of the code, which uses the changes performed in this commits. But I do not care. These files have to go anyhow. 100 MB is too much.

I can have a look tomorrow!

I spent some time now and I think that there are some "easy" things that one could do. One is to delete the entire history, but I think that is probably way to much. The other is to use this tool: https://rtyley.github.io/bfg-repo-cleaner/#usage.
It seems very easy to use, and one can specify that it should remove all files that are lager than a certain size from history, which I guess is what we want. We could also use the tool that you suggested above, but it seems quite a lot more poweful (and I was a bit confused as to how I should apply it)

I would be against deleting the entire history.

The second option sounds good. I assume we may have to take special care of the jupyter notebooks though. Wehn starting with deleting commits including files that are larger than 5MB should be rather safe though. I expect this will only affect image data.

One could maybe make a test run on a dummy repo?

I tried now, and by removing files larger than 5MB the size of the repo is reduced to 30MB. I do not dare to push this though (and I did not mirror anything in fear of making horrible mistakes...). Perhaps we can talk briefly tonight about it, and do it "together". The software was at least very easy to use, and it gave a text-file with the names of the files that it dropped.

It occured to me that while we do this, we should probably at the same time introduce the develop branch, and its associated workflow.

I like the idea very much!