newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

git lfs migrate import doubles the size of remote repository after it was cleaned by git-filter-repo

sadaisystems opened this issue · comments

Description of the issue

So, at the moment I'm trying to clean a repository that we use at work for a project. It's gotten large (1GB), an huge portion of this (now I know that its approximately 600MB) are deleted large files stored in history and other like 400MB are some huge .csvs and html files related to python reports and stuff.

So, my first was goal is to clean the repo's history by removing large deleted files from it.

I managed to do this by utilizing git-filter-repo: tried this in a couple of ways, but my final run was using this guide from gitlab .

Basically, I git cloned the repo with --bare and --mirror and then removed all deleted files and directories from the repo's history.

In total, I managed to removed 600 MB worth of blobs, so now I have 422MB. After that I created a new fresh remote and git push --force all the heads and tags into this new remote.

The second goal was to migrate some report related csvs and htmls to git LFS.

I tried to use migrate info first:

git lfs migrate info --everything --include="analys/**/*.html,*.csv

The output showed that it takes in total approx. 400MB of space. Then I tried to migrate all this in every branch by:

git lfs migrate import --everything --include="analys/**/*.html,*.csv" --verbose

Then I pushed all the heads and tags once again. This resulted in all the files that i wanted now marked as LFS. And my repo now has 400+MB of LFS files. However, the Repository weight itself is now effectively doubled, because all the files are now on LFS, but looks like they are still in the repos history as regular files, so Repository portion in storage is now 422MB and also LFS portion is now 480MB, so 900MB in total.

What's the mistake? I just cannot understand. git lfs migrate is supposed to rewrite the repos history by default, am I correct?

System environment

  • System: WSL v2 2.0.1 Ubuntu
  • Repo is hosted on my work gitlab domain

Output of git lfs env

Endpoint=***
  SSH=***
LocalWorkingDir=
LocalGitDir=**.git
LocalGitStorageDir=**.git
LocalMediaDir=**/git/lfs/objects
LocalReferenceDirs=
TempDir=****
ConcurrentTransfers=3
TusTransfers=false
BasicTransfersOnly=false
SkipDownloadErrors=false
FetchRecentAlways=false
FetchRecentRefsDays=7
FetchRecentCommitsDays=0
FetchRecentRefsIncludeRemotes=true
PruneOffsetDays=3
PruneVerifyRemoteAlways=false
PruneRemoteName=origin
LfsStorageDir=***.git/lfs
AccessDownload=none
AccessUpload=none
DownloadTransfers=basic,lfs-standalone-file
UploadTransfers=basic,lfs-standalone-file
GIT_ASKPASS=*****
GIT_EXEC_PATH=/usr/lib/git-core
VSCODE_GIT_ASKPASS_EXTRA_ARGS=
VSCODE_GIT_ASKPASS_MAIN=*****
VSCODE_GIT_ASKPASS_NODE=****
VSCODE_GIT_IPC_HANDLE=*****
git config filter.lfs.process = "git-lfs filter-process"
git config filter.lfs.smudge = "git-lfs smudge -- %f"
git config filter.lfs.clean = "git-lfs clean -- %f"

Maybe someone is able to help me, what am i doing wrong? Should i utilize git-filter-repo before or after the migration on LFS? Or both?

Or I just messed up the lfs portion of the pipeline?