Syncing .git folder
Bessonov opened this issue · comments
Hey @stephenh, your project seems very promising to use it with my development docker image to sync files between local and remote environment.
I've read Git Usage Note. But I think this isn't entirely true. AFAIK git doesn't have repo state outside of .git
folder. It seems to work great on small test git repo. Everything is in-sync: files, stash, .git/config
, current branch etc. But on a little bit larger repository it get out of sync. I'm not familiar with inotify
and mirror
, but I think there is some bug, if multiple files are exchanged in the way how git do it. I don't think this is git-only issue. Maybe some events are missing? What do you think about it?
After some experiments I can reproduce it on a small repo. It miss some sort of delete folder event:
Start server in server
folder:
docker run --rm --init -it -u $(id -u):$(id -g) -v $(pwd):/data -p 49172:49172 quay.io/stephenh/mirror:1.3.3 server
Start client in client
folder:
docker run --rm --init -it --network host -u $(id -u):$(id -g) -v $(pwd):/data quay.io/stephenh/mirror:1.3.3 client --debug-all --include '**' --local-root /data --remote-root /data --host localhost
Go to the client
folder and run:
mkdir missing
cd missing/
git init
touch file
git add file
git commit -m 'init'
git checkout -b test
mkdir folder
touch removes folder/stays
git add removes folder/stays
git commit -m 'add file'
Everything is fine on remote and local system:
$ tree
.
├── file
├── folder
│ └── stays
└── removes
Now run git checkout master
on local or remote - removes
file get removed, but folder/stays
stays`:
$ tree
.
└── file
0 directories, 1 file
and
$ tree
.
├── file
└── folder
└── stays
1 directory, 2 files
Client log:
2020-01-04 22:56:40 INFO Queueing: path: "missing/.git/HEAD" modTime: 1578178600387 local: true
2020-01-04 22:56:40 INFO Queueing: path: "missing/.git/logs/HEAD" modTime: 1578178600387 local: true
2020-01-04 22:56:40 INFO Queueing: path: "missing/.git/index" modTime: 1578178600359 local: true
2020-01-04 22:56:40 INFO Queueing: path: "missing/.git" modTime: 1578178600387 local: true directory: true executable: true
2020-01-04 22:56:40 INFO Queueing: path: "missing/.git/index.lock" delete: true local: true
2020-01-04 22:56:40 INFO Queueing: path: "missing/folder" delete: true local: true directory: true executable: true
2020-01-04 22:56:40 INFO Queueing: path: "missing" modTime: 1578178600359 local: true directory: true executable: true
2020-01-04 22:56:40 INFO Queueing: path: "missing/folder/stays" delete: true local: true
2020-01-04 22:56:40 INFO Queueing: path: "missing/removes" delete: true local: true
2020-01-04 22:56:40 INFO missing/.git/HEAD isLocalNewer
2020-01-04 22:56:40 INFO l: modTime: 1578178600387 local: true
2020-01-04 22:56:40 INFO r: modTime: 1578178583227 local: true
2020-01-04 22:56:40 INFO Sending missing/.../HEAD
2020-01-04 22:56:42 INFO missing/removes isLocalNewer
2020-01-04 22:56:42 INFO l: modTime: 1578178584227 delete: true local: true
2020-01-04 22:56:42 INFO r: modTime: 1578178583227 local: true
2020-01-04 22:56:42 INFO missing/.git/index isLocalNewer
2020-01-04 22:56:42 INFO l: modTime: 1578178600359 local: true
2020-01-04 22:56:42 INFO r: modTime: 1578178583227 local: true
2020-01-04 22:56:42 INFO missing/.git/logs/HEAD isLocalNewer
2020-01-04 22:56:42 INFO l: modTime: 1578178600387 local: true
2020-01-04 22:56:42 INFO r: modTime: 1578178583227 local: true
2020-01-04 22:56:42 INFO Sending (delete) missing/removes
2020-01-04 22:56:42 INFO Sending missing/.../index
2020-01-04 22:56:42 INFO Sending missing/.../HEAD
Server log:
2020-01-04 22:56:40 INFO Remote update missing/.../HEAD
2020-01-04 22:56:42 INFO Remote delete missing/removes
2020-01-04 22:56:42 INFO Remote update missing/.../index
2020-01-04 22:56:42 INFO Remote update missing/.../HEAD
As you can see, deletion of stays
and folder
is queued, but not sent. But sometimes I see even:
2020-01-04 23:00:23 INFO Sending (delete) missing/removes
2020-01-04 23:00:23 INFO Sending (delete) missing/folder
And sometimes it even works.
Discovered, that it works one time, if you are in test
branch, then start server and then client. If you switch to master
, then everything is fine. But if again to test
and then to master
- it doesn't work.
Log of first time switch:
Client:
2020-01-04 23:13:25 INFO Queueing: path: "missing/.git/HEAD" modTime: 1578179605199 local: true
2020-01-04 23:13:25 INFO Queueing: path: "missing/.git/logs/HEAD" modTime: 1578179605199 local: true
2020-01-04 23:13:25 INFO Queueing: path: "missing/.git/HEAD.lock" delete: true local: true
2020-01-04 23:13:25 INFO Queueing: path: "missing/.git/index" modTime: 1578179605167 local: true
2020-01-04 23:13:25 INFO Queueing: path: "missing/.git" modTime: 1578179605199 local: true directory: true executable: true
2020-01-04 23:13:25 INFO Queueing: path: "missing/.git/index.lock" delete: true local: true
2020-01-04 23:13:25 INFO Queueing: path: "missing/removes" delete: true local: true
2020-01-04 23:13:25 INFO Queueing: path: "missing/folder" delete: true local: true directory: true executable: true
2020-01-04 23:13:25 INFO Queueing: path: "missing/folder/stays" delete: true local: true
2020-01-04 23:13:25 INFO Queueing: path: "missing" modTime: 1578179605167 local: true directory: true executable: true
2020-01-04 23:13:25 INFO missing/.git/HEAD isLocalNewer
2020-01-04 23:13:25 INFO l: modTime: 1578179605199 local: true
2020-01-04 23:13:25 INFO r: modTime: 1578178871683 data: "initialSyncMarker" local: true
2020-01-04 23:13:25 INFO Sending missing/.../HEAD
2020-01-04 23:13:27 INFO missing/folder isLocalNewer
2020-01-04 23:13:27 INFO l: modTime: 1578178872655 delete: true local: true directory: true executable: true
2020-01-04 23:13:27 INFO r: modTime: 1578178809171 data: "initialSyncMarker" local: true directory: true executable: true
2020-01-04 23:13:27 INFO missing/removes isLocalNewer
2020-01-04 23:13:27 INFO l: modTime: 1578178872655 delete: true local: true
2020-01-04 23:13:27 INFO r: modTime: 1578178871655 data: "initialSyncMarker" local: true
2020-01-04 23:13:27 INFO missing/.git/index isLocalNewer
2020-01-04 23:13:27 INFO l: modTime: 1578179605167 local: true
2020-01-04 23:13:27 INFO r: modTime: 1578178871655 data: "initialSyncMarker" local: true
2020-01-04 23:13:27 INFO missing/.git/logs/HEAD isLocalNewer
2020-01-04 23:13:27 INFO l: modTime: 1578179605199 local: true
2020-01-04 23:13:27 INFO r: modTime: 1578178871683 data: "initialSyncMarker" local: true
2020-01-04 23:13:27 INFO Sending (delete) missing/folder
2020-01-04 23:13:27 INFO Sending (delete) missing/removes
2020-01-04 23:13:27 INFO Sending missing/.../index
2020-01-04 23:13:27 INFO Sending missing/.../HEAD
Server:
2020-01-04 23:13:25 INFO Remote update missing/.../HEAD
2020-01-04 23:13:27 INFO Remote delete missing/folder
2020-01-04 23:13:27 INFO Remote delete missing/removes
2020-01-04 23:13:27 INFO Remote update missing/.../index
2020-01-04 23:13:27 INFO Remote update missing/.../HEAD
Hm, there are some troubles with directories.
Go to client
and just create a folder, delete it and then create with the same name again:
mkdir newfolder
$ tree ~/client/ ~/remote/
/home/user/client/
└── newfolder
/home/user/remote/
└── newfolder
2 directories, 0 files
rmdir newfolder/
$ tree ~/client/ ~/remote/
/home/user/client/
/home/user/remote/
0 directories, 0 files
mkdir newfolder
$ tree ~/client/ ~/remote/
/home/user/client/
└── newfolder
/home/user/remote/
1 directory, 0 files
touch newfolder/test
$ tree ~/client/ ~/remote/
/home/user/client/
└── newfolder
└── test
/home/user/remote/
└── newfolder
└── test
2 directories, 2 files
rm -rf newfolder/
$ tree ~/client/ ~/remote/
/home/user/client/
/home/user/remote/
└── newfolder
└── test
1 directory, 1 file
Good find on the directories-not-recreated bug. I probably hadn't noticed before b/c, similar to git, I doubt I'd noticed directories not showing up before they had files in them anyway.
I pushed out a fix in 1.3.4.
Let me know if you have similar easy/great repros as your mkdir/rmdir/mkdir
example.
For your larger point around syncing the .git
directory, yes, you are right that, contrary to the FAQ entry, I think it could work / probably does work in some (most?) scenarios.
My concern with recommending it (and using it myself) is I just don't trust different versions of git, and even more so different versions across different platforms (i.e. git on mac and git linux), to guarantee to use exactly the same backwards/forwards compatible format for every file in the .git
directory.
That said, knowing a little bit about git's approach to text files and overall .git
organization., I'm not surprised it works, I just didn't want to take responsibility for a) verifying which combinations of git versions x OS platforms did/did not work and b) having people get annoyed when it magically didn't work and somehow corrupted their .git
directory on either the client side or remote side.
All that said, I'm happy to update the readme/disclaimer/etc. to note that syncing the .git
dir can actually work if you're able to prove out the workflow (and now I'm pretty tempted to try it myself, especially since I use the exact same linux/git/etc. on both sides of my client/remote :-) ).
Let me know how it goes with the 1.3.4 release.
@stephenh great, thank you for you fast response and fix! It would be nice, if you could release a new docker image.
The issue with recreating directory is indeed fixed. But the example of branch switch works still only partially. If I just switch between branches test
and master
, it works every odd
time and doesn't work every even
time :)
Same behavior with just
mkdir test && touch test/test
and
rm -rf test/
Same behavior with just
Well, jeez, I've been using mirror almost daily for years and never noticed this. Thanks for another great repro. I fiddled with how deletions are handled (they're a little different b/c they don't have mod times).
Pushed out 1.3.6. I'm not necessarily confident that your .git
workflow will work, but I think we should be closer. See how it goes?
I'm working on the docker build...a contributor setup the quay build so I'm kind of learning/re-learning how that works. I think I'd accidentally revoked its github access, but now it's back, and failing on some misc things...
Unfortunately, the directory isn't deleted now:
tree ~/client/ ~/remote/ && rmdir test/ && sleep 2 && tree ~/client/ ~/remote/
/home/user/client/
├── missing
│ └── file
└── test
/home/user/remote/
├── missing
│ └── file
└── test
4 directories, 2 files
/home/user/client/
├── missing
│ └── file
└── test
/home/user/remote/
├── missing
│ └── file
└── test
4 directories, 2 files
It seems fine for me?
$ mkdir remote/test && sleep 2 && tree && rmdir remote/test && sleep 2 && tree
.
├── client
│ └── test
└── remote
└── test
4 directories, 0 files
.
├── client
└── remote
2 directories, 0 files
Also using mkdir -p
and nested dirs seems to work:
$ mkdir -p remote/test/test && sleep 2 && tree && rm -fr remote/test && sleep 2 && tree
.
├── client
│ └── test
│ └── test
└── remote
└── test
└── test
6 directories, 0 files
.
├── client
└── remote
2 directories, 0 files
(Nice idea on the mkdir/sleep/tree-based repros.)
Strange, I can reproduce it locally. I'll try with original image.
BTW, appropriate image tag is missing.
BTW, appropriate image tag is missing.
For some reason quay wasn't picking up new tags, but I just added a new 1.3.8 version tag, and now it's building that ... so not sure what I missed before. Thanks for mentioning it; once that build is finished AFAICT both new-release-tag and latest
label docker builds will be working.
Strange, I can reproduce it locally
Locally as in without docker, or with docker? I haven't been using docker so far to reproduce/fix the issues you've found. I see you gave the full server/client docker commands you had above, so I'll try that later today if I get a chance.
once that build is finished AFAICT both new-release-tag and latest label docker builds will be working.
Thanks!
Locally as in without docker, or with docker?
With docker with client and server on the same host. Maybe I didn't update something, but with latest
I was not able to reproduce the issue 👍
But I had two times desynchronized state, for which I'm not able to find the steps to reproduce even with some stress tests (while true: steps
).
Because the original issue is resolved, I close this ticket.
Thanks you very much for the great support!
Np, thanks for following up!
@Bessonov fwiw you've tempted me into trying this workflow out.
I always figured "there be dragons" with the sort of high-rate-of-churn stuff git would do in the .git/
directory, even from a simple git pull
or git fetch
command (i.e. mirror's approach is admittedly timestamp / semi-heuristically based and not "we're keeping a change data capture log of both sides file system events and merging them with some sort of proven-to-be-always-right operational transform / CRDT data structure".
That said, it does seem to be working. :-) I'll at least be more likely to flush out bugs by using it personally.
Also, I did observe a bug that happened often enough it should be reproducible where deleting a directory with N levels of files would not fully delete the directory, because deleting files on the remote side causes the parent directory modtimes to get bumped, so the remote side thinks it "bumped-b/c-I-just-did-a-delete" directories are newer than the local side's timestamps. It's not a huge deal because it shouldn't affect files, just directories, but it's still annoying. I'll file a bug and poke at it at some point.
@stephenh thanks to let me know!
Also, I did observe a bug that happened often enough it should be reproducible where deleting a directory with N levels of files would not fully delete the directory, because deleting files on the remote side causes the parent directory modtimes to get bumped, so the remote side thinks it "bumped-b/c-I-just-did-a-delete" directories are newer than the local side's timestamps.
Well, it sounds like the bug I've described above, but was not able to reproduce.