otiai10 / copy

Go copy directory recursively

Home Page:https://pkg.go.dev/github.com/otiai10/copy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"no such file or directory" when copying large amounts of data

m90 opened this issue · comments

I'm using this package in a tool for backing up Docker volumes: https://github.com/offen/docker-volume-backup

Users that do not want to stop their containers while taking a backup can opt in to copying their data to a temporary location before creating the tar archive so that creating the archive does not fail in case data is being written to a file while it's being backed up. To perform this copy, package copy is used (thanks for making it public, much appreciated).

This seemed to work well in tests as well as the real world, however recently an issue was raised where copy would fail with the following error when backing up the data volume for a Prometheus container:

open /backup/prometheus_data/01FSM8TPFEXQ0QC28H11PMQZ0R: no such file or directory

The dataset that is being copied seems to be a. very large and b. pretty volatile which has me thinking this file might actually have been deleted/moved before copy finds the resources to actually copy it. This is the downstream issue: offen/docker-volume-backup#49

Is this issue somehow known? Is there a way to fix it by configuring copy differently?

This is the part where I use copy in code and also where the above error is being returned:

if err := copy.Copy(s.c.BackupSources, backupSources, copy.Options{
	PreserveTimes: true,
	PreserveOwner: true,
}); err != nil {
	return fmt.Errorf("takeBackup: error creating snapshot: %w", err)
}

Thank you, @m90
Answering your question quickly, no, it's not known.

Let me clarify that you think there are two possible cause of this problem:

a. Copy failed because the src dir is too large.
b. Copy failed because the src dir does not exist.

Tell me why you think of case a?

My line of thinking (without knowing too much about what copy is actually doing) ist that a. could increase the probability of b. happening as it takes longer to copy over everything, thus increasing the likelihood of another process deleting files and/or directories in the initial set of files to be copied while copy is still working.

Or is there a flaw in that?

For example here:

copy/copy.go

Lines 142 to 166 in 9aae5f7

contents, err := ioutil.ReadDir(srcdir)
if err != nil {
return
}
for _, content := range contents {
cs, cd := filepath.Join(srcdir, content.Name()), filepath.Join(destdir, content.Name())
if err = copyNextOrSkip(cs, cd, content, opt); err != nil {
// If any error, exit immediately
return
}
}
if opt.PreserveTimes {
if err := preserveTimes(info, destdir); err != nil {
return err
}
}
if opt.PreserveOwner {
if err := preserveOwner(srcdir, destdir, info); err != nil {
return err
}
}

we could run into a situation where copyNextOrSkip takes a long time and in the meantime, someone else deletes the next entry in the slice of contents, making the next iteration fail. The likelihood of such a situation happening should increase with the overall amount of files to be copied.

Fair enough. Worth thinking.
Thank you very much.

The core issue is neither size nor time, imo.

That is "should we lock what we wanna copy till it's done?".

Let me think about it to make the best interface for us.

That is "should we lock what we wanna copy till it's done?".

This sums it up perfectly :)

If it's possible to add such an option that would definitely be of much help.

That is "should we lock what we wanna copy till it's done?".

I don't think there is any point in trying to do that from this go module. You will get the same race condition when trying to lock the file because file can get deleted after directory is read. So the only way to do this is to lock the file system before reading directory, and that can only be done either by "locking" the entire filesystem (eg filesystem snapshot, or lvm snapshot), or by pausing the docker container in the docker-volume-backup use case.

I think it would be enough to simply ignore os.IsNotExist(err).

Users that do not want to stop their containers while taking a backup can opt in to copying their data ...

If you don't stop or pause the container before copying you will always risk that files are deleted while you copy.

Thank you, and agree with your idea @ncopa, locking is not what this package should provide.
Let me check your pr. I appreciate your way to separate commits and push test code first.