jeanbmar / s3-sync-client

AWS CLI s3 sync command for Node.js

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Paths Duplicated in S3 -> S3 Sync

brianlenz opened this issue · comments

It looks like paths get duplicated unexpectedly when doing an S3 -> S3 sync. In this use case, the sync only includes files in sub-folders of the bucket. For some reason, the path that the files are located in are duplicated unexpectedly in the destination key.

It's easiest to represent through code:

const s3Client = new S3Client({ /* ... */ });
const syncClient = new S3SyncClient({ client: s3Client });
const sourcePath = `s3://bucket-a/x/y/z`;
const destPath = `s3://bucket-b/x/y/z`;
await syncClient.sync(sourcePath, destPath, {
  sizeOnly: true,
  del: true,
  relocations: [
    (key) => {
      // the key here has unexpected duplication of paths in it:
      // x/y/z//x/y/z/filename.txt
      // to fix, relocate it by stripping the duplication from the start.
      // new path: x/y/z/filename.txt
      return key.replace('x/y/z//', '');
    },
  ],
});

In this example, in transferring from x/y/z to x/y/z, the destination folder is actually x/y/z//x/y/z (note the double // in the middle of the path, too).

For now, we've worked around it using relocations as shown above, but it seems like this is probably a bug that should be fixed?

Hi @brianlenz, I encountered the same issue and although your relocation approach works, I noticed that my issue was to not have a trailing slash in the sourcePath. Thus, sync selected paths like s3://bucket-a/x/y/zabc and mixed up the relocation.

However, I agree that there seems to be an issue with the relocation when the sourcePath is ending with a fraction of a directory name of a file in the source bucket.