s3fs-fuse / s3fs-fuse

FUSE-based file system backed by Amazon S3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Readdir fails for SSE-C and prefix mount

jstastny opened this issue · comments

When using SSE-C with prefix mount on the latest master (commit 2871975), the readdir is not listing items I create inside of the mounted directory.

Repro steps:

  1. Mount like: AWSSSECKEYS=vgXEublG7JJkhfRSZT1Ze3lefY2d7j3i s3fs -f -d -o bucket=my-bucket:/honza-testing-sse,complement_stat,allow_other,uid=0,gid=100,use_cache=/tmp/s3fs-cache,del_cache,ensure_diskfree=10000,curldbg,use_sse=custom /mnt/s3
  2. Create a file inside of the mount directory: touch /mnt/s3/aaa (the object correctly gets created in S3, including SSE-C)
  3. List the contents of the mount directory: ls /mnt/s3
  4. This gives nothing

I investigated the HTTP calls and think this is caused by the incorrect path in the retry of the HEAD request (my understanding is that the retry is intentional, because the first request happens without SSE-C headers and the retry callback contains the SSE-C headers, based on this comment).

The problem is that the path gets broken in the second attempt and the prefix (/honza-testing-sse in my case) is repeated twice there. Here are the snippets from the logs (I've changed the bucket name and removed the boring parts like SSL bundle resolution):

First HEAD request -- correct path, but missing SSE-C headers

2023-10-18T14:35:41.875Z [INF]   s3fs.cpp:readdir_multi_head(3262): [path=/][list=0]
2023-10-18T14:35:41.875Z [INF]       curl.cpp:PreHeadRequest(3208): [tpath=/aaa][bpath=aaa][save=/aaa][sseckeypos=18446744073709551615]
2023-10-18T14:35:41.875Z [INF]       curl_util.cpp:prepare_url(210): URL is https://s3.amazonaws.com/my-bucket/honza-testing-sse/aaa
2023-10-18T14:35:41.875Z [INF]       curl_util.cpp:prepare_url(243): URL changed is https://my-bucket.s3.amazonaws.com/honza-testing-sse/aaa
2023-10-18T14:35:41.875Z [INF]       curl_multi.cpp:Request(324): [count=1]
2023-10-18T14:35:41.875Z [INF]       curl.cpp:insertV4Headers(2840): computing signature [HEAD] [/honza-testing-sse/aaa] [] []
2023-10-18T14:35:41.875Z [INF]       curl_util.cpp:url_to_host(265): url is https://s3.amazonaws.com
...
2023-10-18T14:35:41.875Z [CURL DBG] > HEAD /honza-testing-sse/aaa HTTP/1.1
2023-10-18T14:35:41.875Z [CURL DBG] > Host: my-bucket.s3.amazonaws.com
2023-10-18T14:35:41.875Z [CURL DBG] > User-Agent: s3fs/1.93.2 (commit hash unknown; OpenSSL)
2023-10-18T14:35:41.875Z [CURL DBG] > Accept: */*
...
2023-10-18T14:35:41.875Z [CURL DBG] > x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
2023-10-18T14:35:41.875Z [CURL DBG] > x-amz-date: 20231018T143541Z
2023-10-18T14:35:41.875Z [CURL DBG] >
...
2023-10-18T14:35:41.977Z [CURL DBG] < HTTP/1.1 400 Bad Request
2023-10-18T14:35:41.977Z [CURL DBG] < x-amz-request-id: 4G8GWN3G4K164XWZ
2023-10-18T14:35:41.977Z [CURL DBG] < x-amz-id-2: 7khYl3/U23hCCXP2iz+7zBlsxg7sxOJpwQYPZit0ZfoAkc3nTQt9CgFs/mfBKAckfxsIdgwVBXA=
2023-10-18T14:35:41.977Z [CURL DBG] < Content-Type: application/xml
...
2023-10-18T14:35:41.979Z [ERR] curl.cpp:RequestPerform(2526): HEAD HTTP response code 400, returning EPERM.
2023-10-18T14:35:41.979Z [WAN] curl_multi.cpp:MultiPerform(193): thread terminated with non-zero return code: -1
2023-10-18T14:35:41.979Z [WAN] curl_multi.cpp:MultiRead(228): failed a request(400: https://my-bucket.s3.amazonaws.com/honza-testing-sse/aaa)

Second HEAD request -- incorrect path (see the /honza-testing-sse/honza-testing-sse/), but SSE-C is present

2023-10-18T14:35:41.979Z [INF]       curl.cpp:PreHeadRequest(3208): [tpath=/honza-testing-sse/aaa][bpath=aaa][save=/aaa][sseckeypos=0]
2023-10-18T14:35:41.979Z [INF]       curl_util.cpp:prepare_url(210): URL is https://s3.amazonaws.com/my-bucket/honza-testing-sse/honza-testing-sse/aaa
2023-10-18T14:35:41.979Z [INF]       curl_util.cpp:prepare_url(243): URL changed is https://my-bucket.s3.amazonaws.com/honza-testing-sse/honza-testing-sse/aaa
2023-10-18T14:35:41.979Z [INF]       curl.cpp:insertV4Headers(2840): computing signature [HEAD] [/honza-testing-sse/honza-testing-sse/aaa] [] []
2023-10-18T14:35:41.979Z [INF]       curl_util.cpp:url_to_host(265): url is https://s3.amazonaws.com
2023-10-18T14:35:41.979Z [CURL DBG] * Hostname my-bucket.s3.amazonaws.com was found in DNS cache
...
2023-10-18T14:35:42.363Z [CURL DBG] > x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
2023-10-18T14:35:42.363Z [CURL DBG] > x-amz-date: 20231018T143541Z
2023-10-18T14:35:42.363Z [CURL DBG] > x-amz-server-side-encryption-customer-algorithm: AES256
2023-10-18T14:35:42.363Z [CURL DBG] > x-amz-server-side-encryption-customer-key: dmdYRXVibEc3SkpraGZSU1pUMVplM2xlZlkyZDdqM2k=
2023-10-18T14:35:42.363Z [CURL DBG] > x-amz-server-side-encryption-customer-key-md5: 5hY9TS21HM2qJz6bFpDkgg==
2023-10-18T14:35:42.363Z [CURL DBG] >
2023-10-18T14:35:42.478Z [CURL DBG] * TLSv1.2 (IN), TLS header, Supplemental data (23):
2023-10-18T14:35:42.478Z [CURL DBG] * Mark bundle as not supporting multiuse
2023-10-18T14:35:42.478Z [CURL DBG] < HTTP/1.1 404 Not Found
2023-10-18T14:35:42.478Z [CURL DBG] < x-amz-request-id: GAWNE0Z9CZHHJM7K
2023-10-18T14:35:42.478Z [CURL DBG] < x-amz-id-2: 69CRu9zdCoqb958dahFGx4pNIU7mGGPsNPEvDgV13StKV1gRjEbmuK7GzzEORNcAeKvbIlbG5Nc=
2023-10-18T14:35:42.478Z [CURL DBG] < Content-Type: application/xml
2023-10-18T14:35:42.478Z [CURL DBG] < Date: Wed, 18 Oct 2023 14:35:41 GMT
2023-10-18T14:35:42.479Z [CURL DBG] < Server: AmazonS3
2023-10-18T14:35:42.479Z [CURL DBG] <
2023-10-18T14:35:42.479Z [INF]       curl.cpp:RequestPerform(2540): HTTP response code 404 was returned, returning ENOENT

I tested the same workflow without using the mount prefix and everything worked just fine.

Additional Information

Version of s3fs being used (s3fs --version)

V1.93 2871975

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse or dpkg -s fuse)

6.9.9-5ubuntu3

Kernel information (uname -r)

7.15.0-1037-aws (running in Docker container in Ubuntu image)

GNU/Linux Distribution, if applicable (cat /etc/os-release)

How to run s3fs, if applicable

[x] command line
[] /etc/fstab

I have prepared an extremely naive fix (which likely breaks many other things).
Having debugged this for some time, I believe that the problem is that that retry instance of S3fsCurl takes the s3fscurl->GetPath() from the previous instance, but the path in the previous instance already has the mount_prefix prepended by get_realpath().

@jstastny
I looked into this bug you detected.
Your fix was mostly correct, but it was needed a little more modification.

So, I created a PR as #2406 to fix this.
I have manually checked the HEAD request retries and confirmed that it works, but I would appreciate it if you could confirm this if possible.

@ggtakec -- thanks a lot for looking into this.
@chudyandrej -- could you test #2406 in Deepnote?