Readdir fails for SSE-C and prefix mount
jstastny opened this issue · comments
When using SSE-C with prefix mount on the latest master
(commit 2871975
), the readdir is not listing items I create inside of the mounted directory.
Repro steps:
- Mount like:
AWSSSECKEYS=vgXEublG7JJkhfRSZT1Ze3lefY2d7j3i s3fs -f -d -o bucket=my-bucket:/honza-testing-sse,complement_stat,allow_other,uid=0,gid=100,use_cache=/tmp/s3fs-cache,del_cache,ensure_diskfree=10000,curldbg,use_sse=custom /mnt/s3
- Create a file inside of the mount directory:
touch /mnt/s3/aaa
(the object correctly gets created in S3, including SSE-C) - List the contents of the mount directory:
ls /mnt/s3
- This gives nothing
I investigated the HTTP calls and think this is caused by the incorrect path in the retry of the HEAD request (my understanding is that the retry is intentional, because the first request happens without SSE-C headers and the retry callback contains the SSE-C headers, based on this comment).
The problem is that the path gets broken in the second attempt and the prefix (/honza-testing-sse
in my case) is repeated twice there. Here are the snippets from the logs (I've changed the bucket name and removed the boring parts like SSL bundle resolution):
First HEAD
request -- correct path, but missing SSE-C headers
2023-10-18T14:35:41.875Z [INF] s3fs.cpp:readdir_multi_head(3262): [path=/][list=0]
2023-10-18T14:35:41.875Z [INF] curl.cpp:PreHeadRequest(3208): [tpath=/aaa][bpath=aaa][save=/aaa][sseckeypos=18446744073709551615]
2023-10-18T14:35:41.875Z [INF] curl_util.cpp:prepare_url(210): URL is https://s3.amazonaws.com/my-bucket/honza-testing-sse/aaa
2023-10-18T14:35:41.875Z [INF] curl_util.cpp:prepare_url(243): URL changed is https://my-bucket.s3.amazonaws.com/honza-testing-sse/aaa
2023-10-18T14:35:41.875Z [INF] curl_multi.cpp:Request(324): [count=1]
2023-10-18T14:35:41.875Z [INF] curl.cpp:insertV4Headers(2840): computing signature [HEAD] [/honza-testing-sse/aaa] [] []
2023-10-18T14:35:41.875Z [INF] curl_util.cpp:url_to_host(265): url is https://s3.amazonaws.com
...
2023-10-18T14:35:41.875Z [CURL DBG] > HEAD /honza-testing-sse/aaa HTTP/1.1
2023-10-18T14:35:41.875Z [CURL DBG] > Host: my-bucket.s3.amazonaws.com
2023-10-18T14:35:41.875Z [CURL DBG] > User-Agent: s3fs/1.93.2 (commit hash unknown; OpenSSL)
2023-10-18T14:35:41.875Z [CURL DBG] > Accept: */*
...
2023-10-18T14:35:41.875Z [CURL DBG] > x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
2023-10-18T14:35:41.875Z [CURL DBG] > x-amz-date: 20231018T143541Z
2023-10-18T14:35:41.875Z [CURL DBG] >
...
2023-10-18T14:35:41.977Z [CURL DBG] < HTTP/1.1 400 Bad Request
2023-10-18T14:35:41.977Z [CURL DBG] < x-amz-request-id: 4G8GWN3G4K164XWZ
2023-10-18T14:35:41.977Z [CURL DBG] < x-amz-id-2: 7khYl3/U23hCCXP2iz+7zBlsxg7sxOJpwQYPZit0ZfoAkc3nTQt9CgFs/mfBKAckfxsIdgwVBXA=
2023-10-18T14:35:41.977Z [CURL DBG] < Content-Type: application/xml
...
2023-10-18T14:35:41.979Z [ERR] curl.cpp:RequestPerform(2526): HEAD HTTP response code 400, returning EPERM.
2023-10-18T14:35:41.979Z [WAN] curl_multi.cpp:MultiPerform(193): thread terminated with non-zero return code: -1
2023-10-18T14:35:41.979Z [WAN] curl_multi.cpp:MultiRead(228): failed a request(400: https://my-bucket.s3.amazonaws.com/honza-testing-sse/aaa)
Second HEAD
request -- incorrect path (see the /honza-testing-sse/honza-testing-sse/
), but SSE-C is present
2023-10-18T14:35:41.979Z [INF] curl.cpp:PreHeadRequest(3208): [tpath=/honza-testing-sse/aaa][bpath=aaa][save=/aaa][sseckeypos=0]
2023-10-18T14:35:41.979Z [INF] curl_util.cpp:prepare_url(210): URL is https://s3.amazonaws.com/my-bucket/honza-testing-sse/honza-testing-sse/aaa
2023-10-18T14:35:41.979Z [INF] curl_util.cpp:prepare_url(243): URL changed is https://my-bucket.s3.amazonaws.com/honza-testing-sse/honza-testing-sse/aaa
2023-10-18T14:35:41.979Z [INF] curl.cpp:insertV4Headers(2840): computing signature [HEAD] [/honza-testing-sse/honza-testing-sse/aaa] [] []
2023-10-18T14:35:41.979Z [INF] curl_util.cpp:url_to_host(265): url is https://s3.amazonaws.com
2023-10-18T14:35:41.979Z [CURL DBG] * Hostname my-bucket.s3.amazonaws.com was found in DNS cache
...
2023-10-18T14:35:42.363Z [CURL DBG] > x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
2023-10-18T14:35:42.363Z [CURL DBG] > x-amz-date: 20231018T143541Z
2023-10-18T14:35:42.363Z [CURL DBG] > x-amz-server-side-encryption-customer-algorithm: AES256
2023-10-18T14:35:42.363Z [CURL DBG] > x-amz-server-side-encryption-customer-key: dmdYRXVibEc3SkpraGZSU1pUMVplM2xlZlkyZDdqM2k=
2023-10-18T14:35:42.363Z [CURL DBG] > x-amz-server-side-encryption-customer-key-md5: 5hY9TS21HM2qJz6bFpDkgg==
2023-10-18T14:35:42.363Z [CURL DBG] >
2023-10-18T14:35:42.478Z [CURL DBG] * TLSv1.2 (IN), TLS header, Supplemental data (23):
2023-10-18T14:35:42.478Z [CURL DBG] * Mark bundle as not supporting multiuse
2023-10-18T14:35:42.478Z [CURL DBG] < HTTP/1.1 404 Not Found
2023-10-18T14:35:42.478Z [CURL DBG] < x-amz-request-id: GAWNE0Z9CZHHJM7K
2023-10-18T14:35:42.478Z [CURL DBG] < x-amz-id-2: 69CRu9zdCoqb958dahFGx4pNIU7mGGPsNPEvDgV13StKV1gRjEbmuK7GzzEORNcAeKvbIlbG5Nc=
2023-10-18T14:35:42.478Z [CURL DBG] < Content-Type: application/xml
2023-10-18T14:35:42.478Z [CURL DBG] < Date: Wed, 18 Oct 2023 14:35:41 GMT
2023-10-18T14:35:42.479Z [CURL DBG] < Server: AmazonS3
2023-10-18T14:35:42.479Z [CURL DBG] <
2023-10-18T14:35:42.479Z [INF] curl.cpp:RequestPerform(2540): HTTP response code 404 was returned, returning ENOENT
I tested the same workflow without using the mount prefix and everything worked just fine.
Additional Information
Version of s3fs being used (s3fs --version
)
V1.93 2871975
Version of fuse being used (pkg-config --modversion fuse
, rpm -qi fuse
or dpkg -s fuse
)
6.9.9-5ubuntu3
Kernel information (uname -r
)
7.15.0-1037-aws (running in Docker container in Ubuntu image)
GNU/Linux Distribution, if applicable (cat /etc/os-release
)
How to run s3fs, if applicable
[x] command line
[] /etc/fstab
I have prepared an extremely naive fix (which likely breaks many other things).
Having debugged this for some time, I believe that the problem is that that retry instance of S3fsCurl
takes the s3fscurl->GetPath()
from the previous instance, but the path
in the previous instance already has the mount_prefix
prepended by get_realpath()
.
@jstastny
I looked into this bug you detected.
Your fix was mostly correct, but it was needed a little more modification.
So, I created a PR as #2406 to fix this.
I have manually checked the HEAD request retries and confirmed that it works, but I would appreciate it if you could confirm this if possible.
@ggtakec -- thanks a lot for looking into this.
@chudyandrej -- could you test #2406 in Deepnote?