Large directory listing returns wrong results with --enable-storage-client-library=true
MattIrv opened this issue · comments
Describe the bug
Please provide a clear description of what you were trying to achieve along with the details of the flags that you passed.
I'm listing a bucket that has a lot of files in it. It returns incorrect results. When I try it with --enable-storage-client-library=false
I get the right results.
$ gsutil ls gs://<bucket> | wc -l
54979
Using gcsfuse:
$ ls -f mountdir | wc -l
49979
To Collect more Debug logs
Steps to reproduce the behavior:
- Please make sure you have no other security, monitoring, background processes which can offend the FUSE process running. Possibly reproduce under a fresh/clean installation.
- Please rerun with --debug_fuse --debug_fs --debug_gcs --debug_http --foreground as additional flags to enable debug logs.
- Monitor the logs and please capture screenshots or copy the relevant logs to a file (can use --log-format and --log-file as well).
- Attach the screenshot or the logs file to the bug report here.
- If you're using gcsfuse with any other library/tool/process please list out the steps you took to reproduce the issue.
$ gcsfuse -debug_fuse --debug_fs --debug_gcs --debug_http --foreground <bucket> mountdir
Start gcsfuse/0.42.3 (Go version go1.19.5) for app "" using mount point: /home/mirvine/mountdir
Opening GCS connection...
Creating a mount at "/home/mirvine/mountdir"
Creating a new server...
Set up root directory for bucket <bucket>
gcs: Req 0x0: <- ListObjects("")
gcs: Req 0x0: -> ListObjects("") (331.22039ms): OK
Mounting file system "<bucket>"...
fuse_debug: Beginning the mounting kickoff process
fuse_debug: Parsing fuse file descriptor
fuse_debug: Preparing for direct mounting
fuse_debug: Successfully opened the /dev/fuse in blocking mode
fuse_debug: Starting the unix mounting
fuse_debug: Directmount failed. Trying fallback.
fuse_debug: Creating a socket pair
fuse_debug: Creating files to wrap the sockets
fuse_debug: Starting fusermount/os mount
fuse_debug: Wrapping socket pair in a connection
fuse_debug: Checking that we have a unix domain socket
fuse_debug: Read a message from socket
fuse_debug: Successfully read the socket message.
fuse_debug: Converting FD into os.File
fuse_debug: Completed the mounting kickoff process
fuse_debug: Creating a connection object
fuse_debug: Op 0x00000002 connection.go:416] <- init
fuse_debug: Op 0x00000002 connection.go:498] -> OK ()
fuse_debug: Successfully created the connection
fuse_debug: Waiting for mounting process to complete
File system has been successfully mounted.
fuse_debug: Op 0x00000004 connection.go:416] <- GetInodeAttributes (inode 1, PID 3470)
debug_fs: GetInodeAttributes(1): <nil>
fuse_debug: Op 0x00000004 connection.go:498] -> OK ()
fuse_debug: Op 0x00000006 connection.go:416] <- OpenDir (inode 1, PID 3470)
debug_fs: OpenDir(1): <nil>
fuse_debug: Op 0x00000006 connection.go:498] -> OK ()
fuse_debug: Op 0x00000008 connection.go:416] <- ReadDir (inode 1, PID 3470)
gcs: Req 0x1: <- ListObjects("")
gcs: Req 0x1: -> ListObjects("") (469.947796ms): OK
gcs: Req 0x2: <- ListObjects("")
gcs: Req 0x2: -> ListObjects("") (517.532557ms): OK
gcs: Req 0x3: <- ListObjects("")
gcs: Req 0x3: -> ListObjects("") (503.189936ms): OK
gcs: Req 0x4: <- ListObjects("")
gcs: Req 0x4: -> ListObjects("") (540.966433ms): OK
gcs: Req 0x5: <- ListObjects("")
gcs: Req 0x5: -> ListObjects("") (511.014127ms): OK
gcs: Req 0x6: <- ListObjects("")
gcs: Req 0x6: -> ListObjects("") (536.509317ms): OK
gcs: Req 0x7: <- ListObjects("")
gcs: Req 0x7: -> ListObjects("") (530.863693ms): OK
gcs: Req 0x8: <- ListObjects("")
gcs: Req 0x8: -> ListObjects("") (519.813362ms): OK
gcs: Req 0x9: <- ListObjects("")
gcs: Req 0x9: -> ListObjects("") (985.784837ms): OK
gcs: Req 0xa: <- ListObjects("")
gcs: Req 0xa: -> ListObjects("") (545.591511ms): OK
debug_fs: ReadDir(1, 0): <nil>
fuse_debug: Op 0x00000008 connection.go:498] -> OK ()
fuse_debug: Op 0x0000000a connection.go:416] <- ReadDir (inode 1, PID 3470)
debug_fs: ReadDir(1, 64): <nil>
fuse_debug: Op 0x0000000a connection.go:498] -> OK ()
fuse_debug: Op 0x0000000c connection.go:416] <- ReadDir (inode 1, PID 3470)
...
debug_fs: ReadDir(1, 49792): <nil>
fuse_debug: Op 0x0000061c connection.go:498] -> OK ()
fuse_debug: Op 0x0000061e connection.go:416] <- ReadDir (inode 1, PID 3470)
debug_fs: ReadDir(1, 49856): <nil>
fuse_debug: Op 0x0000061e connection.go:498] -> OK ()
fuse_debug: Op 0x00000620 connection.go:416] <- ReadDir (inode 1, PID 3470)
debug_fs: ReadDir(1, 49920): <nil>
fuse_debug: Op 0x00000620 connection.go:498] -> OK ()
fuse_debug: Op 0x00000622 connection.go:416] <- ReadDir (inode 1, PID 3470)
debug_fs: ReadDir(1, 49979): <nil>
fuse_debug: Op 0x00000622 connection.go:498] -> OK ()
fuse_debug: Op 0x00000624 connection.go:416] <- ReleaseDirHandle (PID 0)
debug_fs: ReleaseDirHandle(0): <nil>
fuse_debug: Op 0x00000624 connection.go:498] -> OK ()
System (please complete the following information):
- OS: Ubuntu 22.04
- Platform GCE VM
- Version
$ gcsfuse -v
gcsfuse version 0.42.3 (Go version go1.19.5)
Additional context
Add any other context about the problem here.
I get the correct results when running with --enable-storage-client-library=true:
$ ls -f mountdir | wc -l
54978
$ gcsfuse -debug_fuse --debug_fs --debug_gcs --foreground --enable-storage-client-library=false <bucket> mountdir
Start gcsfuse/0.42.3 (Go version go1.19.5) for app "" using mount point: /home/mirvine/mountdir
Opening GCS connection...
Creating a mount at "/home/mirvine/mountdir"
Creating a new server...
Set up root directory for bucket <bucket>
OpenBucket("<bucket>", "")
gcs: Req 0x0: <- ListObjects("")
gcs: Req 0x0: -> ListObjects("") (347.352921ms): OK
gcs: Req 0x1: <- ListObjects("")
gcs: Req 0x1: -> ListObjects("") (64.462791ms): OK
Mounting file system "<bucket>"...
fuse_debug: Beginning the mounting kickoff process
fuse_debug: Parsing fuse file descriptor
fuse_debug: Preparing for direct mounting
fuse_debug: Successfully opened the /dev/fuse in blocking mode
fuse_debug: Starting the unix mounting
fuse_debug: Directmount failed. Trying fallback.
fuse_debug: Creating a socket pair
fuse_debug: Creating files to wrap the sockets
fuse_debug: Starting fusermount/os mount
fuse_debug: Wrapping socket pair in a connection
fuse_debug: Checking that we have a unix domain socket
fuse_debug: Read a message from socket
fuse_debug: Successfully read the socket message.
fuse_debug: Converting FD into os.File
fuse_debug: Completed the mounting kickoff process
fuse_debug: Creating a connection object
fuse_debug: Op 0x00000002 connection.go:416] <- init
fuse_debug: Op 0x00000002 connection.go:498] -> OK ()
fuse_debug: Successfully created the connection
fuse_debug: Waiting for mounting process to complete
File system has been successfully mounted.
fuse_debug: Op 0x00000004 connection.go:416] <- GetInodeAttributes (inode 1, PID 3602)
debug_fs: GetInodeAttributes(1): <nil>
fuse_debug: Op 0x00000004 connection.go:498] -> OK ()
fuse_debug: Op 0x00000006 connection.go:416] <- OpenDir (inode 1, PID 3602)
debug_fs: OpenDir(1): <nil>
fuse_debug: Op 0x00000006 connection.go:498] -> OK ()
fuse_debug: Op 0x00000008 connection.go:416] <- ReadDir (inode 1, PID 3602)
gcs: Req 0x2: <- ListObjects("")
gcs: Req 0x2: -> ListObjects("") (469.85154ms): OK
gcs: Req 0x3: <- ListObjects("")
gcs: Req 0x3: -> ListObjects("") (543.97605ms): OK
gcs: Req 0x4: <- ListObjects("")
gcs: Req 0x4: -> ListObjects("") (633.368703ms): OK
gcs: Req 0x5: <- ListObjects("")
gcs: Req 0x5: -> ListObjects("") (494.986415ms): OK
gcs: Req 0x6: <- ListObjects("")
gcs: Req 0x6: -> ListObjects("") (483.669903ms): OK
gcs: Req 0x7: <- ListObjects("")
gcs: Req 0x7: -> ListObjects("") (496.437023ms): OK
gcs: Req 0x8: <- ListObjects("")
gcs: Req 0x8: -> ListObjects("") (488.065365ms): OK
gcs: Req 0x9: <- ListObjects("")
gcs: Req 0x9: -> ListObjects("") (546.585837ms): OK
gcs: Req 0xa: <- ListObjects("")
gcs: Req 0xa: -> ListObjects("") (510.525553ms): OK
gcs: Req 0xb: <- ListObjects("")
gcs: Req 0xb: -> ListObjects("") (505.246729ms): OK
gcs: Req 0xc: <- ListObjects("")
gcs: Req 0xc: -> ListObjects("") (577.179612ms): OK
debug_fs: ReadDir(1, 0): <nil>
fuse_debug: Op 0x00000008 connection.go:498] -> OK ()
fuse_debug: Op 0x0000000a connection.go:416] <- ReadDir (inode 1, PID 3602)
debug_fs: ReadDir(1, 64): <nil>
fuse_debug: Op 0x0000000a connection.go:498] -> OK ()
fuse_debug: Op 0x0000000c connection.go:416] <- ReadDir (inode 1, PID 3602)
debug_fs: ReadDir(1, 128): <nil>
fuse_debug: Op 0x0000000c connection.go:498] -> OK ()
fuse_debug: Op 0x0000000e connection.go:416] <- ReadDir (inode 1, PID 3602)
...
debug_fs: ReadDir(1, 54912): <nil>
fuse_debug: Op 0x000006bc connection.go:498] -> OK ()
fuse_debug: Op 0x000006be connection.go:416] <- ReadDir (inode 1, PID 3602)
debug_fs: ReadDir(1, 54976): <nil>
fuse_debug: Op 0x000006be connection.go:498] -> OK ()
fuse_debug: Op 0x000006c0 connection.go:416] <- ReadDir (inode 1, PID 3602)
debug_fs: ReadDir(1, 54978): <nil>
fuse_debug: Op 0x000006c0 connection.go:498] -> OK ()
fuse_debug: Op 0x000006c2 connection.go:416] <- ReleaseDirHandle (PID 0)
debug_fs: ReleaseDirHandle(0): <nil>
fuse_debug: Op 0x000006c2 connection.go:498] -> OK ()
I can't quite tell what the bug is, but I think it may be coming from the implementation of ListObjects in https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/internal/storage/bucket_handle.go#L238 and how it handles the PageToken, in particular it seems to assume that an API call would be made and the PageToken would be updated on every call to Next() which I don't think is true.
Hey @MattIrv , We tried listing more than 54979 (60000) files in a directory mounted through GCSFuse and that worked with both --enable-storage-client-library=true
and --enable-storage-client-library=false
.
Could you please share the following information to debug further ?
- Is this issue always reproducible ? If not, how can we reproduce at our end ?
- What is the directory structure that you are listing ?
- In case the directory structure is nested, can you try with
--implicit-dirs
flag and confirm if you are facing the same issue ?
Is this issue always reproducible ? If not, how can we reproduce at our end ?
It consistently reproduces on the same buckets, but with different numbers of files it might not. I would recommend you try to test with a bucket that has exactly 54979 files and see if that causes it to reproduce. I've also seen different behavior depending on whether the files are in a folder in the bucket or just in the bucket root (although I've been able to reproduce it in both cases I think) so I would recommend trying with them just in the bucket root
What is the directory structure that you are listing ?
Flat files. Maybe there's one directory in there.
In case the directory structure is nested, can you try with --implicit-dirs flag and confirm if you are facing the same issue ?
I believe I tried this and didn't see any change. I'll double check the next time I see it happen as well
Confirmed this repros even with --implicit-dirs
(and a bucket with 19245 files):
$ gcsfuse -debug_fuse --debug_fs --debug_gcs --foreground --enable-storage-client-library=true --implicit-dirs <bucket> mountdir
$ ls -f mountdir | wc -l
15001
$ gcloud storage ls gs://<bucket> | wc -l
19245
When I run with --implicit-dirs=false
I get a slightly different result - this bucket is the same structure and I think there is one implicit directory so it's not surprising:
$ gcsfuse -debug_fuse --debug_fs --debug_gcs --foreground --enable-storage-client-library=true <bucket> mountdir
$ ls -f mountdir | wc -l
15000
Repro steps:
$ mkdir lotsoffiles && cd lotsoffiles
$ for i in {1..54979}; do touch file${i}.txt ; done
$ gcloud storage buckets create gs://mirvine-apr11
$ gcloud storage cp * gs://mirvine-apr11
Mount using gcsfuse and list:
$ gcsfuse -debug_fuse --debug_fs --debug_gcs --foreground --enable-storage-client-library=true --implicit-dirs mirvine-apr11 mountdir
Check the number of files, it's correct:
$ ls -f mountdir | wc -l
54979
Create a single implicit directory:
$ gcloud storage cp file1.txt gs://mirvine-apr11/dir/file1.txt
Now list again, it's incorrect:
$ ls -f mountdir | wc -l
49981
Remove the file, the listing is correct again:
$ gcloud storage rm gs://mirvine-apr11/dir/file1.txt
$ ls -f mountdir | wc -l
54979
Add a single file in the root directory, the listing is still correct:
$ gcloud storage cp file1.txt gs://mirvine-apr11/file0.txt
$ ls -f mountdir | wc -l
54980
Thank you @MattIrv for sharing repro steps. We were able to reproduce the issue. We will fix this issue in our upcoming GCSFuse release (i.e. v0.42.4).
@MattIrv , Thanks for your patience. The fix for this issue has been released in GCSFuse v0.42.4. Closing this issue. Please re-open if you face any further issues.