GoogleCloudPlatform / gcsfuse

A user-space file system for interacting with Google Cloud Storage

Home Page:https://cloud.google.com/storage/docs/gcs-fuse

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Large directory listing returns wrong results with --enable-storage-client-library=true

MattIrv opened this issue · comments

Describe the bug
Please provide a clear description of what you were trying to achieve along with the details of the flags that you passed.

I'm listing a bucket that has a lot of files in it. It returns incorrect results. When I try it with --enable-storage-client-library=false I get the right results.

$ gsutil ls gs://<bucket> | wc -l
54979

Using gcsfuse:

$ ls -f mountdir | wc -l
49979

To Collect more Debug logs
Steps to reproduce the behavior:

  1. Please make sure you have no other security, monitoring, background processes which can offend the FUSE process running. Possibly reproduce under a fresh/clean installation.
  2. Please rerun with --debug_fuse --debug_fs --debug_gcs --debug_http --foreground as additional flags to enable debug logs.
  3. Monitor the logs and please capture screenshots or copy the relevant logs to a file (can use --log-format and --log-file as well).
  4. Attach the screenshot or the logs file to the bug report here.
  5. If you're using gcsfuse with any other library/tool/process please list out the steps you took to reproduce the issue.
$ gcsfuse -debug_fuse --debug_fs --debug_gcs --debug_http --foreground <bucket> mountdir
Start gcsfuse/0.42.3 (Go version go1.19.5) for app "" using mount point: /home/mirvine/mountdir
Opening GCS connection...
Creating a mount at "/home/mirvine/mountdir"
Creating a new server...
Set up root directory for bucket <bucket>
gcs: Req              0x0: <- ListObjects("")
gcs: Req              0x0: -> ListObjects("") (331.22039ms): OK
Mounting file system "<bucket>"...
fuse_debug: Beginning the mounting kickoff process
fuse_debug: Parsing fuse file descriptor
fuse_debug: Preparing for direct mounting
fuse_debug: Successfully opened the /dev/fuse in blocking mode
fuse_debug: Starting the unix mounting
fuse_debug: Directmount failed. Trying fallback.
fuse_debug: Creating a socket pair
fuse_debug: Creating files to wrap the sockets
fuse_debug: Starting fusermount/os mount
fuse_debug: Wrapping socket pair in a connection
fuse_debug: Checking that we have a unix domain socket
fuse_debug: Read a message from socket
fuse_debug: Successfully read the socket message.
fuse_debug: Converting FD into os.File
fuse_debug: Completed the mounting kickoff process
fuse_debug: Creating a connection object
fuse_debug: Op 0x00000002        connection.go:416] <- init
fuse_debug: Op 0x00000002        connection.go:498] -> OK ()
fuse_debug: Successfully created the connection
fuse_debug: Waiting for mounting process to complete
File system has been successfully mounted.
fuse_debug: Op 0x00000004        connection.go:416] <- GetInodeAttributes (inode 1, PID 3470)
debug_fs: GetInodeAttributes(1): <nil>
fuse_debug: Op 0x00000004        connection.go:498] -> OK ()
fuse_debug: Op 0x00000006        connection.go:416] <- OpenDir (inode 1, PID 3470)
debug_fs: OpenDir(1): <nil>
fuse_debug: Op 0x00000006        connection.go:498] -> OK ()
fuse_debug: Op 0x00000008        connection.go:416] <- ReadDir (inode 1, PID 3470)
gcs: Req              0x1: <- ListObjects("")
gcs: Req              0x1: -> ListObjects("") (469.947796ms): OK
gcs: Req              0x2: <- ListObjects("")
gcs: Req              0x2: -> ListObjects("") (517.532557ms): OK
gcs: Req              0x3: <- ListObjects("")
gcs: Req              0x3: -> ListObjects("") (503.189936ms): OK
gcs: Req              0x4: <- ListObjects("")
gcs: Req              0x4: -> ListObjects("") (540.966433ms): OK
gcs: Req              0x5: <- ListObjects("")
gcs: Req              0x5: -> ListObjects("") (511.014127ms): OK
gcs: Req              0x6: <- ListObjects("")
gcs: Req              0x6: -> ListObjects("") (536.509317ms): OK
gcs: Req              0x7: <- ListObjects("")
gcs: Req              0x7: -> ListObjects("") (530.863693ms): OK
gcs: Req              0x8: <- ListObjects("")
gcs: Req              0x8: -> ListObjects("") (519.813362ms): OK
gcs: Req              0x9: <- ListObjects("")
gcs: Req              0x9: -> ListObjects("") (985.784837ms): OK
gcs: Req              0xa: <- ListObjects("")
gcs: Req              0xa: -> ListObjects("") (545.591511ms): OK
debug_fs: ReadDir(1, 0): <nil>
fuse_debug: Op 0x00000008        connection.go:498] -> OK ()
fuse_debug: Op 0x0000000a        connection.go:416] <- ReadDir (inode 1, PID 3470)
debug_fs: ReadDir(1, 64): <nil>
fuse_debug: Op 0x0000000a        connection.go:498] -> OK ()
fuse_debug: Op 0x0000000c        connection.go:416] <- ReadDir (inode 1, PID 3470)

...

debug_fs: ReadDir(1, 49792): <nil>
fuse_debug: Op 0x0000061c        connection.go:498] -> OK ()
fuse_debug: Op 0x0000061e        connection.go:416] <- ReadDir (inode 1, PID 3470)
debug_fs: ReadDir(1, 49856): <nil>
fuse_debug: Op 0x0000061e        connection.go:498] -> OK ()
fuse_debug: Op 0x00000620        connection.go:416] <- ReadDir (inode 1, PID 3470)
debug_fs: ReadDir(1, 49920): <nil>
fuse_debug: Op 0x00000620        connection.go:498] -> OK ()
fuse_debug: Op 0x00000622        connection.go:416] <- ReadDir (inode 1, PID 3470)
debug_fs: ReadDir(1, 49979): <nil>
fuse_debug: Op 0x00000622        connection.go:498] -> OK ()
fuse_debug: Op 0x00000624        connection.go:416] <- ReleaseDirHandle (PID 0)
debug_fs: ReleaseDirHandle(0): <nil>
fuse_debug: Op 0x00000624        connection.go:498] -> OK ()

System (please complete the following information):

  • OS: Ubuntu 22.04
  • Platform GCE VM
  • Version
$ gcsfuse -v
gcsfuse version 0.42.3 (Go version go1.19.5)

Additional context
Add any other context about the problem here.

I get the correct results when running with --enable-storage-client-library=true:

$ ls -f mountdir | wc -l
54978
$ gcsfuse -debug_fuse --debug_fs --debug_gcs --foreground --enable-storage-client-library=false  <bucket>  mountdir
Start gcsfuse/0.42.3 (Go version go1.19.5) for app "" using mount point: /home/mirvine/mountdir
Opening GCS connection...
Creating a mount at "/home/mirvine/mountdir"
Creating a new server...
Set up root directory for bucket <bucket>
OpenBucket("<bucket>", "")
gcs: Req              0x0: <- ListObjects("")
gcs: Req              0x0: -> ListObjects("") (347.352921ms): OK
gcs: Req              0x1: <- ListObjects("")
gcs: Req              0x1: -> ListObjects("") (64.462791ms): OK
Mounting file system "<bucket>"...
fuse_debug: Beginning the mounting kickoff process
fuse_debug: Parsing fuse file descriptor
fuse_debug: Preparing for direct mounting
fuse_debug: Successfully opened the /dev/fuse in blocking mode
fuse_debug: Starting the unix mounting
fuse_debug: Directmount failed. Trying fallback.
fuse_debug: Creating a socket pair
fuse_debug: Creating files to wrap the sockets
fuse_debug: Starting fusermount/os mount
fuse_debug: Wrapping socket pair in a connection
fuse_debug: Checking that we have a unix domain socket
fuse_debug: Read a message from socket
fuse_debug: Successfully read the socket message.
fuse_debug: Converting FD into os.File
fuse_debug: Completed the mounting kickoff process
fuse_debug: Creating a connection object
fuse_debug: Op 0x00000002        connection.go:416] <- init
fuse_debug: Op 0x00000002        connection.go:498] -> OK ()
fuse_debug: Successfully created the connection
fuse_debug: Waiting for mounting process to complete
File system has been successfully mounted.
fuse_debug: Op 0x00000004        connection.go:416] <- GetInodeAttributes (inode 1, PID 3602)
debug_fs: GetInodeAttributes(1): <nil>
fuse_debug: Op 0x00000004        connection.go:498] -> OK ()
fuse_debug: Op 0x00000006        connection.go:416] <- OpenDir (inode 1, PID 3602)
debug_fs: OpenDir(1): <nil>
fuse_debug: Op 0x00000006        connection.go:498] -> OK ()
fuse_debug: Op 0x00000008        connection.go:416] <- ReadDir (inode 1, PID 3602)
gcs: Req              0x2: <- ListObjects("")
gcs: Req              0x2: -> ListObjects("") (469.85154ms): OK
gcs: Req              0x3: <- ListObjects("")
gcs: Req              0x3: -> ListObjects("") (543.97605ms): OK
gcs: Req              0x4: <- ListObjects("")
gcs: Req              0x4: -> ListObjects("") (633.368703ms): OK
gcs: Req              0x5: <- ListObjects("")
gcs: Req              0x5: -> ListObjects("") (494.986415ms): OK
gcs: Req              0x6: <- ListObjects("")
gcs: Req              0x6: -> ListObjects("") (483.669903ms): OK
gcs: Req              0x7: <- ListObjects("")
gcs: Req              0x7: -> ListObjects("") (496.437023ms): OK
gcs: Req              0x8: <- ListObjects("")
gcs: Req              0x8: -> ListObjects("") (488.065365ms): OK
gcs: Req              0x9: <- ListObjects("")
gcs: Req              0x9: -> ListObjects("") (546.585837ms): OK
gcs: Req              0xa: <- ListObjects("")
gcs: Req              0xa: -> ListObjects("") (510.525553ms): OK
gcs: Req              0xb: <- ListObjects("")
gcs: Req              0xb: -> ListObjects("") (505.246729ms): OK
gcs: Req              0xc: <- ListObjects("")
gcs: Req              0xc: -> ListObjects("") (577.179612ms): OK
debug_fs: ReadDir(1, 0): <nil>
fuse_debug: Op 0x00000008        connection.go:498] -> OK ()
fuse_debug: Op 0x0000000a        connection.go:416] <- ReadDir (inode 1, PID 3602)
debug_fs: ReadDir(1, 64): <nil>
fuse_debug: Op 0x0000000a        connection.go:498] -> OK ()
fuse_debug: Op 0x0000000c        connection.go:416] <- ReadDir (inode 1, PID 3602)
debug_fs: ReadDir(1, 128): <nil>
fuse_debug: Op 0x0000000c        connection.go:498] -> OK ()
fuse_debug: Op 0x0000000e        connection.go:416] <- ReadDir (inode 1, PID 3602)

...

debug_fs: ReadDir(1, 54912): <nil>
fuse_debug: Op 0x000006bc        connection.go:498] -> OK ()
fuse_debug: Op 0x000006be        connection.go:416] <- ReadDir (inode 1, PID 3602)
debug_fs: ReadDir(1, 54976): <nil>
fuse_debug: Op 0x000006be        connection.go:498] -> OK ()
fuse_debug: Op 0x000006c0        connection.go:416] <- ReadDir (inode 1, PID 3602)
debug_fs: ReadDir(1, 54978): <nil>
fuse_debug: Op 0x000006c0        connection.go:498] -> OK ()
fuse_debug: Op 0x000006c2        connection.go:416] <- ReleaseDirHandle (PID 0)
debug_fs: ReleaseDirHandle(0): <nil>
fuse_debug: Op 0x000006c2        connection.go:498] -> OK ()

I can't quite tell what the bug is, but I think it may be coming from the implementation of ListObjects in https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/internal/storage/bucket_handle.go#L238 and how it handles the PageToken, in particular it seems to assume that an API call would be made and the PageToken would be updated on every call to Next() which I don't think is true.

Hey @MattIrv , We tried listing more than 54979 (60000) files in a directory mounted through GCSFuse and that worked with both --enable-storage-client-library=true and --enable-storage-client-library=false.

Could you please share the following information to debug further ?

  1. Is this issue always reproducible ? If not, how can we reproduce at our end ?
  2. What is the directory structure that you are listing ?
  3. In case the directory structure is nested, can you try with --implicit-dirs flag and confirm if you are facing the same issue ?

Is this issue always reproducible ? If not, how can we reproduce at our end ?

It consistently reproduces on the same buckets, but with different numbers of files it might not. I would recommend you try to test with a bucket that has exactly 54979 files and see if that causes it to reproduce. I've also seen different behavior depending on whether the files are in a folder in the bucket or just in the bucket root (although I've been able to reproduce it in both cases I think) so I would recommend trying with them just in the bucket root

What is the directory structure that you are listing ?

Flat files. Maybe there's one directory in there.

In case the directory structure is nested, can you try with --implicit-dirs flag and confirm if you are facing the same issue ?

I believe I tried this and didn't see any change. I'll double check the next time I see it happen as well

Confirmed this repros even with --implicit-dirs (and a bucket with 19245 files):

$ gcsfuse -debug_fuse --debug_fs --debug_gcs --foreground --enable-storage-client-library=true --implicit-dirs  <bucket>  mountdir
$ ls -f mountdir | wc -l
15001
$ gcloud storage ls gs://<bucket> | wc -l
19245

When I run with --implicit-dirs=false I get a slightly different result - this bucket is the same structure and I think there is one implicit directory so it's not surprising:

$ gcsfuse -debug_fuse --debug_fs --debug_gcs --foreground --enable-storage-client-library=true <bucket>  mountdir
$ ls -f mountdir | wc -l
15000

Repro steps:

$ mkdir lotsoffiles && cd lotsoffiles
$ for i in {1..54979}; do touch file${i}.txt ; done
$ gcloud storage buckets create gs://mirvine-apr11
$ gcloud storage cp * gs://mirvine-apr11

Mount using gcsfuse and list:

$ gcsfuse -debug_fuse --debug_fs --debug_gcs --foreground --enable-storage-client-library=true --implicit-dirs mirvine-apr11  mountdir

Check the number of files, it's correct:

$ ls -f mountdir | wc -l
54979

Create a single implicit directory:

$ gcloud storage cp file1.txt gs://mirvine-apr11/dir/file1.txt

Now list again, it's incorrect:

$ ls -f mountdir | wc -l
49981

Remove the file, the listing is correct again:

$ gcloud storage rm gs://mirvine-apr11/dir/file1.txt
$ ls -f mountdir | wc -l
54979

Add a single file in the root directory, the listing is still correct:

$ gcloud storage cp file1.txt gs://mirvine-apr11/file0.txt
$ ls -f mountdir | wc -l
54980

Thank you @MattIrv for sharing repro steps. We were able to reproduce the issue. We will fix this issue in our upcoming GCSFuse release (i.e. v0.42.4).

@MattIrv , Thanks for your patience. The fix for this issue has been released in GCSFuse v0.42.4. Closing this issue. Please re-open if you face any further issues.