toby1991 / geesefs

Finally, a good FUSE FS implementation over S3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GeeseFS is a high-performance, POSIX-ish S3 (Yandex, Amazon) file system written in Go

Overview

GeeseFS allows you to mount an S3 bucket as a file system.

FUSE file systems based on S3 typically have performance problems, especially with small files and metadata operations.

GeeseFS attempts to solve these problems by using aggressive parallelism and asynchrony.

POSIX Compatibility Matrix

GeeseFS rclone Goofys S3FS gcsfuse
Read after write + + - + +
Partial writes + + - + +
Truncate + - - + +
chmod/chown - - - + -
fsync + - - + +
Symlinks + - - + +
xattr + - + + -
Directory renames + + * + +
readdir & changes + + - + +

* Directory renames are allowed in Goofys for directories with no more than 1000 entries and the limit is hardcoded

List of non-POSIX behaviors/limitations for GeeseFS:

  • does not store file mode/owner/group, use --(dir|file)-mode or --(uid|gid) options
  • does not support hard links
  • does not support locking
  • ctime, atime is always the same as mtime

In addition to the items above, the following are supportable but not yet implemented:

  • creating files larger than 1TB

Stability

GeeseFS is stable enough to pass most of xfstests which are applicable, including dirstress/fsstress stress-tests (generic/007, generic/011, generic/013).

Performance Features

GeeseFS rclone Goofys S3FS gcsfuse
Parallel readahead + - + + -
Parallel multipart uploads + - + + -
No readahead on random read + - + - +
Server-side copy on append + - - * +
Server-side copy on update + - - * -
xattrs without extra RTT + - - - +
Fast recursive listings + - * - +
Asynchronous write + + - - -
Asynchronous delete + - - - -
Asynchronous rename + - - - -
Disk cache for reads + * - + +
Disk cache for writes + * - + -

* Recursive listing optimisation in Goofys is buggy and may skip files under certain conditions

* S3FS uses server-side copy, but it still downloads the whole file to update it. And it's buggy too :-)

* rclone mount has VFS cache, but it can only cache whole files. And it's also buggy - it often hangs on write.

Installation

  • Pre-built binaries:
    • Linux amd64. You may also need to install fuse-utils first.
    • Mac amd64, arm64. You also need osxfuse/macfuse for GeeseFS to work.
  • Or build from source with Go 1.13 or later:
$ go get github.com/yandex-cloud/geesefs

Usage

$ cat ~/.aws/credentials
[default]
aws_access_key_id = AKID1234567890
aws_secret_access_key = MY-SECRET-KEY
$ $GOPATH/bin/geesefs <bucket> <mountpoint>
$ $GOPATH/bin/geesefs [--endpoint https://...] <bucket:prefix> <mountpoint> # if you only want to mount objects under a prefix

You can also supply credentials via the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

To mount an S3 bucket on startup, make sure the credential is configured for root, and can add this to /etc/fstab:

bucket    /mnt/mountpoint    fuse.geesefs    _netdev,allow_other,--file-mode=0666,--dir-mode=0777    0    0

See also: Instruction for Azure Blob Storage.

Benchmarks

See bench/README.md.

Configuration

There's a lot of tuning you can do. Consult geesefs -h to view the list of options.

License

Licensed under the Apache License, Version 2.0

See LICENSE and AUTHORS

Compatibility with S3

geesefs works with:

  • Yandex Object Storage (default)
  • Amazon S3
  • Ceph (and also Ceph-based Digital Ocean Spaces, DreamObjects, gridscale etc)
  • Minio
  • OpenStack Swift
  • Azure Blob Storage (even though it's not S3)

It should also work with any other S3 that implements multipart uploads and multipart server-side copy (UploadPartCopy).

The following backends are inherited from Goofys code and still exist, but are broken:

  • Google Cloud Storage
  • Azure Data Lake Gen1
  • Azure Data Lake Gen2

References

About

Finally, a good FUSE FS implementation over S3

License:Other


Languages

Language:Go 98.8%Language:Shell 0.9%Language:Gnuplot 0.2%Language:Python 0.1%Language:Dockerfile 0.0%Language:Makefile 0.0%