GCSFuse Read/Write Perf - gcloud storage api

Question

GCSFuse Read/Write Perf - gcloud storage api

pmorse-cr opened this issue a year ago · comments

Describe the issue
We are building a new video transcoding pipeline within Google Cloud and are looking to minimize the amount of files moving around. In video transcoding most of the time is taken in CPU/GPU, but we have found that GCSFuse is working well for smaller source files and as soon as we src >6GB video performance drops. We are testing copying to local-ssd/pd and filestore, but would like to maintain our use of GCSFuse and would like to know what the plans are to move to the new gcloud storage api/functions vs gsutil. Thank you very much~!

pmorse-cr · Answer 1 · Wed Aug 23 2023 22:43:23 GMT+0800 (China Standard Time)

I apologize, it actually looks like it isn't a different API - but the difference in tooling within the sdk. Are there any plans to change the way the gcsfuse interacts with the API for perf improvements obtained by the new gcloud storage tooling? thanks

Prince Kumar · Answer 2 · Thu Aug 24 2023 01:29:24 GMT+0800 (China Standard Time)

Hi @pmorse-cr,

Thank you for the showing interest in GCSFuse!

Is your workload read heavy or write heavy?

To give a little context, GCSfuse is a FUSE (Filesystem in Userspace) implementation that allows you to mount a GCS bucket as a local filesystem. This means that you can access your GCS objects as if they were regular files on your computer. GCSfuse is a good option if you need to access GCS objects with standard file system operations, such as cat, ls, cp, and mv.

As far as I know, both GCSFuse and gcloud storage use similar APIs to interact with the GCS. However, in the case of GCSFuse, requests go through the kernel (by design), which can result in a difference in the performance for some operations. Still, you can follow this doc to improve the performance based on your use-case.

-Prince

Prince Kumar · Answer 3 · Tue Aug 29 2023 00:29:21 GMT+0800 (China Standard Time)

Hi @pmorse-cr,

In addition to the above:

Could you please also share the numbers you got b/w gcloud storage and gcsfuse? This will help us in understanding the gap.
What is the expected rate of processing data through your pipeline in order to ensure smooth execution?
You can refer this page, to gain a better understanding of the performance of gcsfuse for different workloads.

Also, are you talking about these improvements?
(a) CRC32C data integrity check.
(b) Graph based tasked management to parallellize the work with less overhead.

I'll discuss within team, if we can incorporate any of these in GCSFuse.

We look forward to hearing from you.

Thanks,
Prince Kumar.

pmorse-cr · Answer 4 · Wed Sep 27 2023 03:59:42 GMT+0800 (China Standard Time)

Thanks very much for the reply and sorry about the delays. We are helping a Google Cloud customer build a new transcoding pipeline for VOD services delivery globally. They are currently on AWS and one of the goals is to use as much serverless as possible, while not moving files everywhere. This is why we were hoping to use GCSFuse. We have run into some performance challenges within the drm/packaging services, where it looks like we are io-bound. In regards to performance improvements with gcloud storage vs gsutil - yes the link you sent is what we found within our testing. In fact we moved the transcoding and first portion of packaging into a k8s cluster to use local and regional ssd with gcloud storage operations in front and in back of the jobs within k8s because of the slow downs. We would love to work with you all closely on this. Our customer has also referenced this article multiple times about Fuse and Transcoding, fyi - https://netflixtechblog.com/mezzfs-mounting-object-storage-in-netflixs-media-processing-platform-cda01c446ba
@raj-prince - thanks

marcoa6 · Answer 5 · Wed Sep 27 2023 23:01:13 GMT+0800 (China Standard Time)

Hi @pmorse-cr you mention serverless (Cloud Run?) but then talk about using K8S (GKE). Which will be used?
If GKE, please using the integrated GCSfuse CSI driver. If Serverless there is a potential future integration we are exploring I can make you aware of.

Re Perf: Please see performance best practices, limitations, and benchmarks.

Does the performance with gcloud meet your needs? If not, GCS is not the right solution.
If this request is to support parallel downloads within GCSfuse, it is on our backlog but dont have a date yet.

If you send me your email address, I can reach out to you with more details as i would also like to better understand the use case and see how we can collaborate

pmorse-cr · Answer 6 · Thu Sep 28 2023 00:56:09 GMT+0800 (China Standard Time)

I'd love to send you my email address and collaborate more. Is there a private way to do that vs open on this issue? Thanks @marcoa6 @raj-prince

marcoa6 · Answer 7 · Fri Sep 29 2023 23:07:43 GMT+0800 (China Standard Time)

We will track this as a feature request to support parallel reads in which large objects are read in chunks in parallel to improve performance