mpiannucci / redis-fsspec-cache

A redis based filesystem cache for fsspec

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

redis_fsspec_cache

A prototype redis based filesystem cache for fsspec

Motivation

fsspec currently contains a filesystem cache that is based on a local filesystem, as well as in memory caches. As we start to deploy python services to serverless platforms, we need a way to share a cache between multiple instances of a service. This package provides a filesystem cache that uses redis as a backend, allowing multiple instances of a service to share a cache.

Specifically, this package looks to improve api route response times when building services with xpublish deployed to serverless environments.

Installation

pip install git+https://github.com/mpiannucci/redis-fsspec-cache.git

Usage

from redis_fsspec_cache import RedisCachingFileSystem

fs = RedisCachingFileSystem(
    redis_host="localhost",
    redis_port=6380,
    expiry_time=60,
    method="chunk"
    target_protocol="s3",
    target_options={
        'anon': True,
    },
)

When a block or chunk is cached, it will be visible in redis using the KEYS command:

KEYS *
1) "noaa-hrrr-bdp-pds/hrrr.20230927/conus/hrrr.t00z.wrfsubhf00.grib2-0"

Block vs Chunk Caching

The method parameter controls whether the cache will store blocks or chunks. When method="block", the cache will store each file block as a separate key in redis. When method="chunk", the cache will store each file chunk as a separate key in redis. This distinction is important when considering the size of target chunks, for example when accessing GRIB or NetCDF files from cloud storage, where data is accessed as specific chunks at predetermined byte ranges. In this scenario, blocks may not map directly to the target chunks, and so may result in more data being fetched and cached than is necessary.

About

A redis based filesystem cache for fsspec

License:MIT License


Languages

Language:Python 100.0%