redis_fsspec_cache
A prototype redis based filesystem cache for fsspec
Motivation
fsspec
currently contains a filesystem cache that is based on a local filesystem, as well as in memory caches. As we start to deploy python services to serverless platforms, we need a way to share a cache between multiple instances of a service. This package provides a filesystem cache that uses redis as a backend, allowing multiple instances of a service to share a cache.
Specifically, this package looks to improve api route response times when building services with xpublish
deployed to serverless environments.
Installation
pip install git+https://github.com/mpiannucci/redis-fsspec-cache.git
Usage
from redis_fsspec_cache import RedisCachingFileSystem
fs = RedisCachingFileSystem(
redis_host="localhost",
redis_port=6380,
expiry_time=60,
method="chunk"
target_protocol="s3",
target_options={
'anon': True,
},
)
When a block or chunk is cached, it will be visible in redis using the KEYS
command:
KEYS *
1) "noaa-hrrr-bdp-pds/hrrr.20230927/conus/hrrr.t00z.wrfsubhf00.grib2-0"
Block vs Chunk Caching
The method
parameter controls whether the cache will store blocks or chunks. When method="block"
, the cache will store each file block as a separate key in redis. When method="chunk"
, the cache will store each file chunk as a separate key in redis. This distinction is important when considering the size of target chunks, for example when accessing GRIB or NetCDF files from cloud storage, where data is accessed as specific chunks at predetermined byte ranges. In this scenario, blocks may not map directly to the target chunks, and so may result in more data being fetched and cached than is necessary.