thombashi / df-diskcache

df-diskcache is a Python library for caching pandas.DataFrame objects to local disk.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Summary

df-diskcache is a Python library for caching pandas.DataFrame objects to local disk.

PyPI package version Supported Python versions CI status of Linux/macOS/Windows Test coverage: coveralls CodeQL

Installation

pip install df-diskcache

Features

Supports the following methods:

  • get: Get a cache entry (pandas.DataFrame) for the key. Returns None if the key is not found.
  • set: Create a cache entry with an optional time-to-live (TTL) for the key-value pair.
  • update
  • touch: Update the last accessed time of a cache entry to extend the TTL.
  • delete
  • prune: Delete expired cache entries.
  • Dictionary-like operations:
    • __getitem__
    • __setitem__
    • __contains__
    • __delitem__

Usage

Sample Code:
import pandas as pd
from dfdiskcache import DataFrameDiskCache

cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"

df = cache.get(url)
if df is None:
    print("cache miss")
    df = pd.read_csv(url)
    cache.set(url, df)
else:
    print("cache hit")

print(df)

You can also use operations like a dictionary:

Sample Code:
import pandas as pd
from dfdiskcache import DataFrameDiskCache

cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"

df = cache[url]
if df is None:
    print("cache miss")
    df = pd.read_csv(url)
    cache[url] = df
else:
    print("cache hit")

print(df)

Set TTL for cache entries

Sample Code:
import pandas as pd
from dfdiskcache import DataFrameDiskCache

DataFrameDiskCache.DEFAULT_TTL = 10  # you can override the default TTL (default: 3600 seconds)

cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"

df = cache.get(url)
if df is None:
    df = pd.read_csv(url)
    cache.set(url, df, ttl=60)  # you can set a TTL for the key-value pair

print(df)

Dependencies

About

df-diskcache is a Python library for caching pandas.DataFrame objects to local disk.

License:MIT License


Languages

Language:Python 95.5%Language:Makefile 4.5%