observingClouds / car_referencer

Create reference filesystem for collections of car files.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

car_referencer: Creating parquet file reference system for car collections.

Github-CI Status

Similar to tape archive (tar) files, content addressable archive (car) files are a possibility to group objects to larger quantities. Besides uploading these car files to an object store, they also pose the possibility to save the collections of objects on a traditional filesystem. Accessing these collections without the need of extracting the individual objects can be realized by the usage of a reference file system.

car_referencer can create the needed reference file from single car s or multiple car s that are part of the same merkle DAG.

Command line usage

car_referencer creates the reference file internally in two steps. The first step is to identify all available references within the provided car s ( here carfiles.*.car) and save this as an index file (e.g. index.parquet) that will be reused if it already exists. In a second step the reference file (e.g. preffs.parquet) is created based on the ROOT-HASH that identifies NOT the root-CID of the car file, but the root-CID of the root file-object. In case of a zarr file, like example.zarr, the ROOT-CID would refer to example.zarr itself.

car_referencer -c "carfiles.*.car" -p preffs.parquet -r ROOT-HASH -i index.parquet

The created file preffs.parquet can then be opened by

import xarray as xr

ds = xr.open_zarr("preffs::preffs.parquet")

thanks to https://github.com/d70-t/preffs.

Installation

git clone https://github.com/observingClouds/car_referencer.git
cd car_referencer
pip install .

Development

For testing purposes additional dependencies need to be installed including some packages written in go. The needed environment can be installed by

git clone https://github.com/observingClouds/car_referencer.git
cd car_referencer
mamba env create
source activate test-env

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

About

Create reference filesystem for collections of car files.

License:Other


Languages

Language:Python 100.0%