ipni / storetheindex

A directory of CIDs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Write saved ads and entries to CAR file, and store at configured location

gammazero opened this issue · comments

When the indexer is configured no to delete ads and entries that are acquired during ingestion, these ads and entries should be packaged into CAR files. The CAR files should be stored in a configured location, such as S3.

Each ad and its associated entries should be packed into one CAR file. This allows other indexers to fetch only the ads and the entries that are needer for indexing. If an ad is deleted later in the chain, the indexer can skip fetching the ad and its entries.

Once a CAR file is created, that CAR should be stored in a "mirror" location specified in the indexer config. Minimally, this must support storing in S3 and to a file system path. The CAR file should be named with the advertisement CID.

The indexer should also look in the mirror location to see if an ad is already there, before fetching the ad and its entries from the publisher.

Proposing the following configuration settings:

MirrorURL string
FetchFromMirror bool
StoreAtMirror bool

Small note on the implementation: please make this configurable by io.writer.

In the case of local file system it'd be a file writer. In the case of S3 it'd do multipart upload on the fly.