catalyst-cooperative / pudl-archiver

A tool for capuring snapshots of public data sources and archiving them on Zenodo for programmatic use.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Validate `datapackage.json` checksums against Zenodo checksums

e-belfer opened this issue · comments

To catch files that have incorrectly uploaded to Zenodo, we should add a validation to AbstractDatasetArchiver.validate_dataset() that compares the checksum for each file reported in the datapackage.json (generated based on the local file) and the checksum reported for the same file when getting the record from Zenodo (see, e.g., https://zenodo.org/api/records/11408171).

If these are not the same, this is a way to catch incomplete/corrupted file uploads, which should produce a validation failure.