Set structural metadata when registering an crawl
lwrubel opened this issue · comments
Laura Wrubel commented
Structural metadata can be created at this step, which will allow the content-metadata step to be removed from the wasCrawlPreassembly workflow.
The current structural metadata for crawls looks like the snippet below from this example:
"contains": ⊖[
⊖{
"type": "https://cocina.sul.stanford.edu/models/resources/file",
"externalIdentifier": "bb929zb5539_1",
"label": "",
"version": 1,
"structural": ⊖{
"contains": ⊖[
⊖{
"type": "https://cocina.sul.stanford.edu/models/file",
"externalIdentifier": "https://cocina.sul.stanford.edu/file/d5d8285b-74f3-462e-8a97-6b268ed73363",
"label": "ARCHIVEIT-8751-WEEKLY-JOB1584447-SEED1426017-20220402070118605-00000-h3.warc.gz",
"filename": "ARCHIVEIT-8751-WEEKLY-JOB1584447-SEED1426017-20220402070118605-00000-h3.warc.gz",
"size": 3024482,
"version": 1,
"hasMimeType": "application/warc",
"hasMessageDigests": ⊖[
⊖{
"type": "sha1",
"digest": "7cdbd7bd50248bb627929c3dd103ad9e51d2d3a0"
},
⊖{
"type": "md5",
"digest": "de89fb13e94dd21b96ddc25d6103c3df"
}
],
"access": ⊖{
"view": "dark",
"download": "none",
"controlledDigitalLending": false
},
"administrative": ⊖{
"publish": false,
"sdrPreserve": true,
"shelve": true
}
}
]
}
}
...
Checksums will need to be generated for the WARCs being registered.