sul-dlss / was-registrar-app

Rails app to organize downloaded web archiving data and trigger preassembly/accessioning when appropriate

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Set structural metadata when registering an crawl

lwrubel opened this issue · comments

Structural metadata can be created at this step, which will allow the content-metadata step to be removed from the wasCrawlPreassembly workflow.

The current structural metadata for crawls looks like the snippet below from this example:

        "contains": ⊖[
           ⊖{
                "type": "https://cocina.sul.stanford.edu/models/resources/file",
                "externalIdentifier": "bb929zb5539_1",
                "label": "",
                "version": 1,
                "structural": ⊖{
                    "contains": ⊖[
                       ⊖{
                            "type": "https://cocina.sul.stanford.edu/models/file",
                            "externalIdentifier": "https://cocina.sul.stanford.edu/file/d5d8285b-74f3-462e-8a97-6b268ed73363",
                            "label": "ARCHIVEIT-8751-WEEKLY-JOB1584447-SEED1426017-20220402070118605-00000-h3.warc.gz",
                            "filename": "ARCHIVEIT-8751-WEEKLY-JOB1584447-SEED1426017-20220402070118605-00000-h3.warc.gz",
                            "size": 3024482,
                            "version": 1,
                            "hasMimeType": "application/warc",
                            "hasMessageDigests": ⊖[
                               ⊖{
                                    "type": "sha1",
                                    "digest": "7cdbd7bd50248bb627929c3dd103ad9e51d2d3a0"
                                },
                               ⊖{
                                    "type": "md5",
                                    "digest": "de89fb13e94dd21b96ddc25d6103c3df"
                                }
                            ],
                            "access": ⊖{
                                "view": "dark",
                                "download": "none",
                                "controlledDigitalLending": false
                            },
                            "administrative": ⊖{
                                "publish": false,
                                "sdrPreserve": true,
                                "shelve": true
                            }
                        }
                    ]
                }
            }
...

Checksums will need to be generated for the WARCs being registered.