sul-dlss / was-registrar-app

Rails app to organize downloaded web archiving data and trigger preassembly/accessioning when appropriate

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in Register DOR Items

peterchanws opened this issue · comments

I tried to register a crawl with 3 seeds for the collection Senses Places.
Screen Shot 2020-05-12 at 2 49 52 PM
I got the same error for the crawl and 3 seeds
build-was-crawl-druid-tree : No such file or directory - /was_unaccessioned_data/jobs/iaw/201510

Screen Shot 2020-05-12 at 4 52 19 PM

@peterchanws Was this in stage or production?

It looks like there may have been some source_id validation problems: https://app.honeybadger.io/projects/50568/faults/63284869

I can't login https://app.honeybadger.io/projects/50568/faults/63284869 with my sunet id pchan3@stanford.edu. Could someone grant access for me?

It’s not registration that’s failing here. Registration succeeded in terms of creating the druids. The source_id conflict is the result of trying to register again with the same data. But registration doesn't need to be run again.

Instead, the problem is that the wasPreassemblyWF, which registration must automatically start, is running into an error. My guess is that there's supposed to be data placed at the file path that's being reported as not found. I'm not sure how that data gets there. Maybe it's supposed to be there before the object is registered.

On second thought, maybe I just don't understand what "registration" means to WAS. But for https://argo-stage.stanford.edu/view/jk890tf8873 (for example) it looks like the error is downstream of the registration form.

Sorry Peter, the link was for the infrastructure team. The error was:
ArgumentError: Source ID must follow the format 'namespace:value', not 'was_unaccessioned_data/jobs/iaw/201510'

I'm not sure if that has anything to do with the problem you have here, but it seems curious that the id matched.

I followed the instructions in https://consul.stanford.edu/display/WARC/Initiating+Crawl+Object+Accessioning when creating the source id.
Should I use "iaw:201510" instead of iaw/201510?

Here is the screen print from Archive-It for the collection (6385) and crawl id (181129).
Screen Shot 2020-05-14 at 8 29 12 AM

Here is the screen print from Archive-It on the seed url
Screen Shot 2020-05-14 at 8 30 47 AM

@peterchanws The instructions you link to say they are for crawls that did not come from Archive-It.

@andrewjbtw Thanks. I missed that.