sul-dlss / vt-arclight

An Arclight-based discovery application for materials from the Virtual Tribunals project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Number of items inconsistent

laurensorensen opened this issue · comments

The counter under the "collection" box on the left sidebar says "9554" total items, but the number I have from our original dataset and entered into the EAD is 9920. Not sure why there's an inconsistent count..
Screen Shot 2023-01-05 at 11 54 18 AM

commented

If you do a wildcard/blank search on the site you get 9861 documents. One of those is the collection itself, so if you subtract that (because it isn't a component itself, but the parent of all components) you get 9860, which matches the "Total components" number in the more info panel.

The More info panel says there are 9554 online items. If you take the 9860 components and subtract all of the manually created levels that don't actually have documents (i.e., you can't go to an item detail page for them: Record group, Series, Subseries) you get close to 9554 (9558, which is the number of Level -> Items we show in the Level facet). So it seems like maybe there are 4 items that are valid components but for whatever reason don't have an item detail page and thus aren't considered online items. (I'm just speculating, but the numbers seem to work for that explanation.)

Why those numbers are off from the number in the original dataset and entered into the EAD, I have no idea. But it does seem like the numbers on the site itself are more or less internally consistent. So maybe the count in the EAD needs updating?

@ggeisler thank you for this analysis!

@laurensorensen Hi Lauren, looking into this now. It'd be helpful if you can you explain exactly where you got the number 9920 from -- what software or view created this count? Thanks!

Ok, I am unable to find where I got the 9920 number from. Sorry about that! I thought it was from the original inventory, but that is only 9627 rows.
There are also these 12 items that were in the inventory but not delivered from ICJ (noted in inventory that they were not delivered):
H-2785
H-4170
H-5245
H-5246
H-5260_0025A_1
H-5260_0171A_1
H-5260_0172A_1
H-5260_0173A_1
H-5260_0174A_1
H-5260_0187A_1
H-5260_2797A_1
H-5260_2799A_1

Currently going through and trying to see if there are any more that weren't delivered.

Still a little confused and trying to sort this out.

  • items_only.csv: 9615 items
  • from my original spreadsheet from ICJ (w/o "missing" items): 9603
  • right now in NTA site app: 9554

I downloaded all the CSVs from the series github page and I got more results that seem inconsistent..
Exhibits:
3494
Audio:
4590
Docbooks:
394
Emptyfolders:
150
Finalpleas:
113
Indictments:
3
Judgments:
13
Minutes:
1
Lists:
72
Commissiontrans:
236
Court transcripts:
740
Rules:
13
Statements:
59
Trial briefs:
39

Total:
9917

Realizing that the above includes Record Group, sub-series etc. Going to try and do a count and revise without these added.

Total number of sub-series and record groups is 71. Maybe we can discuss next week? @marlo-longley @thatbudakguy

Hi again, So I'm not the best at math but I think what I have below is fairly accurate. I'm not sure how to fix / evaluate where this is at, but I feel like it's important that the numbers be consistent... I was wondering if there might be logs that still exist from when the CSVs were uploaded to ASpace? Sorry I didn't ask for the logs at the time of import.

9626 total items in Argo (documents, audio, film, image)
9558 total items according to extent field in ASpace (by series) (edited, missed Trial briefs and Rules previously)
9620 total items from ICJ (original data plus H-2785)
9554 total items in Arclight now
9616 total items in items_only

(after adding H-2785)

I asked Geoff about the appearance of 9626 items in Argo versus 9620 in original spreadsheet.

Closing in favor of #449