internetarchive / openlibrary

One webpage for every book ever published!

Home Page:https://openlibrary.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Run Standard Ebooks import

cdrini opened this issue · comments

Standard Ebooks recently celebrated their 1,000th ebook! We only have 491, so it's time to re-run the import!

It seems like they've authenticated access to their OPDS feed, although with exceptions for open source organization; we should contact them.

Once we get access, it's pretty easy we just run the scripts/import_standard_ebooks.py on ol-home0 container as non-root.

Stakeholder

@mekarpeles @jimchamp

I was able to run the import, but importbot was misbehaving and attaching the IDs to the wrong book! I caught it after a few edits and turned of importbot. The rest of the imports are blocked on #9387

Ran the import, we're now up to 839! No clue why we haven't hit 1000 though :/ https://openlibrary.org/search?q=id_standard_ebooks%3A*&mode=everything

Found out; when we ran this initially in 2022, because of #9387 , they were imported to the wrong editions, and the identifiers were entirely ignored -_- sigh. Ran a bulk edit to roll back those specific edits: https://openlibrary.org/recentchanges/2024/06/13/bulk_update/131998751

Now to try to figure out how to re-import these...