simonw / scrape-instances-social

https://instances.social/instances.json

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mechanism for rebuilding database file from scratch

simonw opened this issue · comments

Will need this to fix:

Can copy this pattern:

https://github.com/simonw/datasette.io/blob/5455068f5ffdb8cd3f09a4d84d94b7512a46b18e/.github/workflows/deploy.yml?q=%22if%3A%22+user%3Asimonw+path%3A.github%2Fworkflows%2F*.yml#L34-L37

on:
  workflow_dispatch:
    inputs:
      from_scratch:
        description: Enter 'skip' to create a new database from scratch
    - name: Download previous content.db
      if: github.event.inputs.from_scratch != 'skip'
      run: |
        curl -O https://datasette.io/content.db

I also want to be sure that things don't get weird if I'm trying to run a "rebuild" task but one of the scheduled tasks kicks in and runs at the same time.

https://docs.github.com/en/actions/using-jobs/using-concurrency can help there:

concurrency: scraper

It's not working right:

[{"table": "namespaces", "count": 1},
 {"table": "commits", "count": 2},
 {"table": "item", "count": 1},
 {"table": "item_version", "count": 2},
 {"table": "columns", "count": 3},
 {"table": "item_changed", "count": 5}]

I think because I need to checkout the full repo history, not the default shallow checkout.

with:
  fetch-depth: 0

This works correctly now.