simonw / til

Today I Learned

Home Page:https://til.simonwillison.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Downloading the previous database file no longer works (>5MB)

simonw opened this issue · comments

Vercel has a 5MB size limit on responses. The til.db database is now 5.1 MB thanks to the screenshots, which means it can no longer be downloaded. This dramatically slows down the builds as each build has to generate the HTML and screenshot for every page.

So I need somewhere else to keep that .db file in between runs.

A few options:

That last option is tempting but feels a bit rude - that's going to end up being an enormous .git history. Though I guess I could flatten the history every time I store a file there...

I'm going to try doing this with artifacts first, mainly to learn how to use them.

Not sure that will work:

Note: You can only download artifacts in a workflow that were uploaded during the same workflow run.

I could also migrate the site to Google Cloud Run.

Got this working! The latest til.db file is now stored in https://github.com/simonw/til-db - every time I push a new copy of the file there I use git commit --amend to rewrite history followed by git push --force - so that repo only ever contains a single commit which is the commit that added the file.

- name: Checkout til-db
uses: actions/checkout@v2
with:
repository: simonw/til-db
path: til-db
token: ${{ secrets.PERSONAL_ACCESS_TOKEN_FOR_PUSH }}

Then later:

- name: Save til.db to simonw/til-db
run: |-
cd til-db
cp ../main/til.db .
git add til.db
git commit --amend --no-edit
git push --force

I was using a slightly modified version of this solution with great joy for a while, but eventually ran into a couple of issues. The main one is that the sqlite file outgrew its britches, even without tracking its history.

I ended up using GitHub Releases instead, which has a 2GB single file limit.

See my workflow file here.

The overall architecture is now three repos: one for code, one for flat files, and one for the db. I could probably get rid of the db repo and include the sqlite file as a release in either of the other repos as well.

Cheers, and thanks for all the inspiration. Happy to take any suggestions as well.

P.S. Using a separate repo for flat files (JSON, here) yields a bit of a free, hosted NoSQL backend. I think it's a fun pattern.