Downloading the previous database file no longer works (>5MB)
simonw opened this issue · comments
Vercel has a 5MB size limit on responses. The til.db
database is now 5.1 MB thanks to the screenshots, which means it can no longer be downloaded. This dramatically slows down the builds as each build has to generate the HTML and screenshot for every page.
So I need somewhere else to keep that .db file in between runs.
A few options:
- The GitHub Actions cache. I'm worried about this though as there are no guarantees on how long that will last.
- GitHub Action artifacts. These last for 90 days. https://docs.github.com/en/actions/configuring-and-managing-workflows/persisting-workflow-data-using-artifacts
- An S3 bucket somewhere
- Another GitHub repository!
That last option is tempting but feels a bit rude - that's going to end up being an enormous .git
history. Though I guess I could flatten the history every time I store a file there...
I'm going to try doing this with artifacts first, mainly to learn how to use them.
Not sure that will work:
Note: You can only download artifacts in a workflow that were uploaded during the same workflow run.
I could also migrate the site to Google Cloud Run.
Got this working! The latest til.db
file is now stored in https://github.com/simonw/til-db - every time I push a new copy of the file there I use git commit --amend
to rewrite history followed by git push --force
- so that repo only ever contains a single commit which is the commit that added the file.
til/.github/workflows/build.yml
Lines 18 to 23 in 1e29c3f
Then later:
til/.github/workflows/build.yml
Lines 80 to 86 in 1e29c3f
I was using a slightly modified version of this solution with great joy for a while, but eventually ran into a couple of issues. The main one is that the sqlite file outgrew its britches, even without tracking its history.
I ended up using GitHub Releases instead, which has a 2GB single file limit.
See my workflow file here.
The overall architecture is now three repos: one for code, one for flat files, and one for the db. I could probably get rid of the db repo and include the sqlite file as a release in either of the other repos as well.
Cheers, and thanks for all the inspiration. Happy to take any suggestions as well.
P.S. Using a separate repo for flat files (JSON, here) yields a bit of a free, hosted NoSQL backend. I think it's a fun pattern.