jakekara/sheet-csv

sheets-csv - serve google spreadsheets from your own server, which is
usually faster

by Jake Kara
jake@jakekara.com

WHY
Google spreadsheets is a good backend for news apps, when you need
to quickly give your colleagues a spreadsheet to update. This is
especially valuable for live events, such as elections.

The problem is that if your app has to get to .csv from Google's
servers via an AJAX request, that can be slow, and slow down your
app a lot.

EXAMPLE: Your app can instead get the data from your own
folder. I'll use the example of setting up the data backend for an
election results app on a super-short deadline throughout this guide.

OVERVIEW

The user requests a CSV file via a GET request.

The server checks to see if it has a copy.

If it doesn't have a copy, it gets it from Google and sends that,
which is the slowest option, and stores a copy for the next request.

If it does have a copy, it serves that, closes the connection and
the updates the cache by getting a new copy from Google, which is
faster than going to Google directly.

EXAMPLE: As your colleagues fill election results into the Google
spreadsheet, the changes will be reflected each time a new version
is pulled from Google's servers.

SETUP

Copy this repo into a folder on a server that has PHP running.

In that new folder, run the setup script setup.sh, or just make the
following directory tree:

./data
./archive
./master

BASIC USE

To get a spreadsheet, first publish it in Google Sheets, as a CSV,
and get the big string of gibberish from the URL, which is the
ID of the spreadsheet. Observe:

The url to share the spreadsheet as a CSV might look like this:

https://docs.google.com/spreadsheets/d/BIG_STRING_OF_GIBBERISH/pub?gid=0&single=true&output=csv

Take out the BIG_STRING_OF_GIBBERISH part, and we'll call that the
sheet_id from here on.

Next, browse to the URL of the folder where you copied this repo,
and add ?u=BIG_STRING_OF_GIBBERISH, like so:

http://localhost/your-election-app/sheets-csv-copy/?u=BIG_STRING_OF_GIBBERISH

Voila. You should get your .csv.

You'll notice this created two files on the server:
./data/BIG_STRING_OF_GIBBERISH.csv and
./data/archive/BIG_STRING_OF_GIBBERISH-[TIMESTAMP].csv

We'll get to that in the next section.

THE DATA FOLDER

The ./data/BIG_STRING_OF_GIBBERISH.csv file is the "latest" copy
of the file. It will be served for the next request. The
timestamped file in the ./data/archive/ folder is just an archive
(up to one per minute, but we'll be able to change how often a new
file is archived), in case you want to see the data changing over
time or roll back to a previous version of the CSV.

USAGE: OVERRIDING WITH A MASTER CSV

The ./data/master folder allows you to override the spreadsheet
completely.

EXAMPLE: You might want to do this when the election is over, so
the results are effectively "locked in" and no longer dependent on
the google sheet living on.

To use this feature, you copy your file from the archive folder,
and replace the timestamp part of the name with MASTER, so it
looks like:

./data/master/BIG_STRING_OF_GIBBERISH-MASTER.csv

As long as that file exists, it will always be served. The system
will still try to update the cache in the background.

USAGE: SAVING FEWER ARCHIVE COPIES

Saving a copy of a spreadsheet each minute could lead to major
wasting of disk space, but for our example, it's fine, at least
on the night of an election.

To change it so that it only stores an archive file each hour,
day, month, etc, simple change the $TIME_FMT variable in conf.php
to any valid time format that strftime will recognize. I have
some examples in there.

NOTE: The current implemention overwrites files with the same
timestamp, which does save disk space, but if the write cost is a
problem for you, keep that in mind. I should make the program
check if the file exist and don't bother overwriting it.

USAGE: DON'T QUERY GOOGLE SO OFTEN

NOT IMPLEMENTED

I have a $TTL variable in the conf.php file, which is not
implemented. When implemented, it would throttle the cache updating
to queries that are at leaset $TTL seconds apart. I didn't implement
it because I wasn't sure about why the precision for filemtime() and
time() was different, and whether they differed based on the machine
they were running on -- so I couldn't reliably determine the "age"
of a file to test whether it was older than $TTL seconds.

jakekara / sheet-csv

About

Languages