jakekara / sheet-csv

serve google spreadsheets from your own server, which is usually faster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sheets-csv - serve google spreadsheets from your own server, which is
usually faster

by Jake Kara
jake@jakekara.com

WHY
	Google spreadsheets is a good backend for news apps, when you need
	to quickly give your colleagues a spreadsheet to update. This is
	especially valuable for live events, such as elections.

	The problem is that if your app has to get to .csv from Google's
	servers via an AJAX request, that can be slow, and slow down your
	app a lot.

	EXAMPLE: Your app can instead get the data from your own
	folder. I'll use the example of setting up the data backend for an
	election results app on a super-short deadline throughout this guide.

OVERVIEW

	The user requests a CSV file via a GET request.

	The server checks to see if it has a copy.

	If it doesn't have a copy, it gets it from Google and sends that,
	which is the slowest option, and stores a copy for the next request.

	If it does have a copy, it serves that, closes the connection and
	the updates the cache by getting a new copy from Google, which is
	faster than going to Google directly.

	EXAMPLE: As your colleagues fill election results into the Google
	spreadsheet, the changes will be reflected each time a new version
	is pulled from Google's servers.

SETUP

	Copy this repo into a folder on a server that has PHP running.

	In that new folder, run the setup script setup.sh, or just make the
	following directory tree:

		  ./data
			./archive
			./master

BASIC USE

	To get a spreadsheet, first publish it in Google Sheets, as a CSV,
	and get the big string of gibberish from the URL, which is the
	ID of the spreadsheet. Observe:

	The url to share the spreadsheet as a CSV might look like this:

	https://docs.google.com/spreadsheets/d/BIG_STRING_OF_GIBBERISH/pub?gid=0&single=true&output=csv

	Take out the BIG_STRING_OF_GIBBERISH part, and we'll call that the
	sheet_id from here on.

	Next, browse to the URL of the folder where you copied this repo,
	and add ?u=BIG_STRING_OF_GIBBERISH, like so:

	http://localhost/your-election-app/sheets-csv-copy/?u=BIG_STRING_OF_GIBBERISH

	Voila. You should get your .csv.

	You'll notice this created two files on the server:
	./data/BIG_STRING_OF_GIBBERISH.csv and
	./data/archive/BIG_STRING_OF_GIBBERISH-[TIMESTAMP].csv

	We'll get to that in the next section.

THE DATA FOLDER

	 The ./data/BIG_STRING_OF_GIBBERISH.csv file is the "latest" copy
	 of the file. It will be served for the next request. The
	 timestamped file in the ./data/archive/ folder is just an archive
	 (up to one per minute, but we'll be able to change how often a new
	 file is archived), in case you want to see the data changing over
	 time or roll back to a previous version of the CSV.

USAGE: OVERRIDING WITH A MASTER CSV

	 The ./data/master folder allows you to override the spreadsheet
	 completely.

	 EXAMPLE: You might want to do this when the election is over, so
	 the results are effectively "locked in" and no longer dependent on
	 the google sheet living on.

	 To use this feature, you copy your file from the archive folder,
	 and replace the timestamp part of the name with MASTER, so it
	 looks like:

	 ./data/master/BIG_STRING_OF_GIBBERISH-MASTER.csv

	 As long as that file exists, it will always be served. The system
	 will still try to update the cache in the background.

USAGE: SAVING FEWER ARCHIVE COPIES

	  Saving a copy of a spreadsheet each minute could lead to major
	  wasting of disk space, but for our example, it's fine, at least
	  on the night of an election.

	  To change it so that it only stores an archive file each hour,
	  day, month, etc, simple change the $TIME_FMT variable in conf.php
	  to any valid time format that strftime will recognize. I have
	  some examples in there.

	  NOTE: The current implemention overwrites files with the same
	  timestamp, which does save disk space, but if the write cost is a
	  problem for you, keep that in mind. I should make the program
	  check if the file exist and don't bother overwriting it. 

USAGE: DON'T QUERY GOOGLE SO OFTEN

       NOT IMPLEMENTED

       I have a $TTL variable in the conf.php file, which is not
       implemented. When implemented, it would throttle the cache updating
       to queries that are at leaset $TTL seconds apart. I didn't implement
       it because I wasn't sure about why the precision for filemtime() and
       time() was different, and whether they differed based on the machine
       they were running on -- so I couldn't reliably determine the "age"
       of a file to test whether it was older than $TTL seconds.


About

serve google spreadsheets from your own server, which is usually faster


Languages

Language:PHP 98.3%Language:Shell 1.7%