ffffound-export

A Python 2.7 script for downloading your Ffffound images.

It will download the images, make HTML pages in which to view them, and save data about them all in a JSON file. It will first try to download the image from the original source (to get the highest quality possible), falling back to use the version cached at Ffffound.

This is based on a script by Ash Hildebrandt. The many terrible things about this hacked together, cargo-cult script are all my fault, not his. Thanks Ash!

Running it

Install the required modules using pip probably in a virtualenv:

pip install -r requirements.txt

Then run the script:

python ffffind.py username

Replace username with your username on Ffffound, eg:

python ffffind.py philgyford

It can take a long time to run, depending on how many images you ffffound and how slowly the servers respond.

To test it with only a few pages you can pass in the number of pages to fetch. e.g.:

python ffffind.py philgyford 2

That will fetch the first two pages of pictures for philgyford.

Results

If all goes well you will end up with a structure like this in the script's directory (but with your username):

philgyford/
	images/
		0soundmagn01.jpg
		0001TV-1431.gif
		002.jpg
		etc...
	images.json
	page1.html
	page2.html
	page3.html
	etc...
	styles.css

Open page1.html in a web browser and enjoy those happy image-based memories.

images.json

images.json contains data about all of the images saved. It is structured something like this:

[
	{
		"page_title": "Photos of a traffic jam stuck in the woods for 70 years | Death and Taxes", 
		"backup_url": "http://img.ffffound.com/static-data/assets/6/eda34c4268e421ec4131f0a1b2ecba5e810ab12a_m.jpg", 
		"filename": "chatillon-car-graveyard-abandoned-cars-cemetery-belgium-4.jpg", 
		"image_url": "http://www.deathandtaxesmag.com/wp-content/uploads/2014/07/chatillon-car-graveyard-abandoned-cars-cemetery-belgium-4.jpg", 
		"save_time": "2014-07-11 18:36:57", 
		"page_url": "http://www.deathandtaxesmag.com/224339/photos-of-a-traffic-jam-stuck-in-the-woods-for-70-years/"
	}, 	
	...
]

page_title is the title of the web page the image was originally found on.
backup_url is the URL of the version of the image displayed on Ffffound. This might be smaller than the original.
filename is the name of the image in the images directory.
image_url is the URL of the image on the original web page.
save_time is the date and time the image was saved to Ffffound.
page_url is the URL of the web page the image was originally found on.

The script tries to fetch the image from image_url, but if that fails, it uses backup_url instead.

Some images might not have an image_url and/or page_url, if, for example, the image was removed from Ffffound at the request of the copyright owner.

Caveats

Encoding

There are some issues with encoding that result in odd characters appearing in the generated pages. Ffffound doesn't have any content-type encoding specified when requesting pages, which doesn't help. I've spent way too long trying to get this working nicely.

Any suggestions welcome. To test, do:

python ffffind.py philgyford 2

Then open page2.html and search for "Something good". The next characters are "Â«" rather than just "«". Also look in images.json.

I know the code is horrible.

Contact

Phil Gyford
phil@gyford.com
@philgyford

philgyford / ffffound-export