keepgrabbing.py

This is a transcription of the Python script which Aaron Swartz used to download a large number of documents from JSTOR archive between 2010 and 2011.

I'm not sure what Aaron would have wanted us to do with this code, but my instinct is that he'd want it freely available, and it's worth having in an executable machine readable format under version control, rather than on a hard drive somewhere which has long since stopped spinning. I guess this is sort of a memorial in some sense.

Rest in peace.

Todo

Line 5 contains a redacted hostname/domain, does anyone know what that was?
sprky0#1 @speedplane points out there there was a second version of the script (keepgrabbing2.py) which is referenced in the indictment. If anyone has a copy of this please get in touch or submit a pull request.

lo-co / jstor

keepgrabbing.py

Todo

About

Languages