nolauren / photogrammar

Code for getting and exploring the photogrammar data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

photogrammar

Code for getting and exploring the photogrammar data.

For example, we download a list of all the photo ids (these uniquely define the urls for scraping the rest of the data, by running the following code:

python src/get_photo_ids.py

This creates a file pickle/all_urls.p, a python pickle file. Now we can run the code to download MARC records from the Library of Congress website for all photo ids in the all_urls.p file. This is done by:

python src/get_marc_records.py

When finished, there should be files in the marc_records directory, such as 'marc_recordsfsa1997000988.csv'. Now, to finish the first stage of the scrape, we download the image urls using a similar syntax:

python src/get_img_urls.py

Which will create text files in the directory 'img_url' such as 'img_url/fsa1997000987.txt' which contain the urls of the photo images.

About

Code for getting and exploring the photogrammar data

License:GNU General Public License v2.0


Languages

Language:Python 66.2%Language:OpenEdge ABL 33.8%