gcerretani / antenati

Tools to download data from Portale Antenati

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Script fails where multiple years share same gallery ID

vinnydabody opened this issue · comments

Some state archives do not create separate galleries for each year and record type. E.g. Modugno (BA) Stato Civile Napoleonico uses one gallery number for all births, one for all deaths, etc. Image numbers are not reused - e.g. 1814 deaths might contain images 1-100 and 1815 deaths contains images 101-200

When the script encounters an already existing subdirectory created in a previous run, an error is thrown and the script terminates:

~/Documents/Antenati/Modugno $ antenati.py http://dl.antenati.san.beniculturali.it/v/Archivio+di+Stato+di+Bari/Stato+civile+napoleonico/Modugno/Morti/1815/005619901_02177.jpg.html
Traceback (most recent call last):
File "/usr/bin/antenati.py", line 74, in
main()
File "/usr/bin/antenati.py", line 52, in main
os.mkdir(splitting[13])
OSError: [Errno 17] File exists: '005619901'

Maybe the test for a duplicate name shouldn't be in the creation of the subdirectory but rather in looking to see if a downloaded file is going to overwrite a file in the target directory with the same name?

In the meantime I got around it by renaming the subdirectory after a run is finished.

@vinnydabody thanks a lot. I'm glad the script is useful to somebody else! I fixed it, modifiing the strategy of the folder name generation. I think now it's more friendly and should work in most cases.

I do like the new logic. But now there is a different problem. If an archive for a particular year (example: Modugno births 1866 [http://dl.antenati.san.beniculturali.it/v/Archivio+di+Stato+di+Bari/Stato+civile+italiano/Modugno/Nati/1866/]) has multiple subfolders (in this case Parte 1 and Parte 2), when executing the script for the first image in Parte 1 a folder is created on the local drive the images are fetched, adding "Parte 1" before the folio number and image number in the filename. When trying to execute the script for the first image in Parte 2, the script tells you the folder already exists and exits.

I tried to add some logic when a folder already exists to ask the user if he wants to continue using click.confirm(), but I am a hack at programming, and know very little about Python, and the indentation is causing me problems. This is what I tried to add:

import click
...
    if os.path.exists(foldername):
		if click.confirm("Directory " + foldername + " already exists. Do you want to copy images to this directory?", default=True, abort=True):
	else:
		os.mkdir(foldername)
		
	os.chdir(foldername)

@vinnydabody thanks a lot for your feedback. Now the folder should be always created using the full gallery name, independently on how many subfolders it contains. I also have implemented your runtime check with used prompt.
If the folder already exists and containg images, files are automatically overwritten if already existing. Thanks a lot, again, and let me know if you have other problems.