not grabbing PDBs from Alphafold
abbyjerger opened this issue · comments
PDB files should be grabbed from the Alphafold URL in the script extract_structure_representation.py. Currently the script uses urllib.request.urlretrieve(), which doesn't seem to work with certain security protocols for systems such as PNNL's HPC Deception. A new way to pull these PDBS should be used.
Closing this issue because it now seems that this is not the actual problem I'm running into. I'll open a new issue to address the actual problem.
PDB files are being successfully obtained (if they exist in the Alphafold database, as expected) when I run extract_structure_representation.py on PNNL's HPC. Errors before might have been related to how I was testing.
Potential edits to extract_structure_representation.py or functionality relating to PDBs we should discuss later:
- Only create the error_ids.txt if it contains any IDs.
- Perhaps we set up a script that allows the user to check if their IDs are in Alphafold, before running any other steps.
- At the end of extract_structure_representation.py, we might want to check that the IDs are now found in all the affected folders such as the user's PDB file location, data/contact_maps, data/resnet_data, and (from the previous retrieve_esm2_embedding step) data/esm2_data.
- Include more specific exception handling (if an ID shows up in error_ids.txt is it because of an issue with urlretrieve() or the URL, or because the ID just doesn't exist in Alphafold?).