Templates containing X don't work witn anarci
pwl opened this issue · comments
When generating the tcr_seqs.json
file I've run into
Traceback (most recent call last):
File "run_tcrmodel2.py", line 334, in <module>
app.run(main)
File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "run_tcrmodel2.py", line 299, in main
cdr3, seq=parse_tcr_seq.parse_anarci(anarci_out)
File "/tcrmodel2/scripts/parse_tcr_seq.py", line 23, in parse_anarci
num=int(fields[1])
ValueError: invalid literal for int() with base 10: 'Unknown'
This error was also mentioned in #2 . It seems to be caused by one of the templates containing an X
amino acid, which leads to ANARCI raising
Error: Unknown amino acid letter found in sequence: X
in which case parse_anarci
returns Unknown
.
In my case this was the5xot_D
template but there are a lot more templates with X
in data/databases/pdb_seqres.txt
.
I'm not sure what to do about this. The error does not seem critical as the structures were already generated at this point. This could probably be handled by adding a special case in parse_anarci
so that it returns an empty list in this case.
This seems to be an open issue with anarci.