dataset-joined pdb_residues file doesn`t match with fasta sequence

Question

dataset-joined pdb_residues file doesn`t match with fasta sequence

ProkopDivin opened this issue 2 years ago · comments

I run these commands, where joined.ds is from: https://github.com/rdk/p2rank-datasets

./prank.sh analyze residues joined.ds
./prank analyze fasta-masked joined.ds

But several files with residues don`t match with the fasta sequence.
All the files are here:
files.zip

In these files length of the sequence of chain, I and L are OK, but the sequence of the chain H should be longer according to csv file.

1hxf.pdb_residues.csv

1hxf_H.fasta
1hxf_I.fasta
1hxf_L.fasta

In these files, the length of chain A is 66 and the length of B is 65 but there are 232 rows in 1pts.pbd_residues.csv and I'm not getting any other files.

1pts.pbd_residues

1pts_A.fasta
1pts_B.fasta

I always get one fasta file for each csv file with residues and the sequence is shorter than the number of rows in csv.

1bbs.pdb_residues.csv
1bb_A.fasta

1chg.pdb_residues.csv
1chg_A.fasta

1djb.pdb_residues.csv
1djb_A.fasta

2cba.pdb_residues.csv
2cba_A.fasta

2fbp.pdb_residues.csv
2fbp_A.fasta

2tga.pdb_residues.csv
2tga_A.fasta

3lck.pdb_residues.csv
3lck_A.fasta

3p2p.pdb_residues.csv
3p2p_A.fasta

3ptn.pdb_residues.csv
3ptn_A.fasta

4ca2.pdb_residues.csv
4ca2_A.fasta

5dfr.pdb_residues.csv
5dfr_A.fasta