soedinglab / hhdatabase_cif70

Scripts to generate the pdb70 database for hh-suite on the basis of pdb's mmcif format

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pdb70 contains obsolete PDB sequences

Augustin-Zidek opened this issue · comments

You can observe this issue for instance with 1cco which is obsolete and has been replaced by 2cco.

However, 1cco is still present in the most recent (as of 2020-01-07) pdb70 database while it should have been replaced by 2cco.

My guess is that this is because of the following two lines in the pdb70_update.sh script:

rsync --progress -rlpt -v -z --port=33444 rsync.wwpdb.org::ftp/data/structures/divided/mmCIF/ ${pdb_dir}
rsync --progress -rlpt -v -z --port=33444 rsync.wwpdb.org::ftp/data/structures/obsolete/mmCIF/ ${pdb_dir}/obsolete

If the directories ${pdb_dir} and {pdb_dir}/obsolete already contain PDB mmCIF files from a previous sync, then rsync will not get rid of mmCIF files that have been made obsolete in the in between those two syncs.

Hence {pdb_dir} will contain more and more obsolete mmCIF files as time goes and the pdb70 database that is produced afterwards will contain more and more sequences pointing to obsolete PDB entries.

Assuming these two lines are causing the issue, I believe the fix is simple: the argument --delete should be added to the two rsync commands:

rsync --progress -rlpt -v -z --delete --port=33444 rsync.wwpdb.org::ftp/data/structures/divided/mmCIF/ ${pdb_dir}
rsync --progress -rlpt -v -z --delete --port=33444 rsync.wwpdb.org::ftp/data/structures/obsolete/mmCIF/ ${pdb_dir}/obsolete

More information on this issue. This is a complete list (741 items) of obsolete structures in pdb70_200205:

1apg, 1apr, 1aub, 1ayq, 1bgr, 1br7, 1bxj, 1cco, 1cum, 1cx0, 1ddc, 1dyv, 1e7t, 1f9y, 1gdo, 1gga,
1giy, 1gma, 1grk, 1grs, 1h77, 1hrx, 1hs0, 1hwx, 1hzr, 1ij7, 1j2h, 1jsk, 1k87, 1kc0, 1krk, 1l1b,
1lbd, 1ldx, 1m98, 1mlt, 1msl, 1mwf, 1mxm, 1n3v, 1o0z, 1o2c, 1o5y, 1o6n, 1oju, 1p0t, 1pgk, 1pmz,
1pns, 1pnu, 1pny, 1pte, 1pxf, 1q4f, 1q4m, 1q9r, 1r7k, 1r9e, 1rhe, 1rig, 1rmw, 1ru8, 1rw3, 1s1h,
1s1i, 1she, 1sia, 1spz, 1tea, 1teb, 1teo, 1ti4, 1tm8, 1tqk, 1uah, 1uf6, 1umm, 1un7, 1utw, 1uul,
1uvw, 1v8a, 1v9b, 1vfk, 1vjp, 1vlf, 1vor, 1voy, 1vp0, 1vpg, 1vsa, 1vsy, 1vsz, 1vu2, 1vw4, 1vw5,
1vwz, 1vx2, 1vx7, 1vy1, 1vy9, 1vyy, 1wb1, 1wb3, 1we3, 1wf4, 1wga, 1wll, 1wlq, 1wph, 1wuc, 1x5u,
1xc2, 1xg9, 1xm0, 1xv1, 1y0f, 1yh6, 1yku, 1yl3, 1ymf, 1yo3, 1yvv, 1zgm, 1zpy, 1zzo, 2a7v, 2ali,
2amw, 2ao0, 2ape, 2aqm, 2b66, 2b79, 2bcl, 2blk, 2bpl, 2bpt, 2bsn, 2c4o, 2cmx, 2cys, 2czz, 2d12,
2dfw, 2dfz, 2dq1, 2e58, 2e79, 2ejh, 2ekv, 2eub, 2ewd, 2fb1, 2feg, 2fgd, 2frn, 2g9m, 2gjc, 2gld,
2gm6, 2gy9, 2gya, 2gyb, 2gyc, 2h1q, 2h4a, 2h58, 2h9q, 2her, 2hfx, 2hj2, 2hkt, 2hq2, 2hv0, 2hxe,
2i4f, 2i6c, 2i8g, 2iac, 2ihi, 2ijp, 2iyh, 2iz2, 2j03, 2jci, 2jl8, 2jwx, 2kd5, 2ko4, 2ko5, 2ktg,
2kuj, 2l3k, 2lcd, 2ljo, 2lrb, 2m25, 2mp6, 2mp7, 2mxr, 2n0p, 2n22, 2n2i, 2nr3, 2nr8, 2nw5, 2o1r,
2o37, 2oq8, 2p3r, 2p8k, 2p92, 2pbt, 2phj, 2pi1, 2pib, 2pjk, 2pk1, 2pq1, 2pqh, 2pw4, 2pz4, 2q25,
2q26, 2qfm, 2qgp, 2qyx, 2r4a, 2r8c, 2rfp, 2rnp, 2uvc, 2v2y, 2v44, 2v4w, 2va4, 2vdz, 2vh8, 2vhn,
2vjg, 2vol, 2vpu, 2vzf, 2w4n, 2w74, 2w9k, 2wak, 2wav, 2wga, 2wgk, 2wir, 2x1y, 2x54, 2x5a, 2x9t,
2xai, 2xbh, 2xhq, 2xip, 2xm6, 2xt9, 2xtg, 2xw2, 2xzm, 2xzn, 2y5o, 2y7v, 2y94, 2y9a, 2y9s, 2yc6,
2yc8, 2yc9, 2yf1, 2ygr, 2yhp, 2yl4, 2yws, 2ywu, 2yyd, 2z0c, 2z2q, 2z4n, 2z9e, 2zkq, 2zlz, 3a63,
3ac6, 3agj, 3aj0, 3an0, 3b2a, 3b2b, 3b3e, 3bbn, 3bbo, 3bc7, 3bcl, 3bpy, 3bqs, 3c4i, 3c68, 3c7b,
3cpn, 3ctu, 3cum, 3cxe, 3d5c, 3df2, 3dfd, 3dm4, 3dma, 3dmq, 3dnw, 3dth, 3eds, 3em5, 3eme, 3end,
3eow, 3epq, 3esg, 3eu6, 3eyr, 3f09, 3f1g, 3f1h, 3f94, 3fic, 3fih, 3fin, 3g41, 3g63, 3gbc, 3ghx,
3gjg, 3gqw, 3gxs, 3h27, 3h88, 3hh9, 3hho, 3hnn, 3hqs, 3hrj, 3i8i, 3icg, 3ifh, 3ih1, 3ikz, 3irg,
3iyn, 3iyu, 3iz5, 3iz6, 3izb, 3izc, 3izr, 3izs, 3j00, 3j01, 3j14, 3j18, 3j20, 3j21, 3j2k, 3j36,
3j38, 3j39, 3j3a, 3j3b, 3j43, 3j44, 3j60, 3j61, 3j65, 3j6v, 3j72, 3j74, 3j9a, 3jvp, 3jyv, 3jyw,
3k95, 3ka1, 3kax, 3kcr, 3kdv, 3kh6, 3kiu, 3kpi, 3ku8, 3kub, 3kyp, 3kz2, 3l2g, 3l3e, 3l52, 3l53,
3l5q, 3l7s, 3l80, 3l99, 3lfx, 3lj4, 3loh, 3ltx, 3lwy, 3lxw, 3m2o, 3m2q, 3m5f, 3m9p, 3mjj, 3mkx,
3mnt, 3mqu, 3mub, 3n09, 3n2f, 3n3v, 3nl4, 3nuw, 3nvb, 3nzy, 3o5h, 3oaq, 3oe2, 3or8, 3or9, 3ora,
3osc, 3p0d, 3p8q, 3p9d, 3p9e, 3pkr, 3px5, 3pyt, 3qd1, 3qhu, 3qnh, 3qrz, 3r39, 3r70, 3rca, 3rcq,
3s6n, 3snr, 3t0n, 3t18, 3t68, 3tkv, 3tt5, 3tuw, 3tve, 3tvh, 3u1f, 3u42, 3u5c, 3u5e, 3u5g, 3u5i,
3usa, 3uym, 3v28, 3vdo, 3vku, 3vvj, 3w14, 3wko, 3wt8, 3wt9, 3wta, 3x2d, 3x3e, 3zdk, 3zey, 3zf7,
3zj3, 3zn9, 3zvp, 4a17, 4a18, 4a19, 4a1a, 4a1b, 4a1c, 4a1d, 4a1e, 4a1q, 4a3a, 4abr, 4agy, 4apk,
4atr, 4aus, 4azx, 4b1o, 4b1p, 4bpe, 4bpn, 4bpo, 4bpp, 4btc, 4btd, 4byc, 4c0v, 4cds, 4crr, 4cuw,
4cuy, 4cwg, 4cwl, 4cwu, 4cyx, 4d0h, 4d0i, 4d0j, 4d15, 4d8q, 4d8r, 4d93, 4dah, 4dde, 4dh9, 4di7,
4div, 4e8i, 4eaz, 4egb, 4eho, 4emc, 4ew8, 4exu, 4f77, 4f7y, 4fy1, 4fyl, 4g1s, 4gdg, 4gdv, 4gns,
4h1k, 4hg1, 4hhc, 4hrz, 4hub, 4i4a, 4igc, 4igz, 4iog, 4ioz, 4j13, 4jc9, 4jcb, 4je2, 4jni, 4jnr,
4jux, 4jyr, 4k0m, 4k0q, 4ka6, 4kbu, 4kcz, 4kd0, 4keo, 4kfh, 4kfk, 4kfl, 4kg1, 4kix, 4kj1, 4kkf,
4kkz, 4l3m, 4l6j, 4lhg, 4lu2, 4m6z, 4mxg, 4myj, 4n46, 4n5w, 4npg, 4nys, 4oe8, 4om6, 4onb, 4otw,
4pi4, 4pt3, 4pwr, 4q55, 4q61, 4q8j, 4qea, 4ql4, 4qyc, 4r1x, 4rmx, 4ro6, 4roh, 4roi, 4rre, 4rsb,
4rsf, 4ru8, 4rxk, 4s06, 4tly, 4u29, 4u9t, 4ues, 4ujo, 4ujp, 4ujr, 4ujs, 4ujt, 4uli, 4uol, 4uq7,
4v5u, 4v80, 4w28, 4w7b, 4wbn, 4wl5, 4wus, 4wwe, 4wwt, 4wz1, 4x0i, 4x26, 4xdb, 4yfc, 4ytj, 4yx8,
4zeh, 4zj4, 4zxj, 5a0k, 5aaj, 5aft, 5aje, 5ak1, 5aow, 5apb, 5azm, 5b69, 5bri, 5c3v, 5c96, 5cf7,
5cii, 5cnz, 5cvx, 5d7t, 5dcj, 5dng, 5dvg, 5e47, 5f1d, 5f45, 5f7i, 5fxb, 5gj1, 5gj8, 5gnn, 5hg6,
5hkt, 5hmt, 5hsn, 5i31, 5ina, 5iog, 5j65, 5jit, 5jnz, 5jt3, 5kbo, 5l1y, 5l9m, 5lql, 5m6w, 5mb8,
5mz9, 5n7r, 5ng8, 5npx, 5o8g, 5oag, 5oii, 5t8x, 5tkx, 5tsi, 5uc2, 5udb, 5up6, 5uz5, 5v14, 5v3e,
5v78, 5vfl, 5wfa, 5wq5, 5wsi, 5wvq, 5ww2, 5x3a, 5x3q, 5xnu, 5xpq, 5zd5, 6bll, 6c03, 6c0c, 6c2j,
6dip, 6drp, 6e79, 6edo, 6ees, 6elh, 6esr, 6etx, 6feu, 6fjr, 6fot, 6gkp, 6gww, 6h7k, 6jlw, 6mf7,
6mir, 6moc, 6mpu, 6msz, 6n3t

I produced this list by looking at all PDB IDs in pdb70_a3m.ffindex and searching if they are present in the official list of obsolete PDB structures: ftp://ftp.wwpdb.org/pub/pdb/data/status/obsolete.dat.

This is now fixed, thanks!