Question about hla_referenence_dna.fasta

Question

Question about hla_referenence_dna.fasta

Atalasia opened this issue 8 years ago · comments

Hello,

I was looking through the given fasta files under data directory and noticed that some of the sequences provided in there are combinations of coding sequence and non-coding sequence.

What puzzles me is that some of them have different first 2 digits. (ex. HLA07296_HLA00097 HLA-A_33:53 (introns from HLA-A_31:01:02)). And it seems like the list of combinations of exon and intron that share first two digits are not exhaustive.

Is there a logic to how these alleles are combined? (And if I were to update this fasta to more recent alleles from the HLA db, what logic should I use to combine the alleles?)

Thanks.

Clemens Messerschmidt · Answer 1 · Thu Feb 18 2016 20:48:14 GMT+0800 (China Standard Time)

@Atalasia,

as you might have noticed HLA00097 translates to HLA-A31:01:02 (just search for it in the file).
This just means that the first allele (HLA07296) had no intron sequence available and had to be imputed somehow.

Citing the paper:

We impute the missing sequence data by replacing it with its closest neighbor with respect to sequence similarity from among the complete allele sequences

So rather than you updating the database, it is probably easier if there would be an update to OptiType (they should also have tested code for that). @andras86, what do you think?

I would also be very interested if that improves performance.

andras86 · Answer 2 · Thu Feb 18 2016 20:56:08 GMT+0800 (China Standard Time)

Thank you @messersc, this is exactly what happens, we lift over intron sequences from the most similar alleles. Updating them yourself would be difficult as there's a lot of sanity checking and some manual work in this process, but we're making a more-or-less automated pipeline to do it. We won't update the database for the current OptiType version, but the coming-soon OT2 (with Class II typing) will have the most recent one and we'll keep updating it.

Alvin Ng · Answer 3 · Thu Feb 25 2016 21:28:08 GMT+0800 (China Standard Time)

@andras86 Do let us know when the OT2 will be out and I'm definitely looking forward to it.

loreseeker · Answer 4 · Wed Mar 29 2017 18:13:03 GMT+0800 (China Standard Time)

@andras86 Hello. I was just wondering how the automated-pipeline for updating the OptiType database is coming along? I'm interested in using the updated HLA alleles with OptiType, but would rather not impute the introns on my own, if a validated procedure already exists. Is any of your code related to this publically available? Thank you.

Yu Wang · Answer 5 · Tue Apr 11 2017 06:35:29 GMT+0800 (China Standard Time)

@andras86 Hi. I also have the same request for using the updated HLA reference database since I realized that there are two things for me impossible to do by myself: 1. build genomic sequences for truncated alleles from intron 1 to intron 3 by imputing most similar intron from other allele. 2. how to format "alleles.h5" file. Looking forward to the updating.
Thank you.

Arun Khattri · Answer 6 · Tue Apr 11 2017 06:41:11 GMT+0800 (China Standard Time)

@andras86 Looking forward to OT2. Thanks.

zimoun · Answer 7 · Tue Apr 25 2017 20:50:42 GMT+0800 (China Standard Time)

@AlfredShawn alleles.h5is a HDF5 file, therefore you can explore it using whatever packages such that, e.g., python ones: h5py or Pandas.

All the best

lidd77 · Answer 8 · Mon Jun 17 2019 16:00:02 GMT+0800 (China Standard Time)

@zimoun could you please provide the automated-pipeline for updating the OptiType database ? I also wanna use the latest HLA database from IGMT.

Sergey Mitrofanov · Answer 9 · Mon Jun 17 2019 17:11:51 GMT+0800 (China Standard Time)

I would appreciate any instruction for both hla_referenence_dna.fasta and hla_referenence_rna.fasta update! Can anyone provide it, please?

zimoun · Answer 10 · Tue Jun 18 2019 22:39:48 GMT+0800 (China Standard Time)

@lidd77 @serge2016 My comment was more than 2 years ago. :-)
Now I do not use OptiType anymore. Therefore, I cannot provide nothing.
Last, note that my comment did not suggest any "automated-pipeline for updating the OptiType database" but only a way to explore alleles.h5 to format it.

All the best and Godspeed!

Sergey Mitrofanov · Answer 11 · Tue Jun 18 2019 23:00:15 GMT+0800 (China Standard Time)

@zimoun and what do you use now instead of OptiType?