Frequent reporting of a null allele for 10X data
vincentwalter opened this issue · comments
Thanks for creating arcasHLA!
Since 10X references that arcasHLA does a good job at genotyping I tried to use it with scRNA data.
I extracted reads using the following command:
arcasHLA extract \
--single --unmapped \
sample_alignments.bam
And then genotyped:
arcasHLA genotype \
-g A,B,C,DQA1,DQB1,DRB1 \
-p caucasian \
--single \
sample_alignments.extracted.fq.gz
This results in the following genotypes being assigned:
locus | subject_1 | subject_2 | subject_3 | subject_4 | subject_5 |
---|---|---|---|---|---|
A | A*01:01:140,A*11:303 | A*24:608N,A*24:608N | A*01:01:143,A*03:01:103 | A*29:01:01,A*29:01:01 | A*03:01:119,A*24:608N |
B | B*07:386N,B*07:386N | B*07:386N,B*40:01:02 | B*07:386N,B*57:01:01 | B*07:386N,B*07:386N | B*07:386N,B*07:386N |
C | C*07:01:106,C*07:02:104 | C*03:392N,C*07:02:101 | C*07:02:128,C*07:02:128 | C*15:05:02,C*15:05:02 | C*07:02:101,C*07:02:101 |
DQA1 | DQA1*03:01:11,DQA1*03:03:01 | DQA1*01:02:01,DQA1*01:02:01 | DQA1*01:03:01,DQA1*01:05:01 | DQA1*01:01:01,DQA1*01:05:01 | DQA1*05:05:01,DQA1*01:02:01 |
DQB1 | DQB1*03:02:01,DQB1*03:01:01 | DQB1*06:304N,DQB1*06:02:01 | DQB1*06:352,DQB1*05:01:01 | DQB1*05:01:01,DQB1*05:01:01 | DQB1*03:01:46,DQB1*06:02:01 |
DRB1 | DRB1*04:01:01,DRB1*04:01:01 | DRB1*08:01:01,DRB1*08:01:01 | DRB1*10:01:01,DRB1*13:01:01 | DRB1*10:01:01,DRB1*01:02:01 | DRB1*11:04:01,DRB1*15:01:01 |
Since the subjects were selected because they share a CD8+ T cell response to a certain antigen I'm not surprised that they share class B alleles. However, I don't think it's plausible that they all share B*07:386N, since it's a null allele. In case of subjects 1, 4 & 5 the result even implies that B*07:386N makes up both alleles.
Do you have an explanation for this?
Is there a way to exclude null alleles from the reference?