RabadanLab / arcasHLA

Fast and accurate in silico inference of HLA genotypes from RNA-seq

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Frequent reporting of a null allele for 10X data

vincentwalter opened this issue · comments

Thanks for creating arcasHLA!

Since 10X references that arcasHLA does a good job at genotyping I tried to use it with scRNA data.

I extracted reads using the following command:

arcasHLA extract \
    --single --unmapped \
    sample_alignments.bam

And then genotyped:

 arcasHLA genotype \
    -g A,B,C,DQA1,DQB1,DRB1 \
    -p caucasian \
    --single \
    sample_alignments.extracted.fq.gz

This results in the following genotypes being assigned:

locus subject_1 subject_2 subject_3 subject_4 subject_5
A A*01:01:140,A*11:303 A*24:608N,A*24:608N A*01:01:143,A*03:01:103 A*29:01:01,A*29:01:01 A*03:01:119,A*24:608N
B B*07:386N,B*07:386N B*07:386N,B*40:01:02 B*07:386N,B*57:01:01 B*07:386N,B*07:386N B*07:386N,B*07:386N
C C*07:01:106,C*07:02:104 C*03:392N,C*07:02:101 C*07:02:128,C*07:02:128 C*15:05:02,C*15:05:02 C*07:02:101,C*07:02:101
DQA1 DQA1*03:01:11,DQA1*03:03:01 DQA1*01:02:01,DQA1*01:02:01 DQA1*01:03:01,DQA1*01:05:01 DQA1*01:01:01,DQA1*01:05:01 DQA1*05:05:01,DQA1*01:02:01
DQB1 DQB1*03:02:01,DQB1*03:01:01 DQB1*06:304N,DQB1*06:02:01 DQB1*06:352,DQB1*05:01:01 DQB1*05:01:01,DQB1*05:01:01 DQB1*03:01:46,DQB1*06:02:01
DRB1 DRB1*04:01:01,DRB1*04:01:01 DRB1*08:01:01,DRB1*08:01:01 DRB1*10:01:01,DRB1*13:01:01 DRB1*10:01:01,DRB1*01:02:01 DRB1*11:04:01,DRB1*15:01:01

Since the subjects were selected because they share a CD8+ T cell response to a certain antigen I'm not surprised that they share class B alleles. However, I don't think it's plausible that they all share B*07:386N, since it's a null allele. In case of subjects 1, 4 & 5 the result even implies that B*07:386N makes up both alleles.

Do you have an explanation for this?
Is there a way to exclude null alleles from the reference?