Missing J sequences
Januaryyiyue opened this issue · comments
Hello,
I ran TRUST4 on my whole-genome sequencing data, and got the report.tsv file.
Something from the report I don't get is why some clones only have the V gene usage information and not D and J gene usage information. There are also cases where the J gene information is available but not the V gene information.
Please find two examples below:
#count frequency CDR3nt CDR3aa V D J C cid cid_full_length
40 1.632653e-01 TGTATGATCGAGCACAGCAGAGCTTCTCATGCTGACACACACAGGTGG CMIEHSRASHADTHRW IGLV5-45*01 . . . assemble123 0
3 3.333333e-01 TGTGACAATAACAATGACATGCGCTTT CDNNNDMRF . . TRAJ43*01 TRAC assemble2106 0
Could someone explain why this is the case? Thank you so much!
It depends on the underlying contig, which may not have sufficient length overlap with V/J gene to get the annotation. Since your data is WGS, it is more likely these were from the non-recombined V,J genes, and their genomic sequence happens to contain the CDR3 motifs.
I'm working on a new feature to filter these false positive CDR3s from genomic regions, which could happen often in WGS data.