liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing J sequences

Januaryyiyue opened this issue · comments

Hello,
I ran TRUST4 on my whole-genome sequencing data, and got the report.tsv file.
Something from the report I don't get is why some clones only have the V gene usage information and not D and J gene usage information. There are also cases where the J gene information is available but not the V gene information.

Please find two examples below:

#count	frequency	CDR3nt	CDR3aa	V	D	J	C	cid	cid_full_length
40	1.632653e-01	TGTATGATCGAGCACAGCAGAGCTTCTCATGCTGACACACACAGGTGG	CMIEHSRASHADTHRW	IGLV5-45*01	.	.	.	assemble123	0
3	3.333333e-01	TGTGACAATAACAATGACATGCGCTTT	CDNNNDMRF	.	.	TRAJ43*01	TRAC	assemble2106	0

Could someone explain why this is the case? Thank you so much!

It depends on the underlying contig, which may not have sufficient length overlap with V/J gene to get the annotation. Since your data is WGS, it is more likely these were from the non-recombined V,J genes, and their genomic sequence happens to contain the CDR3 motifs.

I'm working on a new feature to filter these false positive CDR3s from genomic regions, which could happen often in WGS data.