liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Description of columns in trust4_report.tsv, specifically column C

divya-venkat opened this issue · comments

I cannot find any description for the columns in trust4_report.tsv. Need to clarify whether the C column gives the constant region gene or chain type.
Also, what does it mean when the value of C is blank or “.” ?

Appreciate any help, thanks!

The description of this format is at: https://github.com/liulab-dfci/TRUST4?tab=readme-ov-file#inputoutput . In particular, for report.tsv file:

read_count	frequency(proportion of read_count)	CDR3_dna	CDR3_amino_acids	V	D	J	C	consensus_id consensus_id_full_length

The C column is the constant region gene, not the chain type. The chain type information can be inferred from these gene names. The chain type information is explicitly in the airr output format as the locus column.

"." means that TRUST4 could not obtain the information from the contig.

That clears it up. Thanks for the prompt reply!