liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to get the sequence of C region?

ljy-sys opened this issue · comments

Excuse me again. Because I analyzed the TCR information of smart-seq3 through Trust-Smartsep.pl, but in the report and airr files, I do not found the sequence of the constant region C region, so I hope to get your help on how to obtain the sequence of the constant region C region.

TRUST4 only assembles the first portion of C genes, maybe around 200bp. To get those sequences, you also need the AIRR format, and utilize the "sequence" column and the J_align column, where everything after J_align may correspond to C gene. Or you extract the sequences after the "sequence_alignment" portion.

I can add a "c_cigar" column in TRUST4 later, which will give you a more accurate range of C gene on the sequence.

TRUST4 only assembles the first portion of C genes, maybe around 200bp. To get those sequences, you also need the AIRR format, and utilize the "sequence" column and the J_align column, where everything after J_align may correspond to C gene. Or you extract the sequences after the "sequence_alignment" portion.

I can add a "c_cigar" column in TRUST4 later, which will give you a more accurate range of C gene on the sequence.

TRUST4 only assembles the first portion of C genes, maybe around 200bp. To get those sequences, you also need the AIRR format, and utilize the "sequence" column and the J_align column, where everything after J_align may correspond to C gene. Or you extract the sequences after the "sequence_alignment" portion.

I can add a "c_cigar" column in TRUST4 later, which will give you a more accurate range of C gene on the sequence.

Yes, I get the C gene sequence by processing the airr file: extract the sequence after the sequence contained in the "sequence_alignment" column in the "sequence" column, which is the partial sequence of the C gene. But there are two questions, the first is the sequence of FR4 region (J gene part) is not included in the "sequence"? the second is that with the current version, we can only get a partial sequence of the C gene, right?

If the assembled contig contains the j gene part, it will be in both sequence and sequence_alignment columns.

Right. C gene is much less diverse, so there is no need for full-length C gene assembly to identify it. Just curious, why do you need the full sequence of C gene?

I forgot to mention that the header i the _annot.fa file in the smartseq wrapper also contains the coordinate for the C gene, which probably is more accurate than using all the sequences after J gene.

I forgot to mention that the header i the _annot.fa file in the smartseq wrapper also contains the coordinate for the C gene, which probably is more accurate than using all the sequences after J gene.

Thanks very much! I got it. The main reason why I want the C gene sequence is to further understand smart-seq data and clear the use of TRUST4. Thank you again for your timely reply! Hey hey^_^