How to get the sequence of C region？

Question

How to get the sequence of C region？

ljy-sys opened this issue 6 months ago · comments

Excuse me again. Because I analyzed the TCR information of smart-seq3 through Trust-Smartsep.pl, but in the report and airr files, I do not found the sequence of the constant region C region, so I hope to get your help on how to obtain the sequence of the constant region C region.

Li Song · Answer 1 · Wed Jan 17 2024 15:04:37 GMT+0800 (China Standard Time)

TRUST4 only assembles the first portion of C genes, maybe around 200bp. To get those sequences, you also need the AIRR format, and utilize the "sequence" column and the J_align column, where everything after J_align may correspond to C gene. Or you extract the sequences after the "sequence_alignment" portion.

I can add a "c_cigar" column in TRUST4 later, which will give you a more accurate range of C gene on the sequence.

ljy-sys · Answer 2 · Wed Jan 17 2024 16:09:43 GMT+0800 (China Standard Time)

TRUST4 only assembles the first portion of C genes, maybe around 200bp. To get those sequences, you also need the AIRR format, and utilize the "sequence" column and the J_align column, where everything after J_align may correspond to C gene. Or you extract the sequences after the "sequence_alignment" portion.

I can add a "c_cigar" column in TRUST4 later, which will give you a more accurate range of C gene on the sequence.

Yes, I get the C gene sequence by processing the airr file: extract the sequence after the sequence contained in the "sequence_alignment" column in the "sequence" column, which is the partial sequence of the C gene. But there are two questions, the first is the sequence of FR4 region (J gene part) is not included in the "sequence"? the second is that with the current version, we can only get a partial sequence of the C gene, right？

Li Song · Answer 3 · Wed Jan 17 2024 23:54:47 GMT+0800 (China Standard Time)

If the assembled contig contains the j gene part, it will be in both sequence and sequence_alignment columns.

Right. C gene is much less diverse, so there is no need for full-length C gene assembly to identify it. Just curious, why do you need the full sequence of C gene?

Li Song · Answer 4 · Thu Jan 18 2024 05:11:48 GMT+0800 (China Standard Time)

I forgot to mention that the header i the _annot.fa file in the smartseq wrapper also contains the coordinate for the C gene, which probably is more accurate than using all the sequences after J gene.

ljy-sys · Answer 5 · Thu Jan 18 2024 14:24:16 GMT+0800 (China Standard Time)

I forgot to mention that the header i the _annot.fa file in the smartseq wrapper also contains the coordinate for the C gene, which probably is more accurate than using all the sequences after J gene.

Thanks very much! I got it. The main reason why I want the C gene sequence is to further understand smart-seq data and clear the use of TRUST4. Thank you again for your timely reply! Hey hey^_^