liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

should the best record of multiple records be used for downstream analysis?

chunxuan-hs opened this issue · comments

Hi TRUST4 developers,

I am a newbie on the immune repertoire analysis and would like to ask really basic/stupid questions.

After running the TRUST4 on the bulk RNAseq data, I noted that there is some discrepancy between the "XX_anno.fa" and "XX_report.tsv"/"XX_airr.tsv"/"XX_cdr3.out" files.

In the "_anno.fa" file, only the best record of each assemble is included, while in the "_report.tsv/airr.tsv/cdr3.out", they could be several records for one assemble, among which the index 0 record is used in the "_anno.fa",.

I am wondering should I only keep the best record (index 0) in "report.tsv/airr.tsv/cdr3.out" analysis for the downstream analysis? In the example folder of this repo, I noted there is only one record of each assemble, and have you removed non-best records deliberately?

Thanks!

The assembled contigs from TRUST4 are consensus, which may encodes multiple CDR3s. So the _0, _1,..., will read out CDR3 with less abundance encoded in the consensus. The example file is very simple, so each contig encodes one CDR3.

The _report and _airr coalesce the clonotypes from _cdr3.out with the same V, J, C gene assignment and CDR3 nucleotide sequences, and select the contig with the most read support as the representative. Therefore, you don't need to filter for "_0", "_1",.. in general.

Many thanks the quick reply and clear explanations!

Another question about the counts of CDR3, as the datasets I processed are regular RNAseq data which are not enriched for repertoires, quite a few of the CDR3 are assigned with only 2 read counts. I am wondering would it be better to set a thresholds to remove SHM CDR3s with low counts? And if yes, which value do you use in practice?

I usually include all of them. If false positive is a concern, I would just filter the singleton CDR3s (with 1 read count).

Thanks for the advise!