liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TRUST4 on 10X 3' sequencing output question

RaghadShu opened this issue · comments

Hello,
Thank you for the great tool! I ended up testing out TRUST4 on 10X 3' sc-RNAseq data to try an estimate BCR of each plasma cell from Multiple Myeloma patients. For downstream analysis, I am using scRepertoire. I asked most of my questions there (if you need further explanation: ncborcherding/scRepertoire#277) but there is one that maybe you can help me with.

I am interested in knowing how many of the cells I used as input was TRUST4 able to retrieve BCR for. Using scRepertoire, I used the outputs of TRUST4, then combined BCRs using combineBCR(), and use quantContig() to get this table:
image

Where:

  • Contigs is the number of unique clonotypes
  • Values is the grouping variables
  • Total is total cells recovered
  • Scaled is unique clonotypes/total

My issue is with the total field: The input cell numbers for MGUS1 is around 13k cells. This is the number of cells from the 10X 3' sequencing, which I used as input for TRUST4. But the "total" here in scRepertoire is around 18k.

Developers of scRepertoire suggest to "quantify the number of barcodes from your TRUST4 output directly" but I am not sure how to do so or which files to use. I read that in the MGUS1_barcode_report.tsv the number of lines should be equal to the number of barcodes, but the number of lines there is 97k. How do you suggest I find the number of cells that TRUST4 was able to re-construct BCRs from?

Best,
Raghad

TRUST4 may pick the BCRs from ambient barcodes, so the count can be above what you would expect. In practice, I always intersect with barcodes passed quality control from traditional scRNA-seq analysis workflow. You can also use "barcode-filter.py" script in the script folder to filter some of the false positive calls.

The "barcode_report.tsv" is the right file, but it may include many light-chain only barcodes, so you may further filter some entries based on your definition of clonotypes, i.e., heavy chain, paired-chain.