TRUST4 on 10X 3' sequencing output question
RaghadShu opened this issue · comments
Hello,
Thank you for the great tool! I ended up testing out TRUST4 on 10X 3' sc-RNAseq data to try an estimate BCR of each plasma cell from Multiple Myeloma patients. For downstream analysis, I am using scRepertoire. I asked most of my questions there (if you need further explanation: ncborcherding/scRepertoire#277) but there is one that maybe you can help me with.
I am interested in knowing how many of the cells I used as input was TRUST4 able to retrieve BCR for. Using scRepertoire, I used the outputs of TRUST4, then combined BCRs using combineBCR()
, and use quantContig()
to get this table:
Where:
- Contigs is the number of unique clonotypes
- Values is the grouping variables
- Total is total cells recovered
- Scaled is unique clonotypes/total
My issue is with the total field: The input cell numbers for MGUS1 is around 13k cells. This is the number of cells from the 10X 3' sequencing, which I used as input for TRUST4. But the "total" here in scRepertoire is around 18k.
Developers of scRepertoire suggest to "quantify the number of barcodes from your TRUST4 output directly" but I am not sure how to do so or which files to use. I read that in the MGUS1_barcode_report.tsv the number of lines should be equal to the number of barcodes, but the number of lines there is 97k. How do you suggest I find the number of cells that TRUST4 was able to re-construct BCRs from?
Best,
Raghad
TRUST4 may pick the BCRs from ambient barcodes, so the count can be above what you would expect. In practice, I always intersect with barcodes passed quality control from traditional scRNA-seq analysis workflow. You can also use "barcode-filter.py" script in the script folder to filter some of the false positive calls.
The "barcode_report.tsv" is the right file, but it may include many light-chain only barcodes, so you may further filter some entries based on your definition of clonotypes, i.e., heavy chain, paired-chain.