alexdobin / STAR

RNA-seq aligner

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Velocyto mapping summary indicating no unique reads

AnnaMaguza opened this issue · comments

Hi!

Thank you for creating such a great tool!

I have been mapping my single-cell GEX data generated with 10X V2 using STAR version 2.7.11.a. My goal is to obtain spliced and unspliced count matrices for RNA velocity analysis. The mapping process completed without any errors, and I successfully generated all necessary files, including the ambiguous, spliced, and unspliced matrices. I verified that these matrices are not empty. After creating anndata objects with these matrices, I checked the proportions using scv.pl.proportions(adata), and they appear valid: 40% spliced, 51% unspliced, and 8% ambiguous.

However, the Summary.csv file from Velocyto indicates:

Reads Mapped to Velocyto: Unique+Multiple Velocyto,NoMulti
Reads Mapped to Velocyto: Unique Velocyto,0

Does this mean that the mapping didn't work properly? Can I use the matrices I generated?

Here are the parameters I used:

STAR \
    --runThreadN 56 \
    --genomeDir "$INDEX_FILE_DIR" \
    --readFilesIn "$FILE"_S1_L001_R2_001.fastq.gz "$FILE"_S1_L001_R1_001.fastq.gz" \
    --runDirPerm All_RWX \
    --soloCBwhitelist "$WHITE_LIST_DIR/737K-august-2016.txt" \
    --soloFeatures Gene GeneFull Velocyto \
    --readFilesCommand zcat \
    --soloOutFileNames "$SRA_OUTPUT_DIR"/ features.tsv barcodes.tsv matrix.mtx \
    --soloType CB_UMI_Simple \
    --soloCBstart 1 \
    --soloCBlen 16 \
    --soloUMIstart 17 \
    --soloUMIlen 10 \
    --soloStrand Forward \
    --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \
    --soloUMIfiltering MultiGeneUMI_CR \
    --soloUMIdedup 1MM_CR \
    --outFilterScoreMin 30 \
    --outSAMtype BAM SortedByCoordinate \
    --clip5pAdapterSeq - - \
    --clip5pAdapterMMp 0.1 0.1 \
    --soloBarcodeReadLength 101 \
    --outFileNamePrefix "$SRA_OUTPUT_DIR/"

And here is the full output in the Velocyto Summary.csv:

Number of Reads,294990278
Reads With Valid Barcodes,0.933189
Sequencing Saturation,-inf
Q30 Bases in CB+UMI,0.976501
Q30 Bases in RNA read,0.894207
Reads Mapped to Genome: Unique+Multiple,0.808504
Reads Mapped to Genome: Unique,0.620197
Reads Mapped to Velocyto: Unique+Multiple Velocyto,NoMulti
Reads Mapped to Velocyto: Unique Velocyto,0

Thank you in advance!

Anna Maguza