alexdobin / STAR

RNA-seq aligner

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reads With Valid Barcodes=zero in STARsolo analysis

jfoedfjwofa opened this issue · comments

Hello everyone,

I'm trying to conduct a mapping of my 10x scRNA-seq data with a customized reference genome using STARsolo.

My analysis seemed to work properly ("finished successfully" message appeared),  but when I checked the output Summary.csv file, I found many problems as follows.

Number of Reads 342455698
Reads With Valid Barcodes 0
Sequencing Saturation nanQ30 Bases in CB+UMI 0.931662Q30
Bases in RNA read 0.914494
Reads Mapped to Genome: Unique+Multiple 0.924652
Reads Mapped to Genome: Unique 0.749886
Reads Mapped to Transcriptome: Unique+Multipe Genes 0
Reads Mapped to Transcriptome: Unique Genes 0
Estimated Number of Cells 0
Reads in Cells Mapped to Unique Genes 0
Fraction of Reads in Cells nanMean Reads per Cell 0
Median Reads per Cell 0
UMIs in Cells 0
Mean UMI per Cell 0
Median UMI per Cell 0
Mean Genes per Cell 0
Median Genes per Cell 0
Total Genes Detected 0  

As for "Barcodes", barcodes.tsv file with content was generated in output "raw" folder, but it was empty in "filtered" folder.

I suppose this is not a true result because when my collaborator previously analyzed same data with CellRanger, there were no problems with sequencing quality.
(I need to re-analyze my data in my hands because I want to use customized reference genome.)

The command I executed to run STARsolo was following;
STAR --runThreadN 16 --genomeDir STAR_reference --readFilesIn Fastq/sample1_GEX/sample1_GEX_S3_L003_R2_001.fastq.gz Fastq/sample1_GEX/sample1_GEX_S3_L003_R1_001.fastq.gz --soloType Droplet --soloCBwhitelist Fastq/737K-august-2016.txt --soloUMIlen 12 --soloCBlen 16 --outFileNamePrefix sample1_ --readFilesCommand gzcat --soloBarcodeReadLength 0

I would really appreciate it if someone could give me some advice.

Sincerely,

Hi @jfoedfjwofa

This is likely an issue with the barcode whitelist - please check that you are using the correct one for this library.

Thank you very much for your prompt response.

My scRNA-seq samples are 10x 5' GEX v2 Sample, so I think the whitelist I used (737K-august-2016.txt) was correct...

After reading through the STARsolo tutorial (https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md) again, I have corrected the script as follows.

 STAR --runThreadN 16 --genomeDir STAR_reference  --soloCBwhitelist Fastq/737K-august-2016.txt --outFileNamePrefix sample1_ --readFilesCommand gzcat --soloBarcodeReadLength 1  --clip5pNbases 39 0 --soloType CB_UMI_Simple   --soloCBstart 1   --soloCBlen 16   --soloUMIstart 17   --soloUMIlen 10 --readFilesIn Fastq/sample1_GEX/sample1_GEX_S3_L003_R2_001.fastq.gz Fastq/sample1_GEX/sample1_GEX_S3_L003_R1_001.fastq.gz

After executing this command, I could get following output;

Number of Reads 342455698
Reads With Valid Barcodes 0.903744
Sequencing Saturation 0.749502
Q30 Bases in CB+UMI 0.964867
Q30 Bases in RNA read 0.914494
Reads Mapped to Genome: Unique+Multiple 0.922693
Reads Mapped to Genome: Unique 0.646587
Reads Mapped to Transcriptome: Unique+Multipe Genes 0.0693924
Reads Mapped to Transcriptome: Unique Genes 0.0535003
Estimated Number of Cells 3857
Reads in Cells Mapped to Unique Genes 16633083
Fraction of Reads in Cells 0.907846
Mean Reads per Cell 4312
Median Reads per Cell 3533
UMIs in Cells 4194728
Mean UMI per Cell 1087
Median UMI per Cell 896
Mean Genes per Cell 493
Median Genes per Cell 422
Total Genes Detected 22030

The value of "Reads With Valid Barcodes" seems proper in this time, but I suppose the value of "Reads Mapped to Transcriptome: Unique+Multipe Genes" and  "Reads Mapped to Transcriptome: Unique Genes" seem strange.....
I am sorry for asking so many questions, but I would be very grateful for any advice you could give me.

Best regards,

This could be an issue with strandedness, please try --soloStrand Reverse

I'm very sorry for the late reply.
Thank you very much for your advice!
I added this --soloStrand Reverse command based on your advice and the analysis worked!

I'm deeply thankful for your advice.