DigitalExpression discrepency v.1.13 -> v.2.0.0
Hoohm opened this issue · comments
I'm testing v2.0.0 to integrate it in my pipeline but I'm having different results of umi counts.
Here is some sampled data. The sam file is the final step. Only one cell is in it.
sample.txt
The file is in txt because of githubs restriction to txt.
This is the command I use:
DigitalExpression I=sample.sam O=res.tsv NUM_CORE_BARCODES=1
With version 1.13 I get:
GENE ATGCG
Gm16547 1
Gm7266 1
Rps27-ps1 1
Snrpe 1
Txn-ps1 1
With version 2.0.0 I get:
GENE ATGCG
I know that counting should be more restrictive, but I don't think this is the step that deals with the stringency. Am I missing something?
To make sure: you're saying you only get the header output with no results?
Exactly
You were right about the tagging. I was using the old version. I've gone back to the TagReadWithGeneFunction. I don't get the new tags in the bam file. For some reason the new tags don't show up.
Here is one read as example. Before tagging with TagReadWithGeneFunction
HISEQ:185:H5VVMBCXY:1:1101:4571:2619 16 1 3595162 0 29S16M * 0 0 CTATGGACCTAGACACTGCTCGCTCCCATCCATTAGATCCTGAAG GGAGIIIGGGAIIIGGGGGGGGAGAGGAGAGG<GGIGGGIGAAGG XC:Z:GAGTTMD:Z:16 PG:Z:STAR RG:Z:A NH:i:5 NM:i:0 XM:Z:TACACACAGA UQ:i:0 AS:i:15
After:
HISEQ:185:H5VVMBCXY:1:1101:4571:2619 16 1 3595162 0 29S16M * 0 0 CTATGGACCTAGACACTGCTCGCTCCCATCCATTAGATCCTGAAG GGAGIIIGGGAIIIGGGGGGGGAGAGGAGAGG<GGIGGGIGAAGG XC:Z:GAGTTMD:Z:16 GE:Z:Gm38148 XF:Z:CODING PG:Z:STAR RG:Z:A NH:i:5 NM:i:0 XM:Z:TACACACAGA UQ:i:0 AS:i:15 GS:Z:-
Command used:
TagReadWithGeneExonFunction INPUT=data/sample2.Aligned.merged.bam OUTPUT=data/sample2_gene_exon_tagged.bam ANNOTATIONS_FILE=ref/annotation.chr1.refFlat
I'll keep digging into it using the test data provided.
Is the reflat generation also altered with v2.0.0?
Ok. My bad, found out that I was using TagReadWithGeneExonFunction instead of TagReadWithGeneFunction.