timoast / sinto

Tools for single-cell data processing

Home Page:https://timoast.github.io/sinto/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Zero-length fragments generated from Cell Ranger BAM

cflerin opened this issue · comments

Hi, thanks for making this tool!

I've come across this issue and I'm not sure if this is the expected behavior or not. I'm using Sinto 0.7.1 to create a fragments file from a Cell Ranger bam file. In the output, I get many fragments with the same start/end position (around 6000 in total). For example:

chr5    49658161        49658162        CGCACAGCACCTATTT-1      1
chr5    49658161        49658164        GATTGACCACGTTGTA-1      2
chr5    49658161        49658168        TGTGTCCGTATTGTCG-1      1
chr5    49658162        49658162        CTCTACGCAAAGGTCG-1      1 # <--- this fragment
chr5    49658162        49658168        CCGTACTCACACACAT-1      2
chr5    49658162        49658173        GTGGATTCAGCAACAG-1      1
chr5    49658166        49658432        CTGAATGAGGACTAGC-1      2
chr5    49658168        49658168        CACCTTGAGCCTGTAT-1      3

When comparing to the Cell Ranger fragments file from the same bam, I don't see any of these. From Cell Ranger, the minimum fragment size seems to be 10, so maybe it has been filtered. Should I filter the Sinto fragments as well?

Hi, I think filtering fragments based on a minimum length is definitely a good idea, I hadn't noticed any of these very short fragments in the test cases I looked at. I can add a minimum fragment length argument to the next version, similar to how we have the --max_distance parameter

Hi @timoast, thanks for the reply. For now, I can work around this by just filtering for a minimum fragment size during the filtering step:

sort -k1,1 -k2,2n fragments.bed | awk '($3-$2) >= 10' | bgzip -c > fragments.tsv.gz

So, feel free to close this, unless you're planning on adding the filtering step in a later release.

I'll leave this open until an option is added to sinto to filter small fragments

Now added in 0.7.2