broadinstitute / Drop-seq

Java tools for analyzing Drop-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Drop-seq Tools Application on Dataset with Non-dropseq Barcode Design

lzy42 opened this issue · comments

commented

Hi,

Thanks for developing such useful tools! I have a question regarding the barcode extracting step: is this possible for users to change the algorithm in by themselves in order to use this pipeline for non-dropseq barcode design? If so, how I suppose to change it?

The barcode I'm working on has a similar design as the SPLiT-seq barcode design: https://teichlab.github.io/scg_lib_structs/methods_html/SPLiT-seq.html

Thanks in advance,
Zhen Li

commented

Hi @zhengyueli98 ,

Sorry, I can't follow what the structure would be based on the SPLiT-seq document. If you could describe the structure would be in a few words I might be able to answer more specifically (e.g. read1: 15 bases of cellular index; 20 bases of molecular index. read2: biological bases)

It's possible that you can do what you want, as long as all the bases for a given index (cellular or molecular) are on the same end of a paired-end read. Look at TagBamWithReadSequenceExtended usage, in particular BASE_RANGE, HARD_CLIP_BASES, TAG_BARCODED_READ arguments. Note that BASE_RANGE can handle multiple non-contiguous ranges if that is what you need (as long as they are on the same end of the pair).

Regards, Alec

commented

Hi Alec,

Thank you so much for getting back to me so quickly, I really appreciated it! Apologies for my late response, we recently just had some modifications regarding the design.

The final library structure of SPLiT-seq is:
read1: biological bases.
read2: 20 bp special primer design, 12 bp cell barcode, 10 bp UMI.

I would like to double-check that, besides 1) flipping position of read1 and read2 when we converting these raw reads to unmapped BAM file via picard.jar; 2) changing base range when we extracting barcode and UMI via TagBamWithReadSequenceExtended, is there anything else we need to modify before using the drop-seq pipeline?

SPLiT-seq design reference: https://teichlab.github.io/scg_lib_structs/methods_html/SPLiT-seq.html

Thank you so much for your time, have a great day!

Best,
Zhen Li

commented

Hi Zhen Li,

I assume the 20 bp special primer design can be discarded, right? Although it you wanted to retain it you could assign it to some other tag for your own custom processing.

That question aside, this is all doable with the existing tools. You'll just need to pay close attention to the arguments to the arguments for the 2 invocations of TagBamWithReadSequenceExtended.

Regards, Alec

commented

Hi Alec,

Awesome, that fixed my issue, I really appreciate your help! Thanks again for developing such useful tools, have a great day!

Best,
Zhen Li