Drop-seq Tools Application on Dataset with Non-dropseq Barcode Design

Question

Drop-seq Tools Application on Dataset with Non-dropseq Barcode Design

lzy42 opened this issue 4 years ago · comments

Hi,

Thanks for developing such useful tools! I have a question regarding the barcode extracting step: is this possible for users to change the algorithm in by themselves in order to use this pipeline for non-dropseq barcode design? If so, how I suppose to change it?

The barcode I'm working on has a similar design as the SPLiT-seq barcode design: https://teichlab.github.io/scg_lib_structs/methods_html/SPLiT-seq.html

Thanks in advance,
Zhen Li

alecw · Answer 1 · Wed Jan 06 2021 23:57:48 GMT+0800 (China Standard Time)

Hi @zhengyueli98 ,

Sorry, I can't follow what the structure would be based on the SPLiT-seq document. If you could describe the structure would be in a few words I might be able to answer more specifically (e.g. read1: 15 bases of cellular index; 20 bases of molecular index. read2: biological bases)

It's possible that you can do what you want, as long as all the bases for a given index (cellular or molecular) are on the same end of a paired-end read. Look at TagBamWithReadSequenceExtended usage, in particular BASE_RANGE, HARD_CLIP_BASES, TAG_BARCODED_READ arguments. Note that BASE_RANGE can handle multiple non-contiguous ranges if that is what you need (as long as they are on the same end of the pair).

Regards, Alec

lzy42 · Answer 2 · Wed Jan 20 2021 04:59:17 GMT+0800 (China Standard Time)

Hi Alec,

Thank you so much for getting back to me so quickly, I really appreciated it! Apologies for my late response, we recently just had some modifications regarding the design.

The final library structure of SPLiT-seq is:
read1: biological bases.
read2: 20 bp special primer design, 12 bp cell barcode, 10 bp UMI.

I would like to double-check that, besides 1) flipping position of read1 and read2 when we converting these raw reads to unmapped BAM file via picard.jar; 2) changing base range when we extracting barcode and UMI via TagBamWithReadSequenceExtended, is there anything else we need to modify before using the drop-seq pipeline?

SPLiT-seq design reference: https://teichlab.github.io/scg_lib_structs/methods_html/SPLiT-seq.html

Thank you so much for your time, have a great day!

Best,
Zhen Li

alecw · Answer 3 · Wed Jan 20 2021 05:13:47 GMT+0800 (China Standard Time)

Hi Zhen Li,

I assume the 20 bp special primer design can be discarded, right? Although it you wanted to retain it you could assign it to some other tag for your own custom processing.

That question aside, this is all doable with the existing tools. You'll just need to pay close attention to the arguments to the arguments for the 2 invocations of TagBamWithReadSequenceExtended.

Regards, Alec

lzy42 · Answer 4 · Wed Jan 20 2021 06:30:21 GMT+0800 (China Standard Time)

Hi Alec,

Awesome, that fixed my issue, I really appreciate your help! Thanks again for developing such useful tools, have a great day!

Best,
Zhen Li