dropSeqPipe - Single cell data preprocessing snakemake workflow

Question

dropSeqPipe - Single cell data preprocessing snakemake workflow

Hoohm opened this issue 6 years ago · comments

Patrick Roelli commented 6 years ago

Hello,

the latest version of my pipeline is trying to make it as a snakemake workflow.
I'm kindly asking for a review.

I have not yet worked on specific envs for each rule but this can be done in the future without too much effort.

Please tell me if there is anything else that I need to implement to pass the review.

Best wishes

Johannes Köster commented 6 years ago

Great!

Johannes Köster · Answer 1 · Thu Jan 25 2018 21:36:41 GMT+0800 (China Standard Time)

Great, this looks very promising! I have only a few points that should be solved before inclusion here:

I know there is no strict formatting guide yet, but I try to establish a certain standard in this organization. That is: (a) input, output, ... items in new lines and indented. (b) threads are only specified if != 1. No whitespace between the equal operator and left and right item in input, output, ...
Use a wrapper when possible (e.g., for star, fastqc and multiqc), see here.
Add a conda-directive to every other rule. It is fine if multiple rules point to the same conda environment file.

Patrick Roelli · Answer 2 · Thu Jan 25 2018 21:42:18 GMT+0800 (China Standard Time)

Great, I'll work on those issues soon :)

Patrick Roelli · Answer 3 · Fri Jan 26 2018 20:41:58 GMT+0800 (China Standard Time)

@johanneskoester I have a question.
I'm not sure how i can use the fastqc wrapper since I want two files at the same time and use 2 threads. Should I split them?
I'm also using the --extractoption which is not proposed in the wrapper.
Any ideas?
EDIT: found out how to do it. Had to split R1 and R2 though.

Johannes Köster · Answer 4 · Fri Jan 26 2018 23:02:32 GMT+0800 (China Standard Time)

Might be that it is not possible. Then, feel free to not use it for now.

Patrick Roelli · Answer 5 · Sun Jan 28 2018 18:24:42 GMT+0800 (China Standard Time)

@johanneskoester
I'm almost done implementing the changes. I have an issue with trimmomatic.
I'm using the log as an output for a multiqc rule and hence I need the output listed. The problem is that the wrapper adds all the entries from the output key which leads to an error.

rule trim_single:
	input:
		'data/{sample}_trimmed_unmapped.fastq.gz'
	output:
		data='data/{sample}_filtered.fastq.gz',
		log='logs/{sample}_trimlog.txt'
	log:
		'logs/{sample}_trimlog.txt'
	params:
		trimmer=['LEADING:3','TRAILING:3','SLIDINGWINDOW:4:20','MINLEN:15', 'ILLUMINACLIP:$CONDA_PREFIX/share/trimmomatic/adapters/{}:2:30:10'.format(config['FILTER']['IlluminaClip'])],
		extra='-threads 2'
	threads: 2
	wrapper:
		'0.21.0/bio/trimmomatic/se'

Here is the command produced:
trimmomatic SE -threads 2 data/sample1_trimmed_unmapped.fastq.gz data/sample1_filtered.fastq.gz logs/sample1_trimlog.txt LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:15 ILLUMINACLIP:$CONDA_PREFIX/share/trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 > logs/sample1_trimlog.txt 2>&1

One way to solve this would be to change the wrapper to:

shell("trimmomatic SE {snakemake.params.extra} "
      "{snakemake.input} {snakemake.output[0]} "
      "{trimmer} "
      "{log}")

I tried to make a pull request on bitbucket but it failed I'm not sure why (there was no error message)

Johannes Köster · Answer 6 · Wed Jan 31 2018 20:15:23 GMT+0800 (China Standard Time)

Well, if you really need it as an input, you can simply omit it in the log directive. However, this is discouraged since then, Snakemake will delete the log upon error (which is usually not what you want). I guess the problem is the multiqc wrapper?
Maybe you can simply omit that file from the input files of multiqc and add the log path as a param instead.

Patrick Roelli · Answer 7 · Tue Feb 06 2018 01:00:50 GMT+0800 (China Standard Time)

I'm not sure how to do this. If I don't have the log files from trimmomatic as input, how would the multiqc rule know when to run?

Johannes Köster · Answer 8 · Tue Feb 06 2018 16:28:41 GMT+0800 (China Standard Time)

You are right, this is not convincing. I just modified Snakemake to allow log files as input. This will solve your problem, as you don't need to specify it as additional output file anymore.

Johannes Köster · Answer 9 · Tue Feb 06 2018 16:29:00 GMT+0800 (China Standard Time)

I will release a new version today.

Patrick Roelli · Answer 10 · Tue Feb 06 2018 17:34:04 GMT+0800 (China Standard Time)

Well I guess it makes sense! Cool, I'll finish the asked changes after the snakemake update.

Johannes Köster · Answer 11 · Thu Feb 08 2018 04:54:41 GMT+0800 (China Standard Time)

Snakemake 4.6.0 has been released. You can now use the log file as input to the multiqc rule. Make sure that it is only defined as log file, not as output file.

Patrick Roelli · Answer 12 · Wed Feb 14 2018 22:37:23 GMT+0800 (China Standard Time)

Hello,

coming back with an issue for a wrapper, this time the STAR wrapper.

This is what my rule looks like:

rule STAR_align:
	input:
		fq1="data/{sample}_filtered.fastq.gz",
		index=lambda wildcards: star_index_prefix + '_' + str(samples.loc[wildcards.sample,'read_length']) + '/SA'
	output:
		data=temp('logs/{sample}.Aligned.out.bam')
	log:
		log_out='logs/{sample}.Log.final.out'
	params:
		extra="""--outReadsUnmapped Fatsx\
			 	--outFilterMismatchNmax {}\
			 	--outFilterMismatchNoverLmax {}\
			 	--outFilterMismatchNoverReadLmax {}\
			 	--outFilterMatchNmin {}""".format(
				config['STAR_PARAMETERS']['outFilterMismatchNmax'],
				config['STAR_PARAMETERS']['outFilterMismatchNoverLmax'],
				config['STAR_PARAMETERS']['outFilterMismatchNoverReadLmax'],
				config['STAR_PARAMETERS']['outFilterMatchNmin']),
		index=lambda wildcards: star_index_prefix + '_' + str(samples.loc[wildcards.sample,'read_length']) + '/'	
	threads: 24
	wrapper:
		'0.21.0/bio/star/align'

I get some warnings

/share/big2/Test_data/DropSeq/.snakemake/scripts/j8re728z.wrapper.py:18: SyntaxWarning: assertion is always true, perhaps remove parentheses?
  assert(fq1 is not None, "input-> fq1 is a required input parameter")
/share/big2/Test_data/DropSeq/.snakemake/scripts/j8re728z.wrapper.py:23: SyntaxWarning: assertion is always true, perhaps remove parentheses?
  assert(len(fq1) == len(fq2), "input-> equal number of files required for fq1 and fq2")
STAR --outReadsUnmapped Fatsx			 	--outFilterMismatchNmax 10			 	--outFilterMismatchNoverLmax 0.3			 	--outFilterMismatchNoverReadLmax 1 --outFilterMatchNmin 0 --runThreadN 8 --genomeDir /home/patrick/big/references/mouse_91/STAR_INDEX/SA_100/ --readFilesIn data/sample1_filtered.fastq.gz  --readFilesCommand zcat --outSAMtype BAM Unsorted --outFileNamePrefix logs/ --outStd Log  > logs/sample1.Log.final.out 2>&1

It still runs fine, but I'm not sure this is intended.

Johannes Köster · Answer 13 · Fri Feb 16 2018 17:20:48 GMT+0800 (China Standard Time)

The warnings are fixed now, but the next wrapper release waits on another issue. For you, it is fine to go on ignoring them for now. How close are you to inclusion in here?

Patrick Roelli · Answer 14 · Fri Feb 16 2018 17:24:12 GMT+0800 (China Standard Time)

The changes in code have been done. The wrapper are now in place and working. I'm testing now the few datasets I have to check that everything runs fine, I'll probably push it early next week.

Patrick Roelli · Answer 15 · Fri Feb 16 2018 17:27:53 GMT+0800 (China Standard Time)

It's a detail, but I want to know if you had any naming convention for config files. I went with camelCase for "end variables" and CAPITAL for subsections.

SUBSECTION:
    SUBSECTION2:
        variableOne:

Johannes Köster · Answer 16 · Fri Feb 16 2018 17:40:23 GMT+0800 (China Standard Time)

I would use lower case, and foo-bar for composed words. Camel case does not nicely fit to the rest of Python, where is is only used for clases. Instead of foo-bar, you can also use snake_case. But lower case is, I think, preferred by most, because it is easier on the eyes.

Patrick Roelli · Answer 17 · Fri Feb 16 2018 17:46:24 GMT+0800 (China Standard Time)

I would definitely like a different naming convention between sections and variables.

would this be ok with you?

FILTER:
    cell-barcode:
        start: 1
        end: 6
        min-quality: 20
        num-below-quality: 1

The main reason is that it fits the different major steps of the pipeline. It's easy to read/understand which part of the config file has an influence on which part of the pipeline.

Johannes Köster · Answer 18 · Fri Feb 16 2018 18:13:04 GMT+0800 (China Standard Time)

Yeah, that's ok. To me, all lowercase would still be sufficient, because you can also add some comments, or empty lines to hightlight the sections, but your approach is also fine.

Patrick Roelli · Answer 19 · Fri Feb 16 2018 23:04:01 GMT+0800 (China Standard Time)

So there seems to be a bug regarding the use of logfiles as input.
I'm running a rule that depends on the STAR logs and it wants to rerun STAR although the files exist.
The reason for rerunning is "missing output files"

rule STAR_align:
	input:
		fq1="data/{sample}_filtered.fastq.gz",
		index=lambda wildcards: star_index_prefix + '_' + str(samples.loc[wildcards.sample,'read_length']) + '/SA'
	output:
		data=temp('data/{sample}/Aligned.out.bam')
	log:
		log_out='data/{sample}/Log.final.out'
	params:
		extra="""--outReadsUnmapped Fatsx\
			 	--outFilterMismatchNmax {}\
			 	--outFilterMismatchNoverLmax {}\
			 	--outFilterMismatchNoverReadLmax {}\
			 	--outFilterMatchNmin {}""".format(
				config['STAR_PARAMETERS']['outFilterMismatchNmax'],
				config['STAR_PARAMETERS']['outFilterMismatchNoverLmax'],
				config['STAR_PARAMETERS']['outFilterMismatchNoverReadLmax'],
				config['STAR_PARAMETERS']['outFilterMatchNmin']),
		index=lambda wildcards: star_index_prefix + '_' + str(samples.loc[wildcards.sample,'read_length']) + '/'	
	threads: 24
	wrapper:
		'0.21.0/bio/star/align'

There is one thing that might be related. I'm not sure how snakemake works internally, but since the actual output file doesn't exist because it's temp, could it be that while it checks for the existing log file, it thinks it should have those bam files and rerun it for that reason? Although the reason message would be pointing towards the wrong file.

Johannes Köster · Answer 20 · Fri Feb 16 2018 23:08:40 GMT+0800 (China Standard Time)

I have just checked it locally, and there is no such problem. It must be another job in your DAG the has to run and needs the temp bam file.

Patrick Roelli · Answer 21 · Fri Feb 16 2018 23:30:20 GMT+0800 (China Standard Time)

Odd, maybe I'm missing something. Here is the list of rules to run

rule STAR_align:
    input: data/L2-SCRB-Opt-2-1C_filtered.fastq.gz, /naslx/projects/pr62lo/di49qar/reference/mouse_91/STAR_INDEX/SA_100/SA
    output: data/L2-SCRB-Opt-2-1C/Aligned.out.bam
    log: data/L2-SCRB-Opt-2-1C/Log.final.out
    jobid: 3
    reason: Missing output files: data/L2-SCRB-Opt-2-1C/Log.final.out
    wildcards: sample=L2-SCRB-Opt-2-1C


localrule plot_yield:
    input: logs/L2-SCRB-Opt-2-1C_CELL_barcode.txt, logs/L2-SCRB-Opt-2-1C_UMI_barcode.txt, logs/L2-SCRB-Opt-2-1C_reads_left.txt, data/L2-SCRB-Opt-2-1C/Log.final.out, logs/L2-SCRB-Opt-2-1C_reads_left_trim.txt
    output: plots/yield.pdf
    jobid: 0
    reason: Missing output files: plots/yield.pdf; Input files updated by another job: data/L2-SCRB-Opt-2-1C/Log.final.out

Shutting down, this might take some time.
Job counts:
	count	jobs
	1	STAR_align
	1	plot_yield
	2

this is the command I run: snakemake plot_yield --dryrun -r

There are a few steps between them, but normally when I delete the plot from the plot_yield rule, it just runs the plot again based on the old logfiles.

I'll look more into it, but if you have an idea of what to look for, it would help a lot.

Johannes Köster · Answer 22 · Fri Feb 16 2018 23:35:42 GMT+0800 (China Standard Time)

This is indeed a bit weird. It also explicitly lists the log file to be missing... are you sure it is there? What is the output of ls -l data/L2-SCRB-Opt-2-1C/Log.final.out?

Patrick Roelli · Answer 23 · Fri Feb 16 2018 23:41:27 GMT+0800 (China Standard Time)

Yeah the file is there, no issue there.
-rw-r--r-- 1 di49qar pr62lo 1857 Feb 16 12:01 data/L2-SCRB-Opt-2-1C/Log.final.out
I just swapped to the old version of the STAR_align rule, not using the wrapper, and now it works again. Only plot_yield is invoked.

Johannes Köster · Answer 24 · Fri Feb 16 2018 23:43:55 GMT+0800 (China Standard Time)

I don't see how this could be related to the wrapper. Could you post both versions of the rule?

Patrick Roelli · Answer 25 · Fri Feb 16 2018 23:47:59 GMT+0800 (China Standard Time)

None wrapper dependent

rule STAR_align:
	input:
		data="data/{sample}_filtered.fastq.gz",
		index=lambda wildcards: star_index_prefix + '_' + str(samples.loc[wildcards.sample,'read_length']) + '/SA'
	output:
		sam=temp('logs/{sample}/Aligned.out.bam'),
		log_out='logs/{sample}/Log.final.out'
	params:
		prefix='logs/{sample}/',
		outFilterMismatchNmax=config['STAR_PARAMETERS']['outFilterMismatchNmax'],
		outFilterMismatchNoverLmax=config['STAR_PARAMETERS']['outFilterMismatchNoverLmax'],
		outFilterMismatchNoverReadLmax=config['STAR_PARAMETERS']['outFilterMismatchNoverReadLmax'],
		outFilterMatchNmin=config['STAR_PARAMETERS']['outFilterMatchNmin'],
		read_length=lambda wildcards: int(samples.loc[wildcards.sample,'read_length'])
	threads: 24
	shell:
		"""
			--genomeDir {star_index_prefix}_{params.read_length}/\
			--readFilesCommand zcat\
			--runThreadN {threads}\
			--readFilesIn {input.data}\
			--outSAMtype BAM Unsorted\
			--outReadsUnmapped Fatsx\
			--outFileNamePrefix logs/{params.prefix}\
			--outFilterMismatchNmax {params.outFilterMismatchNmax}\
			--outFilterMismatchNoverLmax {params.outFilterMismatchNoverLmax}\
			--outFilterMismatchNoverReadLmax {params.outFilterMismatchNoverReadLmax}\
			--outFilterMatchNmin {params.outFilterMatchNmin}"""

I tried another rule based on other logfiles, I get the same issue.
I also tried adding the logfile as an output in the wrapper of the STAR_align rules, it doesn't help

Patrick Roelli · Answer 26 · Fri Feb 16 2018 23:51:31 GMT+0800 (China Standard Time)

I think I might have fixed it for the STAR_align. I added the log file to the output.
Trying it on the trimmomatic now.

Johannes Köster · Answer 27 · Fri Feb 16 2018 23:53:29 GMT+0800 (China Standard Time)

Yeah, but that should not be necessary. Could you try to create a minimal example, so that I can debug it on my side?

Johannes Köster · Answer 28 · Fri Feb 16 2018 23:55:23 GMT+0800 (China Standard Time)

Wait, I think I have found the problem and am now also able to reproduce. Working on a fix now.

Johannes Köster · Answer 29 · Fri Feb 16 2018 23:59:02 GMT+0800 (China Standard Time)

Ok, fixed in the master branch. Thanks for reporting. I will create a new release next week.

Patrick Roelli · Answer 30 · Sat Feb 17 2018 00:08:52 GMT+0800 (China Standard Time)

Cool! Glad I could help.
I hope this is the last fix. I'll be able to push the new release just after yours.

EDIT: I got the minimal example if still needed

Patrick Roelli · Answer 31 · Sat Feb 17 2018 00:31:23 GMT+0800 (China Standard Time)

I tried the fixed version, it works properly.

Johannes Köster · Answer 32 · Thu Feb 22 2018 00:31:42 GMT+0800 (China Standard Time)

New version is released as 4.7.0.

Patrick Roelli · Answer 33 · Tue Feb 27 2018 18:08:55 GMT+0800 (China Standard Time)

I'm running the last tests today, should be able to push the new version today.

Johannes Köster · Answer 34 · Tue Feb 27 2018 20:09:49 GMT+0800 (China Standard Time)

Cool! Looking forward to move it here!

Patrick Roelli · Answer 35 · Wed Feb 28 2018 23:05:26 GMT+0800 (China Standard Time)

:( an old bug came back with the wrapper modification on STAR.
It's kind of tricky to explain.

Basically I have a rule in split_species that is grouping species for each sample. I think it's from the plot_barnyard, but basically, it tries to find a combination of SAMPLE_SPECIES for the sample wildcard instead of having only SAMPLE. Of course, this doesn't go with the index in the mapping step:

rule STAR_align:
	input:
		fq1="data/{sample}_filtered.fastq.gz",
		index=lambda wildcards: star_index_prefix + '_' + str(samples.loc[wildcards.sample,'read_length']) + '/SA'
	output:
		temp('data/{sample}/Aligned.out.bam')
	log:
		'data/{sample}/Log.final.out'
	params:
		extra="""--outReadsUnmapped Fatsx\
			 	--outFilterMismatchNmax {}\
			 	--outFilterMismatchNoverLmax {}\
			 	--outFilterMismatchNoverReadLmax {}\
			 	--outFilterMatchNmin {}""".format(
				config['STAR_PARAMETERS']['out-filter-mismatch-nmax'],
				config['STAR_PARAMETERS']['out-filter-mismatch-nover-lmax'],
				config['STAR_PARAMETERS']['out-filter-mismatch-nover-read-lmax'],
				config['STAR_PARAMETERS']['out-filter-match-nmin']),
		index=lambda wildcards: star_index_prefix + '_' + str(samples.loc[wildcards.sample,'read_length']) + '/'	
	threads: 24
	wrapper:
		"0.22.0/bio/star/align"

Here is the error:

InputFunctionException in line 10 of /share/big2/Test_data/DropSeq_mixed/rules/map.smk:
KeyError: 'the label [Experiment_MOUSE] is not in the [index]'
Wildcards:
sample=Experiment_MOUSE

I'm looking into it and might create a minimal example to illustrate the issue.

Johannes Köster · Answer 36 · Thu Mar 01 2018 02:58:41 GMT+0800 (China Standard Time)

Maybe it is a good idea to put a global wildcard_constraint on sample. E.g.:

wildcard_constraints:
    sample="({})".format("|".join(samples.index))

Then, you should be better able to see where the problem is.

Patrick Roelli · Answer 37 · Tue Mar 13 2018 18:10:20 GMT+0800 (China Standard Time)

I added the constraint and it seems to work now. Could you explain me what this does?

To me it looks like I constraint the samples to choose only from samples.index for sample wildcards.

Johannes Köster · Answer 38 · Thu Mar 15 2018 14:17:24 GMT+0800 (China Standard Time)

Yes, exactly. It generates a regular expression that only matches those values. Sometimes this is necessary because matching can be ambiguous if you have multiple wildcards.

Johannes Köster · Answer 39 · Thu Mar 15 2018 15:39:31 GMT+0800 (China Standard Time)

Do you want to move the repo here now? Or are there remaining issues? Do you already have Travis ci based tests enabled?

Patrick Roelli · Answer 40 · Fri Mar 16 2018 01:08:13 GMT+0800 (China Standard Time)

I'm not familiar with Travis ci. I have subscribed and I'll look into it, but before that, I will push the new release.
A lot of modifications that I want to push and make available.
I hope this was the last big issue.

So how am I supposed to "move" the repo?

Johannes Köster · Answer 41 · Fri Mar 16 2018 03:08:39 GMT+0800 (China Standard Time)

There is a transfer ownership section in the settings.

For an example Travis setup have a look at the rnaseq workflow.

Johannes Köster · Answer 42 · Tue Mar 20 2018 15:44:47 GMT+0800 (China Standard Time)

Ok, change of plans, we fork it from here, and create a backstroke that automatically creates a PR from each of your commits to the main repo. Before I can fork, I still need travis tests to be set up. Any progress on this? Test data is available here: https://github.com/snakemake-workflows/ngs-test-data, you can include it as a submodule analog to https://github.com/snakemake-workflows/rna-seq-star-deseq2.

Patrick Roelli · Answer 43 · Tue Mar 20 2018 17:19:22 GMT+0800 (China Standard Time)

Ok, I just pushed the latest version (0.31) and I can now go on and use travis.

I need some test data that is specific for single cell. I'll try to make one similar to the one you have for bulk NGS.

Johannes Köster · Answer 44 · Tue Mar 20 2018 17:24:33 GMT+0800 (China Standard Time)

Ok! Note that the testing is basically only for checking that the tools and steps work. It is not a benchmark, so everything can be very small.
Also, I have just seen that the main Snakefile still contains a lot of redundant code. If you really want all these subtargets, try to share the expand invocations between them and the rule all. Usually, it is also sufficient to just list the final plots and tables in the all rule. Definitely not intermediate files like the bam files.

Patrick Roelli · Answer 45 · Tue Mar 20 2018 17:37:32 GMT+0800 (China Standard Time)

Not sure what you mean by "share the expand invocations between them".

Travis seems pretty straight forward to use. The only mystery to me is this submodule for the test data.
Any tips to do it asap?

Johannes Köster · Answer 46 · Tue Mar 20 2018 17:43:10 GMT+0800 (China Standard Time)

I mean that if you have the same expand statement twice (in two rules), just define a variable at the top, and refer to that variable from the rules:

bams = expand("...", ...)

rule all:
    input:
        bams, plots, ...

The idea with the git submodule is that the test data becomes reusable between different repositories. Further, when people checkout your repo, they don't need to check it out together with the test data, because that is in a submodule which is only checked out on request. See here: https://github.com/snakemake-workflows/rna-seq-star-deseq2/tree/master/.test

Johannes Köster · Answer 47 · Fri Apr 06 2018 02:15:53 GMT+0800 (China Standard Time)

Any update? Can I help?

Patrick Roelli · Answer 48 · Fri Apr 06 2018 05:52:41 GMT+0800 (China Standard Time)

Yes, I have made a repo for the test data here
Just forked yours and changed it up a bit.

Those past two weeks were difficult and I didn't have much time to work on including the test data.

If you can help me out in terms of how I should add the submodule, I think it should not be that much more work to run it properly.

Johannes Köster · Answer 49 · Tue Apr 10 2018 02:17:40 GMT+0800 (China Standard Time)

I don't think you need to fork it if you don't need any real changes. All you need to do is to issue

mkdir -p .test/data
git add submodule https://github.com/snakemake-workflows/ngs-test-data.git

in your repo clone.
Then, you commit the changes and push. This should be enough. For details, see here.

Patrick Roelli · Answer 50 · Tue Apr 10 2018 19:03:01 GMT+0800 (China Standard Time)

I needed to change the reads because I need some UMI and cell barcodes in read1.
That seems super easy, gonna try it out shortly.

Johannes Köster · Answer 51 · Tue Apr 10 2018 19:23:48 GMT+0800 (China Standard Time)

I see. Maybe you can add the modified reads as additional samples in the NGS test data repo (as a pull request)?

Patrick Roelli · Answer 52 · Tue Apr 10 2018 23:02:06 GMT+0800 (China Standard Time)

I think there is still a problem. You got fastq files paths in units.tsv. I just got sample names in samples.csv. dropSeqPipe will look for files in data/ as data/{sample_name}_R1.fastq.gz
That's what I already did on the forked one.
And this is just one case of testing. There will be for whitelisted barcodes and double species.

It would be really nice to work on one single repo for testing data. Would be useful for other people as well.

A "simple" fix is that I add a default path for data in the config.yaml. This would let it be flexible enough for the whole pipeline and allow a default path for testing purposes.

Patrick Roelli · Answer 53 · Wed Apr 25 2018 17:10:21 GMT+0800 (China Standard Time)

Ok, I'm tying to build on travis but it doesn't "see" the travis_integration branch. I did use a safelist and I did run a "failed" build on the master branch.
Have you had similar problems? Past the safelist and having 0 builds on the git project, I haven't found new clues.

The build is a bit messy with moving files from .test/data, but it should work if the build starts.

Johannes Köster · Answer 54 · Wed Apr 25 2018 17:42:33 GMT+0800 (China Standard Time)

Once you have registered the repo in travis, it is usually sufficient to push a new commit to the branch. This should then trigger the build.

Patrick Roelli · Answer 55 · Thu Apr 26 2018 23:42:40 GMT+0800 (China Standard Time)

Ok, travis implemented!

Johannes Köster · Answer 56 · Wed May 09 2018 17:22:49 GMT+0800 (China Standard Time)

Nice! I have added some comments to your commit.

Patrick Roelli · Answer 57 · Fri Jun 08 2018 21:41:23 GMT+0800 (China Standard Time)

Ok, I've made the requests and it ran through :)

Patrick Roelli · Answer 58 · Sat Jun 09 2018 19:05:28 GMT+0800 (China Standard Time)

But this leaves me with a question.

Are there two different relative paths for data and rules/scripts?

Since only the .test folder is redirected, this means it will take this path as default path for the pipeline to run. But All the rules and scripts are in a folder above .test. Hence my question above.

Johannes Köster · Answer 59 · Sun Jun 10 2018 20:09:06 GMT+0800 (China Standard Time)

When you use --directory, it will work in the given path, but still take the Snakefile and any additional source files relative to the directory from where you invoke snakemake.

Johannes Köster · Answer 60 · Sun Jun 10 2018 20:48:09 GMT+0800 (China Standard Time)

Apart from that, there seem only two items missing for inclusion that I did not notice before (sorry that it takes so long). First, we need drop-seq-tools in bioconda, such that no additional setup is required. Any particular problems that make this impossible? And then, it would be nice to have a readme that directly contains stepwise usage instructions that follow the other workflows we have here. Your approach is in principle fine as well, but it is easier for users to browse snakemake-workflows if all pipelines follow the same pattern. Would that be agreeable to you?

Patrick Roelli · Answer 61 · Sun Jun 10 2018 21:30:34 GMT+0800 (China Standard Time)

i'm here to learn!

This might take some time then. The developer has no problem with dropseqtool being on conda, I already asked him.

I just don't know java and hence have no clue of how to do this properly.
I will try to find someone who could help me out. If you know someone, please let me know :)

Johannes Köster · Answer 62 · Mon Jun 11 2018 19:22:40 GMT+0800 (China Standard Time)

You can take the picard recipe as a blueprint: https://github.com/bioconda/bioconda-recipes/blob/f5eb63e30a76fd13c28663786d219c9f7750267c/recipes/picard/meta.yaml

Let me know if problems occur. I will try to help as fast as possible.

Johannes Köster · Answer 63 · Wed Jul 11 2018 19:14:54 GMT+0800 (China Standard Time)

Did you have success with the conda recipe?

Patrick Roelli · Answer 64 · Tue Jul 17 2018 07:38:12 GMT+0800 (China Standard Time)

I have started going down the rabbit hole yes but I'm clearly not there yet. Maybe you have a hint for me. I would like to use conda skeleton but there is none for java code. which one should I use?

Johannes Köster · Answer 65 · Tue Jul 31 2018 03:26:47 GMT+0800 (China Standard Time)

Sorry for the late reply. You can follow the bioconda docs for JAVA: http://bioconda.github.io/guidelines.html#java

Patrick Roelli · Answer 66 · Wed Aug 15 2018 01:46:31 GMT+0800 (China Standard Time)

Hey, someone uploaded drop-seq tools on conda for us!!
Is there anything left to do to validate dropSeqPipe as a workflow?

Johannes Köster · Answer 67 · Fri Aug 31 2018 16:33:10 GMT+0800 (China Standard Time)

Sorry for the late reply. The last weeks were pretty busy.

It would be nice if you could move the changelog to a separate file and try to harmonize the README with the rest of the workflows in snakemake-workflows. This makes it easier for users to understand how to use the workflow.
Please ensure that all Snakefiles are indented with 4 spaces, not 8.
Add a .gitattributes file like this for syntax highlighting in github.

Afterwards, I think we are ready to fork.

Patrick Roelli · Answer 68 · Fri Nov 09 2018 19:55:03 GMT+0800 (China Standard Time)

It's been a long time sorry. Got busy with other features for the pipeline.
I've changed a few things though. The docs are now generated via mkdocs directly on gh-pages.
The branch I'm working on for the formatting and workflow standard is this one

Johannes Köster · Answer 69 · Tue Dec 11 2018 20:56:25 GMT+0800 (China Standard Time)

Do you think it is time to fork now?

Patrick Roelli · Answer 70 · Thu Dec 20 2018 05:34:06 GMT+0800 (China Standard Time)

Yes. Release 0.4 just came out! Very happy about it.

Johannes Köster · Answer 71 · Thu Dec 20 2018 18:20:47 GMT+0800 (China Standard Time)

Thank you! Forked and announced on Twitter. Really great work!

Patrick Roelli · Answer 72 · Thu Dec 20 2018 22:34:15 GMT+0800 (China Standard Time)

Thank you too! This wouldn't be possible without your many contributions!

Could you also put the TUM in my affiliations? If only one is possible, then TUM because that is where my main PI is at.

Johannes Köster · Answer 73 · Thu Dec 20 2018 23:40:09 GMT+0800 (China Standard Time)

Sure, done. Sorry, I did not see that. Feel free to create a PR to update your affiliations, maybe also adding your preferred homepage link.