Jules pipeline hack

Generate data

$ make data

Calls make-data.py to (roughly) do the following:

Choose some images.
For each image, choose a protein.
Convert the image to black and white.
Extract the horizontal black lines from the image.
Scale the image to be the (aa) length of the protein.
Break each horizontal line into some reads.
Calculate a bit score for the reads based on how far down we are in the image.
Add some noise (otherwise the image looks too good and you can't see the individual reads).
Write out fake DIAMOND results for those reads.
Write our fake FASTQ files for the reads.

The files to be injected into the pipeline appear in OUT/json and OUT/fastq.

Inject

When the pipeline for the target sample is finished with the 03-diamond-civ-rna and 025-dedup steps:

$ make add

Calls add-data.py to:

Add the compressed DIAMOND results to the pre-existing 03-diamond-civ-rna output.
Add the compressed FASTQ to the pre-existing 025-dedup output.

No original data is touched. Only intermediate pipeline outputs are appended to. The original intermediate files are saved.

Re-run

$ make rerun

Calls rerun-pipeline.py to re-run those two pipeline steps.

Deployment

The easiest/calmest way to deploy is just to edit the 06-stop/stop.sh script for the sample so that it does not remove the slurm-pipeline.running file (or create slurm-pipeline.done). Instead you can just make it touch some other file and wait for that file to show up. Then you do the make add and make rerun. After that, just mv slurm-pipeline.running slurm-pipeline.done and the sample will be considered done by monitor-run.py.

Profit

The pipeline results look like this:

with "blue plots" like this

terrycojones / jules-pipeline-hack