Big Data Genomics Recipes
Recipes using BDG projects. Apache 2 licensed.
Introduction
This repository is a home for "recipes" that use a Big Data Genomics project to accomplish some task. By default, these recipes use EC2 to create a Spark cluster, on which we run ADAM/etc. These recipes serve three purposes:
- As a quickstart for people new to the BDG project, who would like to figure out how to use BDG software to replace their current workflows.
- As a benchmarking/regression testing environment for the various BDG tools.
- As a sandbox where we can set up head-to-head tests against other tools (e.g., for experiments for papers).
Recipes
Our recipe book contains the following recipes:
- Single node recipes:
** bqsr-head-to-head: Runs a head-to-head single node speed test of the
ADAM and GATK
base quality score recalibration (BQSR) engines.
** flagstat-head-to-head: Runs a head-to-head performance test of
Flagstat
using ADAM, samtools, and sambamba. ** indel-realignment-head-to-head: Runs a head-to-head single node speed test of the ADAM and GATK INDEL realignment engines. ** markdup-head-to-head: Runs a head-to-head performance test of duplicate marking using ADAM, samtools, and sambamba, and picard. ** sort-head-to-head: Runs a head-to-head performance test of sorting using ADAM, samtools, and sambamba, and picard. - Multiple node recipes: ** adam-transforms: Runs scale-out performance testing on ADAM's BQSR, Flagstat, INDEL realignment, duplicate marking, and sort implementations.
Running a Single Node Recipe
To run a single node recipe, run:
fab _configure_master_aptitude
fab bake:<recipe>
Running a Multi Node Recipe
To run a multi node recipe with n nodes, run:
fab provision:<n>
fab _configure_master_yum
fab bake:<recipe>
Adding a New Recipe
If you are adding a new recipe, you should add a directory. Under this directory, you should create a Bash
script named run.sh
that runs the recipe. If the recipe needs setup, you should add the necessary details to
the fabfile configuration target.
Getting In Touch
Mailing List
The ADAM mailing list is a good
way to sync up with other people who use bdg projects including the core developers. You can
subscribe by sending an email to adam-developers+subscribe@googlegroups.com
or just post using
the web forum page.
IRC Channel
A lot of the developers are hanging on the #adamdev freenode.net channel. Come join us and ask questions.
License
bdg-recipes is released under an Apache 2.0 license.