gregoryg / cdsw-hail-genetics-tutorial

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hail tutorial

Created by Tom White (tom@cloudera.com)

Hail is an open-source, scalable framework for exploring and analyzing genetic data. This repo contains the Hail Tutorial, lightly reformatted to run in Cloudera Data Science Workbench.

Status: In Progress
Use Case: Genetics

Steps:

  1. Go to Project > Settings > Environment > Spark Configuration: hail-genetics-tutorial/spark-defaults.conf
  2. Open a CDSW terminal and run setup.sh
  3. Create a Python Session and run tutorial.py
  4. When finished, run cleanup.sh in the terminal

Recommended Session Sizes:

Estimated Runtime:

Notes:

  1. HAIL requires java version 8. If you are running multiple versions on java on your system, you can set the Project Setting's Environmental Varaiables for JAVA_HOME, PATH, etc.

Recommended Jobs/Pipeline: None

Demo Script TBD

Related Content: Video (Internal Only!): https://cloudera.webex.com/cloudera/ldr.php?RCID=af7861670238dc884a134c59ce55049e

About


Languages

Language:Python 96.9%Language:Shell 3.1%