boalang / NR_Dataset

A Cyberinfrastructure to Analyze Large-Scale Genome Data

Home Page:http://boa.cs.iastate.edu/boag/index.php

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A Cyberinfrastructure to Analyze Large-Scale Genome Data

BoaG: Boa for Genomics

BoaG is a domain-specific language and infrastructure on top of Hadoop for genomics data. The infrastructure is publicly available here: http://boa.cs.iastate.edu/boag/index.php

BoaG example on the infrastructure:

http://boa.cs.iastate.edu/examples/boag/index.php

Dataset: Non Redundant (NR) and CD-HIT clustering information

Protobuffer schema and the step by step data generation is shown here.

Interpreting BoaG Output

BoaG can be integrated with jupyter notebook. The output can easily be further analyzed with R, Python, Bash, Perl, etc. The most time-consuming part of the analysis is done via the BoaG infrastructure. More examples can be found in the jupyter notebooks folder.

BoaG Compiler source code

BoaG compiler is written in Java and the source code is available here

  • This is a video on step by step instructions to set up programming environment on Eclipse for Boa compiler. link

About

A Cyberinfrastructure to Analyze Large-Scale Genome Data

http://boa.cs.iastate.edu/boag/index.php


Languages

Language:Jupyter Notebook 60.4%Language:Java 38.7%Language:GAP 0.7%Language:Shell 0.2%Language:Dockerfile 0.0%Language:Batchfile 0.0%