boalang / NR

Detecting and correcting misclassified sequences in the large-scale public databases

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Detecting and correcting misclassified sequences in the large-scale public databases

Dataset: Non Redundant (NR) and CD-HIT clustering information


Boag: Boa for genomics

Boag is a domain-specific language and infrastructure on top of Hadoop for genomics data. Website: https://boalang.github.io/bio/

Boag example on the infrastructure: http://boa.cs.iastate.edu/examples/boag/index.php

Prerequisites

You need to install Java. Boag compiler is written in Java. It can be downloaded here.

Run Boag

These instructions will get you a command line, jupyter notebook, Docker container, and Hadoop version of Boag. You can also set up a programming environment in Eclipse.

From Jupyter notebook

From command line

On a Docker container

On Hadoop

Boag Compiler source code

  • Boag compiler is written in Java. See the source code
  • This is a video on step by step instructions to set up programming environment on Eclipse for Boa compiler. link

Boag Query Script examples:

Download dataset and VirtualBox

  • Google Drive Link
  • Web interface is also implemented in the Ubuntu linux and it can be seen in the VirtualBox.

About

Detecting and correcting misclassified sequences in the large-scale public databases


Languages

Language:Jupyter Notebook 49.6%Language:Java 49.0%Language:GAP 0.9%Language:Python 0.3%Language:Shell 0.2%Language:Dockerfile 0.0%Language:Batchfile 0.0%