folker / myM5NR

M5NRv2 -- non-redundant protein and rRNA database integration

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

myM5NR

local version of M5NR

Installation with Docker

To build this image:

git clone https://github.com/MG-RAST/myM5NR.git

There are seperate dockerfiles for the different actions available: download, parse, build, upload They can be built with the following commands:

docker build -t mgrast/m5nr .

Examples for manual invocation:

docker run -t -d --name mgrast/m5nr -v /var/tmp/m5nr:/m5nr_data mgrast/m5nr bash

From now steps execute inside the container

Set up some environment bits

mkdir -p /m5nr_data/Sources
mkdir -p /m5nr_data/Parsed
mkdir -p /m5nr_data/Build

To initiate the download (you can use --force to delete old _part directories)

cd /m5nr_data
/myM5NR/bin/m5nr_compiler.py download --debug 2>&1 | tee /m5nr_data/Sources/logfile.txt

To initiate the parsing (work in progress)

cd /m5nr_data
/myM5NR/bin/m5nr_compiler.py parse --debug 2>&1 | tee /m5nr_data/Parsed/logfile.txt

To view status

cd /m5nr_data
/myM5NR/bin/m5nr_compiler.py status --debug

To use an automated wrapper script for full round build

docker exec m5nr m5nr_master.sh -a download
docker exec m5nr m5nr_master.sh -a parse
docker exec m5nr m5nr_master.sh -a build -v <m5nr version #>
docker exec m5nr m5nr_master.sh -a upload -v <m5nr version #> -t <shock token>

To load build data on solr server, run following on same host

docker exec m5nr docker_setup.sh
docker exec m5nr solr_load.sh -n -i <shock file download url> -v <m5nr version #> -s <solr url>

To load build data on cassandra cluster, run following

docker exec m5nr cassandra_load.py -n -i <shock file download url> -v <m5nr version #> -t <shock token>

To check table sizes in cassandra for new m5nr build

CQLSH="/usr/bin/cqlsh --request-timeout 600 --connect-timeout 600"
for T in `docker exec cassandra-simple $CQLSH -e "USE m5nr_v12; describe tables;"`; do echo $T; docker exec cassandra-simple $CQLSH -e "USE m5nr_v12; CONSISTENCY QUORUM; SELECT COUNT(*) FROM $T;"; done

About

M5NRv2 -- non-redundant protein and rRNA database integration

License:BSD 2-Clause "Simplified" License


Languages

Language:Perl 51.3%Language:Python 32.0%Language:Shell 13.4%Language:Java 2.3%Language:TSQL 0.5%Language:Dockerfile 0.2%Language:Makefile 0.1%