SuLab / bioshovel

Biological text mining tool.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BioShovel

For text mining of biological research articles using DeepDive

Tong Shu Li and Sandip Chatterjee of the Su Lab

Build Status

Getting Started

Requirements

  • Python 3.4+
  • Java 1.8 for various NER/NLP annotators
  • PBS/Torque cluster for cluster workflows
  • Python package dependencies in requirements.txt
    • lxml may require external libxml2 installation (using a tool like apt-get)

Installing DeepDive (v0.8)

  • See DeepDive main page for the latest installation instructions
  • Run bash <(curl -fsSL git.io/getdeepdive)
  • Install DeepDive by selecting option from menu
  • Install PostgreSQL by selecting option from menu

Ensuring pip3 and libxml2 are installed

  • On Ubuntu (14.04), run sudo apt-get install -y python3-pip
  • Install lxml dependencies using: sudo apt-get install -y libxml2 libxml2-dev libxslt1-dev lib32z1-dev

Installing Python dependencies

  • Clone the repo and cd bioshovel
  • Create a virtualenv: $ python3 -m venv venv
  • Activate virtualenv: $ source venv/bin/activate
  • Install dependencies: (venv) $ pip install -r requirements.txt

Run BioShovel modules

  • Modules should be run from the src directory
  • Use (venv) $ python3 -m [package_name].[module_name] [args]
  • See preprocess and downloaders packages for more information

Running Unit Tests

  • Tests should be run from the src directory
  • Run test discovery using python3 -m unittest

About

Biological text mining tool.


Languages

Language:Python 98.5%Language:Shell 1.5%