bennuttall / Deep-PICO

Deep-ish learning for PICO extraction from abstracts

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deep-PICO

Experiments in deep (OK, shallow, but using embeddings) for PICO identification.

##Requirements

python2.7

Keras

    $ pip install keras

scikit-learn

    $ pip install -U scikit-learn

gensim

    $ pip install gensim

theano

    $ pip install theano

nltk

    $ pip install nltk

geniatagger

    got to http://www.nactem.ac.uk/GENIA/tagger/
    unzip: tar xvzf geniatagger.tar.gz
    navigate to geniatagger and make
    
    install the python wrapper
    pip install geniatagger-python

sklearn_crfsuite

    $ pip install sklearn_crfsuite

pycrfsuite

    $ pip install pycrfsuite

Installing tensorflow

    # Ubuntu/Linux 64-bit
    $ sudo apt-get install python-pip python-dev
    
    # Mac OS X
    $ sudo easy_install pip
    # Ubuntu/Linux 64-bit, CPU only:
    $ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.7.1-cp27-none-linux_x86_64.whl
    
    # Ubuntu/Linux 64-bit, GPU enabled:
    $ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.7.1-cp27-none-linux_x86_64.whl
    
    # Mac OS X, CPU only:
    $ sudo easy_install --upgrade six
    $ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.7.1-cp27-none-any.whl

##Usage

###Running the Conditional Random Field Model

    $ python crf.py 

###Command line arguments

    --w2v               # 1 or 0 whether to use word vectors or not as features
    --iters             # number of iterations to train on    
    --l1                # l1 regulerzation term
    --l2                # l2 regulerzation term
    --wiki              # 1 or 0 whether to use the word vectors trained on wikipedia and pubmed 
    --shallow_parse     # 1 or 0 whether to use standerd POS features
    --words_before      # number of words to use as features that come before each token
    --words_after       # number of words to use as features that come after each token
    --grid_search       # 1 or 0 whether to search for optimal hyperparmeters with grid search

###Running the Convolutional or Standard Neural Network To use the Convolutional Neural Network or Standard Feed forward Neural Network

    $ python GroupCNNExperiment.py  

###Command line arguments

    --window_size       # the number of words to use as features 
    --wiki              # 1 or 0 Use the word vectors trained on pubmed and wikipedia 
    --n_feature_maps    # the numner of feature maps for the CNN only
    --epochs            # number of epochs to train the model for
    --undersample       # 1 or 0 whether to train the model with 
    --criterion         # the loss function
    --optimizer         # optimization algorthim 
    --model             # nn or cnn | whether to use a Convotuonal or feed forward neural network 
    --genia             # 1 or 0
    --tacc              # for personal use only or if you have access to TACC for some reason 
    --layers            # format <1,2,3,4> the numbers of hidden layers in the network

About

Deep-ish learning for PICO extraction from abstracts


Languages

Language:Python 97.2%Language:Jupyter Notebook 2.8%