MLDSBigGuy / SHARE

SHARE is building a free, open, data set about research and scholarly activities across their life cycle.

Home Page:http://share-research.readthedocs.io/en/latest/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SHARE v2

SHARE is creating a free, open dataset of research (meta)data.

Coverage Status Gitter

Technical Documentation

http://share-research.readthedocs.io/en/latest/index.html

On the OSF

https://osf.io/sdxvj/

Get involved

We'll be expanding this section in the near future, but, beyond using our API for your own purposes, harvesters are a great way to get started. You can find a few that we have in our list here.

Setup for testing

It is useful to set up a virtual environment to ensure python3 is your designated version of python and make the python requirements specific to this project.

mkvirtualenv share -p `which python3.5`
workon share

Once in the share virtual environment, install the necessary requirements, then setup SHARE.

pip install -Ur requirements.txt
python setup.py develop
pyenv rehash  # Only necessary when using pyenv to manage virtual environments

docker-compose assumes Docker is installed and running. Running ./bootstrap.sh will create and provision the database. If there are any SHARE containers running, make sure to stop them before bootstrapping using docker-compose stop.

docker-compose build web
docker-compose run --rm web ./bootstrap.sh

Run

Run the API server

# In docker
docker-compose up -d web

# Locally
sharectl server

Setup Elasticsearch

sharectl search setup

Run Celery

# In docker
docker-compose up -d worker

# Locally
sharectl worker -B

Populate with data

This is particularly applicable to running ember-share, an interface for SHARE.

Harvest data from providers, for example

sharectl harvest com.nature
sharectl harvest com.peerj.preprints

# Harvests may be scheduled to run asynchronously using the schedule command
sharectl schedule org.biorxiv.html

# Some sources provide thousands of records per day
# --limit can be used to set a maximum number of records to gather
sharectl harvest org.crossref --limit 250

If the Celery worker is running, new data will automatically be indexed every couple minutes.

Alternatively, data may be explicitly indexed using sharectl

sharectl search
# Forcefully re-index all data
sharectl search --all

Building docs

cd docs/
pip install -r requirements.txt
make watch

Running Tests

Unit test suite

py.test

BDD Suite

behave

About

SHARE is building a free, open, data set about research and scholarly activities across their life cycle.

http://share-research.readthedocs.io/en/latest/index.html

License:Apache License 2.0


Languages

Language:Python 99.0%Language:Gherkin 0.5%Language:HTML 0.3%Language:CSS 0.2%Language:Shell 0.0%