blekhmanlab / biorxiv_countries

Code repository for "International authorship and collaboration in bioRxiv preprints"

Home Page:https://www.biorxiv.org/content/10.1101/2020.04.25.060756v1

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

rxivist spider

Running the spider for real

The web crawler runs in a lightly customized Docker container and can be launched from any server (or workstation) that has access to the database.

# start ROR API:
git clone https://github.com/ror-community/ror-api.git
cd ror-api
docker-compose up -d
docker-compose exec web python manage.py setup

# start database:
cd ..
git clone https://github.com/blekhmanlab/biorxiv_countries.git
cd biorxiv_countries/code/db
docker build . -t local_rxdbthing:latest
docker-compose up

# connect ROR API and the database:
cd ..
docker network connect ror-api_default authordb

# launch spider:
cd biorxiv_countries/code
docker build . -t countryspider:latest
docker run -it --rm --name localspider -v "$(pwd)":/app --entrypoint "bash" --env RX_DBHOST --env RX_DBPASSWORD --env RX_DBUSER --net ror-api_default countryspider:latest

About

Code repository for "International authorship and collaboration in bioRxiv preprints"

https://www.biorxiv.org/content/10.1101/2020.04.25.060756v1

License:GNU Affero General Public License v3.0


Languages

Language:TSQL 78.4%Language:Python 20.5%Language:Dockerfile 0.7%Language:Shell 0.3%