samibrahmi / scraper

Scrape experiment data off of MLab nodes and upload it to Google Cloud Storage

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build Status Coverage Status

Scraper

Scrape experiment data off of MLab nodes and upload it to the ETL pipeline.

Development Requirements

All tests are run inside docker containers, so the big requirement is that you must be able to build and run docker containers. The ./pre-commit.sh script runs the tests and the ./prepare-commit-msg.sh script makes a nice description of the current state of test coverage and code health. I recommend following the instructions in the comments of each of those files encouraging you to make symlinks so that the scripts are run automatically on every checkin. Those scripts run tests inside Docker containers, and the containers and tests are (or should be) exactly the same as the ones that Travis runs, so this local testing should pass if and only if the Travis CI tests will pass.

Building and running

To build and push the image to GCR and deploy to production, type

./deploy.sh production

To build and push the image to GCR and deploy to staging, type

./deploy.sh staging

The rest of this doc describes how to run the image locally or to set up a cluster from scratch.

To run the image locally, try:

sudo docker build . -t scraper && \
  sudo docker run -it -p 9090:9090 \
    -e RSYNC_MODULE=ndt \
    -e RSYNC_HOST=ndt.iupui.mlab1.yyz01.measurement-lab.org \
    scraper

If you would like to run things on your own cluster, you will need your own cluster! I created the cluster in staging with the following command lines:

gcloud container \
  --project "mlab-oti" clusters create "scraper-cluster" \
  --zone "us-central1-a" \
  --machine-type "n1-standard-1" \
  --image-type "GCI" \
  --disk-size "40" \
  --scopes "https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/spreadsheets" \
  --num-nodes "200" \
  --network "default" \
  --enable-cloud-logging \
  --node-labels=scraper-node=true \
  --no-enable-cloud-monitoring

gcloud --project=mlab-sandbox container node-pools create prometheus-pool \
  --cluster=scraper-cluster \
  --num-nodes=2 \
  --node-labels=prometheus-node=true \
  --machine-type=n1-standard-8

About

Scrape experiment data off of MLab nodes and upload it to Google Cloud Storage

License:Apache License 2.0


Languages

Language:Python 88.0%Language:Shell 11.1%Language:Dockerfile 0.9%