ebernhardson / analytics-integration

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Integration testing environment for search platform analytics deployment.

= Overview

The integration environment is a docker-compose project that ties together
a variety of docker containers to provide many of the services the analytics
deployment typically interacts with. Notably this is *two* separate docker-compose
environments, one for bigtop+airflow and one for everything else. This is because
bigtop expects to only find bigtop containers in it's config.

The build is organized roughly as:

build.sh -> creates hadoop+airflow with etc/bigtop/docker-hadoop.sh
            *  This uses Dockerfile-bigtop_airflow
         -> after creation bin/init-hive-state.sh to run create dbs/tables/etc.

Dockerfile-bigtop_airflow -> Extended version of hadoop container that
  also contains our airflow installation.

docker-hadoop.sh -> creates running hadoop container from etc/bigtop

== Dependencies

* Docker
* Plenty of disk space
* ???

== Running

One command should build and start hadoop + airflow:

  ./build.sh --create

On success the final output will be:

  ********************
  
  Provisioning of hive state complete!
  
  ********************

=== Test invocations of airflow tasks

Airflow will be available at http://localhost:7887/. Individual airflow tasks
can be invoked as

  ./build.sh --airflow-test <dag_id> <task_id> 2001-1-15T19:00

Start the bigtop hadoop instance. This script must be run from the root of the integration
repository, as that will force it to use our docker-compose.yml file and ensure everything
is kept together.

  ./docker-hadoop.sh -C config_debian-9.yaml --stack "hadoop,hive,kafka" --create 1


=== Airflow Scheduler

By default the airflow scheduler is not enabled. If you wish to enable it run the
following after building the environment:

  ./build.sh --exec systemctl enable airflow-scheduler
  ./build.sh --exec systemctl start airflow-scheduler


=== Random notes

* test data dates from https://en.wikipedia.org/w/index.php?oldid=908493298





### OLD STUFF

# Init swift authentication

curl -v -H 'X-Storage-User: test:tester' -H 'X-Storage-Pass: testing' http://localhost:8808/auth/v1.0
curl -v -H 'X-Auth-Token: <token>' <x-auth-url>

# Perform swift upload for glent
# * swift_upload.py hacked at SwiftHelper.prefix_uri to replace localhost:8808 with swift01:8080 so it can run outside a container
# * venv runs python3, with  docopt and python_swiftclient installed via pip
# * TODO: Write container?
PATH=$PWD/venv/bin:$PATH venv/bin/python swift_upload.py --event-service-url=http://localhost:8192/v1/events --swift-overwrite=true etc/swift_auth.env search_glent upload_test/20190727/


# Perform popularity_score upload
# * same hacked swift_upload.py from above
# * same venv from above

# Create indices to import into
for wiki in enwiki eswiki; do curl -XPUT localhost:9200/${wiki}_content -H 'Content-Type: application/json' -d '{"mappings":{"page":{"properties":{"popularity_score": {"type": "float"}}}}}'; done

PATH=$PWD/venv/bin:$PATH venv/bin/python swift_upload.py --event-service-url=http://localhost:8192/v1/events --swift-overwrite=true etc/swift_auth.env search_popularity_score --event-per-object=true upload_test/popularity_score/20190727/



About


Languages

Language:Shell 98.0%Language:Python 2.0%