Integration testing environment for search platform analytics deployment. = Overview The integration environment is a docker-compose project that ties together a variety of docker containers to provide many of the services the analytics deployment typically interacts with. Notably this is *two* separate docker-compose environments, one for bigtop+airflow and one for everything else. This is because bigtop expects to only find bigtop containers in it's config. The build is organized roughly as: build.sh -> creates hadoop+airflow with etc/bigtop/docker-hadoop.sh * This uses Dockerfile-bigtop_airflow -> after creation bin/init-hive-state.sh to run create dbs/tables/etc. Dockerfile-bigtop_airflow -> Extended version of hadoop container that also contains our airflow installation. docker-hadoop.sh -> creates running hadoop container from etc/bigtop == Dependencies * Docker * Plenty of disk space * ??? == Running One command should build and start hadoop + airflow: ./build.sh --create On success the final output will be: ******************** Provisioning of hive state complete! ******************** === Test invocations of airflow tasks Airflow will be available at http://localhost:7887/. Individual airflow tasks can be invoked as ./build.sh --airflow-test <dag_id> <task_id> 2001-1-15T19:00 Start the bigtop hadoop instance. This script must be run from the root of the integration repository, as that will force it to use our docker-compose.yml file and ensure everything is kept together. ./docker-hadoop.sh -C config_debian-9.yaml --stack "hadoop,hive,kafka" --create 1 === Airflow Scheduler By default the airflow scheduler is not enabled. If you wish to enable it run the following after building the environment: ./build.sh --exec systemctl enable airflow-scheduler ./build.sh --exec systemctl start airflow-scheduler === Random notes * test data dates from https://en.wikipedia.org/w/index.php?oldid=908493298 ### OLD STUFF # Init swift authentication curl -v -H 'X-Storage-User: test:tester' -H 'X-Storage-Pass: testing' http://localhost:8808/auth/v1.0 curl -v -H 'X-Auth-Token: <token>' <x-auth-url> # Perform swift upload for glent # * swift_upload.py hacked at SwiftHelper.prefix_uri to replace localhost:8808 with swift01:8080 so it can run outside a container # * venv runs python3, with docopt and python_swiftclient installed via pip # * TODO: Write container? PATH=$PWD/venv/bin:$PATH venv/bin/python swift_upload.py --event-service-url=http://localhost:8192/v1/events --swift-overwrite=true etc/swift_auth.env search_glent upload_test/20190727/ # Perform popularity_score upload # * same hacked swift_upload.py from above # * same venv from above # Create indices to import into for wiki in enwiki eswiki; do curl -XPUT localhost:9200/${wiki}_content -H 'Content-Type: application/json' -d '{"mappings":{"page":{"properties":{"popularity_score": {"type": "float"}}}}}'; done PATH=$PWD/venv/bin:$PATH venv/bin/python swift_upload.py --event-service-url=http://localhost:8192/v1/events --swift-overwrite=true etc/swift_auth.env search_popularity_score --event-per-object=true upload_test/popularity_score/20190727/