ranjanmanish / airflow-local

A Ubuntu (14.04) Vagrant Virtual Machine (VM) with Airflow, a data workflow management system from Airbnb

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

airflow-local

About

This project provides a Ubuntu (14.04) Vagrant Virtual Machine (VM) with Airflow, a data workflow management system from Airbnb.

There are Puppet scripts that automatically install the software when the VM is started.

Connect to the VM

  1. To start the virtual machine(VM) type

    vagrant up
    
  2. Connect to the VM

    vagrant ssh
    

Initialize Airflow

  1. Setup the home directory

    export AIRFLOW_HOME=~/airflow
    
  2. Initialize the sqlite database

    airflow initdb
    
  3. Start the web server

    airflow webserver -p 8080
    
  4. Open a web browser to the UI at http://192.168.33.10:8080

Run a task

  1. List DAGS

    airflow list_dags
    
  2. List tasks for example_bash_operator DAG

    airflow list_tasks example_bash_operator
    
  3. List tasks for example_bash_operator in a tree view

    airflow list_tasks example_bash_operator -t
    
  4. Run the runme_0 task on the example_bash_operator DAG today

    airflow run example_bash_operator runme_0 `date +%Y-%m-%d`
    
  5. Backfill a DAG

    export START_DATE=$(date -d "-2 days" "+%Y-%m-%d")
    airflow backfill -s $START_DATE example_bash_operator
    
  6. Clear the history of DAG runs

    airflow clear example_bash_operator
    

Add a new task

  1. Go to the Airflow config directory

    cd ~/airflow
    
  2. Set the airflow dags directory in airflow.cfg by change the line:

    dags_folder = /vagrant/airflow/dags
    
  3. Restart the web server

    airflow webserver -p 8080
    

Documentation

  1. Main documentation

  2. Videos on Airflow

  3. Slides

  4. Airflow reviews

  5. Airflow tips and tricks

Disable logging

  1. Change to the airflow directory

    cd /vagrant/airflow
    
  2. Set airflow environment

    source set_airflow_env.sh
    
  3. Run airflow without any logging messages

Setup airflow dags directory

  1. Edit file ~/airflow/airflow.cfg

  2. Set the following:

    dags_folder = /vagrant/airflow/dags
    load_examples = False
    
  3. Start the scheduler by running the following

    airflow scheduler
    

Requirements

The following software is needed to get the software from github and run Vagrant to set up the Python development environment. The Git environment also provides an SSH client for Windows.

How to copy the local files from laptop to Vagrant

About

A Ubuntu (14.04) Vagrant Virtual Machine (VM) with Airflow, a data workflow management system from Airbnb


Languages

Language:Python 44.5%Language:Puppet 32.5%Language:Shell 22.0%Language:Batchfile 1.0%