jaywonder20 / apache_airflow_basics

This is a simple demonstration of Apache Airflow hosted on Heroku.This project implements a simple DAG that fetches the top questions from StackOverflow tagged airflow and forwards to a specified email address. The dag is set to run daily. CHECKOUT MY ARTICLE AT https://medium.com/analytics-vidhya/apache-airflow-what-it-is-and-why-you-should-start-using-it-c6334090265d

Home Page:https://airflow-stackoverflow.herokuapp.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Apache airflow instance on heroku

This is a simple demonstration of Apache Airflow hosted on heroku

This project implements a simple DAG that fetches the top questions from stackoverflow with the tag "airflow" and forwards to a specified email address

Actually this is over engineered and can be done with a simple cronjob or a simple .py script but this a simple project I used to learn apache airflow


To get started a basic knowledge of apache airflow, Heroku cli , AWS S3 bucket and python is required

Step 1

  • Option 1

    • 🍴 Fork this repo!
  • Option 2

    • 👯 Clone this repo to your local machine using https://github.com/jaywonder20/apache_airflow_basics.git

Step 2

  • Create heroku app and add postgreSql Add-on 🔨🔨🔨

necessary configuration for heroku app

Set the following from Heroku CLi
heroku config:set AIRFLOW_HOME=/app

set environment variables

set AIRFLOW__CORE__SQL_ALCHEMY_CONN in  .profile to your postgreSql connection string

Heroku will automatically export .profile to the env on dyno start up. This way if/when your DB URL changes, it will automatically update.

  • NB: To prevent error during configuration change the "dags_folder" in the airflow.cfg file to a non existent folder to prevent error as the airflow instance is not configured yet
  • push app to heroku

Step 3

Now some configuration

configure the following in the airflow.cfg file
sql_alchemy_conn= postgress db uri
smtp_user =xxxxx@gmail.com
smtp_password =password
smtp_port = 587

Step 4

create s3 bucket and get key https://preventdirectaccess.com/docs/amazon-s3-quick-start-guide/

Step 5

set the following connection parameters:


Step 6

  • Create a Stackoverflow app
  • Set the parameters in the variables.json file
  • import variables.json file into variables from the airflow UI

Step 7

  • Run the dag from the airflow UI (The dag runs sucessfully and sends the mail to the specified email address)

Step 8

secure your account

 secure the app by adding an extra environment variables to the .profile file.

export AIRFLOW__WEBSERVER__AUTH_BACKEND=airflow.contrib.auth.backends.password_auth

Step 9

Open heroku bash with the Command

heroku run bash

Start python on the heroku bash and type (you know i mean copy right) the following commands as also described in Airflow’s official Documentation.

>>> import airflow
>>> from airflow import models, settings
>>> from airflow.contrib.auth.backends.password_auth import PasswordUser
>>> user = PasswordUser(models.User())
>>> user.username = 'new_user_name'
>>> user.email = 'new_user_email@example.com'
>>> user.password = 'set_the_password'
>>> session = settings.Session()
>>> session.add(user)
>>> session.commit()
>>> session.close()
>>> exit()

If everything went well, you should be able to see this screen in your browser:

#####Proceed to modify DAG for further customization


Reach out to me at one of the following places!




This is a simple demonstration of Apache Airflow hosted on Heroku.This project implements a simple DAG that fetches the top questions from StackOverflow tagged airflow and forwards to a specified email address. The dag is set to run daily. CHECKOUT MY ARTICLE AT https://medium.com/analytics-vidhya/apache-airflow-what-it-is-and-why-you-should-start-using-it-c6334090265d



Language:Python 91.4%Language:HTML 4.9%Language:Shell 3.6%