End to End data pipeline test
Code for the post Setting up end-to-end tests for cloud data pipelines
Architecture
This is what our data pipeline architecture looks like.
For our local setup, we will use
- Open source sftp server
- Moto server to mock S3 and Lambda
- Postgres as a substitute for AWS Redshift
Prerequisites & Setup
To run, you will need
Clone, create a virtual env, set up python path, spin up containers and run tests as shown below.
git clone https://github.com/josephmachado/e2e_datapipeline_test.git
python -m venv ./env
source env/bin/activate # use virtual environment
pip install -r requirements.txt
make up # spins up the SFTP, Motoserver, Warehouse docker containers
export PYTHONPATH=${PYTHONPATH}:./src # set path to enable imports
Run tests
We can run our tests using pytest
.
pytest # runs all tests under the ./test folder
Clean up
make ci
Tear down
make down # spins down the docker containers
deactivate # stop using the virtual environment