ReznikovRoman / netflix-etl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Netflix ETL

Netflix ETL pipelines for syncing data between Postgres and Elasticsearch.

Stack

Postgres, Elasticsearch, Redis

Services

Configuration

Docker containers:

  1. redis
  2. elasticsearch
  3. kibana
  4. etl
  5. db_admin
  6. server_admin

docker-compose files:

  1. docker-compose.yml - for local development.

To run docker containers, you need to create a .env file in the root directory.

.env example:

ENV=.env

# Netflix Admin
# Django
DJANGO_SETTINGS_MODULE=netflix.settings
DJANGO_CONFIGURATION=External
DJANGO_ADMIN=django-cadmin
DJANGO_SECRET_KEY=7)%31(changeme12@g87_h8vthi4dj=1pjp^ysv*tnm_fx$$fw6
ALLOWED_HOSTS=localhost,127.0.0.1

# Project
NA_PROJECT_BASE_URL=http://localhost:8000

# Media
NA_MEDIA_URL=/media/
NA_STATIC_URL=/staticfiles/

# Postgres
NA_DB_HOST=db_admin
NA_DB_PORT=5432
NA_DB_NAME=netflix
NA_DB_USER=test
NA_DB_PASSWORD=yandex

# Scripts
NA_DB_POSTGRES_BATCH_SIZE=500

# Netflix ETL
# Redis
NE_REDIS_HOST=redis
NE_REDIS_PORT=6379
NE_REDIS_DECODE_RESPONSES=1

# Elasticsearch
NE_ES_HOST=elasticsearch
NE_ES_PORT=9200
NE_ES_RETRY_ON_TIMEOUT=1

Start project:

Locally:

docker compose build
docker compose up

To fill DB with test data

docker compose run --rm server bash -c "cd /app/scripts/load_db && python load_data.py"

Development

Sync environment with requirements.txt / requirements.dev.txt (will install/update missing packages, remove redundant ones):

make sync-requirements

Compile requirements.*.txt files (have to re-compile after changes in requirements.*.in):

make compile-requirements

Use requirements.local.in for local dependencies; always specify constraints files (-c ...)

Example:

# requirements.local.txt

-c requirements.txt

ipython

Code style:

Before pushing a commit run all linters:

make lint

About


Languages

Language:Python 95.6%Language:Dockerfile 2.7%Language:Makefile 1.7%