zuzhi / practical-data-engineering

Real estate dagster pipeline

Home Page:https://www.sspaeti.com/blog/data-engineering-project-in-twenty-minutes/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Practical Data Engineering Project

This is a practical example of a data engineering project with real-estates. The connected blog post about Building a Data Engineering Project in 20 Minutes you can find on my website. Topics are:


The Status of the project you find here.

Starting Dagster

To get MinIO, Spark, Kubernetes, etc. ready, check the representive folder in here.

  1. MinIO started
  2. Kubernetes ready
  3. Spark image and role and namespaces ready
  4. cd src/pipelines/real-estate and start dagit with dagit

Start issue

  1. ImportError
ImportError: cannot import name 'SubscriptionObserver' from 'graphql_ws.gevent'

Manually install graphql_ws with version 0.3.1.

pip install graphql_ws==0.3.1
  1. dagster.check.CheckError
dagster.check.CheckError: Invariant failed. Description: Attempted to deserialize class "InstigatorState" which is not in the whitelist.

Use a clean dagster home.

export DAGSTER_HOME="~/dagster-home-real-estate"
  1. ModuleNotFoundError
ModuleNotFoundError: No module named 'bs4'

Install bs4

pip install bs4
  1. UserWarning
UserWarning: No dagster instance configuration file (dagster.yaml) found at /Users/zuzhi/dagster-home-real-estate.

Create an empty dagster.yaml file in /Users/zuzhi/dagster-home-real-estate.

touch ~/dagster-home-real-estate/dagster.yaml

About

Real estate dagster pipeline

https://www.sspaeti.com/blog/data-engineering-project-in-twenty-minutes/


Languages

Language:Jupyter Notebook 98.6%Language:Python 1.4%