Repo with data engineering resources
- To be clear, this is not a roadmap for
getting started
with Data Engineering. - I am not covering the books you should study, university studies, certificates, etc.
- I assume you have satisfactory understanding of
Python
andSQL
.Scala
good to have. - Having basics understanding of CI/CD is needed ( There is DevOps / MLOps to help offcourse )
- After knowing the basics and how things work, it's upon you, what to do ( Or lets say if it's your cup of tea / coffee or not )
Remember one thing,
knowing
andimplementing
Data Engineering tools are different thing, try toimplement
if it is a simple program or project.
- Databricks and Snowflake
Do some research on what sorts of company you want to apply job and what tools they use ( you can achieve this by just going through the job description of those companies) Example:
- There are many books but if you want me to suggest one, go for
Fundamentals of Data Engineering
by Joe Reis, Matt Housley.
There are many repos with greate content / links. Some of them 👇 Suggestion: Just search data engineering and find the best ones,
- Data Engineer Handbook <-- One example out of many
- There is unlimited knoweledge you can grasp, try to find the best ones and follow them instead of jumping among videos.
Main thing I want to highlight, practice practice and practice, take help with AI assistants 👇
- Perplexity AI --> let's put this way, it's Google Search with LLMs with it.
- ChatGPT --> Based on your need, free or paid version. ( Team, Enterprise , etc)
- Bing Chat , Bing Enterprise.
- Hugging Chat
This page will be updated over time. Cheers !!