JagadeeshwaranM / Data_Engineering_Simplified

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Engineering Roadmap

  1. Learn SQL... Aggregations with GROUP BY Joins (INNER, LEFT, FULL OUTER) Window functions Common table expressions etc.

You can learn from https://www.w3schools.com/

  1. Learn python/Scala..... Learn basics for/while/if loops, functional programming, abstract methods, traits Learn libraries like numpy, pandas, scikit-learn etc.

you can learn https://lnkd.in/gSz45km5

  1. Learn distributed computing... Hadoop versions/hadoop architecture fault tolerance in hadoop Read/understand about Mapreduce processing. learn optimizations used in mapreduce etc.

  2. Learn data ingestion tools... Learn Sqoop/ Kafka/NIFi Understand their functionality and job running mechanism.

  3. Learn data processing/NOSQL.... Spark architecture/ RDD/Dataframes/datasets. lazy evaluation, DAGs/ Lineage graph/optimization techniques YARN utilization/ spark streaming etc.

  4. Learn data warehousing..... Understand how HIve store and process the data different File formats/ compression Techniques. partitioning/ Bucketing. different UDF's available in Hive. SCD concepts. Ex Hbase. cassandra

  5. Learn job Orchestration... Learn Airflow/Oozie learn about workflow/ CRON etc.

  6. Learn Cloud Computing.... Learn Azure/AWS/ GCP. understand the significance of Cloud in #dataengineering Learn Azure synapse/Redshift/Big query Learn Ingestion tools/pipeline tools like ADF etc.

  7. Learn basics of CI/ CD and Linux commands.... Read about Kubernetes/Docker. And how crucial they are in data.

About


Languages

Language:Python 100.0%