gr4nada / data-scientist-roadmap

Jobs linked to data science are becoming more and more popular. A bunch of tutorials could easily complete this roadmap, helping whoever wants to start learning stuff about data science.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

data-scientist-roadmap

I just found this data science skills roadmap, drew by geeks for geeks on his cool page.


roadmap-picture


Jobs linked to data science are becoming more and more popular. A bunch of tutorials could easily complete this roadmap, helping whoever wants to start learning stuff about data science.

A Roadmap to Learn

Mathematics

Math skill is very important as they help us in understanding various machine learning algorithms that play an important role in Data Science.

Part 1:

  • Linear Algebra
  • Analytic Geometry
  • Matrix
  • Vector Calculus
  • Optimization

Part 2:

  • Regression
  • Dimensionality Reduction
  • Density Estimation
  • Classification

Probability

Probability is also significant to statistics, and it is considered a prerequisite for mastering machine learning.

  • Introduction to Probability
  • 1D Random Variable
  • The function of One Random Variable
  • Joint Probability Distribution
  • Discrete Distribution
  • Binomial (Python | R)
  • Bernoulli
  • Geometric etc
  • Continuous Distribution
  • Uniform
  • Exponential
  • Gamma
  • Normal Distribution (Python | R)

Statistics

Understanding of Statistics is very significant as this is a part of Data analysis.

  • Introduction to Statistics
  • Data Description
  • Random Samples
  • Sampling Distribution
  • Parameter Estimation
  • Hypotheses Testing (Python | R)
  • ANOVA (Python | R)
  • Reliability Engineering
  • Stochastic Process
  • Computer Simulation
  • Design of Experiments
  • Simple Linear Regression
  • Correlation
  • Multiple Regression (Python | R)
  • Nonparametric Statistics
  • Sign Test
  • The Wilcoxon Signed-Rank Test (R)
  • The Wilcoxon Rank Sum Test
  • The Kruskal-Wallis Test (R)
  • Statistical Quality Control
  • Basics of Graphs

Programming

One needs to have a good grasp of programming concepts such as Data structures and Algorithms. The programming languages used are Python, R, Java, Scala. C++ is also useful in some places where performance is very important.

Python:

  • Python Basics
  • List
  • Set
  • Tuples
  • Dictionary
  • Function, etc.
  • NumPy
  • Pandas
  • Matplotlib/Seaborn, etc.

R:

  • R Basics
  • Vector
  • List
  • Data Frame
  • Matrix
  • Array
  • Function, etc.
  • dplyr
  • ggplot2
  • Tidyr
  • Shiny, etc.
  • DataBase:
  • SQL
  • MongoDB
  • Other:
  • Data Structure
  • Time Complexity
  • Web Scraping (Python | R)
  • Linux
  • Git

Machine Learning

ML is one of the most vital parts of data science and the hottest subject of research among researchers so each year new advancements are made in this. One at least needs to understand basic algorithms of Supervised and Unsupervised Learning. There are multiple libraries available in Python and R for implementing these algorithms.

Introduction:

  • How Model Works
  • Basic Data Exploration
  • First ML Model
  • Model Validation
  • Underfitting & Overfitting
  • Random Forests (Python | R)
  • scikit-learn
  • Intermediate:
  • Handling Missing Values
  • Handling Categorical Variables
  • Pipelines
  • Cross-Validation (R)
  • XGBoost (Python | R)
  • Data Leakage,

Deep Learning

Deep Learning uses TensorFlow and Keras to build and train neural networks for structured data.

  • Artificial Neural Network
  • Convolutional Neural Network
  • Recurrent Neural Network
  • TensorFlow
  • Keras
  • PyTorch
  • A Single Neuron
  • Deep Neural Network
  • Stochastic Gradient Descent
  • Overfitting and Underfitting
  • Dropout Batch Normalization
  • Binary Classification

Feature Engineering

In Feature Engineering discover the most effective way to improve your models.

  • Baseline Model
  • Categorical Encodings
  • Feature Generation
  • Feature Selection

Natural Language Processing

In NLP distinguish yourself by learning to work with text data.

  • Text Classification
  • Word Vectors

Data Visualization Tools

Make great data visualizations. A great way to see the power of coding!

  • Excel VBA
  • BI (Business Intelligence):
  • Tableau
  • Power BI
  • Qlik View
  • Qlik Sense

Deployment

The last part is doing the deployment. Definitely, whether you are fresher or 5+ years of experience, or 10+ years of experience, deployment is necessary. Because deployment will definitely give you a fact is that you worked a lot.

  • Microsoft Azure
  • Heroku
  • Google Cloud Platform
  • Flask
  • DJango

Other Points to Learn

  • Domain Knowledge
  • Communication Skill
  • Reinforcement Learning
  • Different Case Studies:
  • Data Science at Netflix
  • Data Science at Flipkart
  • Project on Credit Card Fraud Detection
  • Project on Movie Recommendation, etc.

About

Jobs linked to data science are becoming more and more popular. A bunch of tutorials could easily complete this roadmap, helping whoever wants to start learning stuff about data science.


Languages

Language:Jupyter Notebook 100.0%