100 Days Machine Learning and Deep Learning
How it Started? Day 0 - 18 Sept 2023
Over the past months, I've dived into the world of data science, mastering tools like Pandas, NumPy, Matplotlib, Seaborn. Now, I'm ready to take my skills to the next level!
This 100-day journey will be all about understanding statistics, machine learning, and deep learning algorithms at their core, along with a lot of hands-on projects. I'm eager to delve deep into the theory behind these powerful algorithms, ensuring I grasp every concept intricately. But there's a twist!
Throughout this challenge, I'll be sharing my newfound insights with our amazing community. Each day, I'll revisit these topics and create articles to teach what I've learned. You can Follow me on Medium for Detailed Articles. My goal is simple: to enhance my own understanding while helping others on their data science journeys.
What Inspired Me?
One of the things is definitely the “Show Your Work” book by Austin Kleon, and I believe it can motivate you as well. Read more about it here.
Click Here to Find Detailed Articles.
Daily Progress of 100 Days MLDL
DAY 1 (19 Sept 2023):
Topic: Pandas Revision through Handwritten Notes
- Data Structures
- Data Loading and Data Inspection
- Data Selection and Indexing
- Data Cleaning
- Data Manipulation
Detailed Medium Article: Pandas Demystified: A Comprehensive Handbook for Data Enthusiasts
Detailed Source Code: Day 1 Commit
LinkedIn post: Day 1 Update
LeetCode Problems Solved:
DAY 2 (20 Sept 2023):
Topic: Advanced Pandas Topics Revision
- Data Aggregations
- Data Visualizations
- Time Series Data Handling
- Handling Categorical Data
- Advanced Topics
Detailed Medium Article: Advanced Pandas: A Comprehensive Handbook for Data Enthusiasts
Detailed Source Code: Day 2 Commit
LinkedIn post: Day 2 Update , Pandas Complete Guide Post
LeetCode Problems Solved:
DAY 3 (21 Sept 2023):
Topic: Numpy Revision
- Numpy Array Basics
- Array Inspection
- Array Operations
- Working with Numpy Arrays
- NumPy for Data Cleaning
- NumPy for Statistical Analysis
- NumPy for Linear Algebra
- Advanced NumPy Techniques
- Performance Optimization with NumPy
Detailed Medium Article: Mastering NumPy: A Data Enthusiast’s Essential Companion
Detailed Source Code: Day 3 Commit
LinkedIn post: Day 3 Update
LeetCode Problems Solved:
DAY 4 (22 Sept 2023):
Topic: Matplotlib Fundamentals Revision
- Basic Plotting
- Plot Types
- 2.1 Bar Chart
- 2.2 Histograms
- 2.3 Scatter plots
- 2.4 Pie Charts
- 2.5 Box Plot (Box and Whisker Plot)
- 2.6 Heatmap, and Displaying Images
- 2.7 Stack Plot
Detailed Medium Article: Mastering Maplotlib: A Comprehensive Guide to Data Visualization
Detailed Source Code: Day 4 Commit
LinkedIn post: Day 4 Update
LeetCode Problems Solved:
DAY 5 (23 Sept 2023):
Topic: Advanced Matplotlib Topics Revision
- Multiple Subplots
- 1.1 Creating Multiple Plots in a Single Figure
- 1.2 Combining Different Types of Plots
- Advanced Features
- 2.1 Adding annotations and text
- 2.2 Fill the Area Between Plots
- 2.3 Plotting Time Series Data
- 2.4 Creating 3D Plots
- 2.5 Live Plot - Incorporating Animations and Interactivity.
Detailed Medium Article: Advanced Maplotlib: A Comprehensive Guide to Data Visualization
Detailed Source Code: Day 5 Commit
LinkedIn post: Day 5 Update
LeetCode Problem Solved:
DAY 6 (24 Sept 2023):
Topic: Seaborn Fundamentals Revision
- Categorical Plots
- 1.1 Count Plot
- 1.2 Swarm Plot
- 1.3 Point Plot
- 1.4 Cat Plot
- 1.5 Categorical Box Plot
- 1.6 Categorical Violin Plot
Detailed Source Code: Day 6 Commit
LinkedIn post: Day 6 Update
LeetCode Problem Solved:
DAY 7 (25 Sept 2023):
Topic: Seaborn Univariate and Bivariate Plots
- Univarite Plots
- 1.1 KDE Plot
- 1.2 Rug Plot
- 1.3 Box Plot
- 1.4 Violin Plot
- 1.5 Strip Plot
- Bivariate PLots
- 2.1 Regression Plot
- 2.2 Joint Plot
- 2.3 Hexbin Plot
Detailed Medium Article: Mastering Seaborn: Demystifying the Complex Plots!
Detailed Source Code: Day 7 Commit
LinkedIn post: Day 7 Update
LeetCode Problem Solved:
DAY 8 (26 Sept 2023):
Topic: Seaborn Multivariate and Matrix Plots
- Multivariate Plots
- 1.1 Using Parameters
- 1.2 Relational Plot
- 1.3 Facet Grid
- 1.4 Pair Plot
- 1.5 Pair Grid
- Matrix PLots
- 2.1 Heat Map
- 2.2 Cluster Map
Detailed Medium Article: Advanced Seaborn: Demystifying the Complex Plots!
Detailed Source Code: Day 8 Commit
LinkedIn post: Day 8 Update
LeetCode Problem Solved:
DAY 9 (27 Sept 2023):
Topic: Plotly Fundamentals
- Using plotly express to create basic plots
- Using graph objects module to customize plots
Detailed Source Code: Day 9 Commit
LinkedIn post: Day 9 Update
LeetCode Problem Solved:
DAY 10 (28 Sept 2023):
Topic: Plotly Advanced plots
- Advanced Plots
- Box plots
- Violin Plots
- Density Heatmaps
- Scatter Matrix
- 3D Plots
- Animated Plots
Detailed Medium Article:
Detailed Source Code: Day 10 Commit
LinkedIn post:Day 10 Update
DAY 11 (29 Sept 2023):
Topic: Data Cleaning on Loan Defaulter Dataset
- Data Inspection.
- Handling missing values.
- Data Imputation
Detailed Source Code: Day 11 Commit
LinkedIn post: Day 11 Update
DAY 12 (30 Sept 2023):
Topic: Data Visualization on Loan Defaulter Dataset
- Binning of data for better visualizaiton
- Univariant analysis
- Bivariant analsis
Detailed Source Code: Day 12 Commit
LinkedIn post: Day 12 Update
DAY 13 (1 Oct 2023):
Topic: Exploratory Data Analysis and Insights on Loan Defaulter Dataset
- Finding insights from the visualizations
Detailed Source Code: Day 13 Commit
LinkedIn post: Day 13 Update
DAY 14 (2 Oct 2023):
Topic: Descriptive Statistice
- Mean, Median, Mode: These are measures of central tendency.
- Variance and Standard Deviation: These quantify data spread or dispersion.
- Skewness and Kurtosis: These describe the shape of data distributions.
- Quantiles and Percentiles: These help analyze data distribution.
- Box Plots for Descriptive Stats: Box plots provide a visual summary of the dataset.
- Interquartile Range (IQR): The IQR is the range covered by the middle 50% of the data
Detailed Source Code: Day 14 Commit
LinkedIn post: Day 14 Update
DAY 15 (3 Oct 2023):
Topic: Probability for Data Science
- Probability Basics: Understand the fundamental concepts like events, outcomes, and sample spaces.
- Probability Formulas: Master key formulas:
- Probability of an Event (P(A)): Number of favorable outcomes / Total number of outcomes.
- Conditional Probability (P(A|B)): Probability of A given that B has occurred.
- Bayes' Theorem: A powerful tool for updating probabilities based on new evidence.
- Law of Large Numbers: As you increase the sample size, the sample mean converges to the population mean. Crucial for statistical inference.
- Probability Distributions: Get acquainted with probability distributions:
- Normal Distribution: The bell curve is everywhere in data science. It's essential for hypothesis testing and confidence intervals.
- Bernoulli Distribution: For binary outcomes (like success or failure).
- Binomial Distribution: When dealing with a fixed number of independent Bernoulli trials.
- Poisson Distribution: Used for rare events, like customer arrivals at a store.
Detailed Source Code: Day 15 Commit
LinkedIn post: Day 15 Update
DAY 16 (4 Oct 2023):
Topic: Inferential Statistics
- Central Limit Theorm
- Hypothesis Testing
- Deriving p-values
- Z-Test
- T-Test
Detailed Source Code: Day 16 Commit
LinkedIn post: Day 16 Update
DAY 17 (5 Oct 2023):
Topic: Inferential Statistics
- Chi-Square Test
- F-Test/ANOVA
- Covariance
- Pearson Correlation
- Spearman Rank Correlation
Detailed Source Code: Day 17 Commit
LinkedIn post: Day 17 Update
DAY 18 (6 Oct 2023):
Topic: Introduction to Machine Learning
- What is Machine Learning?
- Types of Machine Learning?
- Supervised Machine Learning
- Unsupervised Machien Learning
- Reinforcement Learning
- Semi-supervised Learning
Detailed Source Code: Day 18 Commit
LinkedIn post: Day 18 Update
DAY 19 (7 Oct 2023):
Topic: Steps in Machine Learning Project
- Data Collection
- Data Cleaning
- Exploratory Data Analysis
- Data Preprocessing
- Data Splitting
- Train the model
- Evaluation of a Model
- Deploy and Retrain
Detailed Source Code: Day 19 Commit
LinkedIn post: Day 19 Update
DAY 20 (8 Oct 2023):
Topic: Exploring Scikit-Learn
- sklearn.datasets
- sklearn.preprocessing
- sklearn.model_selection
- sklearn.feature_selection
- sklearn.linear_model And Many more...
Detailed Source Code: Day 20 Commit
LinkedIn post: Day 20 Update
DAY 21 (9 Oct 2023):
Topic: Advanced Scikit-Learn Features
- sklearn.metrics
- sklearn.compose
- sklearn.pipeline
Detailed Source Code: Day 21 Commit
LinkedIn post: Day 21 Update
DAY 22 (10 Oct 2023):
Topic: Feature Engineering 1 - Handling Missing Values
1.Handling Missing values
- 1.1 Problems of Having Missing values
- 1.2 Understanding Types of Missing Values
- 1.3 Dealing MV Using SimpleImputer Method
- 1.4 Dealing MV Using KNN Imputer Method
2.Handling Categorical Values
- 2.1 One Hot Encoding
- 2.2 Label Encoding
- 2.3 Ordinal Encoding
- 2.4 Multi Label Binarizer
- 2.5 Count/Frequency Encoding
- 2.6 Target Guided Ordinal Encoding
Detailed Source Code: Day 22 Commit
LinkedIn post: Day 22 Update
DAY 23 (11 Oct 2023):
Topic: Feature Engineering 2 - Feature Scaling
- Feature Scaling
- 1.1 Standardization/Standard Scaler
- 1.2 Normalization/MinMax Scaler
- 1.3 Max Abs Scaler
- 1.4 Robust Scaler
Detailed Source Code: Day 23 Commit
LinkedIn post: Day 23 Update
DAY 24 (12 Oct 2023):
Topic: Feature Engineering 3 - Feature Selection
-
why Feature Selection Matters
-
Types of Feature Selection
-
Filter Methods
- Variance Threshold
- SelectKBest
- SelectPercentile
- GenericUnivariateSelect
-
Wrapper Methods
- RFE
- RFECV
- SelectFromModel
- SequentialFeatureSelector
Detailed Source Code: Day 24 Commit
LinkedIn post: Day 24 Update
DAY 25 (13 Oct 2023):
Topic: Feature Engineering 4 - Feature Transformation and Pipelines
-
Feature Transformation
- Undestanding QQPlot and PP-Plot
- logarithmic transformation
- reciprocal transformation
- square root transformation
- exponential transformation
- boxcox transformation
-
Using Pipelines to automate the FE
- What are Pipelines
- Accessing individual steps in pipeline
- Accessing Parameters in Pipeline
- Performing Grid Search with Pipeline
- Combining Transformers and Pipeline
- Visualizing the Pipeline
Detailed Source Code: Day 25 Commit
LinkedIn post: Day 25 Update
DAY 26 (14 Oct 2023):
Topic: Understanding Linear Regression
- Fundamentals of Linear Regression
- Exploring the Assumptions of Linear Regression
- Gradient Descent and Loss Function
- Evaluation Metrics for Linear Regression
- Applications of Linear Regression
Detailed Notes: Day 26 Commit
LinkedIn post: Day 26 Update
DAY 27 (15 Oct 2023):
Topic: Understanding Multicollinearity, and Regularization Techniques
- Multiple Linear Regression
- Multicollinearity
- Regularization Techniques
- Ridge, Lasso and Elastic Net
- Polynomial Regression
Detailed Notes: Day 27 Commit
LinkedIn post: Day 27 Update
DAY 28 (16 Oct 2023):
Topic: Understanding the Logistic Regression
- How does Logistic Regression work
- What is a sigmoid curve
- Assumptions of Logistic Regression
- Cost Function of Logistic Regression
Detailed Notes: Day 28 Commit
LinkedIn post: Day 28 Update
DAY 29 (17 Oct 2023):
Topic: Understanding Decision Trees
- Why do we need Decision Trees
- How does Decision Trees work
- How do we select a root node
- Understanding Entropy, Information Gain
- Solving an Example on Entropy
- Understanding Gini Impurity
- Solving an Exmaple on Gini Impurity
- Decision Trees for Regression
- Why decsision trees are Greedy Approach
- Understanding Pruning
Detailed Notes: Day 29 Commit
LinkedIn post: Day 29 Update
DAY 30 (18 Oct 2023):
Topic: Understanding Ensemble Techniques
- What are Ensemble Techniques
- Understanding Bagging
- Understanding Boosting
- Understanding Stacking
Detailed Notes: Day 30 Commit
LinkedIn post: Day 30 Update
DAY 31 (19 Oct 2023):
Topic: Understanding Random Forests
- Decision Trees Agreegation
- Bagging and Variance Reduction
- FEature Subspace sampling
- Handling Overfitting
- Out of bag error
Detailed Notes: Day 31 Commit
LinkedIn post: Day 31 Update
DAY 32 (20 Oct 2023):
Topic: Understanding Boosting Algorithms
- Concept of Boosting
- Understanding Ada Boost
- Solving an Example on AdaBoost
- Understanding Gradient Boosting
- Solving an Example on Gradient Boosting
- AdaBoost vs Gradient Boosting
Detailed Notes: Day 32 Commit
LinkedIn post: Day 32 Update
DAY 33 (21 Oct 2023):
Topic: Understanding XG Boost Algorithms
- Concept of XGBoost Algorithm
- Boosting Mechanism
- Feature Importance Interpretation
- Regularization Techniques
- Flexibility and Scalability
Detailed Notes: Day 33 Commit
LinkedIn post: Day 33 Update
DAY 34 (22 Oct 2023):
Topic: Understanding K Nearest Neighbours
- How does K-Nearest Neighbours work
- How is Distance Calculated
- Eculidean Distance
- Hamming Distance
- Manhattan Distance
- Why is KNN a Lazy Learner
- Effects of Choosing the value of K
- Different ways to perform KNN
- Understanding KD-Tree
- Solving an Example of KD Tree
- Understanding Ball Tree
Detailed Notes: Day 34 Commit
LinkedIn post: Day 34 Update
DAY 35 (23 Oct 2023):
Topic: Understanding Support Vector Machines
- Understanding Concept of SVC
- What are Support Vectors
- What is Margin
- Hard Margin and Soft Margin
- Kernelized SVC
- Types of Kernels
- Understanding SVR
Detailed Notes: Day 35 Commit
LinkedIn post: Day 35 Update
DAY 36 (24 Oct 2023):
Topic: Understanding Naive Bayes Classifiers
- Why do we need Naive Bayes
- Concept of how it works
- Mathematical Intuition of Naive Bayes
- Solving an Example on Naive Bayes
- Other Bayes Classifiers
- Gaussian Naive Bayes Classifier
- Multinomial Naive Bayes Classifier
- Bernoulli Naive Bayes Classifier
Detailed Notes: Day 36 Commit
LinkedIn post: Day 36 Update
DAY 37 (25 Oct 2023):
Topic: Understanding Clustering Techniques
- How clustering is different from classification
- Applications of Clustering
- What are density based methods
- What are Hierarchial based methods
- What are partitioning methods
- What are Grid Based methods
- Main Requirements for Clustering Algorithms
Detailed Notes: Day 37 Commit
LinkedIn post: Day 37 Update
DAY 38 (26 Oct 2023):
Topic: Understanding K-Means Clustering
- Concept of K-Means Clustering
- Math Intuition Behind K-Means
- Cluster Building Process
- Edge Case Scenarios of K-Means
- Challenges and Improvements in K-Means
Detailed Notes: Day 38 Commit
LinkedIn post: Day 38 Update
DAY 39 (27 Oct 2023):
Topic: Understanding Hierarchical Clustering
- Concept of Hierarchical Clustering
- Understanding Algorithm
- Understanding Linkage Methods
Detailed Notes: Day 39 Commit
LinkedIn post: Day 39 Update
DAY 40 (28 Oct 2023):
Topic: Understanding DB SCAN Clustering
- Concept of DB SCAN
- Key words in understanding DB SCAN
- Algorithm of DB SCAN
Detailed Notes: Day 40 Commit
LinkedIn post: Day 40 Update
DAY 41 (29 Oct 2023):
Topic: Evaluation of Clustering Models
- Understanding External Measures
- Rand Index
- Jaccard Co-efficient
- Understanding Internal Measures
- Cohesion
- Seperation
Detailed Notes: Day 41 Commit
LinkedIn post: Day 41 Update
DAY 42 (30 Oct 2023):
Topic: Understanding Curse of Dimensionality
- Computational Complexity
- Data Visualization Challenges
Detailed Notes: Day 42 Commit
LinkedIn post: Day 42 Update
DAY 43 (31 Oct 2023):
Topic: Understanding Principal Component Analysis
- Idea Behind PCA
- What are Principal Components
- Eigen Decomposition Approach
- Singular Value Decomposition Approach
- Why do we maximize Variance
- What is Explained Variance Ratio
- How to select optimal no.of Prinicpal Components
- Understanding Scree plot
- Issues with PCA
- Understanding Kernel PCA
Detailed Notes: Day 43 Commit
LinkedIn post: Day 43 Update
DAY 44 (31 Oct 2023):
Topic: Supervised Algorithms Revision
Regression Algorithms
- Linear Regression
- Polynomial Regression
Classfication Algorithms
- K-Nearest Neighbours
- Logistic Regression
Both Classification and Regression
- Decision Trees
- Random F orest
- Gradient Boosting
- Ada Boost
- Ridge Regression
- Lasso Regression
Detailed Notes: Day 44 Commit
LinkedIn post: Day 44 Update
DAY 45 (1 Nov 2023):
Topic: UnSupervised Algorithms Revision
Clustering Algorithms
- K-Means
- DBSCAN
- HDBSCAN
- Hierarchical
Dimensionality Reduction Techniques
- PCA
- t-SNE
- ICA
Association Rules
- Apriori
- FP-growth
- FP-Max
Detailed Notes: Day 45 Commit
LinkedIn post: Day 45 Update
DAY 46 (2 Nov 2023):
Topic: Big Mart Sales Prediction Project Understanding
- Understanding the Data
Detailed Notes: Day 46 Commit
LinkedIn post: Day 46 Update
DAY 47 (3 Nov 2023):
Topic: EDA for Big Mart Sales
- Dealing with Null Values
- Data Visulization of the Numeric Columns
- Feature Engineering of the Numeric Columns
Detailed Notes: Day 47 Commit
LinkedIn post: Day 47 Update
DAY 48 (4 Nov 2023):
Topic: Data Visualization
- Data Visulization of the Categorical Columns
- Feature Engineering of the Categorical Columns
Detailed Notes: Day 48 Commit
LinkedIn post: Day 48 Update
DAY 49 (5 Nov 2023):
Topic: Model Building and Evaluation
Detailed Notes: Day 49 Commit
LinkedIn post: Day 49 Update
DAY 50 (6 Nov 2023):
Topic: Hyperparameter Tuning the Models
Detailed Notes: Day 50 Commit
LinkedIn post: Day 50 Update
DAY 51 (7 Nov 2023):
Topic: Referring Other Kaggle Notes for the project
Detailed Notes: Day 51 Commit
LinkedIn post: Day 51 Update
DAY 52 (8 Nov 2023):
Topic: History of Deep Learning
Detailed Notes: Day 52 Commit
LinkedIn post: Day 52 Update
DAY 53 (22 Nov 2023):
Topic: Introduction to Neural Networks
Detailed Notes: Day 53 Commit
LinkedIn post: [Day 53 Update](
DAY 52 (8 Nov 2023):
Topic: History of Deep Learning
Detailed Notes: Day 52 Commit
LinkedIn post: Day 52 Update
DAY 54 (22 Nov 2023):
Topic: Understanding the Perceptron Algorithm
Detailed Notes: Day 53 Commit
LinkedIn post: [Day 53 Update](
DAY 52 (8 Nov 2023):
Topic: History of Deep Learning
Detailed Notes: Day 52 Commit
LinkedIn post: Day 52 Update