mandliya / ml

A 60 days+ streak of daily learning of ML/DL/Maths concepts through projects

machine-learning data-wrangling pandas python jupyter exploratory-data-analysis exploratory-data-visualizations machine-learning-projects machine-learning-algo daily time-series time-series-analysis deep-learning

Current Status	Stats
Total Machine Learning Projects	35
Current Daily Streak	76
Last Streak Dates	06/23/2019 - 07/02/2019
Current Streak Dates	04/13/2020 - 06/27/2020
Daily Log Progress	daily_log.md

On break till 07/06. I will re-start the new streak then.

Machine Learning and Deep Learning Projects

Machine Learning and Deep Learning Projects

Hands on Machine Learning

No.	Project	Description	Notebook	Notes
1.	The Machine Learning Landscape	The basics of machine learning terminology, types and challenges	To be updated	Source: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
2.	End to End Machine Learning Project	In this project we will go through an example project end to end, pretending to be a recently hired data scientist in a real estate company.Here are the main steps you will go through: Look at the big picture. Get the data. Discover and visualize the data to gain insights. Prepare the data for Machine Learning algorithms. Select a model and train it. Fine-tune your model. Present your solution. Launch, monitor, and maintain your system	End_to_end_machine_learning_project.ipynb	Source: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
3.	Classification	In this project we will go through concepts of Classification by building a digit classifier using MNIST dataset. We will learn concepts of performance measurement for classfication (e.g. Confusion Matrix, Precision and Recall, The ROC curve etc)	Classification.ipynb	Source: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
3.4	Classification Exercise	Buiding a Spam classifier	spam_classifier.ipynb	Source: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow

Data Wrangling

No.	Project	Description	Notebook	Notes
1.	Data Wrangling using Quandl Api	We retrieve financial data of a stock using quandl api and do basic data analysis using plain vanilla python.	data_wrangling_using_api.ipynb	-
2.	Pandas from scratch	This notebook takes an in-depth look at Pandas, the swiss army knife for data analysis. Exploring pandas data-structures (Series and DataFrame) in detail We fetch Google's stock data and perform various data analysis on it which includes reading data from various sources, filter, visualize, and apply statistics on top of it.	pandas_handson.ipynb	-
3.	Never on a Friday or Turning Tuesday	A simple exploratory data analysis of stock market data to determine if Tuesdays are Turning Tuesdays.	never_on_a_friday.ipynb	-
4.	Handling missing values in pandas	This notebook provides a good overview of how pandas handle missing values and explores functions it provides to handle missing data.	handling_missing_data.ipynb
5.	Mini-Project: Data Wrangling and Transformation with Pandas	In this mini-project we explore multiple datasets (movie, cast, release) to do an extensive data exploration, analysis and visualtion.	data_wrangling_transformations_movie.ipynb	-
6.	Data Wrangling with JSON	This notebook helps understanding Panda's JSON functionality. It also has some challenges which require some fun data-wrangling (e.g. missing values etc).	Mini_Project_Wrangling_Json_Exercise.ipynb	-
7.	67 years of Lego	An exploratory data analysis of fun dataset on every single lego block that has ever been built. Lot of good pandas aggregation	lego_analysis.ipynb	Source: Datacamp
8.	Explore the crypto-currency Bitcoin market	In this notebook we do an in-depth analysis of crypto-currency market cap analysis, and visualize top gainers and losers in a fun way! This analysis tells you how risky or profitable this market is currently.	cryptocurrency_analysis.ipynb	-
9.	Discovery of Handwashing	This notebook tells the story of discovery of handwashing, and how Dr. Ignaz Semmelweis brought down the deaths of women who just gave birth caused childbed fever .	discovery_of_handwashing.ipynb	-
10.	Exploring evaluation of linux	This notebook does exploratory data analysis on Linux git commit history. A good lot of pandas!	exploring_the_evaluation_of_linux.ipynb	-
11.	The github history of Scala Language	This notebook explore the pull requests of Scala language project on github and does interesting analysis of pull requests based on authors, year, months etc	EDA_scala_history.ipynb	Source: Datacamp
12.	Who is drunk and when in Ames, Iowa	Ames, Iowa is home to Iowa State University. Ames has had its fair share of alcohol-related incidents. (For example, Google 'VEISHEA riots 2014'). In this notebook, we analyze and visualize some breath alcohol test data from Ames that is published by the State of Iowa.	EDA_Ames_iowa_drinking.ipynb	Source: Datacamp
13.	Working with strings in Pandas	In this notebooks, we explore string manipulation and easy analysis string operations on pandas.	Working_With_Strings.ipynb	Source:Python Data Science Handbook
14.	A New Era of Data Analysis in Baseball	In this notebook, we will use Statcast data to compare the home runs of two of baseball's brightest (and largest) stars, Aaron Judge (6'7") and Giancarlo Stanton (6'6"), both of whom now play for the New York Yankees.	data_analysis_in_pandas.ipynb	source: Datacamp
15.	Name Game: Gender prediction using sound	A fun analysis of NYT's authors dataset of Children’s Picture Books. We analyze the gender distribution of authors to see if there have been changes over years based on author's names and how they sound using nysiis algorithm.	name_game.ipynb	source: Datacamp
16.	Exploring the Titanic Dataset using Pandas	An exploratory analysis of Titanic Dataset from Kaggle, few tips to get summary statistics.	Exploring_titanic_dataset_using_pandas.ipynb	source: Pandas Docs
17.	Risk and Returns: The Sharpe Ratio	In this notebook we learn about Sharpe Ratio by calculating it for the stocks of Amazon and Facebook, to figure which one is better investment. We use S&P 500 as the benchmark which measures the performance of 500 largest stocks in the US.	risk_and_returns_the_sharpe_ratio.ipynb	source: Datacamp
18.	Generating Keywords for Google Ads	A quick project to learn Pandas by generating keywords for google ad campaign .	generating_keywords_for_google_ads	source: Datacamp

SQL

No.	Project	Description	Notebook	Notes
1.	SQL Spark at scale	In this notebook, we work through a series of exercises using Spark SQL and familiarize ourselves with how SQL works with spark.	Mini_Project_SQL_with_Spark.ipynb	One of the ways to use this notebook is to try domino trial, create a pyspark workspace and launch this notebook, as we need a pyspark environment.

Mathematics

No.	Project	Description	Notebook	Notes
1.	Linear Algebra Basics	In this notebook, we explore basic concepts of Linear Algebra.	Linear_Algebra_Basics.ipynb	Source: Introduction to Linear Algebra for Applied Machine Learning with Python.
2.	Probability and Random Processes	A list of basics of Probability concepts.	Probability and Random Processes.ipynb
2.	Counting A primer	A list of basics of counting concepts.	Counting.ipynb

Time Series

No.	Project	Description	Notebook	Notes
1.	Working with time series in python	This notebook teaches basics of time series analysis. We take a fun dataset of Seattle's Fremont Bridge bicycle data and Google's stock data to visualize, understand and work through dates and time in Python	Time_series_basics.ipynb.	Data is fetched directly from web.

Kaggle

No.	Project	Description	Notebook	Notes
1.	Titanic: Machine Learning from Disaster	This notebook has the walk-through of Kaggle's iconic Titanic problem, learning from the best kernels there. Also this a solution of exercise 2 of chapter 3 of Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow	titanic_competition.ipynb.	This notebook is downloaded from Kaggle's kernel.

Mini Projects

No.	Project	Description	Notebook	Notes
1.	Predicting Credit Card Approvals	In this notebook, we will build an automatic credit card approval predictor using machine learning techniques, just like the real banks do. We explore the data, clean it, impute it, and then apply logistic regression to predict the credit card approval.	Predicting Credit Card Approvals	Source: Datacamp
2.	Find Movie Similarity From Plot Summaries	In this notebook, we will quantify the similarity of movies based on their plot summaries available on IMDb and Wikipedia, then separate them into groups, also known as clusters. We'll create a dendrogram to represent how closely the movies are related to each other..	Find Movie Similarity From Plot Summaries	Source: Datacamp
3.	Reducing traffic mortality in the USA	In this notebook, we do a deep data analysis, data wrangling, plotting, dimensionality reduction, and unsupervised clustering on the data collected by the National Highway Traffic Safety Administration and the National Association of Insurance Commissioners	Reducing traffic mortality in the USA	Source: Datacamp
3.	Word Frequency in Moby Dick	A fun mini-project in which we perform basic NLP tasks using requests, BeautifulSoup and nltk library	Word Frequency in Moby Dick	Source: Datacamp

Courses

No.	Project	Description	Notebook	Notes
1.	Statistical Thiking in Python	This course principles of statistical inference. In this course, you will start building the foundation you need to think statistically, speak the language of your data, and understand what your data is telling you. The foundations of statistical thinking took decades to build, but can be grasped much faster today with the help of computers. With the power of Python-based tools, you will rapidly get up-to-speed and begin thinking statistically by the end of this course.	Statistical Thinking Part 1	Datacamp
2.	CS 224N NLP with Deep Learning	The Stanford course of Natural Language Processing using Deep Learning	Lecture-1-Introduction-and-Word-Vectors.ipynb , Word2Vec_from_scratch	-

Algorithms from Scratch

No.	Project	Description	Notebook	Notes
1.	Linear Regression	This notebook is everything Linear Regression, all the concepts about it.	[Linear Regression](algorithms_from_scratch\Linear Regression.ipynb)	-

About

A 60 days+ streak of daily learning of ML/DL/Maths concepts through projects

machine-learning data-wrangling pandas python jupyter exploratory-data-analysis exploratory-data-visualizations machine-learning-projects machine-learning-algo daily time-series time-series-analysis deep-learning

Languages

Language:Jupyter Notebook 78.5%Language:HTML 21.3%Language:Python 0.2%