YugalPNTL / Dataquest

Data Science Track - Exercises and activities towards Dataquest.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


Data Scientist In Python - Dataquest.io - Exercises and activities

Course Outline: 8 step Data Science Track

Step 1 of 8: Python Introduction

Course 1 of 2:
Python for Data Science: Fundamentals
  	Note: I'd already completed a 4-part Python programming course 
on EDx through GTx, CS 1301. This was mainly review. 
   Programming in Python
  	 Variables and Data Types
  	 Lists and For Loops
  	 Conditional Statements
  	 Dictionaries and Frequency Tables
  	 Functions: Fundamentals
  	 Functions: Intermediate
  	 Project: Learn and Install Jupyter Notebook
  	 Guided Project: Profitable App Profiles for the App Store and Google Play Markets
  	 	Results in my github account

Course 2 of 2: 
  Python for Data Science: Intermediate
   Cleaning and Preparing Data in Python
     Python Data Analysis Basics
     Object-Oriented Python
     Working with Dates and Times in Python
     Guided Project: Exploring Hacker News Posts
     	Results in my github account

Step 2 of 8: Data Analysis and Visualization

Course 1 of 6: 
		NumPy and pandas Fundamentals
   Intro to NumPy
  	 Boolean Indexing with NumPy
  	 Intro to pandas
  	 Exploring Data with pandas: Fundamentals
  	 Exploring Data with pandas: Intermediate
  	 Data Cleaning Basics
  	 Guided Project: Exploring Ebay Car Sales Data
  	 	Results in my github account

Course 2 of 6: 
 Exploratory Data Visualization
   Line charts
  	 Multiple plots
  	 Bar plots and scatter plots
  	 Histograms and box plots
  	 Guided Project: Visualize Earnings Based on College Majors
  	 results in my github account

Course 3 of 6: 
  Storytelling through Data Visualization
   Improving Plot Aesthetics
     Color, Layout, and Annotations
     Guided Project: Visualizing the Gender Gap in College Degrees
     	Results in my github account
     Conditional Plots
     Visualizing Geographic Data
 	Note: Uses Basemap module which is deprecated, 
	should change to Cartopy

Course 4 of 6: 
  		Data Cleaning and Analysis
   Data Aggregation
     Combining Data With pandas
     Working With Strings In pandas
     Working With Missing And Duplicate Data
     Guided Project: Clean and Analyze Employee Exit Surveys
     	Results in my github account

Course 5 of 6: 
  		Data Cleaning: Advanced
   Regular Expression Basics
     Advanced Regular Expressions
     List Comprehensions and Lambda Functions
     Working with Missing Data

Course 6 of 6: 
  		Data Cleaning Project Walkthrough
   Data Cleaning Walkthrough
     Data Cleaning Walkthrough: Combining the Data
     Data Cleaning Walkthrough: Analyzing and Visualizing the Data
     Guided Project: Analyzing NYC High School SAT Data
 	Note: Uses Basemap module which is deprecated
	I've created a Colaboratory notebook 
     	Results in my github account
     Challenge: Cleaning Data
     Guided Project: Star Wars Survey
 	- Results can be found in my github 

Step 3 of 8: The Command Line

Note: Since these were command line exercises with 
which I was already familiar I didn't create a Jupyter notebook. 

Course 1 of 2: 
		Elements of the Command Line
  Intro to the Command Line
  	The Filesystem
  	Modifying the Filesystem
  	Glob Patterns and Wildcards
  	Users and Permissions

Course 2 of 2: 
	Text Processing in the Command Line
  Getting Help and Reading Documentation
  	File Inspection
  	Text Processing
  	Redirection and Pipelines
  	Standard Streams and File Descriptors

Step 4 of 8: Working with Data Sources

Course 1 of 4: 
	SQL Fundamentals
  Intro to SQL
  	Summary Statistics
  	Group Summary Statistics
  	Querying SQLite from Python
  	Guided Project: Analyzing CIA Factbook Data Using SQLite and Python

Course 2 of 4: 
	SQL Intermediate: Table Relations and Joins
  Joining Data in SQL
  	Intermediate Joins in SQL
  	Building and Organizing Complex Queries
  	Guided Project: Answering Business Questions Using SQL
  	Results in my github account
  	Table Relations and Normalization
  	Guided Project: Designing and Creating a Database
	Note: The mlb.db file used is too large to upload directly. 

Course 3 of 4: 
	SQL Databases: Advanced
  Using PostgreSQL
  	Command Line PostgreSQL
  	Project: PostgreSQL installation
  	Introduction to Indexing
  	Multi-Column Indexing

Course 4 of 4: 
		APIs and Webscraping
  Working with APIs
  	Intermediate APIs
  	Challenge: Working with the reddit API
  	Web Scraping

Step 5 of 8: Probability and Statistics

Course 1 of 5: 
	Statistics Fundamentals
  	Variables in Statistics
  	Frequency Distributions
  	Visualizing Frequency Distributions
  	Comparing Frequency Distributions
  	Guided Project: Investigating Fandango Movie Ratings

Course 2 of 5: 
	Statistics Intermediate: Averages and Variability
  The Mean
  	The Weighted Mean and the Median
  	The Mode
  	Measures of Variability
 	Guided Project: Finding the Best Markets to Advertise in
  	Results in my github account

Course 3 of 5: 
	Probability Fundamentals
  Estimating Probability
 	Probability Rules
  	Solving Complex Probability Problems
  	Permutations and Combinations
  	Guided Project: Mobile App for Lottery Addiction
  	Results in my github account

Course 4 of 5: 
	Conditional Probability
  Conditional Probability Fundamentals
  	Conditional Probability Intermediate
  	Bayes Theorem
  	The Naive Bayes Theorem
  	Guided Project: Building a Spam Filter with Naive Bayes
  	Results in my github account

Course 5 of 5: 
	Hypothesis Testing Fundamentals
  Significance Testing
  	Chi-squared Tests
  	Multi category Chi-squared Tests
  	Guided Project: Winning Jeopardy
  	Results in my github account

Step 6 of 8: Machine Learning Intro

Course 1 of 6: 
	Machine Learning Fundamentals
  Intro to K-nearest Neighbors
  	Evaluating Model Performance
  	Multivariate K-nearest Neighbors
  	Hyperparameter Optimization
  	Cross Validation
  	Guided Project: Predicting Car Prices
  	Results in my github account

Course 2 of 6: 
	Calculus for Machine Learning
  Understanding Linear and Nonlinear Functions
  	Understanding Limits
  	Finding Extreme Points

Course 3 of 6: 
	Linear Algebra for Machine Learning
  Linear Systems
  	Matrix Algebra
  	Solution Sets

Course 4 of 6: 
	Linear Regression for Machine Learning
  The Linear Regression Model
  	Feature Selection
  	Gradient Descent
  	Ordinary Least Squares
  	Processing and Transforming Features
  	Guided Project: Predicting House Sale Prices

Course 5 of 6: 
	Machine Learning in Python Intermediate
  Logistic Regression
  	Intro to evaluating binary classifiers
  	Multiclass classification
  	Clustering basics
  	K-means clustering
  	Guided Project: Predicting the Stock Market
  	Results in my github account

Course 6 of 6: 
	Decision Trees
  Intro to Decision Trees
  	Building a Decision Tree
  	Applying a Decision Tree
  	Intro to Random Forests
  	Guided Project: Predicting Bike Rentals
  	Results in my github account

Step 7 of 8: Machine Learning Intermediate

Course 1 of 5: 
Deep Learning Fundamentals
  Representing Neural Networks
  	Nonlinear Activation Functions
  	Hidden Layers
  	Guided Project: Building a Handwritten Digits Classifier
  	Results in my github account

Course 2 of 5: 
	Machine Learning Project
  Machine Learning Project Walkthrough: Data Cleaning
  	Machine Learning Project Walkthrough: Preparing the Features
  	Machine Learning Project Walkthrough: Making Predictions

Course 3 of 5: 
	Kaggle Fundamentals
  Getting Started with Kaggle
  	Feature Preparation, Selection, and Engineering
  	Model Selection and Tuning
  	Guided Project: Creating a Kaggle Workflow

Course 4 of 5: 
	Exploring Topics in Data Science
  Naive Bayes for Sentiment Analysis
  	An Intro to K-Nearest Neighbors

Course 5 of 5: 
	Natural Language Processing
  Intro to NLP

Step 8 of 8: Advanced Topics in Data Science

Course 1 of 6: 
	Functions - Advanced
  Best Practices for Writing Functions
 	Context Managers
 	Intro to Decorators
  	Decorators: Advanced

Course 2 of 6: 
Data Structures and Algorithms
  Memory and Unicode
  	Binary Search
  	Data Structures
  	Recursion and Advanced Data Structures
  	Guided Project: Investigating Airplane Accidents

Course 3 of 6: 
	Python Programming: Advanced
  	Exception Handling
  	Lambda Functions
  	Intro to Computer Arch
  	Parallel Processing

Course 4 of 6: 
Command Line Intermediate
  Working with programs
  	Command Line Python Scripting
  	Challenge: Working with the Command Line
  	Working with Jupyter Console
  	Piping and redirecting output
  	Challenge: Data munging using the Command Line
  	Data Cleaning and Exploration using Csvkit

Course 5 of 6: 
	Git and Version Control
  Intro to Git
  	Git Remotes
  	Git Branches
  	Merge Conflicts
  	Project: Git installation and GitHub integration

Course 6 of 6: 
	Spark and Map-reduce
  Intro to Spark
  	Project: Spark Installation and Jupyter Notebook integration
  	Transformations and Actions
  	Challenge: Transforming Hamlet into a Data Set
  	Spark DataFrames
  	Spark SQL

Thanks for visiting, and happy coding.


Data Science Track - Exercises and activities towards Dataquest.io

License:Apache License 2.0


Language:Jupyter Notebook 92.2%Language:HTML 7.4%Language:Python 0.4%