Anish Shah's repositories
Social-Network-analysis-on-Twitter-Data
Using Twitter package in R with the search API, collected tweets and grouped them by geo location as in google maps API and plotted them on the geo map of USA according to number of tweets per state. Technology: R| Tools: Jupyter, Rstudio | libraries: ggplot, ggmap, geom_map.
IMDb-5000-Data-analysis
Analyzed the IMDb 5000 Movie Dataset from Kaggle to predict movie ratings and gained some meaningful insights by using different methodologies such as Multiple Linear Regression, Decision Tree and Random Forest. Technology: R— Tools: Rstudio —Libraries: Dplyr, ggplot2.
-DATA-AGGREGATION-BIG-DATA-ANALYSIS-AND-VISUALIZATION
Data aggregation from Twitter and NYTimes using the APIs exposed by data sources, and applying classical big data analytic method of MapReduce to the unstructured data collected, and building a visualization data product.
anishshah23.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
Buffalo-Sewer-Authority
Data gathering using GIS (Geographic Information Systems) tools such as ArcGIS and cleaning the gathered data.Built a multiple regression model with an 80% accuracy in Python in one of the project to determine the relationship between trees and crimes in the city of Buffalo which would help the City of Buffalo in future Landscape and Urban Planning projects.
Data-Analytics-pipeline-using-Spark
Processing graph data using Spark
Data-Science-Industry-Overview
Detailed learnings in form of multiple reports after attending in-person weekly modules on application-oriented and other related topics to the field of Data Science spanning various industries from the people who work as Data Scientists' in Fortune 500 companies.
Handwritten-digits-classification
Implemented multilayer perceptron neural networks in the classification of handwritten digits on MNIST dataset. Technology: Python— Tools: Jupyter Notebook —Libraries: NumPy, Scipy.
NYC-Uber-Data-Analysis
Using quantitative data analysis methods visualized Uber’s ridership growth, characterized the demand based on identified patterns in the time series, estimated the value of the NYC market for Uber and its revenue growth, analyzed the trip duration to determine the probability distribution model and also insights about the usage of the service. Technology: Python —Tools: Jupyter Notebook —Libraries: Pandas, Matplotlib, Seaborn, SQL, NumPy.
Regression
Implementing Regression techniques such as Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Linear regression, Ridge regression, Ridge Regression using Gradient Descent and Non-linear regression to understand how machine learning functions work.
aws-codepipeline-s3-codedeploy-linux
Use this sample when creating a simple pipeline in AWS CodePipeline while following the Simple Pipeline Walkthrough tutorial. http://docs.aws.amazon.com/codepipeline/latest/userguide/getting-started-w.html
Boston-Housing-Data-Analysis-
Analyzed the areas with high crime rates and drawing conclusions for the increased crime rates in these areas. Performed data analysis to obtain the relationship between the predictors.
Classification-and-Regression
Implemented Logistic Regression to give the prediction results.
Clustering-demographic-data-using-a-classification-tree
Cluster the demographic data of Table 14.1 using a classification tree. Specifically, generate a reference sample the same size as the training set, by randomly permuting the values within each feature. Build a classification tree to the training sample (class 1) and the reference sample (class 0) and describe the terminal nodes having highest estimated class 1 probability.
coding-interview-university
A complete computer science study plan to become a software engineer.
EAS503-Programming-Fundamentals-for-Data-Scientists
All my coursework assignements related to the course.
Hierarchical-clustering-on-the-states-data
Applying unsupervised clustering technique (Hierarchical clustering) on the states.
Hierarchical-clustering-to-gene-expression-data-set
Apply hierarchical clustering to the samples using correlation based distance.
leetcode
Python & JAVA Solutions for Leetcode
Online-Resume
Portfolio
PCA-and-K-Means-Clustering-Of-High-Dimensional-Aircraft-Data-
Executed PCA and K-Means Clustering on Delta airlines high dimensional dataset to obtain some interesting findings. Technology: R— Tools: Rstudio —Libraries: Stats, rgl.
PCA-and-K-means-clustering-on-the-data
Performing PCA and K-means clsutering on simulated dataset
Repeating-Topical-Data-Analysis
Tried to recreate the charts from the CDC site of flu data and analysis, flu.gov and fluview using R for the data till the week of Jan 27th 2018
STA545-
Statistical Data MIning 1