anishshah23

Anish Shah's repositories

Social-Network-analysis-on-Twitter-Data

Using Twitter package in R with the search API, collected tweets and grouped them by geo location as in google maps API and plotted them on the geo map of USA according to number of tweets per state. Technology: R| Tools: Jupyter, Rstudio | libraries: ggplot, ggmap, geom_map.

Language:Jupyter Notebook300

IMDb-5000-Data-analysis

Analyzed the IMDb 5000 Movie Dataset from Kaggle to predict movie ratings and gained some meaningful insights by using different methodologies such as Multiple Linear Regression, Decision Tree and Random Forest. Technology: R— Tools: Rstudio —Libraries: Dplyr, ggplot2.

Language:R100

-DATA-AGGREGATION-BIG-DATA-ANALYSIS-AND-VISUALIZATION

Data aggregation from Twitter and NYTimes using the APIs exposed by data sources, and applying classical big data analytic method of MapReduce to the unstructured data collected, and building a visualization data product.

Language:JavaScript000

anishshah23.github.io

Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes

Language:JavaScriptMIT000

Buffalo-Sewer-Authority

Data gathering using GIS (Geographic Information Systems) tools such as ArcGIS and cleaning the gathered data.Built a multiple regression model with an 80% accuracy in Python in one of the project to determine the relationship between trees and crimes in the city of Buffalo which would help the City of Buffalo in future Landscape and Urban Planning projects.

000

Data-Analytics-pipeline-using-Spark

Processing graph data using Spark

Language:Jupyter Notebook000

Data-Science-Industry-Overview

Detailed learnings in form of multiple reports after attending in-person weekly modules on application-oriented and other related topics to the field of Data Science spanning various industries from the people who work as Data Scientists' in Fortune 500 companies.

000

Handwritten-digits-classification

Implemented multilayer perceptron neural networks in the classification of handwritten digits on MNIST dataset. Technology: Python— Tools: Jupyter Notebook —Libraries: NumPy, Scipy.

Language:Python000

NYC-Uber-Data-Analysis

Using quantitative data analysis methods visualized Uber’s ridership growth, characterized the demand based on identified patterns in the time series, estimated the value of the NYC market for Uber and its revenue growth, analyzed the trip duration to determine the probability distribution model and also insights about the usage of the service. Technology: Python —Tools: Jupyter Notebook —Libraries: Pandas, Matplotlib, Seaborn, SQL, NumPy.

Language:Jupyter Notebook000

Regression

Implementing Regression techniques such as Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Linear regression, Ridge regression, Ridge Regression using Gradient Descent and Non-linear regression to understand how machine learning functions work.

Language:Jupyter Notebook000

aws-codepipeline-s3-codedeploy-linux

Use this sample when creating a simple pipeline in AWS CodePipeline while following the Simple Pipeline Walkthrough tutorial. http://docs.aws.amazon.com/codepipeline/latest/userguide/getting-started-w.html

Language:HTMLApache-2.0000

Boston-Housing-Data-Analysis-

Analyzed the areas with high crime rates and drawing conclusions for the increased crime rates in these areas. Performed data analysis to obtain the relationship between the predictors.

Language:R000

Classification-and-Regression

Implemented Logistic Regression to give the prediction results.

Language:Jupyter Notebook000

Clustering-demographic-data-using-a-classification-tree

Cluster the demographic data of Table 14.1 using a classification tree. Specifically, generate a reference sample the same size as the training set, by randomly permuting the values within each feature. Build a classification tree to the training sample (class 1) and the reference sample (class 0) and describe the terminal nodes having highest estimated class 1 probability.

Language:R000

coding-interview-university

A complete computer science study plan to become a software engineer.

CC-BY-SA-4.0000

EAS503-Programming-Fundamentals-for-Data-Scientists

All my coursework assignements related to the course.

Language:Jupyter Notebook000

Hierarchical-clustering-on-the-states-data

Applying unsupervised clustering technique (Hierarchical clustering) on the states.

Language:R000

Hierarchical-clustering-to-gene-expression-data-set

Apply hierarchical clustering to the samples using correlation based distance.

Language:R000

leetcode

Python & JAVA Solutions for Leetcode

MIT000

Online-Resume

Portfolio

Language:Jupyter Notebook000

PCA-and-K-Means-Clustering-Of-High-Dimensional-Aircraft-Data-

Executed PCA and K-Means Clustering on Delta airlines high dimensional dataset to obtain some interesting findings. Technology: R— Tools: Rstudio —Libraries: Stats, rgl.

Language:R000

PCA-and-K-means-clustering-on-the-data

Performing PCA and K-means clsutering on simulated dataset

Language:R000

Repeating-Topical-Data-Analysis

Tried to recreate the charts from the CDC site of flu data and analysis, flu.gov and fluview using R for the data till the week of Jan 27th 2018

Language:Jupyter Notebook000

STA545-

Statistical Data MIning 1

Language:R000