This project gives an overview of crime time analysis in New York City . We have created Python Jupyter notebooks for spatial analysis of different crime types in the city using Pandas, Numpy, Plotly and Leaflet packages. As a second part to this analysis, we worked on ARIMA model on R for predicting the crime counts across various localities in the city based on correlations of various demographics correlation in each locality.

This project highlights a Spark application built on Scala. It utilizes Spark Core, Spark SQL and Spark ML (Machine Learning libraries) for predicting stock prices of specific airline companies. We have used the Google trending words (searched on internet and relevant to financial domain) and also macro-economic oil prices as alternate data to predict stock prices.



This project is a research focused project highlighting application of unsupervised Machine learning techniques in predicting Lymphedema disease without having knowledge about the symptoms or fields. Libraries Used: sklearn, pandas, numpy, matplotlib, sklearn. cluster KMeans and sklearn decomposition PCA.



This project is based on a case study that focuses on Employee Attrition. The data is taken from IBM Watson's sample case study data. I have utilized data mining and basic machine learning algorithms to predict the Employee Attrition of a pharmaceutical company. One of the most important resource for successful functioning of any organization or company is the People resource. Hence, losing the right people from the company can be a huge setback. Thus, understanding the factors or reasons for attrition makes it, all the more, necessary for a company or organization.

AWS solution: Analyzing Text with Amazon Elasticsearch Service and Amazon Comprehend



Boston Housing Prices Analysis

This project is based on a Kaggle dataset providing Carvana's auctioned car purchases based on various factors deciding whether it is a kicked (deceivingly faulty) car or not. It is a binary classification problem.



This project explains in detail what were the probable causes of 2008-2009 Housing Bubble. It includes data which we can use to predict the probable delinquent customers in the banking or financial sector.



This project displays a great amount of dealing with maps on R using packages leaflet, plotly, fossil, geosphere, etc. The map display different scenarios and functionalities to understand the relations between fire incidents (severity) and response times from nearest fire stations.



New York Health Artificial Intelligence Society Hack Nights GitHub repo.

This project focuses on digital solution of Market Basket Analysis using e-commerce company Instacart data. It s big data with customer records of around 2 million users. I have used Hive, SparkCore,MLLib and Hadoop HDFS for this solution.

This project focuses on selecting the best neighborhood in New York City based on the NYC Open Data available online like - Crime, Road Safety, Cleanliness and Happiness (through provisions of amenities like schools, restaurants, hospitals and subways).



Findings from Stackoverflow 2017



