Tanay Mukherjee's starred repositories
pytorch-tutorial
PyTorch Tutorial for Deep Learning Researchers
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
machine-learning-interview
Machine Learning Interviews from FAANG, Snapchat, LinkedIn. I have offers from Snapchat, Coupang, Stitchfix etc. Blog: mlengineer.io.
my-awesome-AI-bookmarks
Curated list of my reads, implementations and core concepts of Artificial Intelligence, Deep Learning, Machine Learning by best folk in the world.
Complex-SQL-Exercise
SQL queries of all kind being put together as a single repository
Case-Study-Predicting-Bankruptcy
Based on available data from bank and parameters to identify the variables that influence the most, predict the bankruptcy of the given financial model
findjakes.com
This app is developed to help you locate the nearby public toilets and the direction to reach. This will also help Govt. to keep a track of all the public toilets as regular feedback will be generated from random citizens who in turn will guide the local Municipality to improve any infrastructure problem identified and keep the toilets clean. Also, the motive of building public toilets will be fulfilled. Additionally, for women this will be an added advantage as they can now know where the public toilets are, and use them whenever needed and may not rush for home
Dimensionality-Reduction
In statistics, machine learning, and information theory, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. Approaches can be divided into feature selection and feature extraction.
Dissecting-Yelp-Dataset
This dataset is a subset of Yelp's businesses, reviews, and user data. It was originally put together for the Yelp Dataset Challenge which is a chance for students to conduct research or analysis on Yelp's data and share their discoveries. In the dataset you'll find information about businesses across 11 metropolitan areas in four countries.
Exploring-SQL-with-R
The idea is to use the SQL skills in R by converting data into relational database from text files and then using it to run queries to filter data by SQL
Google-Analytics-with-R
How to automate reporting suite from GA to R, so that one can pull data at will without even interacting with Google Analytics interface. There are various things one can do and we will cover each one of them.
A-B-Testing-in-R
A/B testing (or split-testing) is a randomized experiment with two variants A and B. It includes application of statistical hypothesis testing (or two-sample hypothesis testing), as used in the field of statistics. A/B testing is a way to compare two versions of a single variable, typically by testing a subject's response to variant A against variant B, and determining which of the two variants is more effective.
Analysing-NYC-Felony-Offenses-in-2019
In this exercise we will apply many of the multivariate statistics techniques on NYC felony dataset and see if their any association between the features.
Attribution-Modeling-in-R
Application on Markov Chain and Removal Effect (Attribution Modeling)
Bayesian-Data-Analysis-in-R
Bayesian A testing for Swedish Fish Incorporated
Building-your-own-chatbox
We will use NLTK(Natural Language Toolkit) to develop our own simple chatbox that will respond based on user queries using a defined corpus.
findjakes.com
This android app is developed to help you locate the nearby public toilets and the direction to reach. This will also help Govt. to keep a track of all the public toilets as regular feedback will be generated from random citizens who in turn will guide the local Municipality to improve any infrastructure problem identified and keep the toilets clean. Also, the motive of building public toilets will be fulfilled. Additionally, for women this will be an added advantage as they can now know where the public toilets are, and use them whenever needed and may not rush for home
Implementing-TrelliscopeJS-in-R
Trelliscopejs is an R package that brings faceted visualizations to life while plugging in to common analytical workflows like ggplot2 or the “tidyverse”.
Interactive-Sales-Dashboard-in-RShiny
Create a sales dashboard in R shiny that can be customized by users with some cool features and graphs
Investigating-NYC-Parking-Violations
For this project, we will analyze millions of NYC Parking violations since January 2016
K-Means-Clustering-in-R-and-Python
k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
Knapsack-Problem
Implementation of knapsack problem in Python
Machine-Learning-Digit-classifier-in-R
For any given image with a digit written on it (handwritten), we calculate the pixel and analyse the info to predict what digit it could be. This is a classic example of machine learning by using some train data to predict the info for a set of test data.
MapReduce-Exercise
This repo provides two examples and a script to easily define and run mapper and reducer functions against Hadoop MapReduce.
Mastering-SQL
SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system, or for stream processing in a relational data stream management system.
Regression-analysis-of-pedestrains-data-from-New-York
Case Study: Establishing relationship between different parameters from pedestrian data for NYC
Shortest-path-algorithm
In graph theory, the shortest path problem is the problem of finding a path between two vertices in a graph such that the sum of the weights of its constituent edges is minimized.
Speech-Recognition-in-Python
Speech recognition is an interdisciplinary subfield of computational linguistics that develops methodologies and technologies that enables the recognition and translation of spoken language into text by computers.
Streaming-Finance-Data-with-AWS-Lambda
For this project, you are tasked with provisioning a few Lambda functions to generate near real time finance data records for downstream processing and interactive querying.