PyTorch Tutorial for Deep Learning Researchers



Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.



Machine Learning Interviews from FAANG, Snapchat, LinkedIn. I have offers from Snapchat, Coupang, Stitchfix etc. Blog: mlengineer.io.

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:333Issues:34Issues:15


Curated list of my reads, implementations and core concepts of Artificial Intelligence, Deep Learning, Machine Learning by best folk in the world.


SQL queries of all kind being put together as a single repository



Based on available data from bank and parameters to identify the variables that influence the most, predict the bankruptcy of the given financial model



This app is developed to help you locate the nearby public toilets and the direction to reach. This will also help Govt. to keep a track of all the public toilets as regular feedback will be generated from random citizens who in turn will guide the local Municipality to improve any infrastructure problem identified and keep the toilets clean. Also, the motive of building public toilets will be fulfilled. Additionally, for women this will be an added advantage as they can now know where the public toilets are, and use them whenever needed and may not rush for home



In statistics, machine learning, and information theory, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. Approaches can be divided into feature selection and feature extraction.

Language:Jupyter NotebookStargazers:2Issues:0Issues:0


This dataset is a subset of Yelp's businesses, reviews, and user data. It was originally put together for the Yelp Dataset Challenge which is a chance for students to conduct research or analysis on Yelp's data and share their discoveries. In the dataset you'll find information about businesses across 11 metropolitan areas in four countries.

Language:Jupyter NotebookStargazers:2Issues:3Issues:0


The idea is to use the SQL skills in R by converting data into relational database from text files and then using it to run queries to filter data by SQL



How to automate reporting suite from GA to R, so that one can pull data at will without even interacting with Google Analytics interface. There are various things one can do and we will cover each one of them.



A/B testing (or split-testing) is a randomized experiment with two variants A and B. It includes application of statistical hypothesis testing (or two-sample hypothesis testing), as used in the field of statistics. A/B testing is a way to compare two versions of a single variable, typically by testing a subject's response to variant A against variant B, and determining which of the two variants is more effective.



In this exercise we will apply many of the multivariate statistics techniques on NYC felony dataset and see if their any association between the features.



Application on Markov Chain and Removal Effect (Attribution Modeling)



Bayesian A testing for Swedish Fish Incorporated



We will use NLTK(Natural Language Toolkit) to develop our own simple chatbox that will respond based on user queries using a defined corpus.

Language:Jupyter NotebookStargazers:1Issues:0Issues:0


Trelliscopejs is an R package that brings faceted visualizations to life while plugging in to common analytical workflows like ggplot2 or the “tidyverse”.



Create a sales dashboard in R shiny that can be customized by users with some cool features and graphs



For this project, we will analyze millions of NYC Parking violations since January 2016



k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

Language:Jupyter NotebookStargazers:1Issues:0Issues:0


Implementation of knapsack problem in Python

Implementation of knapsack problem in Python


For any given image with a digit written on it (handwritten), we calculate the pixel and analyse the info to predict what digit it could be. This is a classic example of machine learning by using some train data to predict the info for a set of test data.



This repo provides two examples and a script to easily define and run mapper and reducer functions against Hadoop MapReduce.



SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system, or for stream processing in a relational data stream management system.



Case Study: Establishing relationship between different parameters from pedestrian data for NYC



In graph theory, the shortest path problem is the problem of finding a path between two vertices in a graph such that the sum of the weights of its constituent edges is minimized.

Language:Jupyter NotebookStargazers:1Issues:0Issues:0


Speech recognition is an interdisciplinary subfield of computational linguistics that develops methodologies and technologies that enables the recognition and translation of spoken language into text by computers.

Language:Jupyter NotebookStargazers:1Issues:0Issues:0


For this project, you are tasked with provisioning a few Lambda functions to generate near real time finance data records for downstream processing and interactive querying.

Language:Jupyter NotebookStargazers:1Issues:0Issues:0