Kaggle Starter Codes for common ML tasks

A repository containing some of my kaggle notebooks which could be helpful as starter codes. My Kaggle Profile: https://www.kaggle.com/parulpandey

1 Getting started with NLP -

1.1 A general Intoduction

This notebook explains the concepts of NLP with respect to this current competition. NLP is the field of study that focuses on the interactions between human language and computers. NLP sits at the intersection of computer science, artificial intelligence, and computational linguistics[source]. NLP is a way for computers to analyze, understand, and derive meaning from human language in a smart and useful way.This Kaggle notebook with basic codes t

Table of Contents

Importing the necessary libraries
Reading the datasets
Basic EDA
Text data processing
Transforming tokens to vectors
Buiding a Text Classification model

1.2 CountVectorizers | TFIDF | Hashing Vectorizer

This notebook comes as a second part to the Getting started with NLP Notebooks .In this notebook we shall study the various ways of vectorizing text data.Vectorization converts text data into feature vectors.

Dataset

Real or Not? NLP with Disaster Tweets - A Getting started Compeition on Kaggle

2 Getting started with Time Series Analysis

Time series data is a sequence of data points in chronological order that is used by businesses to analyze past data and make future predictions.

Dataset

NIFTY-50 Stock Market Data (2000-2019)

The data is the price history and trading volumes of the fifty stocks in the index NIFTY 50 from NSE (National Stock Exchange) India. All datasets are at a day-level with pricing and trading values split across .cvs files for each stock along with a metadata file with some macro-information about the stocks itself. The data spans from 1st January, 2000 to 31st December, 2019.

3 Getting started with Machine Learning Intrepretebility

Extracting human understandable insights from any Machine Learning mode. Some techniques explained in this notebook are:

Permutation Importance using ELI5 library
Partial Dependence Plots
SHAP Values
Advanced Uses of SHAP Values

Dataset

Pima Indians Diabetes Database This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

4 Getting started with Dimensionality Reduction Techniques in Python

A 3 part serieson Dimensionality reduction techniques using the Kannada MNIST dataset. In this series of notebooks, we shall study about three Dimensionality reduction techniques using the Kannada MNIST dataset. The techniques are PCA, t-SNE and UMAP.

Part 1: Visualizing Kannada MNIST with PCA

Part 2: Visualizing Kannada MNIST with t-SNE

Part 3: Visualizing Kannada MNIST with UMAP

Dataset

Kannada MNIST The goal of this competition is to provide a simple extension to the classic MNIST competition we're all familiar with. Instead of using Arabic numerals, it uses a recently-released dataset of Kannada digits.MNIST like datatset for Kannada handwritten digits.

5 Getting started with Geospatial Data in Python

1. Visualising Geospatial data to get insights

The beauty of using Python is that it offers libraries for every data visualisation need. One such library is Folium which comes in handy for visualising Geographic data (Geo data). Geographic data (Geo data) science is a subset of data science that deals with location-based data i.e description of objects and their relationship in space.

2. Wuhan Coronavirus : A geographical analysis

6 Getting started with H2O libraies in Python

1. Kannada MNIST with H2O DeepLearning

2. Speed up your Data munging with Python's Datatable

3. Automating the ML workflow with H2O AutoML

kalyan678 / Kaggle-Starter-Codes

Kaggle Starter Codes for common ML tasks

1 Getting started with NLP -

1.1 A general Intoduction

1.2 CountVectorizers | TFIDF | Hashing Vectorizer

Dataset

2 Getting started with Time Series Analysis

Dataset

3 Getting started with Machine Learning Intrepretebility

Dataset

4 Getting started with Dimensionality Reduction Techniques in Python

Part 1: Visualizing Kannada MNIST with PCA

Part 2: Visualizing Kannada MNIST with t-SNE

Part 3: Visualizing Kannada MNIST with UMAP

Dataset

5 Getting started with Geospatial Data in Python

1. Visualising Geospatial data to get insights

2. Wuhan Coronavirus : A geographical analysis

6 Getting started with H2O libraies in Python

1. Kannada MNIST with H2O DeepLearning

2. Speed up your Data munging with Python's Datatable

3. Automating the ML workflow with H2O AutoML

About