kalyan678 / Kaggle-Starter-Codes

A repository containing link to some my Kaggle starter Notebooks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kaggle Starter Codes for common ML tasks

A repository containing some of my kaggle notebooks which could be helpful as starter codes. My Kaggle Profile: https://www.kaggle.com/parulpandey

1 Getting started with NLP -

1.1 A general Intoduction

This notebook explains the concepts of NLP with respect to this current competition. NLP is the field of study that focuses on the interactions between human language and computers. NLP sits at the intersection of computer science, artificial intelligence, and computational linguistics[source]. NLP is a way for computers to analyze, understand, and derive meaning from human language in a smart and useful way.This Kaggle notebook with basic codes t

Table of Contents

  1. Importing the necessary libraries
  2. Reading the datasets
  3. Basic EDA
  4. Text data processing
  5. Transforming tokens to vectors
  6. Buiding a Text Classification model

1.2 CountVectorizers | TFIDF | Hashing Vectorizer

This notebook comes as a second part to the Getting started with NLP Notebooks .In this notebook we shall study the various ways of vectorizing text data.Vectorization converts text data into feature vectors.

Dataset

Real or Not? NLP with Disaster Tweets - A Getting started Compeition on Kaggle


2 Getting started with Time Series Analysis

Time series data is a sequence of data points in chronological order that is used by businesses to analyze past data and make future predictions.

Dataset

NIFTY-50 Stock Market Data (2000-2019)

The data is the price history and trading volumes of the fifty stocks in the index NIFTY 50 from NSE (National Stock Exchange) India. All datasets are at a day-level with pricing and trading values split across .cvs files for each stock along with a metadata file with some macro-information about the stocks itself. The data spans from 1st January, 2000 to 31st December, 2019.


3 Getting started with Machine Learning Intrepretebility

Extracting human understandable insights from any Machine Learning mode. Some techniques explained in this notebook are:

  • Permutation Importance using ELI5 library
  • Partial Dependence Plots
  • SHAP Values
  • Advanced Uses of SHAP Values

Dataset

Pima Indians Diabetes Database This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.


4 Getting started with Dimensionality Reduction Techniques in Python

A 3 part serieson Dimensionality reduction techniques using the Kannada MNIST dataset. In this series of notebooks, we shall study about three Dimensionality reduction techniques using the Kannada MNIST dataset. The techniques are PCA, t-SNE and UMAP.

Part 1: Visualizing Kannada MNIST with PCA

Part 2: Visualizing Kannada MNIST with t-SNE

Part 3: Visualizing Kannada MNIST with UMAP

Dataset

Kannada MNIST The goal of this competition is to provide a simple extension to the classic MNIST competition we're all familiar with. Instead of using Arabic numerals, it uses a recently-released dataset of Kannada digits.MNIST like datatset for Kannada handwritten digits.


5 Getting started with Geospatial Data in Python

1. Visualising Geospatial data to get insights

The beauty of using Python is that it offers libraries for every data visualisation need. One such library is Folium which comes in handy for visualising Geographic data (Geo data). Geographic data (Geo data) science is a subset of data science that deals with location-based data i.e description of objects and their relationship in space.

2. Wuhan Coronavirus : A geographical analysis

6 Getting started with H2O libraies in Python

1. Kannada MNIST with H2O DeepLearning

2. Speed up your Data munging with Python's Datatable

3. Automating the ML workflow with H2O AutoML

About

A repository containing link to some my Kaggle starter Notebooks