deyachatterjee / data-science-toolkit

Collection of stats, modeling, and data science tools in Python and R.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

Welcome! The purpose of this repository is to serve as stockpile of statistical methods, modeling techniques, and data science tools. The content itself includes everything from educational vignettes on specific topics, to tailored functions and modeling pipelines built to enhance and optimize analyses, to notes and code from various data science conferences, to general data science utilities. This is and will remain a work in progress, and I welcome all contributions and constructive criticism. If you have a suggestion or request, please make use of the "Issues" tab and I will respond expeditiously.

Table of Contents

  1. Playground
    1. Rough Notes from ISLR Exercises -- R
    2. Rough Notes from Python Data Scientist Track -- Python
  2. Exploratory Data Analysis (EDA) and Visualization
    1. EDA and Basic Visualization -- R
    2. EDA and Basic Visualization -- Python
    3. Visualizing Geographic Data -- Python
    4. Radar Charts -- Python
  3. Hypothesis Testing
    1. Kolmogorov-Smirnov Test (KS Test) -- R
    2. Useful Hypothesis Testing Functions -- R
  4. Classification
    1. Logistic Regression (Ridge and Lasso Methods Included) -- R
    2. Useful Classification Functions -- R
    3. Basic Tree Models -- R
    4. KNN -- R
  5. Regression
    1. Linear Regression -- Python
  6. Reinforcement Learning
  7. Text Mining and Natural Language Processing (NLP)
    1. Basic Texting Mining and NLP -- R
  8. Notes and Material from Data Science Conferences
    1. PyData 2018 DC Conferences (Notes and Tutorial Code) -- Python
  9. Utilities
    1. HTML File Appender (Using Beautiful Soup) -- Python

Contribution Info

All are welcome and encouraged to contribute to this repository. My only request is that you include a detailed description of your contribution, that your code be thoroughly-commented, and that you test your contribution locally with the most recent version of the Master branch integrated prior to submitting the PR.

About

Collection of stats, modeling, and data science tools in Python and R.


Languages

Language:HTML 64.7%Language:Jupyter Notebook 33.2%Language:JavaScript 1.2%Language:R 0.5%Language:CSS 0.2%Language:Python 0.2%Language:Makefile 0.0%