yang-zhang / ds-utils

Data Science Utility Functions in Python.

Home Page:https://yang-zhang.github.io/ds-utils/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python utility functions for data science.

  • Base
    • R-like table function
  • Explore
    • Missing value percentage of all columns in df
    • Get all unique keys in df
    • Count * group by columns
    • Customized describe() for numerical columns
    • Describe a numerical column given the value of a categorical column
    • Describe categorical columns including distribution of values
    • Describe a categorical column given the value of a categorical column
  • Stats Vector-based and dataframe-based stat functions
    • Correlation matrix
    • Entropy and mutual information
    • t-test
    • Chi-square test
    • ANOVA
  • Math
    • Log and inverse-log of different bases
    • Median absolute deviation (MAD)
  • Preprocessing
    • Cast type for mulitple columns in bulk
    • Fill-in missing values in bulk
    • One-hot encoding and label encoding for training and test
  • Evaluation
    • Sklearn scorer for log-scale target variable
  • Visualization Dataframe oriented plot functions
    • histogram
    • scatterplot
    • barplot of distribution of categorical columns
    • boxplot of a numerical column given values of a categorical columns
    • stackedbarplot of a categorical column given values of another categorical column
    • pairplot of all columns
    • plot all rows of dataframe
    • plot explained variance ratio in PCA

Back to Home Page

About

Data Science Utility Functions in Python.

https://yang-zhang.github.io/ds-utils/


Languages

Language:Jupyter Notebook 93.4%Language:Python 6.6%