swarnava-96 / Feature-Engineering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature-Engineering

TOPICS:

Missing Values

1. Mean/Median/Mode replacement
2. Random Sample Imputation
3. Capturing NaN values with a new feature
4. End of Distribution Imputation
5. Arbitrary Imputation
6. Frequent Categories Imputation
7. Adding a variable to capture NaN
8. Replace NaN with a new category

Categorical Features

1. One Hot Encoding
2. One Hot Encoding with many features
3. Ordinal Number Encoding
4. Count or Frequency Encoding 
5. Target Guided Ordinal Encoding
6. Mean Encoding
7. Probability Ratio Encoding

Feature Transformation

1. Standardization
2. Normalization
3. Robust Scaler
4. Guassian Transformation -----
  a. Logarithmic Transformation
  b. Reciprocal Trnasformation
  c. Square Root Transformation
  d. Exponential Trnasformation
  e. Box Cox Transformation

Handling Imbalanced Dataset

1. Under Sampling
2. Over Sampling
3. SMOTETomek
4. Ensemble Techniques

Outliers Intro

1. Detecting outlier using Z score
2. Inter Quantile Range

Handling Outliers

1. If The Data Is Normally Distributed
2. If Features Are Skewed

DATASETS:

  1. https://www.kaggle.com/mlg-ulb/creditcardfraud
  2. https://www.kaggle.com/c/mercedes-benz-greener-manufacturing/data
  3. https://www.kaggle.com/iabhishekofficial/mobile-price-classification
  4. https://www.kaggle.com/c/santander-customer-satisfaction/data
  5. https://www.kaggle.com/c/titanic/data
  6. https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data

SciPy Seaborn