Feature-Engineering
TOPICS:
Missing Values
1. Mean/Median/Mode replacement
2. Random Sample Imputation
3. Capturing NaN values with a new feature
4. End of Distribution Imputation
5. Arbitrary Imputation
6. Frequent Categories Imputation
7. Adding a variable to capture NaN
8. Replace NaN with a new category
Categorical Features
1. One Hot Encoding
2. One Hot Encoding with many features
3. Ordinal Number Encoding
4. Count or Frequency Encoding
5. Target Guided Ordinal Encoding
6. Mean Encoding
7. Probability Ratio Encoding
Feature Transformation
1. Standardization
2. Normalization
3. Robust Scaler
4. Guassian Transformation -----
a. Logarithmic Transformation
b. Reciprocal Trnasformation
c. Square Root Transformation
d. Exponential Trnasformation
e. Box Cox Transformation
Handling Imbalanced Dataset
1. Under Sampling
2. Over Sampling
3. SMOTETomek
4. Ensemble Techniques
Outliers Intro
1. Detecting outlier using Z score
2. Inter Quantile Range
Handling Outliers
1. If The Data Is Normally Distributed
2. If Features Are Skewed
DATASETS:
- https://www.kaggle.com/mlg-ulb/creditcardfraud
- https://www.kaggle.com/c/mercedes-benz-greener-manufacturing/data
- https://www.kaggle.com/iabhishekofficial/mobile-price-classification
- https://www.kaggle.com/c/santander-customer-satisfaction/data
- https://www.kaggle.com/c/titanic/data
- https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data