h9-tect / Data_reprocessing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

14 outlier detection and handling

Amsamms opened this issue · comments

First , I would like to thank you for your Great effort in this repo.

regarding outlier, i suggest to do outlier handling in tow stages, univariate as you did using z score, and multivariate using any technique , like PCA for instance

to clarify my point, imagine a dataset with 3 columns, Age, weight and length of males

univariate will limit the data in the columns for instance to be : the age say from 5 years to 65, and weight from 20 kgs to 200 kgs and length from 0.6 M to 2 meters

but a single instance of age say 7 years with weight 150 kgs and 0.7 meter is highly unlikely and can't be removed by z score only, it needs multivariate analysis to be detected

i hope my point is clear and thanx

thanks, i appreciate your note, and I added the 2 sections univariate and multivariate