Implement "Numerical outlier method" , to detect Anomaly/Outlier points in Dataset
Halix267 opened this issue · comments
@agrawalshubham01 Can I work on this
@Halix267 which method are you going to implement ?
IQR method @agrawalshubham01 .
for detection of outliers
Kindly draft your api here.
could u brief ? I m not getting
@Halix267 draft your model code here.
def dataset_developer(n, probability=[0.05, 0.05, 0.15, 0.5, 0.15]):
X = []
while len(X)<n:
t = random.random()
if t<probability[0]:
X.append(random.random()*0.2)
elif t>=probability[0] and t<probability[1]+probability[0]:
X.append(random.random()*0.2+0.2)
elif t>= probability[1]+probability[0] and t<probability[0]+probability[1]+probability[2]:
X.append(random.random()0.2+0.22)
elif t>=probability[0]+probability[1]+probability[2] and t<probability[0]+probability[1]+probability[2]+probability[3]:
X.append(random.random()0.2+0.23)
else:
X.append(random.random()0.2+0.24)
return X
def numeric_outlier(X, k):
Q1 = X[int((len(X)+1)/4)]
Q3 = X[int(((len(X)+1)3)/4)]
IQR = Q3 - Q1
outliers_index = [i for i in range(len(X)) if X[i]<Q1-kIQR or X[i]>Q3+k*IQR]
return outliers_index
def imager(X, Y, outlier_X, outlier_Y, l):
X = random.sample(X, len(X))
outlier_X = random.sample(outlier_X, l)
plt.scatter(X, Y, color='green', marker='.')
plt.scatter(outlier_X, outlier_Y, color='red', marker='*')
plt.plot()
def accuracy(outliers_index, X):
accuracy = 0
for j in range(len(X)):
if j in outliers_index:
if X[j]<0.5 or X[j]>0.9:
accuracy += 1
else:
if X[j]>=0.5 and X[j]<=0.9:
accuracy += 1
return accuracy/len(X)
def main(X, k):
outliers_index = numeric_outlier(X, k)
outliers = [X[i] for i in outliers_index]
X_c = [X[i] for i in range(len(X)) if i not in outliers_index]
imager(range(len(X_c)), X_c, range(len(X_c)), outliers, len(outliers_index))
return accuracy(outliers_index, X)
k = 0.45
random.seed(0)
X = dataset_developer(1000)
X.sort()
acc = main(X, k)
print(acc)
@agrawalshubham01 done . Now what
What I am trying to say is how would you create class, what would be methods and attributes, how could one use these classes. This is a python package just we need to define related functions using class so that we can inherit its properties. How could end user access it.
I got the logic, what I am asking is how would you draft your code such that some one else can use this
@rohansingh9001 Though I think logic looks fine, help him with drafting an API for the same. We can keep this in a folder like Preprossesers
Yes guide me for drafing the API for the same @agrawalshubham01 @rohansingh9001
@Halix267 sorry for the delay, I was busy in other important commitments.
However, in an issue try to be as descriptive as you can. There are no resources given for us to understand what Numerical Outlier method is or what can it do.
Even if we do have knowledge about this algorithm from background education, we still do not know how robust your solution will be or what all it can do.
Secondly, I would recommend you to go through the code in the Examples directory. It contains examples of how an end-user can use our library.
A user should be able to import your code and apply on his custom dataset.
For example, the Linear Regression model has a .fit()
method which trains the model on the data given to it. There are also other various methods in it.
While your code might logically be correct, please provide what sort of data this model needs to train, what kind of predictions it can make and what functions the end user will call to train this model and run it.
That is what @agrawalshubham01 meant by discussing an API.
ok @rohansingh9001 I wil provide the dataset but first can u guide me How i can proceed on this issue
And about your question imagine the user is preprocessing the dataset and in that dataset suppose there is height column in feet...
And the values are 1.2, 1.3 , 4, 5.1, 5.6, 6.3, 41.2, 50 .....
clearly 1.2,1.3 and 41.2 , 50 are the outliers which results in decreasing model performance.
So to improve the model performance user can find the outliers and exclude them accordingly
@rohansingh9001 @agrawalshubham01 Plzz look at this , Can I start working on this issue?
Resources :-
https://naysan.ca/2020/06/28/interquartile-range-iqr-to-detect-outliers/ <-------------------- (First, See this implementation)
https://towardsdatascience.com/why-1-5-in-iqr-method-of-outlier-detection-5d07fdc82097 <--------(Logic for implementation)
I would like to work on this issue
Sure 👍
Thanks to @Siddharth-Singh27, the issue's been solved. Hereby, closing it.