No `predict` method for some clustering algorithms
smarbal opened this issue · comments
The following error occurs when training some clustering algorithms :
┌──[user@packing-box]──[/mnt/share]──[main|+2…6]──────── ────[172.17.0.3]──[16:32:37]────
$ model train fs-upx-reduced-EP -a dbscan
00:00:03.385 [INFO] Selected algorithm: Density-Based Spatial Clustering of Applications with Noise
00:00:03.387 [INFO] Reference dataset: fs-upx-reduced-EP(PE32,PE64)
00:00:03.388 [INFO] Loading features...
00:00:03.429 [INFO] Making pipeline...
00:00:03.434 [INFO] Training model...
00:00:03.434 [INFO] (step 1/1) Processing dbscan
Traceback (most recent call last):
File "/home/user/.opt/tools/model", line 120, in <module>
getattr(name, args.command)(**vars(args))
File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/model.py", line 594, in train
self._train.predict = self.pipeline.predict(self._train.data)
File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/model.py", line 70, in __getattribute__
return object.__getattribute__(object.__getattribute__(self, "pipeline"), name)
File "/home/user/.local/lib/python3.10/site-packages/sklearn/utils/metaestimators.py", line 127, in __get__
if not self.check(obj):
File "/home/user/.local/lib/python3.10/site-packages/sklearn/pipeline.py", line 46, in check
getattr(self._final_estimator, attr)
AttributeError: 'DBSCAN' object has no attribute 'predict'
This is because the sklearn model does not implement a predict()
method.
Instead, the fit_predict()
method should be used.
@smarbal The difficulty with this is that a few algorithms, either supervised or unsupervised, use the API with .fit(...)
and .predict(...)
, as you can see for RandomForestClassifier
or KMeans
while DBSCAN
doesn't. I will program an alternative workflow for this case.
This works as intended, thank you.