packing-box / docker-packing-box

Docker image gathering packers and tools for making datasets of packed executables and training machine learning models for packing detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

No `predict` method for some clustering algorithms

smarbal opened this issue · comments

The following error occurs when training some clustering algorithms :

┌──[user@packing-box]──[/mnt/share]──[main|+2…6]────────                                                                                 ────[172.17.0.3]──[16:32:37]────
$ model train fs-upx-reduced-EP -a dbscan 
00:00:03.385 [INFO] Selected algorithm: Density-Based Spatial Clustering of Applications with Noise
00:00:03.387 [INFO] Reference dataset:  fs-upx-reduced-EP(PE32,PE64)
00:00:03.388 [INFO] Loading features...
00:00:03.429 [INFO] Making pipeline...
00:00:03.434 [INFO] Training model...
00:00:03.434 [INFO] (step 1/1) Processing dbscan
Traceback (most recent call last):
  File "/home/user/.opt/tools/model", line 120, in <module>
    getattr(name, args.command)(**vars(args))
  File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/model.py", line 594, in train
    self._train.predict = self.pipeline.predict(self._train.data)
  File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/model.py", line 70, in __getattribute__
    return object.__getattribute__(object.__getattribute__(self, "pipeline"), name)
  File "/home/user/.local/lib/python3.10/site-packages/sklearn/utils/metaestimators.py", line 127, in __get__
    if not self.check(obj):
  File "/home/user/.local/lib/python3.10/site-packages/sklearn/pipeline.py", line 46, in check
    getattr(self._final_estimator, attr)
AttributeError: 'DBSCAN' object has no attribute 'predict'

This is because the sklearn model does not implement a predict() method.
Instead, the fit_predict() method should be used.

commented

@smarbal The difficulty with this is that a few algorithms, either supervised or unsupervised, use the API with .fit(...) and .predict(...), as you can see for RandomForestClassifier or KMeans while DBSCAN doesn't. I will program an alternative workflow for this case.

commented

@smarbal I did not test yet, I must confess. If you have time for, please do...

This works as intended, thank you.