packing-box / docker-packing-box

Docker image gathering packers and tools for making datasets of packed executables and training machine learning models for packing detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Labels all set to 1 in visualization

smarbal opened this issue · comments

Description

When visualizing a model, all executables appear as packed, even though it is not the case.

Steps to reproduce

  1. Generate dataset : dataset make upx-PE -p upx -f PE
  2. Train model: model train upx-PE -a mbkmeans
  3. Visualize model: model visualize -e upx-PE_pe32-pe64_99_mbkmeans_f111

Additional information

By printing params['target'] in visualization.py, all labels are indeed set to 1 so it's not a visualization problem.
Used datasets :

   Name    #Executables   Size    Files    Formats    Packers  
 ───────────────────────────────────────────────────────────── 
  fs-upx   99             114KB   no      PE32,PE64   upx{35}  
  upx-PE   99             45MB    yes     PE32,PE64   upx{35} 

fs-upx is the fileless version of the dataset, which also yields the same bug.

commented

@smarbal
If you run model browse upx-PE_pe32-pe64_99_mbkmeans_f111, do you see the right labels in column cluster ?

commented

@smarbal
I got it ; if you run model -v test upx-PE_pe32-pe64_99_mbkmeans_f111 upx-PE, you will point out that true labels are all 1's instead of predicted labels (which may even be all correct). This is likely to come from a bug in label mapping of the y_true vector. I will try to fix this ASAP.

@dhondta
The issue seems to come from line 212 in ../learning/model.py.
After the line 209, all labels of NOT_PACKED instances are replaced by None.
But then, at line 212, the fillna() function replaces those labels by NOT_LABELLED since those labels are None.
The mapping at line 214 can't work correctly then since NOT_PACKED instances will have a '?' label which is not correct.

Maybe changing the value of NOT_PACKED in LABELS_BACK_CONV from None to 0 could be a solution ?

commented

Solved with 3c8a40f