packing-box / docker-packing-box

Docker image gathering packers and tools for making datasets of packed executables and training machine learning models for packing detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`RuntimeError` using modified features set

dhondta opened this issue · comments

commented

From @smarbal :

Hello @dhondta,
My reduced features set is ready but I can´'t test it right now due to this bug.
Note that a bug also happens during the features computing phase when using a full dataset (with files) with a reduced features set :

┌──[user@packing-box]──[/mnt/share/experiments/exp-1]──[.]──[improve-visualization|+6…18]────────                                                             ────[172.17.0.3]──[11:33:41]────
$ model train upx-PE1 -a kmeans -f conf/features.conf 
00:00:03.745 [INFO] Selected algorithm: K-Means clustering
00:00:03.747 [INFO] Reference dataset:  upx-PE1(PE32,PE64)
00:00:03.748 [INFO] Computing features...
00:00:03.901 [WARNING] Bad expression: checksum == 0
00:00:03.901 [ERROR] name 'checksum' is not defined
00:00:03.905 [WARNING] Bad expression: size_of_headers == 512
00:00:03.905 [ERROR] name 'size_of_headers' is not defined
00:00:03.909 [WARNING] Bad expression: size_of_initializeddata >= 3 * 1024 * 1024
00:00:03.909 [ERROR] name 'size_of_initializeddata' is not defined
Traceback (most recent call last):
  File "/home/user/.opt/tools/model", line 121, in <module>
    getattr(name, args.command)(**vars(args))
  File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/model.py", line 527, in train
    if not self._prepare(**kw):
  File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/model.py", line 217, in _prepare
    __parse(ds.files.listdir(is_exe), False)
  File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/model.py", line 201, in __parse
    self._data = self._data.append(exe.data, ignore_index=True)
  File "/usr/lib/python3.10/functools.py", line 981, in __get__
    val = self.func(instance)
  File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/executable.py", line 28, in data
    return Features(self)
  File "/home/user/.local/lib/python3.10/site-packages/tinyscript/preimports/log.py", line 91, in _wrapper
    return f(*args, **kwargs)
  File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/features/__init__.py", line 153, in __init__
    for name in self:
RuntimeError: dictionary changed size during iteration

commented

@smarbal For the following lines :

00:00:03.901 [WARNING] Bad expression: checksum == 0
00:00:03.901 [ERROR] name 'checksum' is not defined
00:00:03.905 [WARNING] Bad expression: size_of_headers == 512
00:00:03.905 [ERROR] name 'size_of_headers' is not defined
00:00:03.909 [WARNING] Bad expression: size_of_initializeddata >= 3 * 1024 * 1024
00:00:03.909 [ERROR] name 'size_of_initializeddata' is not defined

You probably have MS-DOS executables in your dataset. When we compute the Features registry, pefeats only applies to PE32, PE64 and .NET (that's the expected behavior as pefeats fails to parse MS-DOS). Therefore, you get errors when trying to compute the features for the MS-DOS files of your dataset.
The quick fix is to remove MS-DOS executables from your dataset.