target / huntlib

A Python library to help with some common threat hunting data analysis operations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Benfords returns chi2 higher than 1.0

CMiksche opened this issue · comments

Describe the bug
After reading the documentation i expect benfords to return a chi2 value between 0.0 and 1.0 but when testing with high numbers i get higher chi2 values:

To Reproduce
Download the current version from PyPI and test the following input:

huntlib.util.benfords([234,317,211,92])
huntlib.util.benfords([235634643])
huntlib.util.benfords([123])
huntlib.util.benfords([9,9,9])

Expected behavior
Maximum chi2 value of 1

Terminal output

>>> huntlib.util.benfords([234,317,211,92])
(2.279150197628459, 0.9712356424435329, 1    0.00
2    0.50
3    0.25
4    0.00
5    0.00
6    0.00
7    0.00
8    0.00
9    0.25
Name: digits, dtype: float64)
>>> huntlib.util.benfords([235634643])
(4.681818181818183, 0.7909792203003781, 1    0.0
2    1.0
3    0.0
4    0.0
5    0.0
6    0.0
7    0.0
8    0.0
9    0.0
Name: digits, dtype: float64)
>>> huntlib.util.benfords([123])
(2.3222591362126246, 0.9695046717201476, 1    1.0
2    0.0
3    0.0
4    0.0
5    0.0
6    0.0
7    0.0
8    0.0
9    0.0
Name: digits, dtype: float64)
>>> huntlib.util.benfords([9,9,9])
(20.73913043478261, 0.007873655429909338, 1    0.0
2    0.0
3    0.0
4    0.0
5    0.0
6    0.0
7    0.0
8    0.0
9    1.0
Name: digits, dtype: float64)

I know - this issue probably won't occur on bigger and more realistic datasets but either the documentation or the handling of these cases should be changed.

Thanks for catching this. The chi2 range is in fact 0 to infinity, so I corrected the documentation.