Define a global entropy measurement for strings and literals
maxfisher-g opened this issue · comments
Max Fisher commented
Entropy calculations currently use per-file character frequency counts to define the expected probabilities for each character. It would be better to measure character frequencies on a large dataset of source files and then use the same frequency counts to analyse all packages.
Max Fisher commented
It will be easier to measure character frequencies when we have static analysis data in bigquery