endgameinc / dga_predict

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bad sized Simda domain names

Samiisd opened this issue · comments

Hello,

The lengths of domain names generated by the Simda generator are bad (range from 0 to 32-8). Thus, the dataset used from training the model is a bit corrupted.

To fix this issue, just replace this piece of code in data.py:

simda_lengths = range(8, 32)
segs_size = max(1, num_per_dga/len(simda_lengths))
for simda_length in range(len(simda_lengths)):
    domains += simda.generate_domains(segs_size,
                                          length=simda_length,
                                          tld=None,
                                          base=random.randint(2, 2**32))
labels += ['simda']*segs_size

By this one:

simda_lengths = range(8, 
segs_size = max(1, num_per_dga/len(
for simda_length in simda_lengths:
    domains += simda.generate_domains(segs_size,
                                          length=simda_length,
                                          tld=None,
                                          base=random.randint(2, 2**32))
labels += ['simda']*segs_size

The only difference is that the new code takes use of simda_lengths.

I hope it'll help !