Bad sized Simda domain names
Samiisd opened this issue · comments
Hello,
The lengths of domain names generated by the Simda generator are bad (range from 0 to 32-8). Thus, the dataset used from training the model is a bit corrupted.
To fix this issue, just replace this piece of code in data.py:
simda_lengths = range(8, 32)
segs_size = max(1, num_per_dga/len(simda_lengths))
for simda_length in range(len(simda_lengths)):
domains += simda.generate_domains(segs_size,
length=simda_length,
tld=None,
base=random.randint(2, 2**32))
labels += ['simda']*segs_size
By this one:
simda_lengths = range(8,
segs_size = max(1, num_per_dga/len(
for simda_length in simda_lengths:
domains += simda.generate_domains(segs_size,
length=simda_length,
tld=None,
base=random.randint(2, 2**32))
labels += ['simda']*segs_size
The only difference is that the new code takes use of simda_lengths.
I hope it'll help !