endgameinc / dga_predict

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trained model usage

goooroooX opened this issue · comments

Hi,
Could you please post a few lines of code with a sample of checking domain name against trained model and returning result (generated/non-generated)?
Thanks!

Thanks for your interest. This code is meant to reproduce the figures in the paper
https://arxiv.org/abs/1611.00791

but you can also query the trained model directly, as follows.

After you've trained the model using data X,y and have valid_chars
https://github.com/endgameinc/dga_predict/blob/master/dga_classifier/lstm.py#L28-L46

you may query the model using the following steps ("domain.xyz")
(1) remove the TLD from the domain
(2) encode domain characters as integer tokens and pad
(3) query the model

# assumes you've already trained the model and have access to "valid_chars"
import tldextract
from keras.preprocessing import sequence
query_domain = 'domain.xyz'
query_domain_stripped = tldextract.extract(query_domain).domain
query = sequence.pad_sequences( [[valid_chars[y] for y in query_domain_stripped]], maxlen=maxlen) 
print( model.predict(query) )

>> [[0.00203814]]

You can find more information in a related blog post:
https://www.endgame.com/blog/technical-blog/using-deep-learning-detect-dgas

Thank you for a sample.
Is it possible to avoid external libraries usage (keras)? I'm trying to implement a light-weight solution for monitoring and limited with native Python libraries in sandbox.
Thanks!

This isn't straightforward, and beyond the scope of this repo.

One option: export the keras model as a tensorflow model, then investigate using something like https://github.com/riga/tfdeploy to make numpy as the only dependency. I'm not aware of a fail-safe method to do the first step (export keras to tensorflow), but you might find some resources here:

Another route would be to create your own model from scratch using another framework that you find suitable. For example, I believe that numpy is the CPU backend for https://github.com/chainer/chainer. In that case, this repo would only serve as a guide (and data) to you rewriting and training your own model.