mhxueshan / malware-detection-machine-learning

malware detection using machine learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hao Meng 2020.6

Rewrite in the notebook in 2021.10

Motivation

Recently, to do data mining on financial data features. I have taught myself machine learning, mainly traditional statistical-based machine learning (book: An-Introduction-to-Statistical-Learning) and deep learning based on neural networks (from Hung-yi Lee https://speech.ee.ntu.edu.tw/~ tlkagk/courses.html). I also studied reinforcement learning, but I found some of the concepts difficult to understand, and I think it is one of the most challenging kinds of machine learning to understand. So I haven't actually used it yet.

I used to do very intensive research on malware in my last bank and my university days. When I discovered that machine learning is a powerful tool for data analysis, I thought: why can't I use machine learning to detect malware?

Instruction

This notebook is mainly based on an existed notebook(https://github.com/dchad/malware-detection), and the data comes from Microsoft Malware Classification Challenge (BIG 2015)(https://www.kaggle.com/c/malware-classification/data), which data quality is high.

Why I do my research based on Derek's notebook? Because the workload of analyzing this data from scratch is enormous, I will use his method directly and go through the advantages and disadvantages to speed up the analysis.

Entrance

main.ipynb

Conclusion

Even when the feature extraction stage is not done in great detail, the accuracy of these mainstream models can be greater than 99%, and can machine learning is very powerful for malware identification.

About

malware detection using machine learning

License:Apache License 2.0


Languages

Language:Jupyter Notebook 51.7%Language:Python 48.3%