rtwksai / MalwareDetection-ML

Used Classical Machine Learning models to detect Malware given system specs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Malware Detection - Unbalanced Dataset Problem

File Structure

MalwareDetection-ML/
├── HyperParameters.pdf
├── Malware_Detection_Report.pdf
├── Models
│   ├── #1-Voting
│   │   ├── reproduce
│   │   │   ├── consistent-preprocessing-balancedbagging.ipynb
│   │   │   ├── consistent-preprocessing_easy_ensemble_catboost.ipynb
│   │   │   ├── consistent-preprocessing-et.ipynb
│   │   │   ├── consistent-preprocessing-lgbm-smote.ipynb
│   │   │   ├── consistent-preprocessing-xgboost.ipynb
│   │   │   ├── demo.ipynb
│   │   │   └── demo.py
│   │   ├── voting.ipynb
│   │   └── voting.py
│   └── #2-XGBoost
│       ├── xgboost-submission.ipynb
│       └── xgboost-submission.py
└── README.txt
  • The report for the assignment is saved as 'Malware_Detection_Report.pdf'

  • All the hyperparameters that we had for various models are saved in 'HyperParameters.pdf'

  • We have submitted 2 best models. You can find them in 'Models' folder

  • The top submission of ours which got a score of 0.71103 on Private LB is in '#1-Voting folder'

  • For Voting we had used multiple models to get a consolidated score.

  • To reproduce those CSVs please look at 'reproduce' folder

  • The structure of all the notebooks in 'reproduce' is the same. Only the model has been changed in all

  • Demo Notebook has been saved as 'demo.ipynb' on whose basis we made the rest of notebooks in the folder

  • Explanation for 'demo.ipynb' is in 'demo.py

  • Incase you want to skip the part where you run all the notebooks to get the CSVs, we have CSVs stored in drive Link to Drive

Please head over to 'Models/#1-Voting/reproduce/CSV' folder to find all relevant CSVs

  • The second best submission is for XGBoost.
  • Notebook is saved as 'xgboost-submission.ipynb
  • It's relevant script is saved as 'xgboost-submission.py'

BONUS

  • We had also implemented a stacking based model. Incase you want to view it it is present on the drive Link to Drive Please head over to 'Models/#3-Stacking/ to find all relevant models.

Contributors

About

Used Classical Machine Learning models to detect Malware given system specs


Languages

Language:Jupyter Notebook 95.9%Language:Python 4.1%