The task of this project is to detect the malware based on features extracted from the API calls.
The solution achieved an AUC score of 99.18% on Kaggle's private leadership board
- Download the dataset from Kaggle and keep the extracted data in the project root directory
- Do
pip install
- Run the file kfold_ensemble.py by using the command python kfold_ensemble.py
- After training and prediction, output is generated in the file output.csv
The easiest way to interact with Kaggle’s dataset is via Kaggle Command-line tool (CLI). Below are the steps to setup Kaggle CLI and use it to download the dataset
- Install the Kaggle CLI
To get started to Kaggle CLI we will need Python, open terminal and type command
pip install kaggle
- API Credentials Once we have Kaggle installed, type kaggle to check it is installed and we will get an output similar to this
In the above line, we will see the path (highlighted) of where to put your kaggle.json file. To get kaggle.json file go to: https://www.kaggle.com//account
In the API section, click Create New API Token. And copy it the path mentioned in the terminal output.
Type kaggle once again to check.
In some case, even after copying the credentials will not work even though the file is placed in the correct location due incorrect permission. Just type the exact command and it will start working
We can open kaggle help via kaggle -h
For getting info on competitions we can type kaggle competitions download -h
whatever the Kaggle CLI command is, add -h to get help.
To download the dataset, go to Data subtab on the competition page. In API section we will find the exact command that we can copy to the terminal to download the entire dataset.
The syntax is like kaggle competitions download <competition name>
One the dataset is downloaded extract the dataset and use it.