S0urc-3 / Sherlock

This repository contains experiments for different publications at the intersection of Computer Vision and Computer Security.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sherlock

This repository contains experiments for different publications at the intersection of Computer Vision and Computer Security.

We are currently #1 on paperswithcode for malware detection: https://paperswithcode.com/dataset/malnet.

We are currently #1 on paperswithcode for malware detection from type labels: https://paperswithcode.com/dataset/malnet.

We are currently #1 on paperswithcode for malware detection from family labels: https://paperswithcode.com/dataset/malnet.

We are currently #1 on paperswithcode for malware type detection: https://paperswithcode.com/dataset/malnet.

We are currently #1 on paperswithcode for malware family detection: https://paperswithcode.com/dataset/malnet.

What is a binary image?

Binary images represent the bytecode of an executable as a 2D image (see figure below), and can be statically extracted from many types of software (e.g., EXE, PE, APK). We use the Android ecosystem due to its large market share, easy accessibility, and diversity of malicious software.

Binary image

Inference and regenerate results

Follow these steps to evaluate each model.

  1. Download the dataset from malnet dataset and prepare the data.

    • Download full-data-as-1GB or full-data-as-6GB and copy all the zip files to a folder.

    • To recombine file chunks after downloading, run:

      cat malnet-image* | tar xzpvf -

    • To create the required data files for binary, type and family training or evaluation, update the config file in data folder. Then run main.py as below.

      'groups' : ['family', 'binary','type'], # binary, 'type', 'family'

      'data_dir': Data folder path where the group should be created,

      'image_dir': Image unzip folder path which is created from the previous step,

      'dataset_type': what type of dataset to create from train, test and val, # all, train, test, val

      'symbolic': create symbolic links or copy images, # True, False

      python data/main.py

  2. Download the checkpoints to your local folder

Experiment Classes (nb_classes) Checkpoint (model_path)
Binary 2 binary.pth
Type 47 type.pth
Family 696 family.pth
  1. Execute the following commands to evaluate each experiment.
Experiment Command
Binary python regenerate_experiment_results.py --model_path model_path_to_Binary --nb_classes 2 --data_path data_path_to_Binary
Type python regenerate_experiment_results.py --model_path model_path_to_Type --nb_classes 47 --data_path data_path_to_Type
Family python regenerate_experiment_results.py --model_path model_path_to_Family --nb_classes 696 --data_path data_path_to_Family
  1. After the above step .csv files will be generated with results. Use those .csv files and run {binary/family/type}_classification_metrics_generation.py file to regenerate the results.

Results

Experiment Classes F1 Precision Recall Checkpoint
Binary 2 .854 .920 .810 binary.pth
Type 47 .497 .628 .447 type.pth
Family 696 .491 .568 .461 family.pth

Citation

@article{seneviratne2022self, title={Self-supervised vision transformers for malware detection}, author={Seneviratne, Sachith and Shariffdeen, Ridwan and Rasnayaka, Sanka and Kasthuriarachchi, Nuran}, journal={IEEE Access}, volume={10}, pages={103121--103135}, year={2022}, publisher={IEEE} }

About

This repository contains experiments for different publications at the intersection of Computer Vision and Computer Security.

License:MIT License


Languages

Language:Python 100.0%