RayminQAQ / MalDetect_pcap

Created in 2024/4/29, using cuckoo sandbox to generate pcap from malware, and malware from VirusShare.com (orginate in VirusShare_00177).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project name: pcap_MalDetect

Purpose:

Created in 2024/4/29, using

  1. cuckoo sandbox to generate pcap from malware
  2. malware dataset from VirusShare.com (orginate in VirusShare_00177).

VirusShare_00177 Dataset Overview:

The VirusShare_00177 dataset is a collection of malware samples that were submitted to the VirusShare website. The dataset includes both benign and malicious samples, and it can be used for a variety of machine learning tasks, such as malware detection and classification.

Project Structure

Repository file structure:
    |-- downloader.py
    |-- flow_pcap.py
    |-- Model 
        |-- ResNet.py
        |-- model.py
        |-- run.py
    |-- PEdeleter.py
    |-- Pcap2Img.py
    |-- tempDeleter.py
    |-- uploader.py
    |-- README.md

By Benson

  • uploader.py: Uploads viruses to the api server of cuckoo sandbox, be sure to change the result to the place you want to .pcap files to be and change PEs to the folder containing viruses, also change api token to your own token in cuckoo.conf

  • downloader.py: Downloads .pcap file from the api server, be sure to change the result folder and api token to your own token

  • tempdeleter.py: only used when the vmware goes down, deletes all other folders in result if the api token is larger than number in line20(line>=3920)

  • PEdeleter.py: only used when vmware goes down, run it after tempdeleter, it will remove all files that are already processed

By RayminQAQ:

  • Files in "Model" folder: Whole Machine Learning pipline for training and testing.
  • flow_pcap.py: split pcap into small pieces (seperated by TCP) according to the paper.

By Stan:

  • Pcap2Img.py: read the .pcap file and turn to hex bytes. If the number of hex bytes is less than 784, the program will put it 0x00 to 784. If it is higher than 784, the program will get first 784. After get 784 hex bytes, the program will turn it to 28*28 images and be saved by category

Pipeline

The project is run in Python 3.8.10 and cuda version 12.3 (RTX 3060 laptop).

To setup the environment, you should setup python's virtual environment and type:

  1. run the uploader.py to upload the files to api server (Notice: You should change your apikey)

    python uploader.py
  2. run the downloader.py after all the files are processed to download all .pcap files (WARNING: you may come into many problem due to the setting of cuckoo sandbox, see cuckoo sandbox documentation for help.)

    python downloader.py
  3. Turn pcap file into image (.png)

    python flow_pcap.py
    python Pcap2Img.py
  4. Train the Maching Learning model

    python run.py

Referece Paper

  1. Malware Traffic Classification Using Convolutional Neural Network for Representation Learning
  2. Image-based Neural Network Models for Malware Traffic Classification using PCAP to Picture Conversion

Contributors

  1. RayminQAQ:
    • Within the repository, processed pcap files in alignment with the referenced paper and constructed the complete machine learning pipeline.
    • Within the team, oversaw all tasks, provided leadership, and conducted survey paper research.
  2. Stan Wang: I made Pcap2Img.py to turn the file to the image.
  3. Benson: I made uploader.py, downloader.py, tempdeleter.py and PEdeleter.py

About

Created in 2024/4/29, using cuckoo sandbox to generate pcap from malware, and malware from VirusShare.com (orginate in VirusShare_00177).


Languages

Language:Python 100.0%