dimingyu / AliyunSecurityMaliwareDetection

天池新手赛 - 阿里云安全恶意程序检测

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Aliyun Security Maliware Detection

https://tianchi.aliyun.com/competition/entrance/231694/introduction

Approach

scheme1

Todo

  • Report
  • PPT
  • Plot (data, trend, target)
    • 词云
    • 箱型图
    • 柱状图
    • 散点图
  • 特征重要性(卡方特征、LGB)
  • 天花板
    • 算法以及数学模型
    • 领域、行业深入

Feature

  • Statistic Feature

  • Model Feature

  • v1 (only one feature by TF-IDF)

    • api sorted by tid and index grouped by file_id
  • v2

    • tid_count
    • tid_distinct_count
    • api_distinct_count
    • tid_api_count_max
    • tid_api_count_min
    • tid_api_count_mean
    • tid_api_distinct_count_max
    • tid_api_distinct_count_min
    • tid_api_distinct_count_mean
  • v3

    • v1 + v2

Model

  • N-Gram
  • TF-IDF
  • XGBoost
  • NB-LR
  • 卡方校验

Packages

  • Numpy
  • Pandas
  • Scikit-learn
  • SciPy

References

About

天池新手赛 - 阿里云安全恶意程序检测


Languages

Language:Jupyter Notebook 99.9%Language:Python 0.1%