Jasmine969 / qshgm-code

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Outline for README

  • Package Overview
  • Run Scripts
  • Requirements

Package Overview

├─data
│  ├─source
│  │  ├─aac
│  │  │      negative_aac.csv
│  │  │      positive_aac.csv
│  │  └─fasta
│  │         negative.fasta
│  │         positive.fasta
│  └─train
│          data.csv
│          label.csv
│          merge.csv
│  model.py
│  nn.py
│  out.txt
  • model.py
    • the script used for training samples with SVM, KNN and RF (random forest)
  • nn.py
    • the script used for training samples with Neural Network
  • data/source/fasta
    • raw FASTA file
  • data/source/aac
    • the Amino Acid Composition (AAC) calculates the frequency of each amino acid type in a protein or peptide sequence
  • data/train
    • data.csv -- merge positive_aac.csv and negative_aac.csv, used for the script model.py
    • label.csv -- mark the corresponding positive and negative samples, used for the script model.py
    • merge.csv -- merge data.csv and label.csv, used for the script nn.py
  • out.txt
    • the results we got

Run Scripts

model.py

python model.py -e ESTIMATOR [-dp DATA_PATH] [-lp LABEL_PATH]
  • -e, required, the estimator will be used, include ['svm', 'knn', 'rf']
  • -dp, the path of data.csv
  • -lp, the path of label.csv

nn.py

python nn.py [-e EPOCHS] [-bs BATCH_SIZE] [-lr LEARNING_RATE] [-dp DATA_PATH]
  • -e, epochs, default=30
  • -bs, batch_size, default=64
  • -lr, learning_rate, default=0.01
  • -dp, the path of merge.csv

Requirements

In the experiment, all scripts run in python 3.7, and the third-party packages we used are listed below:

numpy		1.16.2
pandas		0.25.0
sklearn		0.23.2
torch		1.4.0+cpu
torchvision	0.5.0+cpu

About

License:Apache License 2.0


Languages

Language:Python 100.0%