chuym726 / GPD

This repository contains scripts used in the manuscript "Massive expansion of human gut bacteriophage diversity"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The Gut Phage Database (GPD)

Scripts used for characterizing human gut bacteriophages in the following manuscript:

Camarillo-Guerrero LF, Almeida A, Rangel-Pineros G, Finn RD, Lawley TD (2020) [Massive expansion of human gut bacteriophage diversity]

Associated data can also be found in our FTP server

classifier/classifier.py

Neural network that distinguishes phages from integrative and conjugative elements (ICEs).

Requirements:

  • Python (tested v3.6.7)
  • TensorFlow (tested v1.10)
  • Keras (tested v.2.2.4)

Usage:

classifier.py <input_features_file.txt>

Notes:

  • input_features_file.txt: It contains a feature vector of 1026 dimensions: fraction of hypothetical proteins (1), gene density (1), 5-kmer signature (1024) that represents a phage or an ICE (1 feature vector per line)
  • classifier/classifier_demo.py: It runs a demo of the classifier with 50 examples of phages and ICEs each

Input features generation files:

getGeneDensity.py: This function takes in a GFF3 file and returns the number of genes / kb.
getHypothetical.py: This function takes in a GFF3 file and returns the fraction of hypothetical proteins.
getKmer.py: This function takes in a DNA sequence and counts the proportion of each of the 1024 possible 5mers.

Usage:

getGeneDensity(<gff3_file_name>)
getHypothetical(<gff3_file_name>)
getSignature_hash(<DNA_sequence>)

Other analysis and plotting scripts

figures/

  • 'Figure 1.py': Distribution of MIUViG scores from CheckV analysis
  • 'Figure 2.py': Viral diversity patterns across gut bacteria genera and broad host range VCs
  • 'Figure 3.py': Gut phageome profiling across human populations and correlation with gut bacteria enterotypes
  • 'Figure 4.py': Crass-like family global distribution and host-phage network of globally distributed VCs
  • 'Figure 5.py': Phylogenetic structure of the pX phage and global distribution
  • 'Figure S1.py': Quality control assessment of GPD
  • 'Figure S2.py': Viral diversity patterns across gut bacteria phyla and host range analysis of gut phages
  • 'Figure S3.py': Correlation between sequencing depth and number of phages detected in a sample
  • 'Figure S4.py': Host range analysis of globally distributed phages

About

This repository contains scripts used in the manuscript "Massive expansion of human gut bacteriophage diversity"

License:MIT License


Languages

Language:Python 100.0%