Shamir-Lab / 3CAC

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

3CAC

3CAC is a three-class classifier designed to classify contigs in mixed metagenome assemblies as phages, plasmids, chromosomes, or uncertain.

Requirements

3CAC generates its initial classification by existing classifiers: viralVerify, PPR-Meta, PlasClass, and deepVirFinder. Thus, prior to running 3CAC, installation of these tools are required. Note that, user can install either viralVerify or PPR-Meta as prefered. 3CAC doesn't require to install both of them. Installation of PlasClass and DeepVirFinder are required.

To run 3CAC, please download the 3CAC folder. 3CAC is written in Java and requires Java Runtime Environment.

Usage

1. Input

3CAC requires the following input files:

(1) Contig file in "fasta" format: a set of contigs to be classified.

(2) Assembly grah file in "gfa" format: the assembly graph generated by metaSPAdes or metaFlye when assembling reads to generate the input contigs.

(3) A path file has path information for each contig, such as scaffolds.path in metaSPAdes assembly and assembly_info.txt in metaFlye assembly.

For contigs assembled from short reads by metaSPAdes, files scaffolds.fasta, assembly_graph_with_scaffolds.gfa, and scaffolds.path can be used as input. for contigs assembled from long reads by metaFlye, files assembly.fasta, assembly_graph.gfa, assembly_info.txt can be used as input.

2. Running PPR-Meta, viralVerify, PlasClass and DeepVirFinder

(1) Run either viralVerify or PPR-Meta on the contig file to classify each of the input contigs as phage, plasmid, chromosome, or uncertain.

(2) Generate files phageContigs.fasta and plasmidContigs.fasta containing contigs classified as phages and plasmids by step (1).

java PhageAndPlasmidContigs --output output_directory --contig contig_file.fasta --PPRMeta(or --viralVerify) output_file_of_PPRMeta_or_viralVerify.csv

(3) Run PlasClass on plasmidContigs.fasta and run DeepVirFinder on phageContigs.fasta.

3. Running 3CAC

Generate classification result of 3CAC.

java Classify3CAC --assembler Flye/SPAdes --output output_directory --graph assembly_graph_file.gfa --path scaffolds.path/assembly_info.txt --PPRMeta(or --viralVerify) output_file_of_PPRMeta_or_viralVerify.csv --PlasClass output_file_of_PlasClass.probs.out --deepVirFinder output_file_of_deepVirFinder.txt

4. Example

A small test dataset could be found under the test folder.

(1) To generate classification result of 3CAC based on viralVerify solution.

  java Classify3CAC --assembler Flye --output ./test/  --graph ./test/assembly_graph.gfa --path ./test/assembly_info.txt --viralVerify ./test/assembly_viralVerify.csv --PlasClass ./test/viralVerify_plasmidContigs_plasClass.fasta.probs.out --deepVirFinder ./test/viralVerify_phageContigs_deepVirFinder.txt

(2) To generate classification result of 3CAC based on PPR-Meta solution.

  java Classify3CAC --assembler Flye --output ./test/  --graph ./test/assembly_graph.gfa --path ./test/assembly_info.txt --PPRMeta ./test/assembly_PPRMeta.csv --PlasClass ./test/PPRMeta_plasmidContigs_plasClass.fasta.probs.out --deepVirFinder ./test/PPRMeta_phageContigs_deepVirFinder.txt

Contacts

In case of any questions or suggestions please feel free to contact lianrong.pu@gmail.com

About

License:MIT License


Languages

Language:Java 100.0%