borenstein-lab / IFDP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IFDP

The inferred fiber degradtion computational framework couples metagenomic sequencing with careful annotation of polysaccharide degrading enzymes and DFs structures has been established to allow the in-depth characterization of the microbiome ability to degrade and breakdown dietary fibers.

The simple framework allows to generate the Inferred Fiber Degradtion Profile (IFDP) in a single command necessitating the database (pre-built or re-built with a different version of diamond) fasta (or fastq) file.

Dependencies:

  • Diamond (database was built with v0.9.9, yet the database can be re-built using the database attached)
  • Numpy
  • Pandas

Installation

1.Install dependencies:

conda install numpy pandas diamond ## install depandicies

2.Download the repo:

git clone https://github.com/borenstein-lab/IFDP.git

3.Extract the databaase and build it with diamond:

gunzip ec_full.fasta.gz;
diamond makedb --in ec_full.fasta -d ec_full

4.Test the installation by running this example:

run_sample.sh -d ec_full.dmnd -i GCF_002075875.1_Bbif1898B_genomic.fna -o output
  1. Run the pipeline using a simple one line command:
run_sample.sh -d [DATABASE] -i [INPUT] -o [OUTPUT]
  1. If you would like to run the pipeline from any directory, add this line to your .bashrc file or run it before running the pipeline:
export PATH=$PATH:/home/labs/elinav/yotamco/IFDP2/

Tutorial output and testing

For easy testing of the framework, we have uplaoded three genomes (which are relatively small in size, memory and run time requirements) as simple use cases. To run any of the genomes just use this command, while changing the genome file name.

run_sample.sh -d ec_full.dmnd -i GCF_002075875.1_Bbif1898B_genomic.fna -o output

In order to run and explore the results, a user must specify the database he wishes to use the input fasta/fastq file and an output name for the diamond output.

run_sample.sh -d [DATABASE] -i [INPUT] -o [OUTPUT]

you can also specify the amount of threads using -p argument.

Three outputs will be visible following the completion of the run:

[OUTPUT] - The diamond mapping output file image

[OUTPUT]_counts - The enzyme counts image

[OUTPUT]_IFDP - The IFDP profile image

About


Languages

Language:Python 68.5%Language:Shell 31.5%