Mass23 / Viral-ecology

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

1. OTUs calling

1.1 Database of assembled viromes

Two datasets were used:

1.2 Merge with contigs

cat /home/fodelian/Desktop/ViralGenomes/assembly_db/refseq_viral_genomes.fasta \
    /home/fodelian/Desktop/ViralGenomes/assembly_db/mVGs_sequences_v2.fasta  \
    /home/fodelian/Desktop/ViralGenomes/SNG/SNG_contigs.fasta  \
    /home/fodelian/Desktop/ViralGenomes/VDN/VDN_contigs.fasta  \
    /home/fodelian/Desktop/ViralGenomes/VEV/VEV_contigs.fasta  \
     > raw_db_ctgs.fasta

1.3 Find 95% identity centroids

Vsearch: https://github.com/torognes/vsearch

vsearch --cluster_fast raw_db_ctgs.fasta --consout 95_database.fasta --id 0.95 --iddef 0 --maxseqlength 3000000 --threads 6 --usersort

1.4 Mapping

Script: preprocess.py

Reads from the 214 bulk soil metagenomes were quality trimmed using Trimmomatic v0.3635 and then paired reads were mapped to the viral contig database with Bowtie236, using default parameters. The output bam files were passed to BamM ‘filter’ v1.7.2 (http://ecogenomics.github.io/BamM/, accessed 15 December 2015) and reads that were aligned over ≥90% of their length at ≥95% > nucleic acid identity were retained.

First, we need to merge the files per sample:

cat SNG1_R1.fq.gz SNG2_R1.fq.gz > SNG_R1.fq.gz
cat VDN1_R1.fq.gz VDN2_R1.fq.gz > VDN_R1.fq.gz
cat VEV1_R1.fq.gz VEV2_R1.fq.gz > VEV_R1.fq.gz

cat SNG1_R2.fq.gz SNG2_R2.fq.gz > SNG_R2.fq.gz
cat VDN1_R2.fq.gz VDN2_R2.fq.gz > VDN_R2.fq.gz
cat VEV1_R2.fq.gz VEV2_R2.fq.gz > VEV_R2.fq.gz
  1. Trimming: Trimmomatic
  1. Mapping: BWA

1.5 Filtering

Bam filter: BamM 'filter

2. Analysis

References:

About


Languages

Language:Python 89.0%Language:Shell 11.0%