MiFish
This is the command line version of MiFish pipeline. It can also be used with any other eDNA meta-barcoding primers
Announcement
In the next release (probably in June or July), we will replace FLASH and USEARCH with VSEARCH to suit non-MiFish primers' sequencing data better.
References
If you use MiFish Pipeline in your projects, please cite:
- Zhu T, Sato Y, Sado T, Miya M, and Iwasaki W. 2023. MitoFish, MitoAnnotator, and MiFish Pipeline: Updates in ten years. Mol Biol Evol, 40:msad035. https://doi.org/10.1093/molbev/msad035
- Sato Y, Miya M, Fukunaga T, Sado T, Iwasaki W. 2018. MitoFish and MiFish Pipeline: A Mitochondrial Genome Database of Fish with an Analysis Pipeline for Environmental DNA Metabarcoding. Mol Biol Evol 35:1553-1555.
- Iwasaki W, Fukunaga T, Isagozawa R, Yamada K, Maeda Y, Satoh TP, Sado T, Mabuchi K, Takeshima H, Miya M, et al. 2013. MitoFish and MitoAnnotator: a mitochondrial genome database of fish with an accurate and automatic annotation pipeline. Mol Biol Evol 30:2531-2540.
If you use MiFish Primers in your projects, please cite:
- Miya M, Sato Y, Fukunaga T, Sado T, Poulsen JY, Sato K, Minamoto T, Yamamoto S, Yamanaka H, Araki H, et al. 2015. MiFish, a set of universal PCR primers for metabarcoding environmental DNA from fishes: detection of more than 230 subtropical marine species. R Soc Open Sci 2:150088.
Install
Currently we only support Linux. Please use conda to manage the environment. If you do not have a Linux OS, or you just want to have a quick look, you can try the Docker version
External Dependencies
Add these softwares to your system PATH. You can download all the external executable files here(except for MAFFT), or compile by yourself.
- fastp (v0.23.2)
- FLASH (v1.2.7)
- seqkit (v2.3.0)
- usearch (v11.0.667)
- NCBI BLAST+ (v2.9.0)
- MAFFT (v7.505)
- Gblocks (v0.91b)
- FastTreeMP (v2.1.11)
Install Steps
$ conda create -n MiFish python==3.9.13
$ conda activate MiFish
$ pip3 install numpy==1.23.1
$ pip3 install scikit-bio==0.5.6
$ pip3 install PyQt5==5.15.7
$ pip3 install ete3==3.1.2
$ pip3 install duckdb==0.6.1
$ pip3 install XlsxWriter==3.0.3
$ pip3 install cutadapt==4.1
$ pip3 install biopython==1.79
$ git clone https://github.com/billzt/MiFish.git
$ cd MiFish
$ python3 setup.py develop
$ mifish -h
In Ubuntu, the following library is also needed.
sudo apt-get install -y libgl1
Test
$ cd test
$ mifish seq mifishdbv3.83.fa -d seq2
There are six files in the result directory MiFishResult
Parameters
Mandatory
$ mifish /path/to/your/amplicon/sequencing/directory/ /path/to/your/ref/db/
Directory for amplicon sequencing data (FASTQ/FASTA)
Since MiFish supports multi-sample analysis, amplicon sequencing data in compressed FASTQ/FASTA format should be put in directories. Pass the path of the directory as the first parameter. Refer to MiFish's Homepage to see the rules of filenames. Here are some examples:
MiFish-example-02_S73_L001_R1_001.fastq.gz
MiFish-example-02_S73_L001_R2_001.fastq.gz
DRR126155_1.fastq.bz2
DRR126155_2.fastq.bz2
mydata.1.fq.xz
mydata.2.fq.xz
RefDB of your metabarcoding primers
Prepare your RefDB in FASTA format and index it using the makeblastdb
from NCBI BLAST+. RefDB for an old version of MiFish is in test/mifishdbv3.83.fa
The head line of RefDB (FASTA) follows this rule:
gb|accessionID|species_scientific_name
Replace blanks with underscores in the species name. Here are examples.
>gb|LC021149|Ostorhinchus_angustatus
CACCGCGGTTATACGAGAGGCCCAAGCTGACAATCACCGGCGTAAAGAGTGGTTAATGAC
CCCACAATAATAAAGTCGAACATCTCCAAAGTTGTTGAACACATTCGAAGATATGAAGCT
CTACCACGAAAGTGACTTTACACTCTTTGAACCCACGAAAGCTAGGAAA
>gb|LC579122|Ostorhinchus_angustatus
CACCGCGGTTATACGAGGGGCCCAAGCTGACAATCACCGGCGTAAAGAGTGGTTAATAAC
CCCACAATAATAAAGTCGAACATCTCCAAAGTTGTTGAACACATTCGAAGATATGAAGCT
CTACCACGAAAGTGACTTTACACTCTTTGAACCCACGAAAGCTAGGAAA
>gb|LC717543|Trachidermus_fasciatus
CACCGCGGTTATACGAGAGACTCAAGCTGACAAACACCGGCGTAAAGCGTGGTTAAGCTA
AAAATTTGCTAAAGTCAAACACCTTCAAGACTGTTATACGTACCCGAAGGCAGGAAGCAC
AACCACGAAAGTGACTTTAACTAAGCTGAATCCACGAAAGCTAAGGAA
accessionID can be any unique strings. Primers were trimmed off from the sequences.
❗️ )
Optional (importantFollowing optional parameters are designed for MiFish metabarcoding primers. If running with other eDNA primers, change them to satisfy your own primers.
Length filtering
-m MIN_READ_LEN, --min-read-len MIN_READ_LEN
Minimum read length(bp) (default: 204)
-M MAX_READ_LEN, --max-read-len MAX_READ_LEN
Maximum read length(bp) (default: 254)
The range of amplicon lengths (including primers). Adjust them to satisfy your own primers. You can estimate the range of from your reference database file.
Primer sequences
-f PRIMER_FWD, --primer-fwd PRIMER_FWD
forward sequence of primer (5->3) (default: GTCGGTAAAACTCGTGCCAGC)
-r PRIMER_REV, --primer-rev PRIMER_REV
reverse sequence of primer (5->3) (default: CATAGTGGGGTATCTAATCCCAGTTTG)
change them according to your own primers
Optional
Following optional parameters are designed for all metabarcoding primers.
Group samples
-d OTHER_DATA_DIR, --other-data-dir OTHER_DATA_DIR
other directory of the amplicon sequencing data file (FASTQ/FASTA). Can specify multiple times. Each directory is considered as a group (default: None)
If your samples are in multiple groups, please arrange them in different directories and use the -d
parameter for multiple times. e.g. -d 2nd_group_dir -d 3rd_group_dir
Threshold of BLASTN identity
-i BLAST_MIN_IDENTITY, --blast-min-identity BLAST_MIN_IDENTITY
Minimum identity (percentage) for filtering BLASTN results (default: 97.0)
Threshold of UNOISE3
-u UNOISE_MIN, --unoise-min UNOISE_MIN
value for the -minsize option in UNOISE3 (default: 8)
Decrease this value would get higher sensitivity but lower accuracy.
Skip downstream analysis
-s, --skip-downstream-analysis
Skip abandance statics, phylogenetic and bio-diversity analysis (default: False)
Turn on this option if you only want to get taxonomy identification results and do not need other analysis.
Output directory
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
directory for output (default: .)
Default is putting MiFishResult
under your current directory. If you specify another directory /path/dir/
, it will put results into /path/dir/MiFishResult
Number of threads
-t THREADS, --threads THREADS
number of threads for BLASTN and usearch (default: 2)
Pass to external programs such as usearch
Keep temporary files
-k, --keep-tmp-files Keep temporary files (default: False)
Useful for debug. If you encountered problems, turn it on and share me the Sample-*
directory in the MiFishResult
directory.
Results
There are six files in the MiFishResult
directory.
QC.zip
read_stat.xlsx
taxonomy.xlsx
tree.zip (if not using -s)
relative_abandance.json (if not using -s)
diversity.json (if not using -s but using -d)
The first four files are the same as the web version of MiFish. (Screenshots were from DRR126155 against refDB v3.83)
An example on using other eDNA primers
See Riaz
Tips
- Please make sure that in a FASTQ/FASTA file, names of reads should start with an identitcal word, such as:
@DRR231392.1
@DRR231392.2
@DRR231392.3
Otherwise usearch
cannot work properly.