pranjalpruthi / bhedi

βHΞDI (Biomarker-based Heuristic Engine for Dengue Identification) is a computational tool designed for the identification of Dengue virus serotypes in wastewater next-generation sequencing data.

Home Page:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


Go Python Matplotlib NumPy Pandas Plotly CMake Docker Swagger Streamlit App


βHΞDI (Biomarker-based Heuristic Engine for Dengue Identification) is a computational tool designed for the identification of Dengue virus serotypes in wastewater next-generation sequencing data. It leverages specific genomic fragments, referred to as sankets, to detect sequences associated with the Dengue virus. This repository contains the command-line interface (CLI) and API for processing FASTQ files and identifying Dengue virus serotypes.




  • Go (1.15 or later)
  • SeqKit

Installing SeqKit

SeqKit must be installed as a prerequisite. You can install SeqKit by following the instructions on its GitHub repository: SeqKit GitHub.

Setting Up the BHEDI CLI Tool

  1. Clone the repository:
   git clone
  1. Navigate to the cloned directory:
   cd bhedi
  1. Build the CLI tool:
   go build -o bhedi-cli


CLI Tool

To process a FASTQ file and generate a Parquet file with the analysis results, run:

./bhedi-cli -i <input_dir> -o <output_dir>


Replace <input_dir> with the directory containing your FASTQ files and <output_dir> with the directory where you want the results to be saved.


To start the API server, run:

go run api/main.go

The API will be available at http://localhost:3000.



CLI Dependencies

  • Standard Library Packages: bufio, encoding/csv, flag, fmt, io, log, math, os, os/exec, path/filepath, strconv, strings, sync
  • Third-Party Packages:,,,,

API Dependencies

  • Standard Library Packages: Same as CLI, minus flag
  • Third-Party Packages:,,, plus all third-party packages listed under CLI Dependencies


  • Ensure seqkit is installed and accessible in your system's PATH.
  • Manage dependencies using Go modules (go.mod and go.sum) for reproducible builds.
  • The API component requires the Fiber web framework and its middleware for CORS and logging.

Use SimP to Plot reports from βHΞDI-CLI

SimP Tool


SimP (Simple Plotter) is a visualization tool designed to plot data processed by the βHΞDI CLI tool. It leverages Python libraries such as Pandas, Dask, HoloViews, and Plotly to generate insightful plots from Parquet files containing analysis results of Dengue virus serotypes in wastewater next-generation sequencing data. SimP supports various plot types including GC percentage box plots, serotype frequency heatmaps, and B score distributions.



  • Python 3.10 or later
  • Conda or virtualenv (recommended for managing Python packages)


SimP requires the following Python packages:

  • pandas
  • dask
  • holoviews
  • plotly
  • argparse
  • numpy

You can install these dependencies using pip:

pip install pandas dask holoviews plotly argparse numpy

Or, if you prefer using Conda or Mamba, you can create a new environment and install the required packages:

conda create -n simp_env python=3.10 pandas dask holoviews plotly numpy
conda activate simp_env
mamba create -n simp_env python=3.10 pandas dask holoviews plotly numpy
mamba activate simp_env

Installing SimP

Currently, SimP is provided as a Python script ( Ensure you have the required dependencies installed in your environment before running the script.


To use SimP for plotting, you need to specify the input directory containing the Parquet files processed by βHΞDI CLI and the output directory where the plots will be saved.

python -i <input_dir> -o <output_dir>

Replace <input_dir> with the directory containing your Parquet files and <output_dir> with the directory where you want the plots to be saved.


Assuming you have Parquet files in /path/to/parquet_files and you want to save the plots in /path/to/plots, run:

python -i /path/to/parquet_files -o /path/to/plots

This will generate various plots such as GC percentage box plots, serotype frequency heatmaps, and B score distributions, and save them as HTML files in the specified output directory.

Running on High-Performance Computing Clusters

SimP can also be run on HPC clusters using SLURM. Here's an example SLURM script:

#SBATCH --job-name=SimP
#SBATCH --output=./log/SimP%j.out
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --mem=16GB
#SBATCH --partition=short

# Activate your Conda environment or Python virtual environment
conda activate simp_env

# Run SimP
time python -i /path/to/parquet_files -o /path/to/plots


Adjust the SLURM parameters according to your cluster's configuration and your job's requirements.


Contributions to the βHΞDI project are welcome. Please refer to the file for guidelines on how to contribute.


This project is licensed under the AGPLv3 License - see the LICENSE file for details.


βHΞDI (Biomarker-based Heuristic Engine for Dengue Identification) is a computational tool designed for the identification of Dengue virus serotypes in wastewater next-generation sequencing data.

License:GNU Affero General Public License v3.0


Language:Go 67.2%Language:Python 32.8%