amkram / shusher

Private, browser-based placement of genome sequences on phylogenetic trees using UShER.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

private (Shh 🤫) Ultrafast Sample placement on Existing tRees

License: AGPL v3 Integration Tests

🖱️ Access ShUShER here!
ShUShER is a browser tool for placing sensitive genome sequences on phylogenetic trees using UShER.

Usage | How it works | Installation

Contents

Usage

⚠️ This tool is intended to be used only for sequences that cannot be shared publicly. If you do not have this requirement, please use the UShER web tool and submit your sequences to an INSDC member institution (NCBI, EMBL-EBI, or DDBJ) and GISAID

ShUShER is currently designed for use with SARS-CoV-2 genomes. The user supplies a set of samples in FASTA or VCF format, and the provided samples are placed on a continuously growing global tree (read more). After placement, subtrees containing user samples can be visualized (using Auspice).

Loading samples

Samples can be provided to ShUShER in either FASTA (.fa, .fasta, .fna) or VCF format (.vcf).

All samples must be in a single file. When you load your samples into ShUShER, they will not be uploaded to our servers and the data will remain on your computer.

Running UShER

After loading your samples, two input fields will appear:

The first field selects the existing tree to place your samples on. Currently, the only option is the global SARS-CoV-2 tree (maintained here).

After UShER places your samples on the global tree, it will output subtrees containing your samples. The second field allows you to select how many closely-related samples from the global tree to include in each subtree.

Interpreting results

After UShER has finished running, a table of information about your samples will be displayed.

Two numbers are reported for each sample:

Number of maximally parsimonious placements is the number of potential placements in the tree with minimal parsimony score. A higher number indicates a less confident placement.

Parsimony score is the number of mutations/changes that must be added to the tree when placing this sample. The higher the number, the more diverged the sample.

Visualizing subtrees

Each sample in the table will have a button, e.g. allowing you to open the subtree containing that sample in Auspice. The subtree visualization will open in a new browser tab (but data is not sent over the Internet).

Downloading data

Newick files for each of the generated subtrees can be downloaded at the bottom of the Auspice visualization page.

How it works

The ShUShER web app uses a ported version of UShER that can be run client-side in a web browser. The original C++ code base is compiled to WebAssembly with Emscripten and wrapped in a React frontend (read more about the port here). User-provided samples are not transmitted across the Internet, and computation is performed locally in the browser. We use a modified version of Auspice to display the subtrees computed by UShER. The visualization opens in a new browser tab, using localStorage to share data between tabs without transmitting any user data over the web.

FASTA to VCF conversion is performed by aligning each provided sample pairwise to the reference SARS-CoV-2 genome. The implementation of pairwise alignment is from Nextclade.

Installation (for developers)

SHUShER currently only supports building on Linux systems, and has been tested on Ubuntu 20.04

If you would like to run ShUShER locally or modify the source, first download the source code, e.g.:

wget https://github.com/amkram/shusher/archive/refs/tags/latest.tar.gz

tar xvzf latest.tar.gz

The above command will download the latest tagged release of ShUShER. View all "Releases" in the right sidebar if you want to download a specific version. Alternately, cloning this repository will give you the latest, unreleased code, but may be unstable.

The downloaded source code contains code for building both the web app and the UShER port.

Running the web app locally

Enter the web-app subdirectory and run

npm install

To build the app, run

npm run build

And to start the local server, run

npm start

You should now be able to access ShUShER in your browser at localhost:4000

Compiling UShER to WebAssembly

The directory usher-port contains the original C++ UShER code and a script that will compile it to WebAssembly. You only need to compile UShER yourself if you want to change the UShER source code. Otherwise, the web app will automatically use the most recent pre-compiled release from this repository.

1. Install Dependencies

sudo apt-get update

sudo apt-get install wget python3 build-essential cmake protobuf-compiler dh-autoreconf

2. Compile UShER

./installUbuntuWeb.sh

This script will download the C++ library dependencies of UShER, make some modifications necessary for WebAssembly compilation, and then compile them using emscripten. Output in the build directory includes usher.wasm, usher.js, usher.data, and usher.worker.js, all of which are used by the ShUShER web app.

3. Specify custom UShER code

By default, the web app grabs the latest tagged release of the WebAssembly UShER bundle from this repository. If you compiled UShER yourself using the above steps, you can tell ShUShER to use your compiled code instead.

In the web-app subdirectory, edit package.json and change the following line:

config: {
  usherBundle: "latest"
}

to

config: {
  usherBundle: "[path to build output]"
}

Contributing

We welcome and encourage contributions to ShUShER from the community. If you would like to contribute, please read the contribution guidelines and code of conduct.

About this repository

usher-port contains the scripts and files needed to compile UShER to WebAssembly. See here for details on the process.

web-app contains the React application that uses the UShER port.

Twice a day, the UShER C++ source hosted in this repository is updated from the main UShER repository.

Upon each push to the master branch, the integration test Github Action is run, which (1) compiles the latest source from the main UShER repo to a binary executable, (2) compiles UShER to WebAssembly with this repo's latest code, and (3) runs both on a sample file and compares the outputs, ensuring they are the same.

New releases are tagged periodically and pushed to the live web app.

Acknowledgements

This project uses or adapts code from several open-source projects. We are grateful for their contributions.

Pairwise sequence alignment uses the implementation from Nextclade.

Visualization of subtrees is performed with Auspice.

Scripts to modify the Auspice server are from auspice.us.

Nextclade, Auspice, and auspice.us are part of the Nextstrain project.

The core functionality of this tool is a ported version of UShER.

About

Private, browser-based placement of genome sequences on phylogenetic trees using UShER.

License:GNU Affero General Public License v3.0


Languages

Language:JavaScript 79.3%Language:C++ 7.8%Language:Shell 6.7%Language:CMake 2.8%Language:TeX 2.1%Language:Dockerfile 0.6%Language:CSS 0.5%Language:Python 0.2%