MASST+ Server

https://masst.ucsd.edu/masstplus/

MASST+ is an improvement on GNPS Mass Spectrometry Search Tool (MASST). MASST+ provides fast and error tolerant search of metabolomics mass spectrometry data while reducing the search time by two orders of magnitude. It is capable of querying against databases of billions of mass spectra, which was not feasible with MASST. Like MASST, MASST+ is publicly available as a web service on GNPS.

Using MASST+ Server

With a Spectrum USI

If you know the spectrum USI of a spectrum you want to search with MASST+, you can enter it directly at https://masst.ucsd.edu/masstplus/.

Searching a spectrum in the GNPS library

(a) First, navigate to the spectrum of interest on the GNPS library. Here, a Malyngamide C spectrum is viewed. Next, click the "MASST+" link. (c) This opens the MASST+ tab which runs a mass spectral search and presents the results.

Integration with Molecular Networking

(a) Start by submitting a new molecular networking job on GNPS (this will require you to be logged in to a GNPS account). (b) When the job has completed, click "View All Clusters With IDs". (c) This will open a new tab, where you can click "Advanced MASST" and then "MASST+ Search" (or "MASST+ Analog Search") in order to start a new MASST+ search. (d) This will open a new tab for MASST+, where the search results will display after a few seconds.

GNPS Molecular Network

We performed molecular networking (both clustering and spectral networking) using NETWORKING+ on the entirety of GNPS. We stored the results of CLUSTERING+ and PAIRING+ in tsv format.

Clustering+ Results for GNPS

We split the GNPS library into 9 divisions according to different precursor mass ranges and executed CLUSTERING+ on each of them. We provide the cluster information of each spectra and the centers for all clusters.

Cluster information for each spectra

The output is in tsv format. Each row of the tsv output represents a spectra from GNPS library. The columns of the output represent:

cluster_idx is a unique ID assigned to each cluster in the division
scan is a unique ID assigned to the each spectra in the division
mz is the precursor mass of the spectra
RTINSECONDS is the retention time of the spectra
MSV_source is the MSV library it belongs to
Filename is the GNPS source file of this spectra inside MSV library
Local_scan is the spectra's scan number inside its GNPS source file

The clustering+ output files for all 9 divisions can be downloaded via the following links:

CLUSTERING+ output for division 0

CLUSTERING+ output for division 1

CLUSTERING+ output for division 2

CLUSTERING+ output for division 3

CLUSTERING+ output for division 4

CLUSTERING+ output for division 5

CLUSTERING+ output for division 6

CLUSTERING+ output for division 7

CLUSTERING+ output for division 8

Cluster centers

We write the representative spectrum of each cluster_idx into a mgf file for each division.

The representative spectra for each cluster contains:

CLUSTERINDEX is the cluster index in the division
CLUSTERSIZE is the number of spectra in the cluster
MSV_LIB is the source MSV library of the representative spectra
FILENAME is the source GNPS file of the representative spectra
LOCAL_SCAN is the scan number of the representative spectra inside the GNPS source file
PEPMASS is the percursor mass of the representative spectra
RTINSECONDS is the retention time of the representative spectra

BEGIN IONS
CLUSTERINDEX=9
CLUSTERSIZE=282
MSV_LIB=MSV000083789
FILENAME=pos_Cd10MYY_33.mgf
LOCAL_SCAN=1069
PEPMASS=53.0051
RTINSECONDS=336.875
31.991 36
38.0024 36
38.0076 36
49.9917 111
51.9917 36
52.8466 75
53.0038 1338
53.0203 40
67.9882 72
END IONS

The spectrum file for each division can be downloaded via the following links:

cluster centers for division 0

cluster centers for division 1

cluster centers for division 2

cluster centers for division 3

cluster centers for division 4

cluster centers for division 5

cluster centers for division 6

cluster centers for division 7

cluster centers for division 8

PAIRING+ Results for GNPS

We apply PAIRING+ to the clusters resulting from CLUSTERING+ to compute the molecular network. The network is stored in two files

Network Nodes

The first output file stores general information for the nodes of the GNPS molecular network in tsv format. The network contains over 8M nodes (total number of non-singleton clusters resulting from CLUSTERING+) in total. Each row of the tsv output represents a node in the network. The columns of the output represent:

scan_number_among_centers is a unique ID assigned to each cluster in the network
component_index is a unique ID assigned to the each connected component in the network
source_division is the division this cluster came from (ranges from division0 to division8)
cluster_index_in_division is the index of this cluster in its source division
cluster_size is the size of the cluster
center_MSV is the source MSV library of the representative spectra
center_source_file is the source GNPS file of the representative spectra
center_scan_in_source_file is the scan number of the representative spectra inside the GNPS source file
center_pepmass is the percursor mass of the representative spectra
center_RT is the retention time of the representative spectra

Network Edges

The second output file stores general information for the edges of the GNPS molecular network in tsv format. Each row of the tsv output represents an edge in the network. The columns of the output represent:

connected_component_index is a unique ID assigned to each connected component in the network
first_center_scan_number is a unique ID assigned to the each connected node in the network
second_center_scan_number is a unique ID assigned to the each connected node in the network
product is the similarity dot-product between the two nodes
product_shared is the contribution of shared peak matches in the similarity score
product_shifted is the contribution of shifted peak matches in the similarity score

tux2000 / MASSTplus