tux2000 / MASSTplus

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MASST+ Server

https://masst.ucsd.edu/masstplus/

MASST+ is an improvement on GNPS Mass Spectrometry Search Tool (MASST). MASST+ provides fast and error tolerant search of metabolomics mass spectrometry data while reducing the search time by two orders of magnitude. It is capable of querying against databases of billions of mass spectra, which was not feasible with MASST. Like MASST, MASST+ is publicly available as a web service on GNPS.

Using MASST+ Server

With a Spectrum USI

If you know the spectrum USI of a spectrum you want to search with MASST+, you can enter it directly at https://masst.ucsd.edu/masstplus/.

Searching a spectrum in the GNPS library

spectrum

(a) First, navigate to the spectrum of interest on the GNPS library. Here, a Malyngamide C spectrum is viewed. Next, click the "MASST+" link. (c) This opens the MASST+ tab which runs a mass spectral search and presents the results.

Integration with Molecular Networking

molecular networking

(a) Start by submitting a new molecular networking job on GNPS (this will require you to be logged in to a GNPS account). (b) When the job has completed, click "View All Clusters With IDs". (c) This will open a new tab, where you can click "Advanced MASST" and then "MASST+ Search" (or "MASST+ Analog Search") in order to start a new MASST+ search. (d) This will open a new tab for MASST+, where the search results will display after a few seconds.

GNPS Molecular Network

We performed molecular networking (both clustering and spectral networking) using NETWORKING+ on the entirety of GNPS. We stored the results of CLUSTERING+ and PAIRING+ in tsv format.

Clustering+ Results for GNPS

We split the GNPS library into 9 divisions according to different precursor mass ranges and executed CLUSTERING+ on each of them. We provide the cluster information of each spectra and the centers for all clusters.

Cluster information for each spectra

The output is in tsv format. Each row of the tsv output represents a spectra from GNPS library. The columns of the output represent:

  • cluster_idx is a unique ID assigned to each cluster in the division
  • scan is a unique ID assigned to the each spectra in the division
  • mz is the precursor mass of the spectra
  • RTINSECONDS is the retention time of the spectra
  • MSV_source is the MSV library it belongs to
  • Filename is the GNPS source file of this spectra inside MSV library
  • Local_scan is the spectra's scan number inside its GNPS source file

The clustering+ output files for all 9 divisions can be downloaded via the following links:

CLUSTERING+ output for division 0

CLUSTERING+ output for division 1

CLUSTERING+ output for division 2

CLUSTERING+ output for division 3

CLUSTERING+ output for division 4

CLUSTERING+ output for division 5

CLUSTERING+ output for division 6

CLUSTERING+ output for division 7

CLUSTERING+ output for division 8

Cluster centers

We write the representative spectrum of each cluster_idx into a mgf file for each division.

The representative spectra for each cluster contains:

  • CLUSTERINDEX is the cluster index in the division
  • CLUSTERSIZE is the number of spectra in the cluster
  • MSV_LIB is the source MSV library of the representative spectra
  • FILENAME is the source GNPS file of the representative spectra
  • LOCAL_SCAN is the scan number of the representative spectra inside the GNPS source file
  • PEPMASS is the percursor mass of the representative spectra
  • RTINSECONDS is the retention time of the representative spectra
BEGIN IONS
CLUSTERINDEX=9
CLUSTERSIZE=282
MSV_LIB=MSV000083789
FILENAME=pos_Cd10MYY_33.mgf
LOCAL_SCAN=1069
PEPMASS=53.0051
RTINSECONDS=336.875
31.991 36
38.0024 36
38.0076 36
49.9917 111
51.9917 36
52.8466 75
53.0038 1338
53.0203 40
67.9882 72
END IONS

The spectrum file for each division can be downloaded via the following links:

cluster centers for division 0

cluster centers for division 1

cluster centers for division 2

cluster centers for division 3

cluster centers for division 4

cluster centers for division 5

cluster centers for division 6

cluster centers for division 7

cluster centers for division 8

PAIRING+ Results for GNPS

We apply PAIRING+ to the clusters resulting from CLUSTERING+ to compute the molecular network. The network is stored in two files

Network Nodes

The first output file stores general information for the nodes of the GNPS molecular network in tsv format. The network contains over 8M nodes (total number of non-singleton clusters resulting from CLUSTERING+) in total. Each row of the tsv output represents a node in the network. The columns of the output represent:

  • scan_number_among_centers is a unique ID assigned to each cluster in the network
  • component_index is a unique ID assigned to the each connected component in the network
  • source_division is the division this cluster came from (ranges from division0 to division8)
  • cluster_index_in_division is the index of this cluster in its source division
  • cluster_size is the size of the cluster
  • center_MSV is the source MSV library of the representative spectra
  • center_source_file is the source GNPS file of the representative spectra
  • center_scan_in_source_file is the scan number of the representative spectra inside the GNPS source file
  • center_pepmass is the percursor mass of the representative spectra
  • center_RT is the retention time of the representative spectra

Network Edges

The second output file stores general information for the edges of the GNPS molecular network in tsv format. Each row of the tsv output represents an edge in the network. The columns of the output represent:

  • connected_component_index is a unique ID assigned to each connected component in the network
  • first_center_scan_number is a unique ID assigned to the each connected node in the network
  • second_center_scan_number is a unique ID assigned to the each connected node in the network
  • product is the similarity dot-product between the two nodes
  • product_shared is the contribution of shared peak matches in the similarity score
  • product_shifted is the contribution of shifted peak matches in the similarity score

About


Languages

Language:Shell 68.5%Language:CMake 31.5%