mavenMimic Introduction

I really like how interactive and exploratory MAVEN is, with its (relatively) intuitive interaction between chromatograms (single mass, many scans) and spectra (single scan, many masses). Combined with a TIC, it's easy to poke quickly at the data and get an initial sense of interesting compounds.

However, MAVEN also had a few drawbacks. For one, it's no longer being developed and is technically defunct. Still works great, but has a few shortcomings. Specifically, MAVEN only accepts the old .mzXML file format, doesn't handle MS²⁺ data, and is hard (impossible?) to color-code by treatment condition.

This app mimics MAVEN's functionality while resolving the above issues. It's built on the shiny and plotly libraries in R and critically exploits the event_data() function to pass information back and forth between the chromatograms and the spectra. The app leverages my own code snippets for getting MS data out of weird file types (thanks to the mzR package) and into a simple data frame in R, which can then be filtered, mutated, and generally easily manipulated.

Some considerations! This app is nowhere near as fast as MAVEN - it often takes a half-second or so to render the plots after a new interaction takes place. It's also quite memory-intensive, as the 25 HILIC positive files I've been looking at from the R/V Falkor cruise occupy about a gigabyte of RAM as a data frame object in the R environment. One way to get around this is to use SQL and the RSQLite package to create a database and handle the filtering. This code exists as "oldMavenDBcode.R" but the I/O times made it almost unusable.

Making the data

Hopefully you've already got some mass-spec data to play with. If so, convert it to open-source formats such as .mzML and modify makeRawData.R to point to it. (i.e. modify the lines below to output the files you want to process).

ms_data_dir <- "G:/My Drive/FalkorFactor/mzMLs"
sample_files <- list.files(ms_data_dir, pattern = "Smp|Blk", full.names = TRUE)

There's an additional metadata frame to add treatment information, so if you're sampling different conditions you'll want to append columns to that to include the relevant information. Metadata is attached to the raw data after it's been filtered to reduce sample size, and is done via dplyr's left_join function, merging by fileid number. For the Falkor cruise data that I'm working on, my metadata dataframe looks like this:

fileid	filenames	depth	spindir	time
1	190715_Blk_KM1906U14-Blk_C.mzML	Blank	Blank	Blank
2	190715_Smp_FK180310S62C1-25m_A.mzML	25m	Cyclone	Afternoon
5	190715_Smp_FK180310S62C1-25m_A.mzML	DCM	Cyclone	Afternoon
8	190715_Smp_FK180310S62C1-25m_A.mzML	25m	Cyclone	Morning
11	190715_Smp_FK180310S62C1-25m_A.mzML	DCM	Cyclone	Morning
14	190715_Smp_FK180310S62C1-25m_A.mzML	25m	Anticyclone	Afternoon
17	190715_Smp_FK180310S62C1-25m_A.mzML	DCM	Anticyclone	Afternoon
20	190715_Smp_FK180310S62C1-25m_A.mzML	25m	Anticyclone	Morning
23	190715_Smp_FK180310S62C1-25m_A.mzML	DCM	Anticyclone	Morning

with triplicate samples for each one explaining the missing fileid numbers. Additional columns will appear as coloration options in the app. Three colors are established by default; you'll need to define more if you're getting crazy.

MSMS data is handled separately in the next section of the script, where you'll again need to point to your MSMS data files in .mzML format

makeRawData will produce 3 files from your .mzMLs. The first is simply a .csv file containing your metadata - it's usually a good idea to open this in a spreadsheet GUI and make sure everything looks okay. The next two are files produced by the calls to saveRDS(), which compresses the data to about a quarter of its size in memory. Reading and writing RDS is faster and safer than using the more general save() function. When these files are loaded using loadRDS(), the data should be a simple data frame with 4 columns for MS1 data and 5 columns for MS2 data:

MS1 data (default name: raw_data_table)

fileid	rt	mz	int
1	60.25661	60.04510	41465.727
1	60.25661	60.05813	9636.746
1	60.25661	60.06448	16870.303
1	60.25661	60.99581	7046.725
1	60.25661	61.04038	762803.000
1	60.25661	61.50027	5937.636

If you don't like my extraction method or have better ideas, feel free to do other things: as long as you can point the app to an MS1 data frame that looks like this, then it'll run.

MS2 data (default name: raw_msms_table)

nrg	rt	premz	mz
20	0.4794087	214.0897	1465.727
20	0.4794087	214.0897	636.746
20	0.4794087	214.0897	870.303
20	0.4794087	214.0897	462.725
20	0.4794087	214.0897	62803.000
20	0.4794087	214.0897	372.636

If you don't have MS2 data, it should be easy enough to modify the app to run without it. Comment out the below lines from the UI:

h2("Fragments of the selected ion: "),
plotlyOutput(outputId = "MSMS", height = "300px")

and the below lines from the server:

  output$MSMS <- renderPlotly({
    EIC_data <- event_data(event = "plotly_click", source = "EIC")
    req(EIC_data)
    plotMSMS(mass = current_mass(), ret_time = EIC_data$x, 
             ppm = input$given_ppm, ret_win = 20,
             dataframe = raw_msms_data)
  })

and it'll function nicely without the MS2 data.

Hard restart instructions

All of the data should be 100% reproducible, given access to the initial data files.

Step 1. Run RunMsconvert.cmd command to convert .RAW files from the network drive to .mzMLs in the project directory subfolder mzMLs.

Step 2. Update makeRawData.R to point to the mzML folder and update the metadata.

Step 3. Update app.R to reflect any name changes to the objects in the Data folder.

Step 4. Run app.R.

wkumler / mavenMimic

mavenMimic Introduction

Making the data

MS1 data (default name: raw_data_table)

MS2 data (default name: raw_msms_table)

Hard restart instructions

About

Languages