asalt / mokapot-analyses

The code for reproducing the results from "mokapot: Fast and flexible semi-supervised learning for peptide detection"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Code for evaluating mokapot

This repository contains the code for reproducing the results from "mokapot: Fast and flexible semi-supervised learning for peptide detection."

Reproducing the manuscript

The provided can fully reproduce the figures and analyses presented in the manuscript, provided that the necessary software are installed and data are present. Additionally, some analyses (such as the benchmarking experiments) will provide different results depending on the hardware they are run on.

Requirements

Operating System: Our code was written for CentOS7 Linux machines, but should be compatible other Linux distributions as well. The code is unlikely to work on Windows and may need slight changes for MacOS.

Hardware: To most accurately reproduce our results, a 12-core machine with a minimum of 32 Gb of memory should be used.

Installed Software: The analysis scripts are written for Python 3.7+. Many of the software tools are installed automatically when executing the analysis, however some need to be installed beforehand:

You can then check that these have been successfully installed and configured with the following commands (the example output is from my machine, but yours may be slightly different).

Verify Python Version and configured:

$ python3 --version
Python 3.8.6

Verify conda is installed and configured:

$ conda --version
conda 4.9.2

Verify Crux is installed and configured:

$ crux version
INFO: Beginning version.
====================
Crux version 3.2-9d35092f
====================
Proteowizard version 3.0.20213
====================
Percolator version 3.05.nightly-1-e16f49a-dirty, Build Date Jul 30 2020 22:14:27
Copyright (c) 2006-9 University of Washington. All rights reserved.
Written by Lukas Käll (lukall@u.washington.edu) in the
Department of Genome Sciences at the University of Washington.
====================
Comet version 2019.0X rev. X
====================
Boost version 1_67
====================
INFO: Elapsed time: 0.000323 s
INFO: Finished crux version.
INFO: Return Code:0

Verify that MSFragger is installed and configured (your path may be different):

$java -jar ~/bin/MSFragger-3.1.1/MSFragger-3.1.1.jar --version
MSFragger version MSFragger-3.1.1
Batmass-IO version 1.19.5
timsdata library version timsdata-2-7-0
(c) University of Michigan
RawFileReader reading tool. Copyright (c) 2016 by Thermo Fisher Scientific, Inc. All rights reserved.
System OS: Linux, Architecture: amd64
Java Info: 14.0.2, OpenJDK 64-Bit Server VM, Red Hat, Inc.

Finally, you'll need to specify the path to the MSFragger jar file:

export MSFRAGGER=~/bin/MSFragger-3.1.1/MSFragger-3.1.1.jar

Running the Analyses

Running the analyses is easy, but will potentially take days. First, we can use GNU make to install the prerequisite packages into a new conda environment:

$ make install && conda activate mokapot

Then the analyses can be run simply with:

$ make

Results

Once complete, all of the figures will be present in the figures directory.

Questions?

If you have problems or questions, feel free to ask Will Fondrie (wfondrie@uw.edu).

About

The code for reproducing the results from "mokapot: Fast and flexible semi-supervised learning for peptide detection"

License:MIT License


Languages

Language:Python 54.2%Language:Jupyter Notebook 42.2%Language:Makefile 2.9%Language:Shell 0.8%