briankarlberg / MuSIC

Multi-Scale Integrated Cell

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The Multi-Scale Integrated Cell (MuSIC)

MuSIC is a hierarchical map of human cell architecture created from integrating immunofluorescence images in the Human Protein Atlas with affinity purification experiments from the BioPlex resource. Integration involves configuring each approach to produce a general measure of protein distance, then calibrating the two measures using machine learning.

Web-based exploration of comprehensive information for MuSIC is available at: https://nrnb.org/music/.

A Step-By-Step Guide to Building a MuSIC Map

If you find MuSIC helpful for your research, please cite Qin et al., “A multi-scale map of cell structure fusing protein images and interactions”.

Set up an environment for MuSIC

  1. Requirements

    • Anaconda (optional but highly recommended)
    • APT packages including build-essential, python-dev, libxml2, libxml2-dev, zlib1g-dev, libigraph0-dev, libmpc-dev
  2. Create an Anaconda virtual environment. This is optional but highly recommended. Takes ~10 minutes.

conda create -n music python=3.6.2 anaconda
source activate music
  1. Download MuSIC and install dependencies.
git clone https://github.com/idekerlab/MuSIC.git
cd MuSIC
pip install -r ./installation/requirements.txt
  1. Install hierarchy building softwares, CliXO v1.0 and DDOT, by running the following command line.
./installation/install.sh

MuSIC pipeline execution

  1. Test of the pipeline: run the toy_example.sh bash script to execute MuSIC pipeline for a toy example including 100 proteins with random embeddings.
./toy_example.sh
  • If getting errors like "no modules named tqdm" or "no modules named dill", try reactivate environment with the following command lines:
    • # reactivate environment
      conda deactivate
      source activate music
      
      # run toy example script again
      ./toy_example.sh
      
  • toy_example.sh runs a series of python scripts to infer a hierarchy for the query proteins using random embeddings. The resulting hierarchy is stored in two output files. Details about the file format can be found here.
    # Output: hierarchical relationship among systems and genes
    head ./Examples/toy_output/toy.louvain.ddot
    
         # column 1: the parent system
         # column 2: the child system or gene
         # column 3: property of child in the second column (default indicates column 2 is a system, gene indicates column 2 is a gene)
    
    # Output: specific protein assignment for each identified system
    head ./Examples/toy_output/toy.louvain.termStats
    
         # column 1: unique identifier for each system
         # column 2 (Number_of_proteins): total number of proteins belonging to the system
         # column 3 (Proteins): comma separated list of proteins belonging to the system
         # column 4 (median_recal_nm): median of predicted distance, in nm, among all pairs of proteins in the system
         # column 5 (Estimated_size_in_nm): predicted size, in nm, of the system
    
  1. To run the MuSIC pipeline for user-specified input (proteins), follow the steps detailed in the following document: A Step-By-Step Guide to Building a MuSIC Map

    Command lines for building MuSIC v1 map as presented in the paper are provided both in a bash script file (Accompanying Bash Script to Build MuSIC v1.0 (example_buid_music_v1.sh)) and as a jupyter notebook (Accompanying Jupyter Notebook to Build MuSIC v1.0).

    • The CliXO version used in the original study is provided in CliXO_MuSIC.zip. For downloading the latest CliXO version, please follow installation instructions detailed previously. In comparison to CliXO used in the original study, the latest CliXO is faster but could yield suboptimal clusters.

About

Multi-Scale Integrated Cell

License:MIT License


Languages

Language:Python 54.9%Language:Jupyter Notebook 42.8%Language:Shell 2.3%