nikodallanoce / ComputationalHealthLaboratory

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ComputationalHealthLaboratory

Computational Health Laboratory project for the a.y. 2021/2022.

Group Members

Gene Set Pathway Merging and Analysis

Starting from one or more genes, extract from interaction databases the genes they interact with. Using the expanded gene set, perform pathway analysis and obtain all disease pathways in which the genes appear. Merge the pathways to obtain a larger graph. Perform further network analysis to extract central biomarkers and communities beyond pathways. Compute a distance between the initial gene set and the various pathways (diseases).

Repository structure

📂ComputationalHealthLaboratory
├── 💼0_Pathway_Enrichment.ipynb  # Pathway gene dataset expansion and pathway enrichment
├── 💼1_Network_Analysis.ipynb  # Network building and analysis
├── 💼2_Community_Analysis.ipynb  # Community detection and analysis
├── 💼3_Plots.ipynb  # Methods to plot the protein, disease and community graphs
├── 💼4_Project_CHL.ipynb  # Entire project, the previous four notebooks combined
├── 📄config_example.yml  # Replace this with your customized configuration file
├── 📄config.py  # Method to retrieve data from BioGRID
├── 📂datasets  # Datasets used by the project
│   ├── 🗃️BIOGRID.tab3.txt  # The starting gene interactions used for our analysis
│   ├── 🗃️BIOGRID_updated.tab3.txt  # The updated starting gene interactions
│   ├── 🗃️biomarkers.csv  # Central nodes
│   ├── 🗃️communities.csv  # Communities of the protein-to-protein graph
│   ├── 🗃️communities_metrics.csv
│   ├── 🗃️community_gene_metrics.csv
│   ├── 🗃️diseases_pathways.csv  # Disease pathways retrieved from DisGeNET
│   ├── 🗃️diseases_scores.csv  # Disease pathways with their metrics
│   ├── 🗃️genes.csv  # Expanded gene dataset
│   ├── 🗃️geneset.csv  # Starting gene interactions, retrieved by BioGRID
│   ├── 🗃️interactions.csv  # Expanded gene interactions dataset
│   ├── 🗃️mean_distances.csv
│   └── 🗃️protein_graph.gpickle  # Protein-to-protein graph
├── 📂presentation  # Project final presentation
│   ├── 📄DallaNoceRistoriZuppolini_presentation.pdf
│   └── 📄DallaNoceRistoriZuppolini_presentation.pptx
├── 📄README.md
├── 📂report  # Project report files
│   ├── 📄DallaNoceRistoriZuppolini_report.pdf  # Project report
│   └── 📄...  # Other Latex files for the report
├── 📄requirements.txt
└── 📂src  # Project methods
    ├── 📄communities.py
    ├── 📄disease.py
    ├── 📄plot_graphs.py
    ├── 📄protein_to_protein_graph.py
    └── 📄utilities.py

How to run the project

First, clone the repo

git clone https://github.com/nikodallanoce/ComputationalHealthLaboratory

Install all the required packages

pip install -r requirements.txt

Then you can work with the notebooks and our package, for a deeper understanding of our work, use 4_Project_CHL.ipynb to run the entire project, we strongly advise to change the protein's name with one of your choice or you can try with the same one worked with.

Resources

About


Languages

Language:Jupyter Notebook 72.8%Language:TeX 13.8%Language:Python 13.4%