fabbio00 / Biological_Data_Access_Guide_BDAG.ipynb

This mini-guide contains some simple practical examples of data retrieval for main biological data archives sites. In addition, specific sections of the notebook have been devoted to collecting, summarizing, and clarifying the available documentation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Practical guide to cancer data retrieval from major online archives

One of the first difficulties when one wants to develop a tool or model to work on biological data is data retrieval. In fact, many of the sites used in this area do not have easily accessible and understandable documentation regarding online data retrieval.

The purpose of the Python notebook developed for this presentation is to provide a practical mini-guide to clearly illustrate how to access the major online repositories of cancer data such as TCGA, cBioportal, FireBrowse, icgc.org, etc.. Data access is done through APIs and/or software packages provided by these sites.

This mini-guide contains some simple practical examples for each of the aforementioned sites. In addition, knowing the difficulties users have in understanding and integrating the scattered and complex information on these sites regarding data retrieval, specific sections of the notebook have been devoted to collecting, summarizing, and clarifying the available documentation.

Table of contents

TCGA aims to map genomic alterations present in different types of human cancers. TCGA has generated a large genomic, epigenomic and clinical dataset from tumor samples of patients with different types of cancer.

cBioPortal is an online platform that provides advanced tools for the analysis and visualization of cancer genomic data. It is not a sequencing project like TCGA, but a platform for accessing and analyzing cancer genomic data from a variety of sources, including TCGA data. cBioPortal focuses on providing interactive tools for exploring and analyzing cancer genomic data, making complex information accessible to researchers and clinicians.

Firebrowse is a website associated with the Broad Institute at MIT that provides a user interface for exploring and analyzing cancer genomic data, particularly those from The Cancer Genome Atlas (TCGA). The main goal is to simplify access to and analysis of molecular information about cancer by offering visualization and analysis tools.

The International Cancer Genome Consortium (ICGC) is a global scientific research consortium that focuses on mapping the genome of various cancers in order to better understand the genetic basis of cancer. The ICGC aims to identify genomic alterations present in various types of cancers through genome mapping, contributing to the understanding of the causes and development of cancer. It involves numerous genomic sequencing projects around the world.

DepMap is a research project concerned with mapping gene dependencies in the context of cancer. The main goal is to understand which genes are essential for cancer cell survival, which can provide crucial information for the development of targeted anticancer therapies.

Contributors

License

This project is licensed under the MIT LICENSE - see the LICENSE file for details

About

This mini-guide contains some simple practical examples of data retrieval for main biological data archives sites. In addition, specific sections of the notebook have been devoted to collecting, summarizing, and clarifying the available documentation.

License:MIT License


Languages

Language:Jupyter Notebook 100.0%