yoonsikp / cnv-pathogenicity-prediction

Using ML to Predict Pathogenicity of Copy Number Variations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Predicting the Pathogenicity of Copy Number Variations

Copy number variations (CNVs) describe a subset of the wide variety of genetic modifications that occur in humans. However, it remains difficult for researchers to predict the effects a CNV will have on an individual. CNVs exhibit a spectrum of phenotypic effects ranging from benign to pathogenic to even beneficial. This project aims to detect pathogenic CNVs, while safely discarding CNVs that are confidently predicted to be benign.

This repository contains the code and datasets required to replicate the results of the project. Furthermore, the libraries used for Feature Extraction can be repurposed for any project involving regions of genetic data aligned to the hg19 reference genome. Every top level folder contains a descriptive README. The following links are example notebooks from the project.

Feature Extraction

Libraries

Feature Extraction

Model Training

Logistic Regression

Neural Network

XGBoost

Model Testing

Logistic Regression

Neural Network

XGBoost

Presentation

Final Presentation Excerpt

Requirements

This project depends on Python 3. The Python 3 libraries needed are listed in requirements.txt.

About

Using ML to Predict Pathogenicity of Copy Number Variations

License:MIT License


Languages

Language:Jupyter Notebook 98.3%Language:Python 1.7%