jiaweih / TBD

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build Status Coverage Status PyPI version License: MIT

Project TBD [Tau Be Damned]


The project TBD [Tau Be Damned] aims to use the amino acid sequence of a protein to identify whether it is disordered.

The TBD logo, nice isn't it?

Project Objective


Our goal is to build a tool to identify whether a protein is disordered based on its amino acid sequence. We have collected amino acid sequences for ordered and disordered proteins from publicly available datasets to train a machine learning model to perform the classification task.

Mission


We share an interest in proteins. While many proteins fold into regular conformations which can be easily analyzed on a structural basis, intrinsically disordered proteins (IDPs) do not. IDPs like tau are implicated in diseases such as Alzheimer's and other neurodegenerative diseases. We aim to employ machine learning tools to improve the study of IDPs for scientific researchers and citizen scientists alike.

Requirements


Package TBD has the following major dependencies:

  1. python = 3.6
  2. tensorflow = 2.4
  3. scikit-learn = 0.23
  4. scipy = 1.5
  5. pandas = 1.1

The detailed list of dependencies can be found in the environment.yml file.

Installation


The package TBD can be installed with the following steps:

  1. Download the repository: git clone https://github.com/Intrinsically-Disordered/TBD.git
  2. Go to the root directory: cd TBD
  3. Create a virtual environment: conda env create --name tbdenv -f environment.yml
  4. Activate the environment: conda activate tbdenv
  5. Install the package: python setup.py install
  6. Check installation run: python -c "import tbd"

Usage


An example to run the whole pipeline of data processing, modeling and prediction using a single script can be found here: run_tbd.py

An example to predict with the pretrained model can be found here: example notebook

Use Cases


This project aims to be of use to the general public with interest in learning about classifying proteins, scientists determining if the protein they are working with or designed is disordered, and by those with experience in machine learning.

Use cases graphic

Modules Overview


  • preprocessing.py : Functions related to data cleaning and data processing to be ready for modeling.
  • model.py : Functions related to modeling of convolutional neural network (CNN).
  • predict.py : Functions related to predicting whether protein sequences are ordered or disordered using trainedmodel.
  • evaluate.py : Functions related to evaluting the trained model.
  • utils.py : Utility functions that can be used by other modules.

Community Guidelines


We welcome the members of open-source community to extend the functionalities of TBD, submit feature requests and report bugs.

Feature Request:

If you would like to suggest a feature or start a discussion on possible extension of TBD, please feel free to raise an issue.

Bug Report:

If you would like to report a bug, please follow this link.

Contributions:

If you would to contribute to TBD, you can fork the repository, add your contribution and generate a pull request. The complete guide to make contributions can be found at this link

About

License:MIT License


Languages

Language:Python 100.0%