Simboost(ML_project)
SimBoost
Drug discovery is a time-consuming, laborious, costly and high-risk process. According to a report by the Eastern Research Group (ERG), it usually takes 10-15 years to develop a new drug. However, the success rate of developing a new molecular entity is only 2.01%.
Finding a compound that selectively binds to a particular protein is a highly challenging and typically expensive procedure in the drug development process.
In this project we are going to implement SimBoost which is machine-learning approch for predicting drug–target binding affinities using gradient boosting.
Table of contents
-
1. Setup
-
2.Feature Engineering
-
2.1 Average Similarities and Binding values
-
2.2 Drug/Target Similarity Networks
-
2.3 Non-negative Matrix Factorization
-
2.4 Building Train, Validation and Test Dataset using extracted features
-
-
3.XGboost
-
3.1 Tune Hyperparameters
-
3.2 Ploting Feature importance
-
3.3 Evaluation
-
-
4.Classification