Gene-Classification-Python

This project has investigated the effectiveness of several Machine Learning models in the classification of DNA sequences. The purpose is to classify the sequences in the dataset into seven gene classes. The models utilized in this project are Random Forests, Support Vector Machine, and Logistic Regression. Data has been processed using the K-mer counting method with K values of 3, 5, and 7. The final results show The maximum F1 score of 0.963 can be achieved on this dataset with Logistic Regression model. Furthermore, the experiments suggests that the Random Forest model can be used with various K values while the other two models work well only with higher K values.

About

Classification of genes from DNA sequence data using Python and SKLearn

Languages

Language:Jupyter Notebook 100.0%