MArya80 / Palmer-Penguin-Dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

In this Project we decided to take a look at the Palmer Penguin Dataset and tackle the missing and unknown values. We saw that there is 2 rows, in which none of bill length, bill depth, flipper length, body mass and sex which are were known and replacing them with random or any other strategy could led us to problems. After dropping this two, we could see only unknown values belonged to sex column and it was not specified whether this penguins are male or female. After analyzing the dataset, we came to conclusion that gender has a strong connection with Flipper length, Bill length, Body mass and Bill depth so we used this features for training the models, one with Random Forest Classifier and the other with Gradient Boosting Classifier, which had 95% and 98% accuracy on the train set and 92% and 93% accuracy on test set and they predicted the unknown values with 89% similarity. Finally we replaced the missing knowns and the project came to an end ! :)