TDung939 / CECS1020

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CECS1020 Final Project - Titanic Prediction

Introduction

This is the Group project for CECS1020 class - Introduction to Machine Learning of VinUniversity. We are group 6:

  • Nguyen Tiet Nguyen Khoi
  • Nguyen Duong Tung
  • Nguyen Hoang Trung Dung

This zip folder includes:

  • Final report (using Latex)
  • ipynb file for the coding implementation
  • Slides for the group presentation

Additional links:

Note:

  • Due to the report's length requirement, we could not put everything we have done into it. Please check through our ipynb file for full implementation.
  • We made our slides on Google Slide. When converting it into the Powerpoint slide, it may has some visualization errors. Please access our Google Slides link above if you encounter such errors.
  • The report's length requirement is 6 pages excluding the references. However, we did 7 pages (excluding the references and 2 first pages for outline).

The Challege (Kaggle)

"The sinking of the Titanic is one of the most infamous shipwrecks in history.

On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.

While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc)."

Implementation (ipynb file)

Importing Necessary Libraries

Preprocessing Part

Method 1: Logistic Regression

from sklearn.model_selection import train_test_split
y = train['Survived']
x = train_scaled
X_train,X_valid,y_train,y_valid = train_test_split(x,y,test_size=0.2,random_state=42)

#Build logistic regression model

from sklearn.linear_model import LogisticRegression
lr=LogisticRegression(max_iter = 10000)
lr.fit(X_train,y_train)

from sklearn.metrics import accuracy_score
y_pred=lr.predict(X_valid)
accuracy_score(y_valid,y_pred)

from sklearn.metrics import confusion_matrix
confusion_matrix(y_valid,y_pred)

Method 2: Decision Tree

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report as cr

X_train,X_valid,y_train,y_valid = train_test_split(x,y,test_size=0.2,random_state=42)

dtc=DecisionTreeClassifier()
dtc.fit(X_train, y_train)
y_predict=dtc.predict(X_valid)

confusion_matrix(y_valid,y_predict)

accuracy_score(y_valid,y_predict)

print(cr(y_valid,y_predict))

Method 3: Random Forest

from sklearn.ensemble import RandomForestClassifier as rc
X_train,X_valid,y_train,y_valid = train_test_split(x,y,test_size=0.2,random_state=42)

rfc=rc()
rfc.fit(X_train,y_train)
rfc_y_pred=rfc.predict(X_valid)
accuracy_score(y_valid,rfc_y_pred)

print(cr(y_valid,rfc_y_pred))

Contributing

  • Nguyen Tiet Nguyen Khoi
  • Nguyen Duong Tung
  • Nguyen Hoang Trung Dung
khoi tung dung

License

MIT

About


Languages

Language:Jupyter Notebook 100.0%