This is the Group project for CECS1020 class - Introduction to Machine Learning of VinUniversity. We are group 6:
- Nguyen Tiet Nguyen Khoi
- Nguyen Duong Tung
- Nguyen Hoang Trung Dung
This zip folder includes:
- Final report (using Latex)
- ipynb file for the coding implementation
- Slides for the group presentation
Additional links:
Note:
- Due to the report's length requirement, we could not put everything we have done into it. Please check through our ipynb file for full implementation.
- We made our slides on Google Slide. When converting it into the Powerpoint slide, it may has some visualization errors. Please access our Google Slides link above if you encounter such errors.
- The report's length requirement is 6 pages excluding the references. However, we did 7 pages (excluding the references and 2 first pages for outline).
"The sinking of the Titanic is one of the most infamous shipwrecks in history.
On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.
While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.
In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc)."
from sklearn.model_selection import train_test_split
y = train['Survived']
x = train_scaled
X_train,X_valid,y_train,y_valid = train_test_split(x,y,test_size=0.2,random_state=42)
#Build logistic regression model
from sklearn.linear_model import LogisticRegression
lr=LogisticRegression(max_iter = 10000)
lr.fit(X_train,y_train)
from sklearn.metrics import accuracy_score
y_pred=lr.predict(X_valid)
accuracy_score(y_valid,y_pred)
from sklearn.metrics import confusion_matrix
confusion_matrix(y_valid,y_pred)
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report as cr
X_train,X_valid,y_train,y_valid = train_test_split(x,y,test_size=0.2,random_state=42)
dtc=DecisionTreeClassifier()
dtc.fit(X_train, y_train)
y_predict=dtc.predict(X_valid)
confusion_matrix(y_valid,y_predict)
accuracy_score(y_valid,y_predict)
print(cr(y_valid,y_predict))
from sklearn.ensemble import RandomForestClassifier as rc
X_train,X_valid,y_train,y_valid = train_test_split(x,y,test_size=0.2,random_state=42)
rfc=rc()
rfc.fit(X_train,y_train)
rfc_y_pred=rfc.predict(X_valid)
accuracy_score(y_valid,rfc_y_pred)
print(cr(y_valid,rfc_y_pred))
- Nguyen Tiet Nguyen Khoi
- Nguyen Duong Tung
- Nguyen Hoang Trung Dung