Kevin-tati / Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spark

download the titanic desaster train.csv file on kaggle: https://www.kaggle.com/c/titanic/data

Databricks

  • Create an account on Databricks: https://community.cloud.databricks.com/login.html

  • Create a table and load the previously downloaded dataset

  • Create a cluster

  • Upload the file in your workspace and choose the cluster you just created

  • Run the notebook

Machine learning part

Model use for the predition :
  • LogisticRegression
  • DecisionTreeClassifier
  • RandomForestClassifier
  • Gradient-boosted tree classifier
  • NaiveBayes
  • Support Vector Machine

About


Languages

Language:Jupyter Notebook 100.0%