Artanic30 / CS150-Database-and-Data-Mining-Project

Final project for CS150 database and data minning course in fall 2020

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CS150 Database and Data Mining Project

Logistics

Please type your Chinese name and ID.

  • Your Name: 邱龙田

  • Your ID: 2018533107

  • Your Name: 施芊靖

  • Your ID: 2018533194

If you are a team, please write names and IDs of both people.

Due date: 23:59,January, 10th, 2020.

You need to finish an entire machine learning system on provided dataset. You do not need to implement a machine learning algorithm from scratch, you are free to call any existing libraries for data science.

Submission

You need to submit three parts.

  • Submit the report to gradescope
    • To form a team, remember to select your teammate when you are submitting at gradescope.
  • Submit the completed test.csv to gradescope
    • To form a team, remember to select your teammate when you are submitting at gradescope.
  • Submit your codes to the github classroom repository.

Report

A report at most 4-page to describle the entire pipeline of your work. You should use the provided the report template, follow the guideline and instructions given in the template and fill into the corresponding part.

Answers of the rest in testset

We'll only offer a subset of correct answers for test data. To submit your results, you should complete the missing values of Correct First Attempt in test.csv, which means replace NaN with the value your model predicts. Then you need to submit your completed test.csv. (Don't submit train.csv.)

Note: For those who don't obey our submission rules, we'll give it 0 point. If you have any question about this, post it on Piazza.

Codes

You also need to upload your codes with an introduction file. We'll do duplicate checking for all the submitted codes, so don't copy other people's codes.

Bonus: We'll offer additional points for those using PySpark to implement the algorithms. To earn the bonus, state clearly in the report about your implementation.

About

Final project for CS150 database and data minning course in fall 2020


Languages

Language:Jupyter Notebook 45.2%Language:TeX 31.1%Language:Python 23.6%