quciet / DU_COMP4447_project

This is a project created for a data science class. The project is about Movie sales in China & North America

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

COMP4447_Project - Explore Movie Sales (China/ North America)

1. Team member & Project objective

  • Yutang Xiong
  • Using techniques from data science tools class to do an analysis on Chinese movie market. Objective includes but not limited to see the sales trend, which actors/actresses/directors/filming companies/distribution companies are the money machines, etc.

2.Instruction

  • The working folder is to store all the intermediate works (including Python code, temproary data files...).
  • Please directly refer to the "Final Report.ipynb" for the report for this project which includes introduction, process description, analysis and visulization.

To run the Report in the BinderHub, click the following: Binder

  • Combined Python code has been provided in the file "Final_Project_Codes_Combined.ipynb". You can use this file to entirely reverse engineering the results shown in the final report.
  • Note that web-scraping part in the code could take a long time to run. Thus, scraped data has been stored and provided in the folder "data" which also contains dataset for each stage (raw, cleaned, subsets for analysis or visulization)

To run the Python code in the BinderHub, click the following: Binder

3. Working Timeline

Process Week Date
data collection week 6 Feb 17th
data clean/transformation week 7 Feb 24th
feature engineering/statistical summary week 8 Mar 3rd
visulization week 9 Mar 8th
Final Submission week 10+ Mar 20th

About

This is a project created for a data science class. The project is about Movie sales in China & North America


Languages

Language:Jupyter Notebook 100.0%