- Yutang Xiong
- Using techniques from data science tools class to do an analysis on Chinese movie market. Objective includes but not limited to see the sales trend, which actors/actresses/directors/filming companies/distribution companies are the money machines, etc.
- The working folder is to store all the intermediate works (including Python code, temproary data files...).
- Please directly refer to the "Final Report.ipynb" for the report for this project which includes introduction, process description, analysis and visulization.
To run the Report in the BinderHub, click the following:
- Combined Python code has been provided in the file "Final_Project_Codes_Combined.ipynb". You can use this file to entirely reverse engineering the results shown in the final report.
- Note that web-scraping part in the code could take a long time to run. Thus, scraped data has been stored and provided in the folder "data" which also contains dataset for each stage (raw, cleaned, subsets for analysis or visulization)
To run the Python code in the BinderHub, click the following:
Process | Week | Date |
---|---|---|
data collection | week 6 | Feb 17th |
data clean/transformation | week 7 | Feb 24th |
feature engineering/statistical summary | week 8 | Mar 3rd |
visulization | week 9 | Mar 8th |
Final Submission | week 10+ | Mar 20th |