This is a group project research proposal from POLI3115 Politics and public opinion, HKU. You can read the full version of the proposal here.
Original dataset:Zhai, Yujia, 2020, "Weibo COVID dataset", https://doi.org/10.7910/DVN/DULFFJ, Harvard Dataverse, V1
Chinese emotional dataset used for sentiment analysis, imported in code folder as 情感词汇本体.xlsx.
1_datafile_stream_processing-jsonTocsv.ipynb: Convert .json datafile to pandas dataframe and store as .csv files.
2.1_data_stream_processing-sampling.ipynb: Randomly sample 1% of the original sample for stage 1 explorative pilot analysis.
2.2_sentiment_analysis_topic_modeling.Rmd: Text tokenisation, sentimentt analysis, topic modeling of the sampled data in R. Visualisation included.
3.1_data_stream_processing-sub topic.ipynb: Filter sub-datasets with key words from the original dataset.
3.2_subtopic_sentiment_analysis.Rmd: An automated function conducting text processing and sentiment analysis for the sub-topic datasets. Results exported for visualisation in Tableau.
Sample data:
word_freq.rds: Cleaned word frequency by post and by date, from sampled data.