davidycliao/redguards

redguards: Factionalism and the Red Guards under Mao's China: Ideal Point Estimation Using Text Data

Abstract

In this paper, we design a new strain of text scaling method, Swordfish (Slogan-featured Wordfish), that takes advantage of the TextRank algorithm to extract the most representative political slogans in a given context and estimates Wordfish with those extracted text variables. We test this method using the case of the Great Proletarian Cultural Revolution in China and the historical archive of handwritten big-character posters and self-printed tabloids from 1966. We estimate student protests' ideal points by analyzing expressed political views in propaganda publications. Our findings point to evidence of factional re(de)alignments within the movement and demonstrate how the students from different educational backgrounds followed Mao Zedong and Xiaohongshu 小紅書 (Little Red Book) and then fell into armed conflicts that divided families, the classes and the society. The results estimated by our approach are shown to be consistent with the representative qualitative literature of factionalism regarding the Cultural Revolution.

Keywords：Text as Data, Textrank, Keyword Extraction, the Cultural Revolution
Documents: slides | paper

Replication

This is a designed package for replicating the estimates and findings in the article of Factionalism and the Red Guards under Mao's China: Ideal Point Estimation Using Text Data. In this paper, we design a new strain of text scaling method as we called SWORDFISH (Slogan-based Features Wordfish) that takes advantage of the TextRank algorithm (Mihalcea and Tarau 2004) to extract the most representative keywords (such as noun collocation phrases) and scale those extracted text variables with Bayesian IRT Generalized Wordfish Model implanted by Imai, Lo, and Olmsted (2016) based on the Slapin and Proksch’s “Wordfish”(2008).

The Red Guard documents analyzed in the paper are archived in The Databases for The History of Contemporary Chinese Political Movements (香港中文大學中國當代**史數據庫) by The Chinese University of Hong Kong. Please note that replicating the analyses initially requires the access to the original corpus of the textual data. We, as authors and data users, do not fully have the copyright of the sources analyzed in the paper. To comply with the terms of service, we cannot share the textual files publicly. However, we are providing pre-processed textual files parsed on CoNLL-U format and document-term-matrix to replicate the analyses of the last stage. The pre-processed textual materials can be found at data.

The source code in replication-code for replicating the estimates for this paper includes four parts for replication for all tables and figures that appear in both the main paper and the online supplemental materials:

01.tokenization-in-udpipe.R (tokenization and part-of-speech tagging on Universal Dependencies via pre-trained model)
02.keywords-extraction.R (keyword extractions using TextRank)
03.pooled-ideal-point-estimates.R (textual documents merged by individual participants and estimated by Wordfish scaling method)
04.incident-ideal-points-estimates.R (textual documents estimated by Wordfish scaling method through time)
05.visualization.R (data visualization and findings)

Replicating the comparable estimates for this paper is easy. Simply follow the description of the reference, folk or download this repo, and run run_replication() in the Rstudio console. The results and figures will automatically generated by the source codes and stored in replication-figures. The history log of ideal point estimation using emIRT will automatically be saved and stored in misc folder for further inspection.

Getting started

Install the release version of R (preferably version 3.6 or above), RStudio and usethis and devtools from CRAN with install.packages(c("usethis", "devtools")) .

install.packages(c("usethis", "devtools"))
library(usethis)
library(devtools)

Download the redguards repository from GitHub with use_course and tick Yes or Definitely, automatically bringing you to the redguards project.

usethis::use_course(create_download_url("https://github.com/davidycliao/redguards"))

Then, build the project package by install() and load it.

devtools::install()

Last, start replication with run_replication().

library(redguards)
run_replication()

Please note that replicating the figures requires installing STHeiti font in local computer beforehand to present Chinese characters.

Cite

@unpublished{liao2022,
  author = {Liao, Yen-Chieh and Tsai, Yi-Nung and Tene, Daniel and Zhang, Dechun},
  title = {Factionalism and the Red Guards under Mao's China: Ideal Point Estimation Using Text Data},
  note={SSRN working paper, Available at SSRN: http://ssrn.com/abstract=4200926 },
  year={2022}
}

davidycliao / redguards

redguards: Factionalism and the Red Guards under Mao's China: Ideal Point Estimation Using Text Data

Abstract

Replication

Getting started

Cite

About

Languages