Repository containing manifestos of the main parties competing in the 2021 German federal elections (26 September 2021). The repository currently contains the manifesto versions listed below.
Party | Description |
---|---|
AfD | Final version |
CDU/CSU | Final version |
FDP | Final version |
Freie Wähler | Final version |
Greens | Final version |
The Left | Final version |
SPD | Final version |
The repository contains the manifestos in the following formats:
- Original PDF files: the folder
manifestos_originals
contains the manifestos published on the parties’ websites - Edited PDF files: the folder
manifestos_clean_pdf
contains cleaned PDF files. I removed title pages, page numbers, headers, footers, table of contents, and the index - Text files: the folder
manifestos_clean_txt
contains.txt
files of the cleaned PDFs - Corpus: the file
data_corpus_manifestos_ger2021.rds
contains all party manifestos as a quanteda text corpus. The corpus object also includes meta data on each variables, including the party codes for the ParlGov and the Manifesto Project datasets.
Note: I manually checked the txt files for errors resulting from hyphenation in the original documents. Yet, the txt files may still contain errors. Please check the raw texts carefully before you use them in research projects. Feel free to push cleaner txt files to this repository if you adjusted texts manually. I thank Corinna Doll for providing adjusted txt files for some of the manifestos.
This example shows how to load the corpus object and extract the most frequent terms in each manifesto (with minimal pre-processing and no compounding of multiword expressions).
## Load packages
library(quanteda)
library(quanteda.textstats)
library(ggplot2)
## Load text corpus
data_corpus_manifestos_ger2021 <- readRDS("data_corpus_manifestos_ger2021.rds")
## Get summary of corpus
textstat_summary(data_corpus_manifestos_ger2021)
## document chars sents tokens types puncts numbers symbols urls tags
## 1 AfD 192214 1432 26486 6959 2940 155 2 0 0
## 2 CDU/CSU 347388 2656 49203 8613 6078 188 0 0 0
## 3 FDP 290037 2093 39577 7940 4042 113 6 0 0
## 4 Freie Wähler 266879 2085 37568 7602 4089 91 2 0 0
## 5 Greens 543449 3678 76685 12257 8735 205 1 0 0
## 6 SPD 186762 1494 26774 5731 3113 110 0 0 0
## 7 The Left 551568 4605 79424 12457 9975 445 0 0 1
## emojis
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
## 7 0
## Tokenize corpus and transform to document-feature matrix
dfmat_man <- data_corpus_manifestos_ger2021 %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE) %>%
tokens_compound(phrase("* *innen")) %>% # compound *innen
tokens_remove(pattern = c(stopwords("de"), "dass")) %>%
dfm()
## Get most frequent words by party
tstat_freq <- textstat_frequency(dfmat_man,
groups = party_name_short,
n = 10)
## Plot most frequent words
ggplot(data = tstat_freq, aes(x = factor(nrow(tstat_freq):1), y = frequency)) +
geom_point() +
facet_wrap(~group, scales = "free_y") +
coord_flip() +
scale_x_discrete(breaks = nrow(tstat_freq):1,
labels = tstat_freq$feature) +
labs(x = NULL, y = "Most frequent words") +
theme_minimal()
Feel free to use the manifestos or edited files for your own work. Please cite the data as follows:
Stefan Müller. 2021. 2021 German federal election manifestos. Version 0.2: https://github.com/stefan-mueller/data_manifestos_ger2021.
If you have any questions or suggestions, please file a GitHub issue or get in touch with me.