rpolecat makes downloading and working with the POLECAT event data easier.
The main download functionality works, but there’s more stuff in the works, see our roadmap in #1.
The POLECAT data are provided via two repositories on dataverse:
- POLECAT Weekly Data, containing the actual data files.
- POLECAT Documentation
The data are described in two papers:
- PLOVER and POLECAT: A New Political Event Ontology and Dataset: data paper with details on the POLECAT data and PLOVER ontology.
- Creating Custom Event Data Without Dictionaries: A Bag-of-Tricks (arXiv): details on the NGEC coder that produces the data.
You can install the development version of rpolecat like so:
library(remotes)
install_github("basil-analytics/rpolecat")
rpolecat depends on the R Dataverse Client to interact with the Dataverse API. That package requires two environment variables in order to be able to interact with the API. More details are documented in the dataverse R package readme API Access Keys section.
One way to meet this requirement without have to mess with R every time you start it is to:
-
Obtain an API access token from Harvard Dataverse.
-
Add the following lines to your
.Rprofile
file:# dataverse API token Sys.setenv(DATAVERSE_SERVER = "dataverse.harvard.edu") Sys.setenv(DATAVERSE_KEY = "<your API token>")
(You can find and open your
.Rprofile
file usingusethis::edit_r_profile()
if the usethis package is installed.) -
Restart R for the changes to take effect.
One of the main functions of this package is to download the POLECAT data without manual point and clicking on Dataverse:
# (not run)
# one-stop-shop for getting all data and keeping updated with new data:
# 1st time
download_polecat(local_dir = "my/data/dir", skip_exiting = TRUE)
# next times
download_polecat(local_dir = "my/data/dir", years = 2023, skip_existing = TRUE)
# see ?download_polecat
This will download the files as they are, namely the current year weekly
files will remain unzipped, and the historical data yearly ZIP archives
will remain zipped. However, the “skip_existing” functionality should
be able to handle things correctly if you zip/unzip various files, see
?download_polecat
for more details.
The package also include information from the PLOVER ontology:
library(rpolecat)
data(contexts)
head(contexts)
#> context
#> 1 military
#> 2 intelligence
#> 3 executive
#> 4 legislative
#> 5 election
#> 6 political_institutions
data(modes)
head(modes)
#> event_type mode
#> 1 consult visit
#> 2 consult third-party
#> 3 consult multilateral
#> 4 consult phone
#> 5 retreat withdraw
#> 6 retreat release
The data license is viewable on dataverse. We copy it here for convenience:
The POLECAT data are produced by the Program on Geostrategic Risk (formerly the Political Instability Task Force). The Program on Geostrategic Risk is funded by the Central Intelligence Agency. The views expressed are the authors’ alone and do not represent the views of the U.S. Government. We are unable to provide the story text from which events are extracted or the URLs due to licensing restrictions. For any data issues or bug reports please contact the dataset points of contact. THESE MATERIALS ARE SUBJECT TO COPYRIGHT PROTECTION AND MAY ONLY BE USED AND COPIED FOR RESEARCH AND EDUCATIONAL PURPOSES. THE MATERIALS MAY NOT BE USED OR COPIED FOR ANY COMMERCIAL PURPOSES. © 2023 Leidos. All rights reserved. THE MATERIALS ARE PROVIDED ON AN AS-IS BASIS, WITH NO WARRANTIES OR GUARANTIES OF ANY KIND. THE OWNERS WILL NOT BE LIABLE FOR ANY DAMAGES ARISING FROM THEIR USE. USE OF THE MATERIALS IS ENTIRELY AT YOUR OWN RISK.