Here are two datasets that can help researchers enrich and provide structure to GDELT GKG Data (https://www.gdeltproject.org/data.html#rawdatafiles)
GDELT injests a ton of different websites, spammy SEO blogs, PR newswires, and of course a ton of news media. Because it's a bit of a soup, we've created media lists that fit categories academics tend to study: elite media, wire media, traditional news media, online partisian media, and emerging media.
To address this problem, we created a series of lists for these categories. To view these media categories, see GDELT sources.ipynb
GDELT is great in that it searches the text of articles and looks for specific themes. For a list: http://data.gdeltproject.org/documentation/GDELT-Global_Knowledge_Graph_CategoryList.xlsx
GDELT makes it easy to know what themes are inside of a news event (e.g., a collection of news articles).
The problem is that these themes are a bit too granular for broader public opinion studies that often study issues (e.g., the economy).
To address this problem, Lei Guo and I created a list of GDELT themes that taken together broadly represent "issues." To access the list, see GDELT Issues.ipynb
Here are the papers where we created these lists. We outline all the content analyses that went into creating these lists. Please cite these papers if you end up using this issue lists!
Vargo, C., & Guo, L. (2017). Networks, big data, and intermedia agenda-setting: an analysis of traditional, partisan, and emerging online U.S. news. Journalism & Mass Communication Quarterly, 94(4) 1031–1055. http://chrisjvargo.com/wp-content/uploads/2016/12/1FinalPDFJMCQ.pdf
Guo, L., & Vargo, C. (2018). “Fake news” and emerging online media ecosystem: An integrated intermedia agenda-setting analysis of during the 2016 U.S. presidential election. Communication Research. Preprint published online, June 4, 2018. https://www.dropbox.com/s/knjgj2ior6r9k8c/1FinalPDF.pdf?dl=1