CuriosityBits's repositories
covid19_twitter
Covid-19 Twitter dataset and pre-processing scripts - under active development -released under CC-BY-4.0
COVID19Twitter
Visualizing the Twitter discourse on COVID-19
Chinese-Word-Vectors
100+ Chinese Word Vectors 上百种预训练中文词向量
Chinese_models_for_SpaCy
SpaCy 中文模型 | Models for SpaCy that support Chinese
coordination-network-toolkit
A small command line tool and set of functions for studying coordination networks in Twitter and other social media data.
corex_topic
Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx
emotionAnalysis
The Expression of Nationalist and Populist Emotions
entity2embedding
A python package for word2vec
ghapi3
Work In Progress: GitHub API v3.0 implemented in R using the gh package
HanLP
自然语言处理 中文分词 词性标注 命名实体识别 依存句法分析 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁
icore
This project introduces the interface for Communication Research (iCoRe) to access, explore, and analyze the Global Database of Events, Language and Tone (GDELT; Leetaru & Schrodt, 2013). GDELT provides a vast, open source, and constantly updated repository of online news and event metadata collected from tens of thousands of news outlets around the world. Despite GDELT’s promise for advancing communication science, its massive scale and complex data structures have hindered efforts of communication scholars aiming to access and analyze GDELT. We thus developed iCoRe, an easy-to-use web interface that (a) provides fast access to the data available in GDELT (b) shapes and processes GDELT for theory-driven applications within communication research and (c) enables replicability through transparent query and analysis protocols.
Nationalizing-the-truth
explore data from Wechatscope (wechatscope.jmsc.hku.hk)
PH_Election_Tracker_2019
A Shiny app used for tracking the Twitter pulse of the candidates in the 2019 Philippine General Election
randomscripts
random scripts to be shared with colleagues
SMMT
Social Media Mining Toolkit (SMMT) main repository
subreddit-comments-dl
Download subreddit comments
teaching_dacss
Public tutorials for COMM621 students
VoterFraud2020
A multi-modal Twitter dataset with 7.6M tweets and 25.6M retweets related to voter fraud claims.
Wechatscope
被删微信公众号文章存档