wmcz / blogpost-cswiki-in-covid-year

WIP repository, looking for differences in cswiki editing during 2020 aka covid year

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

blogpost-cswiki-in-covid-year

This repository contains data for post at WMCZ's blog, about Czech Wikipedia during the pandemy.

Data source

This repository makes use only of public data published by the Wikimedia Foundation, but the public data are processed at WMF's Hadoop cluster via Spark queries.

Page views

Data about page/project views can be downloaded from Wikimedia Dumps as pageviews dataset. In the Hadoop cluster, the data are available as those two tables:

  • wmf.pageview_hourly: per-page views, hourly granularity (docs)
  • wmf.projectview_hourly: per-project views, hourly granularity (docs)

Edits

Data about edits can be downloaded from Wikimedia Dumps as mediawiki_history dataset. In the Hadoop cluster, the data are available as wmf.mediawiki_history (docs).

About

WIP repository, looking for differences in cswiki editing during 2020 aka covid year


Languages

Language:Jupyter Notebook 99.8%Language:Shell 0.2%