snowick-wmf / Editing-movement-metrics

Calculation of monthly movement-level metrics related to editing activity

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This repo contains all the code needed to calculate the monthly Wikimedia movement metrics related to content and contributors. It has three main dependencies:

  • This code is designed to run on one of the SWAP servers and will not work elsewhere.
  • The contributors-related metrics are calculated from the mediawiki_history dataset.
  • The content-related metrics are calculated from the AQS API.

For more details about our monthly reporting process, see mw:Product Analytics/Movement metrics.

For a full list of metric definitions, see mw:Wikimedia Product/Data dictionary.

Usage

  1. Clone this onto one of the SWAP hosts.
  2. In any order, run the two notebooks numbered 01
  3. In any order, run the two notebooks numbered 02
    • 02a-calculation.ipynb, which actually calculates the metrics (some of them using the editor-month and new editor tables calculated in the previous step) and inserts them into metrics.tsv.
    • 02b-diversity-calculation.ipynb, which calculates the diversity metrics (some of them using the editor-month and new editor tables calculated in the previous step) and inserts them into diversity_metrics.tsv.
  4. Run the notebook 03-report.ipynb, which does a few simple transformations on the metrics and produces the table of values needed for the final report, as well as a graph of each metric.
  5. Run the notebook 04-Visualiaztion.ipynb, which provides YoY charts for metrics in the metrics deck.
  6. Do any analysis you need to understand major trends (drawing on the analysis notes in past months' slides if needed). The analysis folder has a variety of notebook you could reuse; if you do new analysis, considering keeping it in an existing or new notebook in this folder, so it can be reused in the future.

About

Calculation of monthly movement-level metrics related to editing activity


Languages

Language:Jupyter Notebook 99.8%Language:HiveQL 0.1%Language:TSQL 0.0%Language:Shell 0.0%