Periodic Sampling (COVID-19 Data)

This package utilises Bayesian Inference and Gibbs Sampling to explore periodic data trends in real or synthetic Covid-19 case data.

Data Files

Timeseries from real Covid-19 data may be used, or generated synthetically from a renewal model

Real Data

We import Covid-19 case and death data from the John Hopkins Database. This data uploaded into separate .csv files on a daily basis, and so routines in the analysis module are provided to generate location-specific files over the history of the pandemic.

For example:

from analysis import generate_location_df

input_dir = "COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/"
location_key = "England, United Kingdom"

country_df = generate_location_df(input_dir, location_key)
country_df.to_csv("data/England_data.csv")

More detailed examples (along with cleaning procedures for the data) are given in data_trends.ipynb. Currently these procedures are not packaged into a separate method, but this may be updated in the future.

Further information about this data (such as collection methods) can be found in a dedicated README. Pre-generated example data files are also available.

Periodic Reporting Trends

In this data we typically observe a strong oscilatory trend, as depicted in both the cases and death data from England, UK. The raw daily data is given in grey, with a 7-day moving average (typically used in most publications) superimposed in colour.

There are consistent over/under reporting trends on particular weekdays across the duration of the pandemic. These may be quantified through a reporting factor, given by the ratio of observed cases on a given day to the 7-day average about that day. The distribution of reporting factor for each dataset is given below:

Synthetic Data

It is also possible to generate synthetic pandemic data using a renewal model framework. Alongside this are provided various reporter functions, which can return/save this data in .csv format, as well as applying various reporting biases to replicate the trends described above.

An example of this process is given below:

from synthetic_data import RenewalModel, Reporter

model = RenewalModel(R0=0.99)
model.simulate(T=200, N_0=500)

rep = Reporter(model.case_data)
truth_df = rep.unbiased_report()
bias_df = rep.fixed_bias_report(bias = [0.5, 1.4, 1.2, 1.1, 1.1, 1.1, 0.6],
                                multinomial_dist=True)

This would generate the following data:

All functions have complete docstrings to record their functionality and expected arguments. Further detail is also given in the README for the periodic_sampling module.

Inference Methods

Both Metropolis-Hastings and Gibbs sampling methods are implemented for use in Bayesian inference. These have separate parameter and sampling classes, but a combined ('mixed') sampling method is also implemented to allow inference on multiple parameters of different types. We also utilise independent sampling for the discrete case values in inference of the ground truth time series.

This flexible implementation is applicable to a wide range of problems, with some examples from Ben Lambert's "A Student's Guide to Bayesian Statistics" given in exampler.ipynb. These methods are then applied to the inference of the true time series from the biased time series, under various assumptions described in a separate README.

We also introduce a number of methods in Stan using a No U-Turn Sampler, to handle larger populations without the computational limits we have imposed on our mixed sampler through the use of independent sampling on the time series. An example of predictions for the timeseries and reproduction number profile (based on the posterior mean) is given below:

KCGallagher / periodic-sampling