k-sys / covid-19

A collection of work related to COVID-19

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Rt estimates early in outbreaks are low

batson opened this issue · comments

Most estimates of R0 for COVID19 are north of 2. (eg Los Alamos paper on Wuhan.)

I would expect most the Rt estimates for most states to begin around R0 (before any responses were made), and then decrease as societal and personal behavior changed.

In epiforecasts.io, this is the case: the 50% credible interval is above 2 for the first week in each of the states they highlight, and the median prediction on March 9 in New Jersey, Illinois, Pennsylvania, is around 2.

In the current rt.live model, all point estimates for Rt are below 2, including on March 9.

To diagnose this, it may be worth simulating some data for which Rt begins at 2.5 and then decreases steadily over a month to 1 or below. There are a myriad of potential causes, including the value of sigma used in the brownian motion and edge effects. Besides, a simulation or two will make a good sanity check for the model in general!

Very cool to see all the progress in the last week, btw.

commented

Looking into this, thanks for the note. I'm in contact with the epiforecasts folks. I can't tell if it's a prior issue or something else. Generally we converge on the most recent values though. Btw I am assuming this is for the mcmc version?

Awesome, glad to hear you and the epiforecasts folks are working together 🤝!

This was for the MCMC version posted yesterday. (@mnielsen actually flagged the issue on twitter and I confirmed.)

The prior for the serial interval is set too low, may want to try something like alpha=12, beta=2 (since the mean is alpha/beta) for that gamma distribution. May also want to adjust the prior for theta to be a bit higher. I tried making these adjustments, and it actually does make a difference in the final values. If you start with a higher prior, the final estimates are 0.1 to 0.2 higher.

I have ran into the same issue with the MCMC model adapted to European data. R_t's are underestimated early into the outbreak for all countries. I've reproduced the problem on synthetic data so I suspect it's a bug.

@gkossakowski same here I adapted to European data and things are looking a little low.

Do you mind sharing how you generated your toy data? And when you tested on generated data Rt was underestimated also?

Looking into this, thanks for the note. I'm in contact with the epiforecasts folks. I can't tell if it's a prior issue or something else. Generally we converge on the most recent values though. Btw I am assuming this is for the mcmc version?

The model under estimates R0 due to not including asymptomatic and undetected cases into the model. I created graphs and estimation figures for NSW, Australia.

I have ran into the same issue with the MCMC model adapted to European data. R_t's are underestimated early into the outbreak for all countries. I've reproduced the problem on synthetic data so I suspect it's a bug.

There is certainly under estimation of R0 of the current model in early stages of the outbreak which is quite concerning. My model resolves this issue my modeling undetected cases as well. More details are in https://github.com/https-seyhan/COVID-19/wiki/Real-time-COVID-19-Infection-Prediction-of-Australian-NSW-population-using-Bayesian-Approach-and-Improved-Poisson-Likelihood-Function

The model, I introduced, improves the Bayesian approach significantly. The model solves the omitted variable bias by modeling uncertainty of undetected and asymptomatic cases as well.

Best,

Seyhan

@bastienboutonnet the MCMC model trained on European data: https://github.com/gkossakowski/covid-19/blob/master/Realtime%20Rt%20mcmc.ipynb

The generated data was coming from a simple SIR model with R predefined (and fixed for periods of time to simplify analysis). I expected that the model would recover R from case numbers and we found that R was underestimated. Upon further investigation, we realized that case -> onset tranformation was the culprit. Surprisingly, what's happening in the far tail end of pdelay distribution (as defined in the notebook) has outsized impact on R estimates.

Have you been digging more into this yourself?