Pathologic Simulated Curves
jrgant opened this issue · comments
Curve simulations in curvefit.R seem to be working, more or less. Still needs some refactoring for large number of repetitions.
However, some sampled parameter sets appear to produce pathologic (i.e., impossible) curves. Note two sorts of bad curves evident in the plots below:
- Massive spike in hospitalizations during week 1 (epiweek 40)
- Negative hospitalization values (impossible)
Two possibilities:
-
I've implemented something incorrectly in translating the Brooks et al. sampling algorithm/equation into the hospitalization context (as opposed to the wILI curves they modeled).
-
The curve equation needs to be altered in some way so as to constrain it to [0, Inf].
Current kludge-y fix is to calculate hospitalizations in week i as max[0, f(i)] to avoid negative predictions.
@kmcconeghy and I are going to modify f(i) so that predictions are naturally constrained to [0, Inf].
See updated analysis plan. I add a transformation of the simulated hospitalization count yhat(i) as follows: 0.5 * [|yhat(i)| + yhat(i)]
. Functionally, I'm pretty sure this is the same as doing max[0, yhat(i)]
.
My instinct is that we should still figure out an appropriate transformation for the error distribution (currently Gaussian with mean 0). Otherwise, any negative hospitalization count produced by f(i) + ε, ε ~ N(μ, σ^2)
gets transformed to 0, and we lose the random error.
I suspect the goal should be this: 0.5 * [|f(i)| + f(i)] + ε, ε ~ dist(x)
.
Let me know if you have any ideas.
The problem with the curves has been fixed. I was erroneously labeling the range of integer weeks for each simulated curve as 1:31. However, due to the curve stretching parameter, predictions were made in total for integer weeks ranging from [-10, 40] across the 100 simulations I ran during the fix.