idontgetoutmuch / abm-particle-filter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Next Steps

idontgetoutmuch opened this issue · comments

  1. Try running the model with different seeds and then either plotting all the results (might become messy) or just plotting the mean and two lines two standard deviations above and below the mean (you will have to calculate the mean and standard deviation at each time point - julia should have functions for doing this).
  2. Try https://juliadynamics.github.io/Agents.jl/dev/examples/optim/ to see if you can find the "optimal" parameters and then compare the simulation(s) with these parameters against the actual data.
  3. Optionally you could try playing with the parameters and seeing if you can get a fit "by eye". \beta is the infection rate and \gamma is the recovery rate (I am using the word rate very loosely here).
  4. Try implementing a particle filter. I'll put some description in a comment below.

IMPORTANT: #1 (comment)

1 Particle Filtering Method

Particle filtering [1, 2] is a popular method for estimating the state of a noisy process using noisy observations. In most expositions of this method, for example, [1] and [2] the underlying Markov process is assumed to be given by a probability density function. This is not necessary; all that is needed is for the process to be Markov and for it also to be possible to sample the process, see [3] and [4].
From a mathematical point of view, these methods can be interpreted as stochastic numerical approximations of Feynman-Kac measures. Feynman-Kac measures represent the distribution of the paths of a reference Markov process, weighted by a collection of potential functions.

1.1 Bootstrap Particle Filter

We describe the simplest particle filter: the Bootstrap Particle Filter (BPF). Although we want to estimate the parameters of the model, we defer this and explain how the BPF can be used to estimate the state of a model. In words:

  1. Given an initial prior distribution for the parameters, we sample this distribution n times and simulate the model to the first data observation. These samples are traditionally known as particles.
  2. At this time point (the first data observation), we then calculate the residual between the data and the model observation for each of the n particles and using this residual, we calculate the probability that each particle produced this observation.
  3. Using these probabilities, we now resample the particles and simulate the model to the next data observation.
  4. We repeat 2 – 3 until the data is exhausted.

N.B. it is important that the model is not deterministic; if it is then it is necessary to make it so otherwise we end up sampling just a single particle (so called particle collapse). In our case, the ABM is stochastic and no such artificial noise need be introduced.

Algorithm

image

1.2 Parameters

image

References
[1] Arnaud Doucet and Adam M. Johansen. A tutorial on particle filtering and smoothing: Fifteen years later. In in Oxford Handbook of Nonlinear Filtering. University Press, 2009.
[2] S. Särkkä. Bayesian Filtering and Smoothing. Bayesian Filtering and Smoothing. Cambridge University Press, 2013.
[3] N. Chopin and O. Papaspiliopoulos. An Introduction to Sequential Monte Carlo. Springer series in statistics. Springer International Publishing, 2020.
[4] P.D. Moral. Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications. Probability and Its Applications. Springer New York, 2004.
[5] Nikolas Kantas, Arnaud Doucet, Sumeetpal S. Singh, Jan Maciejowski, and Nicolas Chopin. On particle methods for parameter estimation in state-space models. Statistical Science, 30(3):328–351, Aug 2015.
[6] Andrew Gelman, Aki Vehtari, Daniel Simpson, Charles C. Margossian, Bob Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul Christian Bürkner, and Martin Modrák. Bayesian workflow, 2020.
[7] Nicolas Chopin, Pierre E. Jacob, and Omiros Papaspiliopoulos. SMC2: an efficient algorithm for sequential analysis of state-space models, 2012.

Eureka! I had already coded up a particle filter in julia: https://github.com/idontgetoutmuch/pmh-smc2/blob/master/simple.jl#L117

Hopefully the words above and the code match (they certainly should do).

Black box optimisation does not give an optimal result at least by eye. This is because the example is optimising to minimize the number of COVID infections. We need a different cost function: one which minimises the difference between the actual and predicted. @idontgetoutmuch to produce a skeleton for such a function.

Done

Can we produce better multiple line charts? @idontgetoutmuch to investigate.

I used

julia> Plots.plot(model_df_2.step, model_df_2.total_infected)
julia> Plots.plot!(model_df_2.step, actuals[1:11])

But looking here https://docs.juliaplots.org/latest/tutorial/ I think we could do much better.

@idontgetoutmuch to write notes on how to use the particle filter code including e.g systematic resampling.

@hrishika-alaj to produce a chart showing a) the actuals b) the black box optimised infected (and sus and rec) c) the one that looks good by eye.

@hrishika-alaj to look at gadfly and see if they produce better charts (probably yes)

# DJS to change this to accept HA's function which takes S I R beta
# gamma and returns S I R one time step ahead.
function pf(inits, N, f, h, y, Q, R, nx)
    # inits - initial values

    # N - number of particles

    # f is the state update function - for us this is the ABM - takes
    # a state and the parameters and returns a new state one time step
    # forward

    # h is the observation function - for us the state is S, I, R
    # (although S + I + R = total boys) but we can only observe I

    # y is the data - for us this the number infected for each of the
    # 14 days

    # To avoid particle collapse we need to perturb the parameters -
    # we do this by sampling from a multivariate normal distribution
    # with a given covariance matrix Q

    T = length(y)
    log_w = zeros(T,N);
    x_pf = zeros(nx,N,T);
    x_pf[:,:,1] = inits;
    wn = zeros(N);

    for t = 1:T
        if t >= 2
            a = resample_stratified(wn);

            # We need to change this line to update the state (S I R)
            # with the ABM and update the parameters by adding a
            # sample from a multivariate normal distribution - since
            # we have two parameters this is a 2 dimensional
            # multivariate normal distribution.

            # So take the S I R beta and gamma and produce a new S I R
            # using the ABM and produce new beta and gamma using a
            # random sample.

            # beta is about 2.0 and gamma is about 0.5 so we should
            # probably take the covariance matrix to be

            # [0.1 0.0; 0.0 0.01]

            x_pf[:, :, t] = hcat(f(x_pf[:, a, t-1])...) + rand(MvNormal(zeros(nx), Q), N)
        end

        # For now choose R to be [0.1]

        log_w[t, :] = logpdf(MvNormal(y[t, :], R), h(x_pf[:,:,t]));

        # To avoid underflow subtract the maximum before moving from
        # log space

        wn = map(x -> exp(x), log_w[t, :] .- maximum(log_w[t, :]));
        wn = wn / sum(wn);
    end

    log_W = sum(map(log, map(x -> x / N, sum(map(exp, log_w[1:T, :]), dims=2))));

    return(x_pf, log_w, log_W)

end

I pushed the functions and they are in flu_blackop_ver.jl

@hrishika-alaj we should put the code into a jupyter notebook with explanations so Adrianne can read it and try out the code.