wikimedia-research / survey-interval-responses

Estimation with non-linearly scaled interval responses in surveys

Home Page:https://wikimedia-research.github.io/survey-interval-responses/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mixture distribution is not identifiable

Dpananos opened this issue · comments

I've coded up a mixture distribution between two normals for our model just as a test example.

Under the model, there is a uniform distribution on the mixture proportion (i.e. the weights between the distributions), and the parameters (means and sds) are considered exchangeable. I think these are relatively realistic priors given the problem statement.

I've tried to relax the model a little bit by assuming the two distributions have the same standard deviation but different means. I still get incomplete mixing (large Rhats) and divergences.

Maybe we should reconsider the mixture example? Or, we could keep it but it would take some time for me to get it working right. Maybe I can reach out to the folks over at the Stan forums for help.

Just created a pull request with some simulated data if you want to give it a try again with that, but otherwise I'm a-OK with throwing the mixture idea into extensions/further work list.

What do you think of changing the mixture to a regression? In practice these kinds of surveys are likely to collect demographic info and other data that can be used to compare differences between groups of responders.

I think I just got the mixture working. Regression is a natural extension too though

Might it make sense to rethink the data we use for the simulation? Here is the mixture density we are using right now. It doesn't look much like a mixture, hence I think the model is having a tough time finding it.

Could we think of a story based on the 1% (1% owns more than 99% of the remaining population), or might it make sense to base the data off some real income statistics? If that is to onerous, maybe we can just can it. I'll let you decide as I think you've got more skin in the game than me.

> x = seq(0,5e5, 10)
> plot(x, 0.75*dlnorm(x,9,1) + 0.25*dlnorm(x, 11, 2), type = 'l')

image

We decided to can it after giving it a solid try.

Good effort, @Dpananos!