[Request] Alternative parameterization for negative binomial distributions?

Question

[Request] Alternative parameterization for negative binomial distributions?

diaorch opened this issue 5 months ago · comments

Currently, the struct NegativeBinomial uses the parameters r and p, as in the interpretation of negative binomial distribution as the distribution of the number of failures in a sequence of Bernoulli trials with probability of success p that continue until r successes occur.

Another way to describe a negative binomial distribution is through the mean and the size/dispersion parameters. This is worth implementing especially given that a lot of the statistical questions center around the mean of the distribution.

r and p can be calculated from the mean and the dispersion parameter.

Perhaps the alternative parameterization of negative binomial distributions can be a useful addition to the crate's functionality?

Orion Yeung · Answer 1 · Thu Feb 29 2024 23:28:33 GMT+0800 (China Standard Time)

I expect that you'd want this at initialization, but I don't have the background to know when else a different parametrization would be useful. Can you help me understand such a scenario?

diaorch · Answer 2 · Tue Apr 02 2024 03:35:29 GMT+0800 (China Standard Time)

Thank you for getting back to me!

Here's a quick summary of what parameterizations we are discussing: the current parameterization of negative binomial distribution as implemented uses r and p, which have specific meanings in terms of Bernoulli trials. The alternative parameterization I am interested in uses mean and dispersion as parameters, which is described in more detail in the last two paragraphs of "Alternative formulation" of negative binomial distribution on Wikipedia page. Also, I appreciate the warning in the current documentation to note carefully the meaning of the parameters.

The mean-and-dispersion is more widely used in regression analysis, because the explanatory variables can be linked to the mean, similar to linear regression. This is also, in my opinion, more intuitive when it comes to interpreting the effects of explanatory variables on the negative binomial counts. (Personally, I don't think I have seen negative binomial regression done with the r-and-p parameterization.) In my work, having the mean in parameterization also helps me compare a negative binomial distribution with a Poisson distribution of the same mean, since I can more quickly evaluate the dispersion level in the negative binomial distribution. I would also imagine that the mean-and-dispersion parameterization is more useful outside of a frequentist statistics context.

The parameters, or statistics like means and variances, can be derived using the current parameterization. In my own codes, I just wrote a separate struct that "wraps" the calls to the statrs::distribution::NegativeBinomial but with calculations to convert the r-and-p parameterization to the mean-and-dispersion parameterization. I imagine that something similar can be done natively within statsrs, which will make the use of negative binomial distribution more consistent, and it shouldn't break anything. I also understand that it may not be a priority, since conversion on the user end, such as the one I have done, is reasonably doable.

tessob · Answer 3 · Tue Apr 02 2024 14:52:20 GMT+0800 (China Standard Time)

This "Alternative Parameterization" usually called the "Method of moments estimator". This parameterization exists for many distributions, so in my opinion it is rather a trait.

Orion Yeung · Answer 4 · Fri Apr 05 2024 02:00:31 GMT+0800 (China Standard Time)

Hmm, were it a trait, how would you define that, i.e. what methods and generics would go with it?

tessob · Answer 5 · Fri Apr 05 2024 16:09:13 GMT+0800 (China Standard Time)

I don't really think such a feature is one of the highest priorities, in any case I think the implementation should look like the following:

trait StandardizedMoments {
    fn mean(&self) -> Option<f64>;
    fn variance(&self) -> Option<f64>;
    fn skewness(&self) -> Option<f64>;
    fn kurtosis(&self) -> Option<f64>;
}

trait MethodOfMomentsEstimator {
    fn from_moments<M: StandardizedMoments>(moments: M) -> Result<Self>;
}

Skewness & kurtosis here as negative binomial distribution (not only this distribution) can be fitted not only to fist 2 moments, but to first 4 moments. There are multiple ways how to fit data to distributions.

Orion Yeung · Answer 6 · Mon Apr 08 2024 00:04:00 GMT+0800 (China Standard Time)

I see this as a way to refer to distributions or estimators with constraints by their moments instead of other parameters. I was expecting something narrower, things that would specify distributions given family and number of moments that fully specify the parametrization of the distribution.

I see usefulness in thinking about moments by specifying mean [and variance [and skew...]] , (notation is bash-style optionals) but I think these are little far off from where the crate is now

specifying a moment without specifying all moments of lower order
overconstrained number of moments relative to parameters under some sense of fit

Not opposed to them, but think they would take a significant amount of work.

@diaorch how close is this to what you were wanting?

diaorch · Answer 7 · Mon Apr 08 2024 11:51:45 GMT+0800 (China Standard Time)

The discussion brings up several good points. Firstly, I agree that it's likely not of the highest priority for the crate right now.

Secondly, I wasn't considering generalizing to the Method of Moments Estimator, but I agree that if we are going with the Method of Moments Estimator in general, a trait would be a better choice than, say, a method specific to a distribution struct. As for how exactly the trait should work, I have to admit it's a bit beyond my ability for me to confidently conclude the discussion.

Hope this is still helpful.

tessob · Answer 8 · Mon Apr 08 2024 23:58:38 GMT+0800 (China Standard Time)

@YeungOnion I don't have ready to implement API design. Maybe instead of trait it could be a builder pattern... this way set of parameters can be effectively constrained if, for instance, set of variance will return an "extended" builder.

As an additional example – Gamma distribution can be parametrized by:

Mean only – and it will be identical to exponential distribution.
Mean and variance – using method of moments from Wiki.
Mean, variance with skewness and/or kurtosis – using numerical optimizer.

Orion Yeung · Answer 9 · Tue Apr 09 2024 00:23:54 GMT+0800 (China Standard Time)

@diaorch that's alright by me, but I think as a user you have some great input for how you'd like it to work as a library despite implementation. Could you share what the struct you wrapped the negative binomial distribution in?

Orion Yeung · Answer 10 · Tue Apr 09 2024 00:24:32 GMT+0800 (China Standard Time)

@tessob yeah, I think it will take some digging, perhaps as its own feature request. I also think a builder pattern would work well, as it seems like there are a few things that can come out depending on what info is specified, assuming all have specified the family of distribution,

Underconstrained parameters for distribution. If you also specify a fit function and an optimizer, then you'd specify parametrized distribution
Exactly constrained parameters for distribution, this is equivalent to specifying a distribution but perhaps not in the parameters we use in the existing constructors.
Overconstrainted parameters for distribution with fit function. I don't think this makes sense without a fit function. Then adding an optimizer would specify parameters for a distribution?

Thoughts?