sdv-dev / Copulas

A library to model multivariate data using copulas.

Home Page:https://sdv.dev/Copulas/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve the fit of the Beta distribution: Use the new `loc` and `scale`

npatki opened this issue · comments

Problem Description

In the beta univariate fit function, we perform the following steps:

  1. Estimate the loc and scale parameters
  2. Call the scipy fit function using the loc and scale as starting guesses
  3. After the fit is complete, get the values for a and b

The issue is that step 3 also returns new loc and scale parameters. The ones we input are just starting guesses. When we use the same loc and scale as step 1, they are out-of-sync with the a and b parameters.

Expected behavior

Stop using the initial guesses for loc and scale. Update them when setting a and b.

i.e. change line 30

def _fit(self, X):
        loc = np.min(X)
        scale = np.max(X) - loc
        a, b, loc, scale = beta.fit(X, loc=loc, scale=scale)
        self._params = {
            'loc': loc,
            'scale': scale,
            'a': a,
            'b': b
        }

Additional context

We verified this change by comparing our fit distribution to scipy. Scipy's fit is better because it's actually updating the loc and scale parameters.

Sometimes it's off by a little bit --
image

Sometimes, by a lot --
image