sdv-dev / Copulas

A library to model multivariate data using copulas.

Home Page:https://sdv.dev/Copulas/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

When Copulas univariate fit fails, produce a log instead of a warning

npatki opened this issue · comments

Problem Description

When using the multivariate Gaussian Copula, there are occasionally cases where scipy fails to fit a univariate distribution for a given shape. In this case, the default functionality is to catch the error and fallback to using the Gaussian distribution, which is most stable and able to be fit in almost all cases.

When we do this fallback, we produce a warning to let the user know this is happening.

UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.

Expected behavior

This information would be better off as a logged item rather than a warning.
(In other parts of the SDV ecosystem, we are using logger to dump any info or debug messages.)

Additional context

We should be careful when producing warnings to the end user. There are several reasons why it may be better to log info rather than produce a warning.

  1. In this case, there's nothing explicitly wrong -- certain scipy distributions just aren't great at fitting specific marginals.
  2. A warning captures the user's attention and indicates that something should be done differently. In this case, the user can choose a different marginal distribution, but it is very data-dependent and not always needed.
  3. A warning will disrupt any progress bars. If the information isn't actionable, this becomes a nuisance. For example the SDV's GaussianCopulaSynthesizes produces a progress bar to show the fit progress, but is interrupted every time the univariate falls back from Beta to Gaussian. There is nothing I can do as a user about this.
Learning relationships:
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'):   0%|          | 0/1565 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
  warnings.warn(warning_message)
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'):   1%|▏         | 22/1565 [00:01<02:15, 11.36it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
  warnings.warn(warning_message)
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'):   3%|▎         | 45/1565 [00:04<03:17,  7.71it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
  warnings.warn(warning_message)
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'):   8%|▊         | 125/1565 [00:11<02:24,  9.98it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
  warnings.warn(warning_message)
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'):  15%|█▍        | 228/1565 [00:19<01:45, 12.71it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
  warnings.warn(warning_message)
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'):  16%|█▌        | 254/1565 [00:21<01:04, 20.36it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
  warnings.warn(warning_message)