DanielBok / copulae

Multivariate data modelling with Copulas in Python

Home Page:https://copulae.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about BaseCopula.random() Method

Hamzehn opened this issue · comments

I'm new to using copulas and have been finding this package really helpful. Going through the explainer on copulas here I noticed that the following steps were followed when generating random variables from copulas:

  1. Draw random samples from a multivariate distribution that describes the "dependency structure" between the target random variables in the dataset. The example used a multivariate normal distribution to describe the dependency structure.
  2. Convert the above random samples to probabilities using the dependency structure's marginal CDFs. The example called the norm.cdf on each column of the random samples obtained from STEP1 to arrive at x1 and x2
  3. Use the results obtained in STEP2 to arrive at the ultimate random values desired for the target random variables by using their marginal distributions' PPFs. The example assumed one of the variables was distributed according to Student distribution, and the other according to Laplace.

My question is if I'm trying to implement the above workflow entirely using the copulae package, and I start by creating, say an Gumbel copula object, and then call the random() method on it and ask it to generate a 1000 samples, do I get the equivalent of what STEP1 above created or STEP2?

Do I take the result of random() and just apply my target distributions' PPFs to it like the below?

from copulae import GumbelCopula
import pandas as pd
from scipy import stats

alphas = [2.43, 1.97, 2.21, 3.01, 3.22] # Target random variables are 5 pareto distributed vars with these parameters

ndim = len(alphas)
gmbl_cop = GumbelCopula(theta=2, dim=ndim)

u_df = pd.DataFrame(gmbl_cop.random(1000))

x_df = pd.DataFrame()
for i in range(ndim):
    col = df.columns[i]
    x_df[col] = df[col].apply(lambda x: stats.pareto.ppf(x, alphas[i]-1))

Thanks

Ok, I think I've answered my own question by looking at the source for the Gaussian copula and I can see the random() method there does both steps 1 and 2 above, so it sounds like after calling random() one can proceed to use the target marginal distributions PPFs like in the code I shared.

Closing this. Thanks a lot for the great work on this package!