py-why / causal-learn

Causal Discovery in Python. It also includes (conditional) independence tests and score functions.

Home Page:https://causal-learn.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError: math domain error in PC with missing data

priamai opened this issue · comments

Hi there,
my input data is like this:

image

I then want to discover with missing values:

from causallearn.search.ConstraintBased.PC import pc
dataset= X.to_numpy()
sub_cols = X.columns
# default parameters
cg = pc(dataset,alpha=0.05,indep_test='mv_fisherz',mvpc=True)

Full error:


ValueError                                Traceback (most recent call last)
Cell In[206], line 5
      3 sub_cols = X.columns
      4 # default parameters
----> 5 cg = pc(dataset,alpha=0.05,indep_test='mv_fisherz',mvpc=True)

File /opt/conda/lib/python3.10/site-packages/causallearn/search/ConstraintBased/PC.py:41, in pc(data, alpha, indep_test, stable, uc_rule, uc_priority, mvpc, correction_name, background_knowledge, verbose, show_progress, node_names, **kwargs)
     39     if indep_test == fisherz:
     40         indep_test = mv_fisherz
---> 41     return mvpc_alg(data=data, node_names=node_names, alpha=alpha, indep_test=indep_test, correction_name=correction_name, stable=stable,
     42                     uc_rule=uc_rule, uc_priority=uc_priority, background_knowledge=background_knowledge,
     43                     verbose=verbose,
     44                     show_progress=show_progress, **kwargs)
     45 else:
     46     return pc_alg(data=data, node_names=node_names, alpha=alpha, indep_test=indep_test, stable=stable, uc_rule=uc_rule,
     47                   uc_priority=uc_priority, background_knowledge=background_knowledge, verbose=verbose,
     48                   show_progress=show_progress, **kwargs)

File /opt/conda/lib/python3.10/site-packages/causallearn/search/ConstraintBased/PC.py:200, in mvpc_alg(data, node_names, alpha, indep_test, correction_name, stable, uc_rule, uc_priority, background_knowledge, verbose, show_progress, **kwargs)
    198 indep_test = CIT(data, indep_test, **kwargs)
    199 ## Step 1: detect the direct causes of missingness indicators
--> 200 prt_m = get_parent_missingness_pairs(data, alpha, indep_test, stable)
    201 # print('Finish detecting the parents of missingness indicators.  ')
    202 
    203 ## Step 2:
    204 ## a) Run PC algorithm with the 1st step skeleton;
    205 cg_pre = SkeletonDiscovery.skeleton_discovery(data, alpha, indep_test, stable,
    206                                               background_knowledge=background_knowledge,
    207                                               verbose=verbose, show_progress=show_progress, node_names=node_names)

File /opt/conda/lib/python3.10/site-packages/causallearn/search/ConstraintBased/PC.py:275, in get_parent_missingness_pairs(data, alpha, indep_test, stable)
    272 ## Get the index of parents of missingness indicators
    273 # If the missingness indicator has no parent, then it will not be collected in prt_m
    274 for missingness_i in missingness_index:
--> 275     parent_of_missingness_i = detect_parent(missingness_i, data, alpha, indep_test, stable)
    276     if not isempty(parent_of_missingness_i):
    277         parent_missingness_pairs['prt'].append(parent_of_missingness_i)

File /opt/conda/lib/python3.10/site-packages/causallearn/search/ConstraintBased/PC.py:363, in detect_parent(r, data_, alpha, indep_test, stable)
    361 if len(Neigh_x) >= depth:
    362     for S in combinations(Neigh_x, depth):
--> 363         p = cg.ci_test(x, y, S)
    364         if p > alpha:
    365             if not stable:  # Unstable: Remove x---y right away

File /opt/conda/lib/python3.10/site-packages/causallearn/graph/GraphClass.py:58, in CausalGraph.ci_test(self, i, j, S)
     56 # assert i != j and not i in S and not j in S
     57 if self.test.method == 'mc_fisherz': return self.test(i, j, S, self.nx_skel, self.prt_m)
---> 58 return self.test(i, j, S)

File /opt/conda/lib/python3.10/site-packages/causallearn/utils/cit.py:388, in MV_FisherZ.__call__(self, X, Y, condition_set)
    386 if abs(r) >= 1: r = (1. - np.finfo(float).eps) * np.sign(r) # may happen when samplesize is very small or relation is deterministic
    387 Z = 0.5 * log((1 + r) / (1 - r))
--> 388 X = sqrt(len(test_wise_deletion_XYcond_rows_index) - len(condition_set) - 3) * abs(Z)
    389 p = 2 * (1 - norm.cdf(abs(X)))
    390 self.pvalue_cache[cache_key] = p

ValueError: math domain error

Hi, it seems that #119 and #29 are related to this issue. Could you please try to add some random noises and see if it remains? I conjecture that it might be a violation of some assumptions in the data, such as singularity somewhere.

Hi there, sounds like it but why is not generating the singularity Exception as it was discussed in the thread.
Maybe it has not been implemented even though the issue was closed suggesting it will produce a meaningful error?

We had updated the code but perhaps your case was not covered (#58). Would you mind providing us (perhaps via email: yujiazh@cmu.edu) with a minimal reproducing example for your issue?