cdt15 / lingam

Python package for causal discovery based on LiNGAM.

Home Page:https://sites.google.com/view/sshimizu06/lingam

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

no_paths doesn't work when using prior_knowledge

kargo113 opened this issue · comments

commented

Hi,

I really appreciate this repository because I can apply LINGAM to the system very quickly.

Now, I have one question, "Does no paths work correctly?"

For example of this notebook: https://github.com/cdt15/lingam/blob/master/examples/DirectLiNGAM(PriorKnowledge).ipynb,

generete prior knowledge,

prior_knowledge = make_prior_knowledge(
    n_variables=6,
    exogenous_variables=[0],
    no_paths=[[2,1]])
print(prior_knowledge)
make_prior_knowledge_graph(prior_knowledge)

outout data is

[[ 0  0  0  0  0  0]
 [-1  0  0 -1 -1 -1]
 [-1 -1  0 -1 -1 -1]
 [-1 -1 -1  0 -1 -1]
 [-1 -1 -1 -1  0 -1]
 [-1 -1 -1 -1 -1  0]]

It seems the path "2 -> 1" is zero.

However,
if the data fit model

model = lingam.DirectLiNGAM(prior_knowledge=prior_knowledge)
model.fit(X)

output model.adjacency_matrix_ is


	0	1	2	3	4	5
0	0.000000	0.0	0.000000	0.000000	0.0	0.0
1	2.986726	0.0	2.006062	0.000000	0.0	0.0
2	0.000000	0.0	0.000000	6.016333	0.0	0.0
3	0.299046	0.0	0.000000	0.000000	0.0	0.0
4	7.984485	0.0	-0.990590	0.000000	0.0	0.0
5	3.952478	0.0	0.000000	0.000000	0.0	0.0


The path "2 -> 1" has value 2.006062.

Is it the correct output value?
When using "no paths", the value should be zero, just I think.

Hi, this prior knowledge option does not necessarily force the estimated graph to satisfy the prior knowledge given by users. DirectLiNGAM algorithm implemented in this library estimates the causal orders of variables one by one. Therefore, for example, if esimation of the causal orders of some variables fails before the causal orders of the variables about which prior knowledge is available are estimated, then sometimes the prior knowledge cannot be used or sometimes the algorithm might have to estimate some causal orders that might be wrong. Prior knowledge about exogenous variables and sink variables are more likely to be reflected to the output in DirectLiNGAM. Though this "soft" way of using prior knowledge might be different from what some users expect, we thought this option is still helpful to make the estimation better.

commented

Thank you for your answer.
I understand the soft way "no_paths" doesn't force the value to zero.

In addition, is there a better solution that some path values will be zero or quite small value?

Because, some value should be zero when LINGAM was applied to a business problem.
In other words, even though the causal effect between variable A and variable B obviously does not exist, LINGAN sometimes estimates the value is not zero but high.

  1. DirectLiNGAM roughly consists of two steps. First, it estimates causal orders of variables. Second, it estimates the coefficients. If the estimated causal orders are acceptable, putting the coefficient from A to B to be zero and estimating the other coefficients based on the estimated causal orders might be a compromise. This can be done using traditional path analysis or structural equation modeling code package.

  2. Another way might be to compute the bootstrapp probability of the directed edge from A to B. The bootstrap probability might be not quite large.

commented

Thanks for your suggestion. I understand these solutions.

Thank you so much.

Now in v1.5.2, you can FORCE prior knowledge on causal ORDERS into estimation, e.g., x1 cannot cause x2.

commented

Thank you for applying "FORCE prior knowledge".
I'll try to use this method when analyzing our data.