no_paths doesn't work when using prior_knowledge

Question

no_paths doesn't work when using prior_knowledge

kargo113 opened this issue 4 years ago · comments

kargo commented 4 years ago

Hi,

I really appreciate this repository because I can apply LINGAM to the system very quickly.

Now, I have one question, "Does no paths work correctly?"

For example of this notebook: https://github.com/cdt15/lingam/blob/master/examples/DirectLiNGAM(PriorKnowledge).ipynb,

generete prior knowledge,

prior_knowledge = make_prior_knowledge(
    n_variables=6,
    exogenous_variables=[0],
    no_paths=[[2,1]])
print(prior_knowledge)
make_prior_knowledge_graph(prior_knowledge)

outout data is

[[ 0  0  0  0  0  0]
 [-1  0  0 -1 -1 -1]
 [-1 -1  0 -1 -1 -1]
 [-1 -1 -1  0 -1 -1]
 [-1 -1 -1 -1  0 -1]
 [-1 -1 -1 -1 -1  0]]

It seems the path "2 -> 1" is zero.

However,
if the data fit model

model = lingam.DirectLiNGAM(prior_knowledge=prior_knowledge)
model.fit(X)

output model.adjacency_matrix_ is


	0	1	2	3	4	5
0	0.000000	0.0	0.000000	0.000000	0.0	0.0
1	2.986726	0.0	2.006062	0.000000	0.0	0.0
2	0.000000	0.0	0.000000	6.016333	0.0	0.0
3	0.299046	0.0	0.000000	0.000000	0.0	0.0
4	7.984485	0.0	-0.990590	0.000000	0.0	0.0
5	3.952478	0.0	0.000000	0.000000	0.0	0.0

The path "2 -> 1" has value 2.006062.

Is it the correct output value?
When using "no paths", the value should be zero, just I think.

Shohei Shimizu · Answer 1 · Tue Aug 04 2020 19:49:54 GMT+0800 (China Standard Time)

Hi, this prior knowledge option does not necessarily force the estimated graph to satisfy the prior knowledge given by users. DirectLiNGAM algorithm implemented in this library estimates the causal orders of variables one by one. Therefore, for example, if esimation of the causal orders of some variables fails before the causal orders of the variables about which prior knowledge is available are estimated, then sometimes the prior knowledge cannot be used or sometimes the algorithm might have to estimate some causal orders that might be wrong. Prior knowledge about exogenous variables and sink variables are more likely to be reflected to the output in DirectLiNGAM. Though this "soft" way of using prior knowledge might be different from what some users expect, we thought this option is still helpful to make the estimation better.

kargo · Answer 2 · Wed Aug 05 2020 09:35:35 GMT+0800 (China Standard Time)

Thank you for your answer.
I understand the soft way "no_paths" doesn't force the value to zero.

In addition, is there a better solution that some path values will be zero or quite small value?

Because, some value should be zero when LINGAM was applied to a business problem.
In other words, even though the causal effect between variable A and variable B obviously does not exist, LINGAN sometimes estimates the value is not zero but high.

Shohei Shimizu · Answer 3 · Wed Aug 05 2020 13:21:57 GMT+0800 (China Standard Time)

DirectLiNGAM roughly consists of two steps. First, it estimates causal orders of variables. Second, it estimates the coefficients. If the estimated causal orders are acceptable, putting the coefficient from A to B to be zero and estimating the other coefficients based on the estimated causal orders might be a compromise. This can be done using traditional path analysis or structural equation modeling code package.
Another way might be to compute the bootstrapp probability of the directed edge from A to B. The bootstrap probability might be not quite large.

kargo · Answer 4 · Wed Aug 05 2020 14:30:03 GMT+0800 (China Standard Time)

Thanks for your suggestion. I understand these solutions.

Thank you so much.

Shohei Shimizu · Answer 5 · Mon Feb 22 2021 15:20:57 GMT+0800 (China Standard Time)

Now in v1.5.2, you can FORCE prior knowledge on causal ORDERS into estimation, e.g., x1 cannot cause x2.

kargo · Answer 6 · Wed Feb 24 2021 09:39:16 GMT+0800 (China Standard Time)

Thank you for applying "FORCE prior knowledge".
I'll try to use this method when analyzing our data.