py-why / causal-learn

Causal Discovery in Python. It also includes (conditional) independence tests and score functions.

Home Page:https://causal-learn.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FCM lags suspicious results

priamai opened this issue · comments

Hi there,
I wanted to see the effects of lagged variables on the car dataset so I did a quick sanity check:

from causallearn.search.FCMBased import lingam

for lags in range(1,4):
  print("\n\nLags %d" % lags)
  print("Total columns %d " % data.shape[1])
  model = lingam.VARLiNGAM(lags=lags,random_state=1,prune=False)
  model.fit(data)

  print("Causal Order features %d " % len(model.causal_order_))
  print(model.causal_order_)
  print("Matrix lags %d " % model.adjacency_matrices_.shape[0])
  #print(model.adjacency_matrices_[0])
  #print(model.adjacency_matrices_[1])
  #print(model.residuals_)

The output doesn't show any influence of the lag factor, see below:

Lags 1
Total columns 10 
Causal Order features 10 
[4, 8, 1, 6, 3, 0, 2, 7, 5, 9]
Matrix lags 2 


Lags 2
Total columns 10 
Causal Order features 10 
[4, 8, 1, 6, 3, 0, 2, 7, 5, 9]
Matrix lags 2 


Lags 3
Total columns 10 
Causal Order features 10 
[4, 8, 1, 6, 3, 0, 2, 7, 5, 9]
Matrix lags 2 

I would expect that the matrix lag and the causal order features increases as the lag increases?
For example when Lag is equal to 2, I should see double the columns and features?

Thanks for the question. The optimal order (lag) for the vector regression model is determined automatically using bayesian information criterion. As a result, even if we set lags as 100, the optimal one might still be selected as 2. To avoid that optimization process, you may consider setting criterion as None, e.g.,

model = lingam.VARLiNGAM(lags=3, criterion=None)

What's the expected result when lags=3, is it going to permutate all the possible columns 3 times?

Sorry, I'm not sure if I fully understand the question. The lags here mean those for lagged causal relations (in contrast to contemporaneous causal relations).

Okay so this is my understanding so far (we should really put some basic example in the test folder):
let's say I have a classical fork DAG: X <- A -> Y.
The generation process implies that: A- lag (5) -> X and A -> lag (7) -> Y.
What's the output going to look like?
Will each edge have a different lag value?
Can we build a simple toy example to verify the desired behaviour?

Also would be interesting to see how to import the output into a DAG for DoWhy.