bashtage / linearmodels

Additional linear models including instrumental variable and panel data models that are missing from statsmodels.

Home Page:https://bashtage.github.io/linearmodels/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PanelOLS: Produced different Std. errors from Stata when clustered by the same variable in linearmodels

yuz0101 opened this issue · comments

Hi there, it seems that we can only use one cov_type. While comparing results based on a model clustered by a variable between linearmodels and Stata, I found that Stata's std. errors are robust adjusted while linearmodels are not (All coefficients are the same).

Please may I know how I can get the same results as Stata by using linearmodels?

Hi bashtage, thanks for your quick reply. I follow that. For example,

mod = PanelOLS.from_formula(formula, data) reg = mod.fit(cov_type='clustered', clusters=data['var'])

The standard errors from the reg are not robust adjusted.

If I change it to reg = mod.fit(cov_type='robust'), then results are not based on the clustered by var one.

Could you please help me with it? How can I get both robust- and clustered-results? I appreciate your time.

What is formula? The definition of robust depends on whether entity effects are included. Clustered std errors are robust to heteroskedasticity.

Thanks for that. y ~ 1 + a*b*c + a + b + c is the formula I'm using. There is no entity effects in my regression model. : )

What Stata command are you using?

gen abc = a*b*c
reg y abc a b c, vce (cluster var)

What you have looks correct to me

mod = PanelOLS.from_formula(formula, data)
reg = mod.fit(cov_type='clustered', clusters=data['var'])

You could also use

mod = PooledOLS.from_formula(formula, data)
reg = mod.fit(cov_type='clustered', clusters=data['var'])

and the results should be the same.

Thank you Kevin, for all your efforts.

I still got different results. I noticed that the std. error from Stata is robust (pls have a look at the scrnshot below). Sorry for the mess data shown below, they are shown in different orders. Would you please look at the last line of pic 1 (black background) and the first line of pic 2 (white background). They are the same variable.

The results from linearmodels:
image

The results from Stata:
image

What happends when you take the ratio of the parameter variance from stata to that from linear models? Stata has a log of magic small sample adjustments it makes. If this ratio is the same for all parameters, this indicates that it is a scalar adjustment.

I did a quick check and this looks to be the issue. What does changing the value of debiased do?

I just did a check, and most of the std. errors from Sata is 70% of the std. errors from linearmodels. For the debiased, I did the check and found nothing affecting the results.

Hi Kevin, you are right. Massive thanks for your time. The std, errors in my model are clustered by only one dummy variable. However, I tried that with xtreg and found that it produced errors in xtreg, mainly due to the usage of clusters. Then I tried to cluster the id, which completely worked in both linearmodels and Stata (using a command of reg), and they produced the same results. However, the same setting with xtreg in Stata produces different results from using reg and using linearmodels.

Q: Is it reasonable to cluster the std. errors with a dummy variable rather than a variable consisting of a number of clusters?