PanelOLS: Produced different Std. errors from Stata when clustered by the same variable in linearmodels

Question

PanelOLS: Produced different Std. errors from Stata when clustered by the same variable in linearmodels

yuz0101 opened this issue 2 years ago · comments

Hi there, it seems that we can only use one cov_type. While comparing results based on a model clustered by a variable between linearmodels and Stata, I found that Stata's std. errors are robust adjusted while linearmodels are not (All coefficients are the same).

Please may I know how I can get the same results as Stata by using linearmodels?

Kevin Sheppard · Answer 1 · Thu Oct 13 2022 00:08:02 GMT+0800 (China Standard Time)

You need to change the cov_type in a call to fit.

https://bashtage.github.io/linearmodels/panel/panel/linearmodels.panel.model.PanelOLS.fit.html#linearmodels.panel.model.PanelOLS.fit

yuz0101 · Answer 2 · Thu Oct 13 2022 00:18:09 GMT+0800 (China Standard Time)

Hi bashtage, thanks for your quick reply. I follow that. For example,

mod = PanelOLS.from_formula(formula, data) reg = mod.fit(cov_type='clustered', clusters=data['var'])

The standard errors from the reg are not robust adjusted.

If I change it to reg = mod.fit(cov_type='robust'), then results are not based on the clustered by var one.

Could you please help me with it? How can I get both robust- and clustered-results? I appreciate your time.

Kevin Sheppard · Answer 3 · Thu Oct 13 2022 00:36:18 GMT+0800 (China Standard Time)

What is formula? The definition of robust depends on whether entity effects are included. Clustered std errors are robust to heteroskedasticity.

yuz0101 · Answer 4 · Thu Oct 13 2022 01:18:09 GMT+0800 (China Standard Time)

Thanks for that. y ~ 1 + a*b*c + a + b + c is the formula I'm using. There is no entity effects in my regression model. : )

Kevin Sheppard · Answer 5 · Thu Oct 13 2022 01:21:03 GMT+0800 (China Standard Time)

What Stata command are you using?

yuz0101 · Answer 6 · Thu Oct 13 2022 01:33:02 GMT+0800 (China Standard Time)

gen abc = a*b*c
reg y abc a b c, vce (cluster var)

Kevin Sheppard · Answer 7 · Thu Oct 13 2022 01:51:27 GMT+0800 (China Standard Time)

What you have looks correct to me

mod = PanelOLS.from_formula(formula, data)
reg = mod.fit(cov_type='clustered', clusters=data['var'])

You could also use

mod = PooledOLS.from_formula(formula, data)
reg = mod.fit(cov_type='clustered', clusters=data['var'])

and the results should be the same.

yuz0101 · Answer 8 · Thu Oct 13 2022 02:00:58 GMT+0800 (China Standard Time)

Thank you Kevin, for all your efforts.

I still got different results. I noticed that the std. error from Stata is robust (pls have a look at the scrnshot below). Sorry for the mess data shown below, they are shown in different orders. Would you please look at the last line of pic 1 (black background) and the first line of pic 2 (white background). They are the same variable.

The results from linearmodels:

The results from Stata:

Kevin Sheppard · Answer 9 · Thu Oct 13 2022 02:05:44 GMT+0800 (China Standard Time)

What happends when you take the ratio of the parameter variance from stata to that from linear models? Stata has a log of magic small sample adjustments it makes. If this ratio is the same for all parameters, this indicates that it is a scalar adjustment.

I did a quick check and this looks to be the issue. What does changing the value of debiased do?

yuz0101 · Answer 10 · Thu Oct 13 2022 02:35:29 GMT+0800 (China Standard Time)

I just did a check, and most of the std. errors from Sata is 70% of the std. errors from linearmodels. For the debiased, I did the check and found nothing affecting the results.

Kevin Sheppard · Answer 11 · Thu Oct 13 2022 02:54:06 GMT+0800 (China Standard Time)

How large are your clusters and how many do you have? The ratio of the variances looks a lot like 2. If use use xtreg in Stata, do you get the same as reg? IME the adjustments can differ across the different estimators in Stata.

…

On Wed, Oct 12, 2022, 19:35 yuz0101 ***@***.***> wrote: I just did a check, and most of the std. errors from Sata is 70% of the std. errors from linearmodels. For the debiased, I did the check and found nothing affecting the results. — Reply to this email directly, view it on GitHub <#477 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKTSRNET7M7XEYRXJPVKB3WC4AH3ANCNFSM6AAAAAARDOA3WE> . You are receiving this because you commented.Message ID: ***@***.***>

yuz0101 · Answer 12 · Thu Oct 13 2022 18:31:34 GMT+0800 (China Standard Time)

Hi Kevin, you are right. Massive thanks for your time. The std, errors in my model are clustered by only one dummy variable. However, I tried that with xtreg and found that it produced errors in xtreg, mainly due to the usage of clusters. Then I tried to cluster the id, which completely worked in both linearmodels and Stata (using a command of reg), and they produced the same results. However, the same setting with xtreg in Stata produces different results from using reg and using linearmodels.

Q: Is it reasonable to cluster the std. errors with a dummy variable rather than a variable consisting of a number of clusters?