bashtage / linearmodels

Additional linear models including instrumental variable and panel data models that are missing from statsmodels.

Home Page:https://bashtage.github.io/linearmodels/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Save *both* estimated effects

effedp opened this issue · comments

Hey there,

I run the textbook example of PanelOLS with both "entity" and "time" FEs. I am trying to save the two estimated effects using linearmodels.panel.results.PanelEffectsResults.estimated_effects, but I obtain only one column in the output DataFrame, which does not even seem to resemble either of the two FEs (I am double checking everything with Stata's reghdfe).

from linearmodels.datasets import wage_panel
from linearmodels import PanelOLS

data = wage_panel.load()
year = pd.Categorical(data.year)
data = data.set_index(["nr", "year"])

exog_vars = ['expersq']
exog = sm.add_constant(data[exog_vars])

mod = PanelOLS(data.lwage, exog, entity_effects=True, time_effects=True)
result = mod.fit(cov_type='clustered')

result.estimated_effects

gives me the following output

nr		year		estimated_effects
13		1980		-0.993351
		1981		-0.835877
		1982		-0.728156
		1983		-0.620804
		1984		-0.479181
...		...		...
12548	        1983		-0.212544
		1984		-0.070921
		1985		0.059622
		1986		0.212194
		1987		0.382053

4360 rows × 1 columns

How can I save both the estimated effects?
Am I missing something, or is this a bug?

Thank you for your help!

estimated_effects are the total combined effects included in the model. The method used to remove FE does not directly lead to a separate estimate of the effects. In balanced panels it should be easy to get them by using the estimated effects of the LHS variable in including only entity effects. The estimated effects from this auxiliary model will be the entity effects, and the residuals will be the time effects.

Hello Kevin and thanks for your reply. I am not sure it is that easy. If I got it correctly, your solution would imply that estimating a regression of a dependent variable on individual effects and then taking the residual as time effect is the same as a regression of the dependent variable on both time and individual effects, which is not the case.

Do you think we will see an option for a separate estimate of the effects soon? It would be handy for many applications, and there is no way to do it in Python to the best of my knowledge.

Hello, as stated in his reply, Bashtage's method works for balanced panels, but not for unbalanced panels. Like the OP, I needed to extract both individual and time FEs in an unbalanced panel, here’s my quick and dirty solution, hope it helps others who stumble across this.

# Fit some model with PanelOLS
mod = PanelOLS(
    dependent = df['y'], 
    exog = exog, 
    entity_effects = True,
    time_effects = True
    )

twfe = mod.fit()

# Get the estimated effects.
ees = twfe.estimated_effects.__deepcopy__(False)
ees.reset_index(inplace = True)
ees.columns = ['individuals', 'time', 'estimated_effects']
ees = ees.drop_duplicates(subset = ['individuals', 'time'])
ees.reset_index(inplace = True, drop = True)


# Make a list of all possible years, a place to store year fixed effects, and a running sum.
time = np.sort(ees.time.unique())
time_fe, period, running_sum = [], [], 0

for t in range(len(time) - 1):
    # Find an individual with data recorded in the base year, b, AND in year b + 1. Stop.
    b, c = time[t], time[t+1]
    individuals_in_base_period = list(ees[(ees.time == b)].individuals)
    individuals_in_following_period = list(ees[(ees.time == c)].individuals)

    for i, j in enumerate(individuals_in_base_period):
        if j in individuals_in_following_period:
            ind = j 
            break
        else:
            if i == len(individuals_in_base_period):
                print("Try another method. Sorry.")

    # Calculate year b+1 fixed effect WITHIN the individual.
    year_year_diff = ees[(ees.individuals == j) & (ees.time == c)].estimated_effects.iloc[0] - ees[(ees.individuals == j) & (ees.time == b)].estimated_effects.iloc[0]
    time_fe.append(running_sum + year_year_diff)
    period.append(time[t+1])
    running_sum += year_year_diff

# Merge with the original df to get the individual fixed effects.
df_time = pd.DataFrame(period, time_fe)
df_time.reset_index(inplace = True)
df_time.columns = ['time_fe', 'time']

ees = ees.merge(df_time, how = "left", on = "time")
ees['individual_fe'] = ees['estimated_effects'] - ees['time_fe']

# And you've got time + individual FEs.