bashtage / linearmodels

Additional linear models including instrumental variable and panel data models that are missing from statsmodels.

Home Page:https://bashtage.github.io/linearmodels/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError raised without a dimention mismatch.

baharcos opened this issue · comments

I am getting ValueError: dependent and exog must have the same number of observations. from PanelOLS raised by the line 414 in source code: if y.shape[0] != x.shape[0]:
Even though I get False when I run the same line. So y.shape[0] == x.shape[0].

Can you post some example code that will produce the problem? Any change all of your x data has missing values somewhere in a row?

Sorry, the problem was the missing indexes which took me a while to figure out from the error message.
I think stating the data type and format of the arguments already in the PanelOLS function's documentation a bit more explicitly could improve the user experience. At least it would have for me!

@bashtage I have a similar problem with my own data:

from linearmodels.panel.model import PanelOLS
from patsy import dmatrices

df = pd.DataFrame({'HUD_TOTAL_UNITS': {0: 287,
  1: 309, 2: 106,3: 48,4: 133,5: 2767,6: 354,7: 78,8: 1063,9: 87},
 'ACS_SHARE_STUDENT': {0: 0.1667319663924078,1: 0.17409332238503,2: 0.1531424340974591,3: 0.140645770849126,4: 0.1874776433804533,
 5: 0.1661870742518456,6: 0.1171084956864537,7: 0.1359496792732456,8: 0.2012157348613907,9: 0.181423395015628},
 'SD_TOTALREV_w': {0: 169245923.26139638,1: 505645392.4999371,2: 130964271.42492916,3: 224003870.101752,4: 212067226.9743809,
  5: 1074644265.1849256,6: 230845244.15128115,7: 241304003.7668597,8: 211527019.86738232,9: 219589384.34815124}})

y, X = dmatrices('HUD_TOTAL_UNITS ~ ACS_SHARE_STUDENT + SD_TOTALREV_w', data = df, return_type='dataframe')
model = PanelOLS(y,X, entity_effects=True, time_effects=True).fit()

Running the code above spits the following error:

ValueError: dependent and exog must have the same number of observations. The number of observations in dependent is 10, and the number of observations in exog is 30.

even though print(y.shape[0] == X.shape[0]) is true and y.shape == (10,1) and X.shape == (10,3)
@baharcos 's point seemingly does not apply to my own issue.

  1. What is the full error?
  2. What is the index of x and y?
  3. What are their full shapes?

This is happending because panelOLS requires a 2-level multiindex. When you pass X as a frame without a multiindex it assumes the columns are one of the level of the index, and so the X matrix which is (10,3) is rehsaped to be (30,1), which does not agree with the 10 observations of y.

See https://bashtage.github.io/linearmodels/panel/examples/data-formats.html