matthieugomez / statar

R package for data manipulation — inspired by Stata's API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tlag does not produce correct result

opened this issue · comments

The match() function in tlag matches a year t that is missing for one individual to another individual where year t is not missing

df <- dplyr::data_frame(
    id    = c(1, 1, 1, 1, 1, 2, 2, 2),
    date  = c(1992, 1989, 1991, 1990, 1994, 1992, 1991, 1993),
    value = c(4.1, NA , 3.3, 5.3, 3.0, 3.2, 5.2, 7.8)
)
df <- df[order(df$id, df$date), ]
df
Source: local data frame [8 x 4]

     id  date value value_lag
  (dbl) (dbl) (dbl)     (dbl)
1     1  1989    NA        NA
2     1  1990   5.3        NA
3     1  1991   3.3       5.3
4     1  1992   4.1       3.3
5     1  1994   3.0       7.8
6     2  1991   5.2       5.3
7     2  1992   3.2       3.3
8     2  1993   7.8       4.1
df$value_lag <- statar::tlag(df$value, n = 1, time = df$date)
df 
Source: local data frame [8 x 4]

     id  date value value_lag
  (dbl) (dbl) (dbl)     (dbl)
1     1  1989    NA        NA
2     1  1990   5.3        NA
3     1  1991   3.3       5.3
4     1  1992   4.1       3.3
5     1  1994   3.0       7.8
6     2  1991   5.2       5.3
7     2  1992   3.2       3.3
8     2  1993   7.8       4.1

value_lag for id = 1 and year = 1994 is 7.8 while it should be NA [because there is no year = 1994-1 = 1993 for id = 1]

One solution would be to do the matching per individual times (splitting times by individual), not over the times for all individuals

This is not a bug. To apply tlag within groups defined by id, use group_by:

df %>% group_by(id) %>% mutate(valuel = tlag(value, n = 1, time = date))

Thank you for clarification! Thought the panel / group structure is already taken care of (I thought that is what Stata does by L.variable (once it knows about the panel structure?).
Maybe adapt the website a bit? http://www.princeton.edu/~mattg/statar/group-by.html

Thanks for spotting the mistake on the website! PS: now, tlag returns an error in the case of duplicate times.