tlag does not produce correct result
opened this issue · comments
The match() function in tlag matches a year t that is missing for one individual to another individual where year t is not missing
df <- dplyr::data_frame(
id = c(1, 1, 1, 1, 1, 2, 2, 2),
date = c(1992, 1989, 1991, 1990, 1994, 1992, 1991, 1993),
value = c(4.1, NA , 3.3, 5.3, 3.0, 3.2, 5.2, 7.8)
)
df <- df[order(df$id, df$date), ]
df
Source: local data frame [8 x 4]
id date value value_lag
(dbl) (dbl) (dbl) (dbl)
1 1 1989 NA NA
2 1 1990 5.3 NA
3 1 1991 3.3 5.3
4 1 1992 4.1 3.3
5 1 1994 3.0 7.8
6 2 1991 5.2 5.3
7 2 1992 3.2 3.3
8 2 1993 7.8 4.1
df$value_lag <- statar::tlag(df$value, n = 1, time = df$date)
df
Source: local data frame [8 x 4]
id date value value_lag
(dbl) (dbl) (dbl) (dbl)
1 1 1989 NA NA
2 1 1990 5.3 NA
3 1 1991 3.3 5.3
4 1 1992 4.1 3.3
5 1 1994 3.0 7.8
6 2 1991 5.2 5.3
7 2 1992 3.2 3.3
8 2 1993 7.8 4.1
value_lag for id = 1 and year = 1994 is 7.8 while it should be NA [because there is no year = 1994-1 = 1993 for id = 1]
One solution would be to do the matching per individual times (splitting times by individual), not over the times for all individuals
This is not a bug. To apply tlag
within groups defined by id, use group_by
:
df %>% group_by(id) %>% mutate(valuel = tlag(value, n = 1, time = date))
Thank you for clarification! Thought the panel / group structure is already taken care of (I thought that is what Stata does by L.variable (once it knows about the panel structure?).
Maybe adapt the website a bit? http://www.princeton.edu/~mattg/statar/group-by.html
Thanks for spotting the mistake on the website! PS: now, tlag returns an error in the case of duplicate times.