IQSS / Zelig

A statistical framework that serves as a common interface to a large range of models

Home Page:http://zeligproject.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

impossible to reproduce ATT() results with the setx() and sim()

albertostefanelli opened this issue · comments

Although the results are not extremely different, the results of the ATT function implemented in the Zelig package are not reproducible with the Zelig setx and sim functions. The example provided uses a matching design done with the MatchIt package. The outcome of interest is support for social spending (servicies_res), the treatment variable W is race. R script and data attached.

Snap of the dataframe.

r$> head(dput(github_exemple,"github_exemple.txt"))                         
   age race edu income gender number_childers servicies_res                 
2   26    0  13     17      1               0             4
4   58    0   9     20      1               0             3
6   60    0  14      1      1               1             3                 
8   56    0   9     28      2               0             2
10  30    0  10     19      2               2             3
11  40    0   7     11      2               0             4   

Running the native Zelig ATT function. The mean difference between those actually treated or exposed and their counterfactuals is 0.7060121

z.out1 <- zelig(servicies_res ~ age + race,
  data = matched, #control
  model = "ls")

set.seed(12)
z.att_treat <- z.out1 %>%
             ATT(treatment = "race",treat=1) %>% 
             get_qi(qi = "ATT", xvalue = "TE")

mean(z.att_treat)

Let's now try to get the ATT using the parametric method suggested in the MatchIt package.
Now the ATT (so-called first difference in the sim output) is equal to 0.7049261. The predicted y mean equals 3.796565 for the control and 4.501491 for the treated. As far as i can asses from the syntax, the estimation should be the same of what the ATT function does since in both cases the model is estimated using the entire sample with the treatment variable (race) as a predictor. However, even with the same seed, the results are slightly different.

z.out1 <- zelig(servicies_res ~ age + race, data = matched,
model = "ls")

set.seed(12)
x.out1 <- setx(z.out1, race=0) #majority 
set.seed(12)
x.out2 <- setx1(z.out1, race=1) #minority
s.out1 <- Zelig::sim(z.out1, x = x.out1, x1=x.out2)
summary(s.out1)

It might be that the ATT function uses a different approach. The MatchIt package suggest to (1) run two different models one for the control and the treatment group, (2) simulating the mean value of y imputing the value of the covariates of the reference group and (3) taking the mean difference of y_treatment - y_control.

First, the coefficients estimated from the control group are combined (imputed) with the values of the covariates of the treated units. Here the expected value of y is 3.797495.

# run the model for the control 
z.out1 <- zelig(servicies_res ~ age, 
  data = control_majority,
  model = "ls")
summary(z.out1)

# set the X using the value of the treated group
# impute the counterfactual outcome for the treatment  group
set.seed(12)
x.out1 <- setx(z.out1, data=treated_minority, cond=FALSE)

# simulate how it would be if the control would have the value of the treatment 
# for all the indipendent variables 
# betas * X for all scenarios
att_treatment <- sim(z.out1, x = x.out1)
summary(att_treatment)

Second, the coefficients estimated from the treatment group are combined (imputed) with the values of the covariates of the control units. In this case the predicted mean value of y is 4.497277

# run the model for the control 
z.out2 <- zelig(servicies_res ~ age, 
  data = treated_minority,
  model = "ls")
summary(z.out1)

# set the X using the value of the treated group
# impute the counterfactual outcome for the treatment  group
set.seed(12)
x.out2 <- setx(z.out2, data=control_majority, cond=TRUE)

# simulate the value of y if if the control would have the value of the treatment 
# for all the independent variables 
# betas * X for all scenarios
att_control <- sim(z.out2, x = x.out2)
summary(att_control)

Third, the difference between the y mean(s) is our ATT witch, in this case, equals 0.6997821. If i understood correctly, compared to the parametric estimation before detailed, the small difference in the mean estimates are due to the fact that in this case the estimatation is done not using the entire sample but the control/treatment group separately. However, the estimate is different from both the parametric example (0.7049261) and the ATT Zelig function (0.7060121).

(mean(att_control$sim.out[[1]][1][[1]][[1]]) - mean(att_treatment$sim.out[[1]][1][[1]][[1]]))
In addtion, the MatchIt documentation suggests to use the conditional prediction (which means using the observed values) in setx(). Switching the argument cond from TRUE to FALSE leads to the same results.

github example_r_script.txt
github_exemple.RData.txt

I am also encountering the same problem as you. Do you fix out the problem?

Unfortunately not. My guess is that setx draws a higher number of values when imputing the counterfactual outcome.