This package provides a causal_tbl
class for causal inference. A
causal_tbl
is a subclass of tibble
which keeps track of information
on the roles of variables like treatment and outcome, and provides
functionality to store models and their fitted values as columns in a
data frame.
You can install the development version of causaltbl from GitHub with:
# install.packages("remotes")
remotes::install_github("CoryMcCartan/causaltbl")
A causal tibble, causal_tbl
, is a data frame with attributes
identifying which columns correspond to common inputs in causal
inference analyses. At the most basic level, you can indicate the
outcome and treatment columns. For more involved analyses, causal_tbl
s
can keep track of additional columns including multiple outcomes and
multiple treatments.
The primary entryway to causaltbl
is through
create a causal_tbl
directly via causal_tbl()
.
Suppose we have data from a really simple differences in differences design. Our data looks like this:
df <- data.frame(
id = c("a", "a", "a", "a", "b", "b", "b", "b"),
year = rep(2015:2018, 2),
trt = c(0, 0, 0, 0, 0, 0, 1, 1),
y = c(1, 3, 2, 3, 2, 4, 4, 5)
)
There are two units (id
), a
and b
. We have 4 yearly observations
from 2015 to 2018 (year
) for each unit. a
is never treated and b
is treated in 2017 and 2018 (trt
). Some outcome (y
) is measured
yearly.
We first can make a causal_tbl
by passing df
to causal_tbl()
. We
don’t need to specify any options.
library(causaltbl)
did <- causal_tbl(df)
Now did
is a causal_tbl
version of df
.
did
#> # A <causal_tbl> [8 × 4]
#>
#> id year trt y
#> <chr> <int> <dbl> <dbl>
#> 1 a 2015 0 1
#> 2 a 2016 0 3
#> 3 a 2017 0 2
#> 4 a 2018 0 3
#> 5 b 2015 0 2
#> 6 b 2016 0 4
#> 7 b 2017 1 4
#> 8 b 2018 1 5
To set outcome , we can use the corresponding functions set_outcome()
.
causal_tbl
uses tidy evaluation, so we can use the bare column name.
did <- did |>
set_outcome(outcome = y)
did
#> # A <causal_tbl> [8 × 4]
#> [out]
#> id year trt y
#> <chr> <int> <dbl> <dbl>
#> 1 a 2015 0 1
#> 2 a 2016 0 3
#> 3 a 2017 0 2
#> 4 a 2018 0 3
#> 5 b 2015 0 2
#> 6 b 2016 0 4
#> 7 b 2017 1 4
#> 8 b 2018 1 5
Similarly, we can indicate that did
has a treatment column trt
or
panel structure for each id
-year
with the corresponding
set_treatment()
and set_panel()
functions.
did <- did |>
set_treatment(treatment = trt) |>
set_panel(unit = id, time = year)
did
#> # A <causal_tbl> [8 × 4]
#> [unit] [time] [trt] [out]
#> id year trt y
#> <chr> <int> <dbl> <dbl>
#> 1 a 2015 0 1
#> 2 a 2016 0 3
#> 3 a 2017 0 2
#> 4 a 2018 0 3
#> 5 b 2015 0 2
#> 6 b 2016 0 4
#> 7 b 2017 1 4
#> 8 b 2018 1 5
This sets attributes that are used down-the-line by other packages. We
can retrieve them by calling their get
ters. For the outcome,
get_outcome()
:
get_outcome(did)
#> [1] "y"
For the treatment, get_treatment()
:
get_treatment(did)
#> y
#> "trt"
And for the panel structure, get_panel()
:
get_panel(did)
#> $unit
#> [1] "id"
#>
#> $time
#> [1] "year"
For more information on using causal_tbl
s or designing functions that
use causal_tbl
s, see the Advanced causal_tbl
vignette.