This software produces the interaction effect between two variables in generalized linear models of probabilities and counts and provides additional statistics and plotting utilities for evaluating and describing this effect. For continuous variables, this is quantified as the second-order cross-partial derivative. When one variable is continuous and the other is discrete, it is quantified as the discrete difference in the partial derivative. Finally, when both variables are discrete, it is defined as the discrete double difference.
The following are instructions for using the functions provided by our modglm
code. We strongly encourage that users first read our accompanying manuscript before using this code. This manuscript is published online in Multivariate Behavioral Research:
McCabe, C. J., Halvorson, M. A., King, K. M., Cao, X., & Kim, D. S. (2020). Interpreting interaction effects in generalized linear models of nonlinear probabilities and counts. Multivariate Behavioral Research, 1-27. doi: https://doi.org/10.1080/00273171.2020.1868966
This code is largely adapted from the intEff
function in the DAMisc
package for logit and probit models (Armstrong & Armstrong, 2020), which itself was an adaptation of the inteff
command in Stata (Norton, Wang, & Ai, 2004). Citations for each of these excellent resouces are as follows:
Armstrong, D., & Armstrong, M. D. (2020). Package DAMisc
.
URL: ftp://cygwin.uib.no/pub/cran/web/packages/DAMisc/DAMisc.pdf
Norton, E. C., Wang, H., & Ai, C. (2004). Computing interaction effects and standard errors in logit and probit models. The Stata Journal, 4(2), 154-167. URL: https://journals.sagepub.com/doi/abs/10.1177/1536867X0400400206
However, we note key differences between our code and those provided by the above resources. First, modglm
provides functionality for computing interaction effects in Poisson and negative binomial models in addition to logit models. Second, we provide additional functionality that accomodates Implication 1 of our manuscript. This includes 1.) allowing users to more flexibly estimate interaction effects conditioned on user-specified hypothetical scenarios; 2.) creating a plot that summarizes point estimates of the interaction effect in the observed data; and 3.) providing additional output that may be relevant in summarizing the results (e.g., the proportion of significant effects in the sample). Third, interaction effects can be estimated without the inclusion of a product term in these models to accomodate Implication 2 of our manuscript (i.e. that product terms are not required to estimate interaction in GLMs).
The modglm
package can be installed using the following:
devtools::install_github("connorjmccabe/modglm")
The following instructions are based upon the simulated Poisson example presented in Equation 17 of our manuscript. Code for generating this data are as follows:
set.seed(1678)
b0 <- -3.8 ##Intercept
b1 <- .35 ###X1 Effect
b2 <- .9 #X2 Effect
b3 <- 1.1 #Sex covariate effect
b13<- .2 #product term coefficient
n<-1000 #Sample Size
mu<-rep(0,2) #Specify means
S<-matrix(c(1,.5,.5,1),nrow=2,ncol=2) #Specify covariance matrix
sigma <- 1 #Level 1 error
require(MASS)
rawvars<-mvrnorm(n=n, mu=mu, Sigma=S) #simulates our continuous predictors from a multivariate normal distribution
cat<-rbinom(n=n,1,.5)
id<-seq(1:n)
eij <- rep(rnorm(id, 0, sigma))
xb<- (b0) + (b1) * (rawvars[,1]) + (b2) * (rawvars[,2]) + (b3)*cat + b13*cat*(rawvars[,1]) + eij
# (b3) * (rawvars[,1]*rawvars[,2]) +
#Generate Poisson data
ct <- exp(xb)
y <- rpois(n,ct)
df <- data.frame(y=y,x1=rawvars[,1],x2=rawvars[,2],female=cat)
We may then use this data to estimate the model as follows:
pois<-glm(y ~ x1 + x2 + female + x1:female, data=df,family="poisson")
Finally, we can load in modglm
using the following code:
require(modglm)
Assume that we aim to use modglm
to estimate the interaction between x1 and female in the above model. We estimate interaction effects using modglm
as follows:
pois.ints<-modglm(model=pois, vars=c("x1","female"), data=df, type="fd", hyps="means")
Above, modglm
requires at minimum four inputs, with one additional optional input.
The first is the estimated model object (e.g., model=pois
). Currently, model
may take logit or Poisson model objects estimated using the stats
package and negative binomial model objects estimating using the MASS
package. We hope to include additional GLMs in modglm
in the near future, including zero-inflated and hurdle models and generalized estimating equation (GEE) GLMs.
The second is a 2-element vector of the variables included in the interaction term (e.g., vars=c("x1","female")
). As noted above, modglm
makes no requirement that a product term need be specified in the model.
The third is the data frame used in estimating the model (e.g., data=df
).
The fourth is the type of interaction being specified (e.g., type="fd"
). This corresponds directly with the definition of interaction being used based on the variable type of the predictors. Three options are available, which mirrior our definition of interaction in Equation 10 in our manuscript. For continuous variable interactions, specify type="cpd"
for computing the second-order cross-partial derivative. For continuous-by-discrete variable interactions, specify type="fd"
for computing the finite difference in the partial derivative. For discrete variable interactions, specify type="dd"
for computing the double finite difference.
Finally, modglm
will optionally produce an interaction point estimate at a specified hypothetical condition using the hyp
input. By default, this is specified at the mean values of all included covariates. However, this can be modified by providing a vector of hypothetical values for the predictors involved in the model. For example, using hyps=c(c(1,-.5,.25,0)
will provide estimates for when x1 is -0.5, x2 is 0.25, and female is 0. Note that a 1 must be provided at the beginning of this vector to carry forward the intercept value.
modglm
produces a list of objects summarizing the results. Noting that we have defined our output as pois.ints
, use (for example) names(pois.ints)
to view the elements of this list, which provides the following:
> names(pois.ints)
[1] "obints" "inthyp" "prop.sig" "model.summary" "intsplot"
obints
provides a data frame of values computed observation-wise in the data. In order of columns, these include the interaction point estimates in the data (int.est
), the predicted value of the outcome (hat
), the delta method standard error of interaction point estimate (se.int.est
), the t-value (t.val
), and the significance designation based on the t-value (sig
).
inthyp
provides the results of the hypothetical condition specified by hyps
as described above. This has the identical format as obints
but contains only a single row of values.
prop.sig
provides the proportion of significant values among the observed data. This value may be helpful to users in summarizing the results in-text.
model.summary
is strictly a reproduction of a results summary table provided by the model (i.e. summary(pois)
).
Finally, intsplot
provides a graphical depiction of the interaction point estimates computed observation-wise, plotted against the model-predicted outcome (see also Ai & Norton, 2003). This plot is created using ggplot2
. This provides a snapshot summary of the interaction effects present in the data, including the significance values and the potential range in the observed interaction effects.
margplot
provides a means of plotting the marginal effects corresponding with a given interaction effect. Estimates are generated straightforwardly via producing conditional expected values based on the estimated model. Confidence regions are generated using parametric bootstrapping (i.e., Monte Carlo simulation) following guidelines by King, Tomz, & Wittenberg (2000) and Kim & McCabe (under review).
The input structure of margplot
was designed to be very similar to modglm
and are as follows:
p<-margplot(model=pois, vars=c("x1","female"),foc="x1", mod="female", modlevels=c(0,1),data=df, hyps="means",sm=F,modnames=c("Male","Female"))
The model
and vars
parameters are identical in meaning to modglm
discussed above. hyps
is also similar, but refers to the hypothetical values at which to hold the covariates constant that are otherwise included in the model. For instance, in the example above, the x2
variable is held at its own mean when computing displayed marginal effects.
foc
refers to the focal (i.e., x-axis) variable to be displayed as the primary predictor on the generated plot.
Conversely, mod
refers to the moderating (i.e., legend) variable.
modlevels
refer to the levels at which to "probe" and display marginal effects that illustrate the interaction effect. These are conceptually identical to selecting levels at which to probe simple slopes in linear interaction models (for instance, at -2, -1, 1, or 2 SDs from the mean; see McCabe, Kim, & King, 2018). In this example, given female
is a dummy varible, we are selecting 0 and 1 to represent the effects for males and females, respectively.
sm
refers to "small mulitples". If TRUE, the plot output will show each marginal effect produced on its own small multiple plot. If FALSE, all effects will display on a single plot.
modnames
helps to redefine the names of the modlevels supplied to be more "sensible". For instance, here, "Male" and "Female" will appear on the plot instead of the dummy values "0" and "1" for readability.
Note that the object produced is a ggplot2
object that can be readily modified further. E.g., I can add more sensible axis labels to the produced plot like so:
p + ylab("Count") + xlab("X1")