# Source scripts that contains various helper functions for reading and plotting
# data.
source(file = "analysis/baci_glmm/helper-functions.r")
load_install_packages()
## glmmTMB AICcmodavg lsmeans DHARMa parallel data.table
## TRUE TRUE TRUE TRUE TRUE TRUE
## magrittr ggplot2 directlabels
## TRUE TRUE TRUE
# Read data
baci_dt <- fread("data/variables_for_BACI.csv", stringsAsFactors = TRUE)
# Rename some columns:
setnames(baci_dt,
old = c("BACItime", "BACIpos", "Site", "Ind./m2", "% EPT", "Total organisms", "Total EPT organisms"),
new = c("period_ba", "treatment_ci", "site_f", "ind_m2", "ept_prc", "N", "N_EPT"))
# Create a factor column `year_f` where I rounded the year to integer and then
# converted to factor type.
baci_dt[, year_f := Year %>% round(digits = 0) %>% factor]
# This will be needed for testing for overdispersation
baci_dt[, obs := 1:.N]
# Variable of interest
varb <- "E20"
hist(baci_dt[[varb]],
main = paste("Histogram of" , varb),
xlab = varb)
preliminary_plot <- exploratory_plot(varb)
preliminary_plot
We can observe already that there is no clear difference between “impact” and “control”. The two mean lines are mostly overlap.
Specify fixed and random effects and test for random effects structure.
The general model formula with fixed effects is varb ~ period_ba + treatment_ci + period_ba:treatment_ci
Note the interaction term period_ba:treatment_ci
which is the “BACI effect” (see Schwarz, 2015). Testing for its statistical significance is equivalent to testing for an environmental impact (Schwarz, 2015).
To the fixed effects model, random effects are added: site_f
and year_f
.
Below, it was tested if site_f:year_f
interaction should be kept in the model.
# Create an empty list to be populated with models
models <- list()
# Fit model without the interaction of random effects (site_f and year_f)
models[[1]] <- glmmTMB(E20 ~ period_ba * treatment_ci + (1|site_f) + (1|year_f),
data = baci_dt, family = beta_family(link = "logit"))
# Model with interaction in the random effects structure
models[[2]] <- glmmTMB(E20 ~ period_ba * treatment_ci + (1|site_f) + (1|year_f) + (1|site_f:year_f),
data = baci_dt, family = beta_family(link = "logit"))
Note that the warnings 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
are expected - see comment of Ben Bolker here.
Model summary:
summary(models[[1]])
## Family: beta ( logit )
## Formula: E20 ~ period_ba * treatment_ci + (1 | site_f) + (1 | year_f)
## Data: baci_dt
##
## AIC BIC logLik deviance df.resid
## -808.3 -785.9 411.1 -822.3 173
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## site_f (Intercept) 0.01582 0.1258
## year_f (Intercept) 0.01358 0.1165
## Number of obs: 180, groups: site_f, 6; year_f, 5
##
## Overdispersion parameter for beta family (): 95.5
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.65363 0.16078 -16.505 <2e-16 ***
## period_babefore 0.40531 0.20688 1.959 0.0501 .
## treatment_ciimpact -0.12177 0.16461 -0.740 0.4594
## period_babefore:treatment_ciimpact -0.08636 0.17840 -0.484 0.6283
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(models[[2]])
## Family: beta ( logit )
## Formula:
## E20 ~ period_ba * treatment_ci + (1 | site_f) + (1 | year_f) +
## (1 | site_f:year_f)
## Data: baci_dt
##
## AIC BIC logLik deviance df.resid
## -810.0 -784.4 413.0 -826.0 172
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## site_f (Intercept) 0.009848 0.09924
## year_f (Intercept) 0.008781 0.09371
## site_f:year_f (Intercept) 0.021765 0.14753
## Number of obs: 180, groups: site_f, 6; year_f, 5; site_f:year_f, 30
##
## Overdispersion parameter for beta family (): 105
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.65853 0.15360 -17.308 <2e-16 ***
## period_babefore 0.40176 0.24922 1.612 0.107
## treatment_ciimpact -0.12122 0.16041 -0.756 0.450
## period_babefore:treatment_ciimpact -0.08796 0.24919 -0.353 0.724
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The models are compared using the Akaike Information Criterion (AIC). The preferred model should be the one with a smaller AIC value. A model selection approach similar to the one proposed by Burnham & Anderson (2002) is used. The authors suggest the rule of thumb according to which models with AIC difference smaller or equal than 2 (delta-AIC ≤ 2) have substantial empirical support, those with 4 ≤ delta-AIC ≤ 7 have considerably less and those with delta-AIC > 10 have essentially none. On the other hand, they also suggest that the model with delta-AIC > 10 might still be used for inference if the sample size is large.
# AICc comparison
aictab(cand.set = models,
modnames = c("1-without site_f:year_f",
"2-with site_f:year_f"),
second.ord = TRUE)
##
## Model selection based on AICc:
##
## K AICc Delta_AICc AICcWt Cum.Wt LL
## 2-with site_f:year_f 8 -809.14 0.00 0.68 0.68 412.99
## 1-without site_f:year_f 7 -807.63 1.51 0.32 1.00 411.14
Additionally to the AIC test, we can also run the likelihood ratio test (LRT) as per Bates (2015).
anova(models[[1]], models[[2]])
## Data: baci_dt
## Models:
## models[[1]]: E20 ~ period_ba * treatment_ci + (1 | site_f) + (1 | year_f), zi=~0, disp=~1
## models[[2]]: E20 ~ period_ba * treatment_ci + (1 | site_f) + (1 | year_f) + , zi=~0, disp=~1
## models[[2]]: (1 | site_f:year_f), zi=~0, disp=~1
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## models[[1]] 7 -808.28 -785.93 411.14 -822.28
## models[[2]] 8 -809.99 -784.44 412.99 -825.99 3.7041 1 0.05428 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The AIC test favors that the more complex model (with site_f:year_f
) but the less complex one has substantial empirical support. The LRT test indicates that there are no significant differences between the two models. According to the rule of parsimony, we pick the simpler model for further analysis.
Testing for overdispersion in the mixed model.
“Overdispersion: the occurrence of more variance in the data than predicted by a statistical model.” (Bolker, 2009). In case of overdispersed then we can fit a model with observation-level random effects see Harrison XA (2014) or Harrison XA (2015). This approach was also suggested by Ben Bolker here, where further references are mentioned.
The tests below indicate no significant overdispersion.
# Test for overdispersion with DHARMa residuals diagnostics
simulation <- DHARMa::simulateResiduals(fittedModel = models[[1]])
plot(simulation)
# A non-parametric test on the simulated residuals
DHARMa::testDispersion(simulation)
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## ratioObsSim = 1.0667, p-value = 0.384
## alternative hypothesis: two.sided
We keep the model without any overdispersion adjustments for further analysis.
model <- models[[1]]
# Residuals
qqnorm(residuals(model), main = "Q-Q plot - residuals")
qqline(residuals(model), col = "red")
# inspecting the random effects (see also Bolker, 2009 - supp 1)
qqnorm(unlist(ranef(model)), main = "Q-Q plot, random effects")
qqline(unlist(ranef(model)), col = "red")
# fitted vs residuals
scatter.smooth(fitted(model),
residuals(model, type = "pearson"),
main = "fitted vs residuals",
xlab = "Fitted Values",
ylab = "Residuals")
abline(h = 0, col = "red")
# fitted vs observed
scatter.smooth(fitted(model), baci_dt[[varb]],
xlab = "Fitted Values",
ylab = "Observed")
abline(0, 1, col = "red")
We will test for statistical significance of BACI interaction term using the likelihood ratio test (LRT).
The nested models that we compare are:
# Model without the BACI interaction term
model.no.interaction <- glmmTMB(E20 ~ period_ba + treatment_ci + (1|site_f) + (1|year_f),
data = baci_dt, family = beta_family(link = "logit"))
# LRT
lrt <- anova(model, model.no.interaction)
lrt
## Data: baci_dt
## Models:
## model.no.interaction: E20 ~ period_ba + treatment_ci + (1 | site_f) + (1 | year_f), zi=~0, disp=~1
## model: E20 ~ period_ba * treatment_ci + (1 | site_f) + (1 | year_f), zi=~0, disp=~1
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## model.no.interaction 6 -810.05 -790.89 411.03 -822.05
## model 7 -808.28 -785.93 411.14 -822.28 0.231 1 0.6307
The likelihood ratio (LRT) gives no significant p-value. There are no significant differences between the two models. Therefore, the interaction term (BACI effect) is not statistically significant (there is no environmental impact).
Extract coefficients from final model.
Remove the intercept for extracting group means and their confidence intervals as suggested in Schielzeth H (2010).
final.model <- model
final.model.noIntercept <- update(final.model, . ~ . -1)
To get the real proportions (Least-squares means/predicted marginal means/treatment means) one can do:
estimates <- lsmeans::lsmeans(final.model.noIntercept, ~ treatment_ci:period_ba, type = "response")
# https://stats.stackexchange.com/questions/192062/issue-calculating-adjusted-means-for-glmer-model
# Confidence level used: 0.95
estimates
## treatment_ci period_ba response SE df lower.CL upper.CL
## control after 0.0658 0.00988 173 0.0488 0.0882
## impact after 0.0587 0.00498 173 0.0496 0.0693
## control before 0.0955 0.01906 173 0.0639 0.1403
## impact before 0.0790 0.01061 173 0.0604 0.1026
##
## Confidence level used: 0.95
## Intervals are back-transformed from the logit scale
As indicated in Schwarz CJ (2015), the BACI effect is computed as BACI = avgCA - avgCB - (avgIA - avgIB):
# get the estimates only (without CI)
est <- predict(ref.grid(final.model.noIntercept), type = "response")
# give names to the estimates vector
names(est) <- c("CA","CB","IA","IB"); est
## CA CB IA IB
## 0.06576588 0.09549480 0.05866820 0.07896813
baci <- est["CA"]-est["CB"]-(est["IA"]-est["IB"])
baci
## CA
## -0.009428997
One can also get the BACI effect like:
contrast(regrid(estimates), list(baci=c(1,-1,-1,1)))
## contrast estimate SE df t.ratio p.value
## baci -0.00943 0.0145 173 -0.650 0.5164
Or with asymptotic CI-s:
baci_ci <- confint(contrast(regrid(estimates), list(baci=c(1,-1,-1,1))))
baci_ci
## contrast estimate SE df lower.CL upper.CL
## baci -0.00943 0.0145 173 -0.038 0.0192
##
## Confidence level used: 0.95
# Confidence level used: 0.95
# https://stats.stackexchange.com/questions/241523/testing-for-pairwise-proportion-differences
Below is presented the coefficient of determination of the correlation between the fitted and observed values as suggested by Ben Bolker here.
r2_corr <- function(m) {
lmfit <- lm(model.response(model.frame(m)) ~ fitted(m))
summary(lmfit)$r.squared
}
R2 <- r2_corr(final.model)
R2
## [1] 0.2638866
This plot will confirm as well visually that there is no BACI effect. The confidence intervals of the means overlap and the two lines are almost parallel.
interaction_plot(estimates, varb)
We can compare the observed mean values (marked with x symbol in the graph below) with the estimated mean values (model coefficients, marked with a plain dot and connected with lines). The estimated means for E20 are close to the observed mean values.
interaction_plot(estimates, varb) +
# Add observed means as dots with bootstrapped CIs
stat_summary(data = baci_dt,
aes(x = period_ba,
y = get(varb),
group = treatment_ci,
color = treatment_ci),
fun.data = "mean_cl_boot",
# geom = "point",
size = 0.5,
shape = 4,
position = position_dodge(width = 0.1),
show.legend = TRUE)
The final GLMM model explained 26.39 % of the variance in E20.
There was no significant BACI period × treatment effect (LRT, p = 0.631). The BACI effect estimated from the model, which is the difference of the two changes (control after − control before) − (impact after − impact before]), was -0.01 ± 0.01 standard error.
The estimated mean E20 in the control sites varied from 0.1 to 0.07, and in the impact sites varied from 0.08 to 0.06, before and after the construction of the dam (Fig 2 & Fig 3).
Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious mixed models. arXiv preprint arXiv:1506.04967.
Bolker BM et al. (2009) Generalized linear mixed models: a practical guide for ecology and evolution. Trends in ecology & evolution 24:127–135 at http://www.sciencedirect.com/science/article/pii/S0169534709000196
Burnham, K. & Anderson, D., 2002. Model selection and multimodel inference: a practical information-theoretic approach, New York: Springer.
Harrison XA (2014) Using observation-level random effects to model overdispersion in count data in ecology and evolution. PeerJ 2:e616 https://doi.org/10.7717/peerj.616
Harrison XA (2015) A comparison of observation-level random effect and Beta-Binomial models for modelling overdispersion in Binomial data in ecology & evolution. PeerJ 3:e1114 at https://peerj.com/articles/1114.pdf
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305-315.
Schwarz CJ (2015) Analysis of BACI experiments. In Course Notes for Beginning and Intermediate Statistics. at http://people.stat.sfu.ca/~cschwarz/Stat-650/Notes/PDFbigbook-R/R-part013.pdf
Schielzeth H (2010) Simple means to improve the interpretability of regression coefficients. Methods in Ecology and Evolution, 1(2), 103-113.