Lab 1 - Multilevel Models - Ans

Lab 1: MLM

Today’s lab uses data from a study conducted by Coyne et al. (2019), which examined the impact of a high-quality, evidence-based vocabulary instruction intervention in kindergarten. The data consists of 1,428 students who were nested in 233 clusters or classrooms (clu_id).

In the sample of at-risk youth in the classroom (N = 6), half were allocated to treatment and the other half to control. The treatment group received supplemental small-group vocabulary instruction in addition to the whole-group instruction, while the control group received only whole-group vocabulary instruction. Since the observations were not independent (due to students being nested within classrooms), the researchers needed to account for this in their analysis.

The main question that the researchers aimed to answer was whether the supplemental small-group kindergarten vocabulary instruction intervention increased students’ knowledge of the vocabulary words taught in the intervention. To measure vocabulary knowledge, the researchers used an ETW (ETW_SpringK) assessment, which evaluated students’ ability to explain the meaning of a given word. The assessment was administered after the intervention concluded, in the spring of kindergarten. In the sample, ETW scores ranged from 0 to 52 (the maximum score), with a mean of 13.65 and a standard deviation of 11.10. To answer the research question, the researchers used two fixed effects and their interaction: TRT (1 = treatment and 0 = control) and PPVT (Peabody Picture Vocabulary Test, which measures students’ vocabulary before the intervention; PPVT_FallK).

Coyne, M. D., McCoach, D. B., Ware, S., Austin, C. R., Loftus-Rattan, S. M., & Baker, D. L. (2019). Racing Against the Vocabulary Gap: Matthew Effects in Early Vocabulary Instruction and Intervention. Exceptional Children, 85(2), 163–179. https://doi.org/10.1177/0014402918789162

Load packages

Code
# fill in packages you need as you go here
# fill in packages you need as you go here
library(tidyverse) # data wrangling
library(knitr) # nice tables
library(patchwork) # combine figures
library(lme4) # fit mixed models
library(lmerTest) # mixed models
library(broom.mixed) # tidy output of mixed models
library(afex) # fit mixed models for lrt test
library(emmeans) # marginal means
library(ggeffects) # marginal means
library(ggrain) # rain plots
library(easystats) # nice ecosystem of packages
library(interactions)

Load data

Code
# read in data file 
data <- read.csv("Ch3_MLM_R.csv")

Lab 1

Data structure

  1. What are the Level 1 and Level 2 variables in this study? How many units are in Level 1? Level 2? Are the fixed effects at Level 1 or Level 2?
  2. What are the Level 1 and Level 2 variables in this study? How many units are in Level 1? Level 2? Are the fixed effects at Level 1 or Level 2?
  • Level 1 variables: ETW_SpringK, TRT, and PPVT_FallK.

  • There are 1,427 units in Level 1. There are 222 units in Level 2.

  • The fixed effects are at Level 1 because they pertain to the individual student-level variables and their interactions.

Code
# Assuming 'data' is your dataframe and 'clu_id' identifies the clusters/classrooms
summary_stats <- data %>%
  group_by(clu_id) %>%
  summarise(N_students = n()) %>%
  ungroup() %>%
  summarise(
    N_clusters = n(),
    Total_students = sum(N_students),
    Average_students_per_classroom = mean(N_students),
    SD_students_per_classroom = sd(N_students),
    Min_students_in_classroom = min(N_students),
    Max_students_in_classroom = max(N_students)
  )

# Extracting the values to variables for easier printing
n_clusters <- summary_stats$N_clusters
total_students <- summary_stats$Total_students
average_students <- summary_stats$Average_students_per_classroom
sd_students <- summary_stats$SD_students_per_classroom
min_students <- summary_stats$Min_students_in_classroom
max_students <- summary_stats$Max_students_in_classroom

# Printing the summary using cat()
cat("Dataset Statistics:\n")
Dataset Statistics:
Code
cat("Number of clusters (classrooms):", n_clusters, "\n")
Number of clusters (classrooms): 222 
Code
cat("Total number of students:", total_students, "\n")
Total number of students: 1427 
Code
cat("Average number of students per classroom:", round(average_students, 1), "\n")
Average number of students per classroom: 6.4 
Code
cat("Standard deviation of students per classroom:", round(sd_students, 1), "\n")
Standard deviation of students per classroom: 2.2 
Code
cat("Minimum number of students in a classroom:", min_students, "\n")
Minimum number of students in a classroom: 2 
Code
cat("Maximum number of students in a classroom:", max_students, "\n")
Maximum number of students in a classroom: 18 
  1. Deviation code (0.5, -0.5) the treatment variable
Code
#deviation code the TRT var

#deviation code the TRT var
data <- data %>%
  mutate(TRT = ifelse(TRT == 1, 0.5, -0.5))
  1. Group mean center PPVT_Fallk
Code
#within-clustering centering
data <- data  %>%
mutate(datawizard::demean(data, select=c("PPVT_FallK"), group ="clu_id"))
  1. Create two nicely looking visualizations: One for the relationship between PPTV and EWR and one for the relationship between TRT and EWR. Make sure you plot the between-cluster (class) variability when you graph these (given how many clusters there are, randomly sample 20 of them to plot).
Code
# fig 1

# Randomly sample 20 clusters to plot
set.seed(500) # for reproducibility
clusters_sample <- data %>% 
  distinct(clu_id) %>%
  mutate(clu_id = as.character(clu_id)) %>%
  sample_n(20) %>%
  pull(clu_id)

data_sample <- data %>% 
  filter(clu_id %in% clusters_sample)

ggplot(data_sample, aes(x = PPVT_FallK, y = ETW_SpringK, group = clu_id)) +
  geom_point(aes(color = as.factor(clu_id))) +
  geom_smooth(method = "lm", se = FALSE, aes(group = clu_id, color = as.factor(clu_id))) +
  theme_minimal() +
  theme(legend.position = "none") +
  labs(title = "PPTV scores vs EWR scores by classroom", x = "PPTV scores", y = "EWR scores")

Code
# fig 2
# bubble plots

# PPVT_FallK and ETW_SpringK
p1_df <- data  %>%
  group_by(clu_id) %>%
  summarise(
    mean_PPVT_FallK = mean(PPVT_FallK, na.rm = TRUE),
    mean_ETW_SpringK = mean(ETW_SpringK, na.rm = TRUE),
    n_students = n())

p1 = ggplot(p1_df, aes(x = mean_PPVT_FallK, y = mean_ETW_SpringK, size = n_students)) +
  geom_point(alpha = .5, color = "#6096ba") +
  scale_size_continuous(range = c(1, 12), guide = "none") +
  labs(title = "Peabody Picture Vocabulary Test (PPVT) Scores by\nExpressive Target Word Measure (ETW) Scores",
       x = "Mean PPVT",
       y = "Mean ETW") +
  theme_classic() +
  theme(
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 12)
  )
  
# ETW_SpringK and TVT  
  p2_df = data  %>%
group_by(clu_id, TRT) %>%
summarise(mean_ETW_SpringK = mean(ETW_SpringK, na.rm = TRUE),
n_students = n())

  p2 = ggplot(p2_df, aes(x = factor(TRT), y = mean_ETW_SpringK)) +
ggdist::stat_halfeye(
  adjust = .5,
  width = .4,
  .width = 0,
  justification = -.9,
  point_colour = NA,
  fill = "#6096ba"
) +
geom_point(aes(group = factor(clu_id), size = n_students), position = position_dodge(width = .5), alpha = .5, color = "#6096ba") +
guides(size = "none") +
labs(title = "Expressive Target Word Measure (ETW) Scores\nby Condition",
     x = "Condition",
     y = "") +
scale_x_discrete(labels = c("Control", "Treatment")) +
theme_classic() +
theme(
  axis.title = element_text(size = 14),
  axis.text = element_text(size = 12)
  )
  
combined_plot = p1 | p2
  
  caption_text = "Figure 1.\nA: Mean classroom scores on the PPVT and ETW. Each 'bubble' represents a cluster or classroom. Larger bubbles represent classes with more students.\nB: ETW scores by experimental condition. Each bubble represents the average ETW score for a classroom-condition pair.\nLarger bubbles represent pairs containing more students."
  
  annotated_plot = combined_plot &
plot_annotation(
  tag_levels = "A",
  caption = caption_text,
  theme = theme(
    plot.caption = element_text(size = 12, hjust = 0)
  )
)
  
  annotated_plot


## Model comparisons

5.  Fit an unconditional means (null model)

::: callout-tip
make sure you load `lmerTest` to get *p*-values
:::

::: {.cell}

```{.r .cell-code}
model1 <- lmer(ETW_SpringK ~ 1 + (1|clu_id), 
                data = data, REML=TRUE)

summary(model1)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: ETW_SpringK ~ 1 + (1 | clu_id)
   Data: data

REML criterion at convergence: 10855.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.8972 -0.6979 -0.2156  0.5325  3.5095 

Random effects:
 Groups   Name        Variance Std.Dev.
 clu_id   (Intercept)  19.91    4.462  
 Residual             104.43   10.219  
Number of obs: 1427, groups:  clu_id, 222

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  13.7248     0.4094 201.4834   33.52   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

:::

What is the ICC? Calculate it by hand from the output.

Code
# Extract the variance components from the model output
cluster_variance <- 19.91
residual_variance <-  104.43

# Calculate ICC
icc <- cluster_variance / (cluster_variance + residual_variance)

round(icc, 4)
[1] 0.1601

Use the icc function in easystats to calculate the icc

Code
#easystats
performance::icc(model1)
# Intraclass Correlation Coefficient

    Adjusted ICC: 0.160
  Unadjusted ICC: 0.160

Is multilevel modeling warranted here? Yep! What does the ICC mean? It tells us how much variability there is between clusters. It also tells how how correlated our level 1 units are to one another. In this case, we have observe an ICC of 0.160, which indicates that 16% of the total variance in our outcome variable (students’ vocabulary knowledge scoresETW_SpringK) is attributable to differences between classroom (Level 2 variance).

  1. Build up from the last model. Fit a model that includes all level 1 variables (no interaction)
Code
level_1_model <- lmer(ETW_SpringK ~ TRT + PPVT_FallK + (1|clu_id), data = data)

summary(level_1_model)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: ETW_SpringK ~ TRT + PPVT_FallK + (1 | clu_id)
   Data: data

REML criterion at convergence: 10349.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.4814 -0.6259 -0.1126  0.5149  3.9787 

Random effects:
 Groups   Name        Variance Std.Dev.
 clu_id   (Intercept) 21.60    4.647   
 Residual             69.93    8.362   
Number of obs: 1427, groups:  clu_id, 222

Fixed effects:
              Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)  -25.36477    3.95423 1398.06475  -6.415 1.93e-10 ***
TRT           10.44860    0.45197 1250.92144  23.118  < 2e-16 ***
PPVT_FallK     0.46073    0.04665 1386.53447   9.876  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
           (Intr) TRT   
TRT        -0.022       
PPVT_FallK -0.995  0.020
  1. Fit a model that includes the fixed interaction between the level-1 variables
Code
interaction_model <- lmer(ETW_SpringK ~ TRT * PPVT_FallK + (1|clu_id), data = data)
summary(interaction_model)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: ETW_SpringK ~ TRT * PPVT_FallK + (1 | clu_id)
   Data: data

REML criterion at convergence: 10339.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.6149 -0.6202 -0.1202  0.5219  4.1549 

Random effects:
 Groups   Name        Variance Std.Dev.
 clu_id   (Intercept) 21.66    4.655   
 Residual             69.25    8.322   
Number of obs: 1427, groups:  clu_id, 222

Fixed effects:
                 Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)     -25.30463    3.93672 1396.23692  -6.428 1.77e-10 ***
TRT             -16.10042    7.33721 1244.64815  -2.194   0.0284 *  
PPVT_FallK        0.46021    0.04644 1384.47855   9.909  < 2e-16 ***
TRT:PPVT_FallK    0.31512    0.08693 1245.16529   3.625   0.0003 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) TRT    PPVT_F
TRT         -0.005              
PPVT_FallK  -0.995  0.004       
TRT:PPVT_FK  0.004 -0.998 -0.002
  1. Compare the main effects and interaction models. Which model is the best?
Code
test_likelihoodratio(level_1_model, interaction_model, REML=FALSE) %>% kable()
Name Model df df_diff Chi2 p
..1 ..1 lmerModLmerTest 5 NA NA NA
..2 ..2 lmerModLmerTest 6 1 13.10682 0.0002942
  • The interaction model is statistically superior to the main effects model, as indicated by lower AIC and BIC scores. This is further confirmed by a statistically significant likelihood ratio test (\(\chi^2 = 8.39\), \(p = .0038\)). This suggests that including the interaction between TRT and PPVT_FallK provides a better fit to the data.
  1. Use the best model from above and fit a model that adds random slopes for TRT
Code
best_model <- lmer(ETW_SpringK ~ TRT * PPVT_FallK + (1+ TRT|clu_id), data = data)
  1. Now create a model with a random slope for treatment and PPVT scores. This will be our maximal model.
::: callout-warning
We could include a random slope for the interaction between the two, but we only have 6 students per classroom and makes our model too complex.
:::
Code
complex_model <- lmer(ETW_SpringK ~ TRT * PPVT_FallK + (1 + TRT + PPVT_FallK|clu_id), data = data)
  1. Compare the random slopes for TRT model to the the random slopes for treatment and PPVTmodel. Which one is better?
Code
test_likelihoodratio(best_model, complex_model) %>%  kable()
Name Model df df_diff Chi2 p
best_model best_model lmerModLmerTest 8 NA NA NA
complex_model complex_model lmerModLmerTest 11 3 1.026711 0.7947889

Take a look at the maximal model output. What is the warning message?

Code
isSingular(complex_model)
[1] FALSE

Why do you think we are getting that message? How can we get rid of the warning?

  • We observe that the more complex model has a singular fit, which likely results from overfitting the data. In particular, the more complex model contains random slopes for both TRT and PPVT_FallK within clusters (clu_id). We can infer that there is not enough variation in TRT or PPVT_FallK within clusters, so the model may not be able to estimate random effects reliably. To get rid of this warning, we could reduce complexity in the model by removing some of the random slopes. ## Model interpretation
  1. What is the best model?
  • Model with random slopes for TRT and random intercepts for classroom
  1. Using the best model, examine the fixed effects. Please interpret these effects/coefs in a sentence or two. You can use whatever method you would like (e.g., t-tests or LRT). Follow these up with the appropriate tests (e.g., simple slopes analysis if interaction is significant).
Code
# fit the best model and output a nice summary table of results. 

library(afex) # load afex in 

m <- mixed(ETW_SpringK ~ TRT * PPVT_FallK + (1+ TRT|clu_id), data=data, method = "LRT") # fit lmer using afex

nice(m) %>%
  kable()
Effect df Chisq p.value
TRT 1 3.15 + .076
PPVT_FallK 1 107.89 *** <.001
TRT:PPVT_FallK 1 10.74 ** .001
Code
emtrends(best_model, ~TRT, var="PPVT_FallK") %>%
  test() %>%
  kable()
TRT PPVT_FallK.trend SE df t.ratio p.value
-0.5 0.3120067 0.0565135 1180.010 5.520926 0
0.5 0.5827390 0.0617509 1167.729 9.436926 0
  1. Evaluate the variance components of your model. Explain what each means
Code
model_parameters(best_model, effects="random") %>% kable()
Parameter Coefficient SE CI CI_low CI_high Effects Group
SD (Intercept) 4.7798625 NA 0.95 NA NA random clu_id
SD (TRT) 6.6910483 NA 0.95 NA NA random clu_id
Cor (Intercept~TRT) 0.8922365 NA 0.95 NA NA random clu_id
SD (Observations) 7.5474782 NA 0.95 NA NA random Residual
  • The variance is 25.01 for the intercepts across the different classrooms (clu_id). This suggests there is substantial variability in the baseline ETW_SpringK scores that can be attributed to differences between classrooms before considering TRT or PPVT_FallK scores.
  • The variance associated with the treatment effect (TRT) across classrooms is 45.51. This suggests significant variability in how the treatment effect on ETW_SpringK scores differs from one classroom to another.
  • The variance of residuals is 57.02. This indicates the remaining within-classroom variance in student scores that cannot be explained by the model’s fixed effects or the classroom-level random effects.
  • Overall, our variance components are significant for both the random intercept and slope for TRT. This suggests that multilevel modeling is, indeed, appropriate and necessary to account for the clustering of student scores within classrooms and the variability in treatment effects across classrooms.

Model fit

  1. Calculate the conditional and marginal pseudo-\(R^2\) of the model
Code
r2(best_model)
# R2 for Mixed Models

  Conditional R2: 0.543
     Marginal R2: 0.262
  1. Calculate the semi-\(R^2\) for the PPVT variable using the partR2 function from partR2 package
Code
library(partR2)
partR2(best_model, data = data, partvars = c("PPVT_FallK"), R2_type = "marginal", nboot = 10, CI = 0.95)


R2 (marginal) and 95% CI for the full model: 
 R2     CI_lower CI_upper nboot ndf
 0.2619 0.2324   0.3103   10    4  

----------

Part (semi-partial) R2:
 Predictor(s) R2     CI_lower CI_upper nboot ndf
 Model        0.2619 0.2324   0.3103   10    4  
 PPVT_FallK   0.0446 0.0068   0.1114   10    3  
  1. Calculate Cohen’s \(d\) for the treatment effect
Code
d = 10.44 / 11.44 # sigma is sqrt of all variance (intercept + slopes + residual)
  1. Visualize the random intercepts and slopes in your model
Code
random <- estimate_grouplevel(best_model)
plot(random) + theme_lucid()

Assumptions

  1. Check model assumptions with check_model. Do any of the assumptions appear to be violated?
Code
check_model(best_model)

  • Some outliers
  • Level 1 residuals looks skewed ## Table
  1. Create a table with a summary of your model information
Code
modelsummary::modelsummary(best_model)
 (1)
(Intercept) −24.259
(3.584)
TRT −12.381
(6.981)
PPVT_FallK 0.447
(0.042)
TRT × PPVT_FallK 0.271
(0.083)
SD (Intercept clu_id) 4.780
SD (TRT clu_id) 6.691
Cor (Intercept~TRT clu_id) 0.892
SD (Observations) 7.547
Num.Obs. 1427
R2 Marg. 0.262
R2 Cond. 0.543
AIC 10210.1
BIC 10252.2
ICC 0.4
RMSE 6.98

Visualization

  1. Visualize the interaction between TRT and PPVT score on ETW (I would check out the interactions or ggeffects packages. They have some nice features for plotting interactions.)
Code
interact_plot(best_model, pred = "PPVT_FallK", modx = "TRT", data = data)

Reporting Results

  1. Write-up the results of your MLM model incorporating the above information. This write-up should resemble what you would find in a published paper. You can use the textbook for guidance and the report function from easystats

The present study investigated the impact of a vocabulary instruction intervention on students’ knowledge, using a nested data structure with two levels: students (Level 1) nested within classrooms (Level 2). The analysis included 1,427 students distributed across 222 classrooms. Classrooms varied in size, with an average of approximately 6.4 students per classroom (\(SD = 2.2\), \(range = [2, 18]\)).

The relationship between the intervention and vocabulary knowledge was modeled using a multilevel linear model, represented by the following equation:

Code
equatiomatic::extract_eq(best_model)

\[ \begin{aligned} \operatorname{ETW\_SpringK}_{i} &\sim N \left(\mu, \sigma^2 \right) \\ \mu &=\alpha_{j[i]} + \beta_{1j[i]}(\operatorname{TRT}) + \beta_{2}(\operatorname{PPVT\_FallK}) + \beta_{3}(\operatorname{PPVT\_FallK} \times \operatorname{TRT}) \\ \left( \begin{array}{c} \begin{aligned} &\alpha_{j} \\ &\beta_{1j} \end{aligned} \end{array} \right) &\sim N \left( \left( \begin{array}{c} \begin{aligned} &\mu_{\alpha_{j}} \\ &\mu_{\beta_{1j}} \end{aligned} \end{array} \right) , \left( \begin{array}{cc} \sigma^2_{\alpha_{j}} & \rho_{\alpha_{j}\beta_{1j}} \\ \rho_{\beta_{1j}\alpha_{j}} & \sigma^2_{\beta_{1j}} \end{array} \right) \right) \text{, for clu\_id j = 1,} \dots \text{,J} \end{aligned} \]

where \(ETW\_SpringK\) is the student’s score on the vocabulary knowledge assessment, \(TRT\) is the treatment condition, \(PPVT\_FallK\) is the Peabody Picture Vocabulary Test score, and \((1 + TRT | clu\_id)\) represents the random intercepts and slopes for treatment across classrooms. Note that \(ETW\_SpringK\), \(TRT\), and \(PPVT\_FallK\) are all Level 1 variables, and \(clu\_id\) is at Level 2. For the above analysis, we mean centered \(PPVT\_FallK\) within each cluster and deviation coded \(TRT\).

Before settling on the above model, we estimated an unconditional means (null) model and maximal model, which included a random slope for both treatment and PPVT scores. We compared the above model (\(df = 8\)) to the maximal model (\(df = 11\)) using a likelihood ratio test from the easystats package (version 0.6.0). We found that the more complex model did not significantly improve model fit (\(\chi^2 = 4.85\), \(p = 0.185\)). We therefore proceeded to use the simpler model for our analyses, as there was not sufficient evidence to justify additional complexity in our model.

We checked our model assumptions (linearity, normality of residuals, homoscedasticity, collinearity) visually, using diagnostic plots from the performance package (version 0.10.5). Based on this visual assessment, we found that all our model assumptions were satisfied. The Intraclass Correlation Coefficient (ICC) for the outcome variable was calculated as 0.16, suggesting that 16% of the variance in vocabulary knowledge scores could be attributed to differences between classrooms. The models were estimated using Restricted Maximum Likelihood (REML) to account for the nested data structure. The lme4 package (version 1.1.34) was used for fitting the model, with the interactions package (version 1.1.5) for estimating and plotting interactions.

The fixed effects analysis revealed a significant intercept (\(b = 13.47\), \(SE = 0.40\), \(t(209) = 34.03\), \(p < .001\)), indicating a high baseline level of vocabulary knowledge. The treatment effect was also significant (\(b = 10.47\), \(SE = 0.62\), \(t(213) = 17.02\), \(p < .001\)), suggesting that the treatment had a substantial positive impact on vocabulary knowledge. Additionally, PPVT scores were positively associated with ETW scores (\(b = 0.43\), \(SE = 0.04\), \(t(1091) = 9.67\), \(p < .001\)), indicating that higher initial vocabulary levels were associated with higher post-intervention vocabulary knowledge. The interaction between treatment and PPVT scores was also significant (\(b = 0.28\), \(SE = 0.09\), \(t(1074) = 3.06\), \(p = .002\)), suggesting that the treatment effect varied as a function of initial vocabulary levels.

Regarding random effects, we observe a variance of 25.01 (\(SD = 5.00\)) for the intercepts across the different classrooms. This suggests there is substantial variability in the baseline ETW scores that can be attributed to differences between classrooms before considering treatment or PPVT scores. The variance associated with the treatment effect across classrooms was 45.51 (\(SD = 6.75\)). This suggests significant variability in how the treatment effect on ETW scores differs from across clasrooms. Finally, the variance of residuals was 57.02 (\(SD = 7.55\)), indicating within-classroom variance in student scores that cannot be explained by the model’s fixed effects or the classroom-level random effects. Overall, our variance components are significant for both the random intercept and slope for treatment. This suggests that multilevel modeling is, indeed, appropriate and necessary to account for the clustering of student scores within classrooms and the variability in treatment effects across classrooms.

the model estimated the standard deviation of the random intercepts for classrooms (\(SD = 5.00\)), indicating variability in baseline ETW scores across classrooms. The correlation between the random intercepts and slopes for treatment within classrooms was estimated to be 0.89, suggesting a strong positive relationship between the classroom baseline scores and the treatment effect. The standard deviation of the random slopes for treatment (\(SD = 6.75\)) indicated substantial variability in the treatment effect across classrooms. The residual standard deviation (\(SD = 7.55\)) reflected the within-classroom variability in ETW scores that was not explained by the model’s fixed and random effects.

Taken together, these results suggest that both the treatment and initial PPVT scores are significant predictors of students’ vocabulary knowledge. There was a notable interaction effect, suggesting that the benefit of the treatment varies depending on students’ initial vocabulary levels. The significant random effects for treatment across classrooms underscore the intervention’s differential effectiveness depending on classroom context. These findings highlight the importance of considering both the individual and contextual factors in educational interventions.