Ordinal Regression Lab Answers

Lab 3- Ordinal Regression

Instructions

  • If you are fitting a model, display the model output in a neatly formatted table. (The tidy and kable functions can help!)

  • If you are creating a plot, use clear labels for all axes, titles, etc.

  • Commit and push your work to GitHub regularly, at least after each exercise. Write short and informative commit messages.

  • When you’re done, we should be able to knit the final version of the QMD in your GitHub as a HTML.

Lab

I am a huge fan of Great British Bake-off. The data for this week’s lab is taken from https://bakeoff.netlify.app/. In this lab you will be looking at Gender and Age as a predictor of technical rank. For this exercise, we will only be looking at those who were in top 3 of technical.

In the GBBO technical, the bakers are usually provided with a list of ingredients and basic instructions, but they may not have access to specific measurements or details on how to prepare the ingredients. The judges evaluate the bakers’ finished products based on factors such as appearance, texture, and flavor, but also compare the bakers’ results to a standard version of the recipe that has been prepared in advance by the judges or a baking expert.

The dataset contains 3 variables:

  • Gender: M = MALE, F = FEMALE

  • Age: Age of baker

  • Technical Rank: Rank in technical (1,2,3)

Load packages:

Code
library(tidyverse)
library(broom)
library(performance)
library(ordinal) #clm
library(car) # anova
library(ggeffects) #  viz
library(gofcat) # brant
library(brms)
library(emmeans) # contrasts
library(knitr)

Load data

  • Make sure only the top 3 ranks are being used. For some reason, there are missing ranks (my guess is they did not announce rank on TV)
Code
gbbo <- read_csv("https://raw.githubusercontent.com/jgeller112/psy504-advanced-stats/main/slides/Ordinal_Regression/data/GBBO.csv")

gb <- gbbo %>%
  filter(`Technical Rank` <=3) %>%
  mutate(Rank = as.factor(`Technical Rank`))

Explore

  • Plot two figures showing the percentage of bakers in each rank— create one for Gender and Age

    Code
    gb %>% 
      ggplot( aes(x=Rank, y=Age) ) +
        geom_boxplot(fill="#69b3a2") +
        xlab("Rank")

Code
ggplot(gb, aes(x=Gender, fill=Rank)) + geom_bar() 

Ordinal Analysis

  • Convert the outcome variable to an ordered factor. What does the order here represent?

    Code
    gb <- gb %>%
      mutate(Rank = ordered(Rank))
    
    gb$Rank
      [1] 2 3 1 2 1 3 1 3 2 3 2 1 2 1 3 2 1 3 1 2 3 3 1 2 3 1 2 1 2 3 3 1 2 1 3 2 2
     [38] 1 3 2 3 1 3 2 1 3 2 1 1 3 2 3 1 2 1 3 2 2 1 3 2 1 3 3 1 2 2 1 2 3 2 1 2 1
     [75] 3 1 2 3 2 1 3 1 3 2 2 3 1 1 3 2 3 2 1 3 2 1 3 1 2 3 2 1 3 2 1 1 2 3 3 2 1
    [112] 2 3 1 2 3 1 3 2 1 1 2 3 1 3 2 2 1 3 2 3 1 1 2 3 2 1 3 3 1 2 3 1 2 2 3 1 1
    [149] 2 3 3 2 1 1 3 2 2 1 3 3 1 2 3 2 1 2 3 1 1 2 3 2 1 3 2 3 1 1 3 2 2 1 3 2 3
    [186] 1 1 2 3 2 3 1 3 2 1 2 1 3 2 3 1 3 2 1 2 3 1 2 3 1 2 3 1 1 2 3 3 2 1 2 3 1
    [223] 3 1 2 2 3 1 1 3 2 2 3 1 1 2 3 2 1 3 1 2 3 3 2 1 1 2 3 1 2 3 1 2 3 2 1 3 2
    [260] 3 1 1 3 2 1 2 3 2 3 1 3 1 2 2 1 3 2 1 3 2 3 1 1 3 2 1 2 3 2 1 3 2 3 1 3 2
    [297] 1 1 3 2 3 1 2 3 2 1 1 3 2
    Levels: 1 < 2 < 3
    • The model is ordered from 1st -3. You could also reverse this with ftc_rev. We can interpret this as positive slope = higher prob of being in 2-3rd place.
  • Convert input variables to categorical factors as appropriate.

    Code
    gb <- gb %>%
      mutate(Gender=as.factor(Gender))
    
    contrasts(gb$Gender) <- c(-.5, .5)
  • Run a ordinal logistic regression model against all relevant input variables. Interpret the effects for Gender, Age and Gender*Age (even if they are non-significant).

    Code
    ols_clm = clm(Rank~Gender*Age, data=gb)
    
    tidy(ols_clm) %>%
      kable()
    term estimate std.error statistic p.value coef.type
    1|2 -0.8419558 0.3422080 -2.460363 0.0138797 intercept
    2|3 0.5796803 0.3404226 1.702826 0.0886007 intercept
    Gender1 -1.1494724 0.6729011 -1.708234 0.0875930 location
    Age -0.0037139 0.0092343 -0.402186 0.6875471 location
    Gender1:Age 0.0387873 0.0185316 2.093039 0.0363456 location
    Note
    • Gender: Males more likely to place better in technical, b = -1.15, SE = .67, z = -1.71, p = 0.087. The log-odds of a baker achieving a higher rank as a male instead of a female is -1.149473 (95% CI [-2.46, 0.16]). Therefore, being male compared to being female is associated with a higher probability of achieving a higher rank. Converting this result to an odd ratio reveals that males are 0.32 times as likely to have a higher rank compared to female bakers (i.e. they are more likely to place higher in technical) (95% CI [0.08, 1.18]).

    • Age: An Increase in age results in increased log odds of placing better in technical, b = -0.004, SE = .009, z = -0.402, p = 0.688. Increase in age leads a decreases of odds by .997 of placing worse in technical.

    • Gender*Age: The effect of age on the log odds of achieving a better rank differs between genders, b = 0.038 (95% CI [0.002, 0.075]). Specifically, there is an increase in odds of achieving a better rank with age that is more pronounced for females than for males. Converting this result to an odd ratio reveals that the effect is 1.039 times more pronounced (95% CI [1.00, 1.07]).

  • Test if the interaction is warranted

    Code
    ols_clm_main = clm(Rank~Gender+Age, data=gb)
    
    ols_clm_inter = clm(Rank~Gender*Age, data=gb)
    
    anova(ols_clm_main, ols_clm_inter) %>% kable()
    no.par AIC logLik LR.stat df Pr(>Chisq)
    ols_clm_main 4 685.7206 -338.8603 NA NA NA
    ols_clm_inter 5 683.2836 -336.6418 4.436959 1 0.0351688
  • It is!

  • Use ggemmeans to create a figure showing the interaction between Gender and Age as a function of rank. Plot predicted probabilities from the model.

    Code
    ggemmeans(ols_clm, terms= c("Age[all]", "Gender")) %>%
      plot()

Latent Visualization

Note

ggeffects updated package to include latent viz.

Code
ols_clm = MASS::polr(Rank~Gender*Age, data=gb)

ggeffect(ols_clm, c("Age[all]", "Gender"), latent=TRUE) %>% plot()

  • Use the Brant test to support or reject the hypothesis that the proportional odds assumption holds for your simplified model.

    Code
    brant.test(ols_clm)
    
    Brant Test:
                   chi-sq   df   pr(>chi)
    Omnibus         1.295    3       0.73
    Gender1         0.585    1       0.44
    Age             0.415    1       0.52
    Gender1:Age     0.924    1       0.34
    
    H0: Proportional odds assumption holds

    brms

  • Reference the readings from this week and run the above model in brms. Do not worry about priors we will just use the default priors for this. You will notice there are no p-values (the horror!!!). Focus on the population-level effects from the output and look at the 95% credible intervals (if they don’t include zero you can use this as an index of significance, or lack of). Are there any differences between the model you fit with clm and the one you fit with brms?

Code
  ols2_brm = brm(Rank ~  Gender*Age, data=gb, family = cumulative, cores = 4,chains = 4)
  • Use the conditional_effects function to plot predicted probabilities by Gender and Age across each rank. Note: the papers use marginal_effects but that has been replaced with conditional_effects.

    Code
    conditional_effects(ols2_brm, categorical = T)

  • Use check_predictions from the easystats performance package. It is useful for examining model fit (i.e., does the data fit the model being used?). Run the below code. What do you think?

Code
check_predictions(ols2_brm)