Lab 4 - Answers

Author

Jason Geller

Data

The data for this assignment comes from an online Ipsos survey that was conducted for the FiveThirtyEight article “Why Many Americans Don’t Vote”. You can read more about the survey design and respondents in the README of the GitHub repo for the data.

Respondents were asked a variety of questions about their political beliefs, thoughts on multiple issues, and voting behavior. We will focus on using the demographic variables and someone’s party identification to understand whether a person is a probable voter.

The variables we’ll focus on were (definitions from the codebook in data set GitHub repo):

  • ppage: Age of respondent

  • educ: Highest educational attainment category.

  • race: Race of respondent, census categories. Note: all categories except Hispanic were non-Hispanic.

  • gender: Gender of respondent

  • income_cat: Household income category of respondent

  • Q30: Response to the question “Generally speaking, do you think of yourself as a…”

    • 1: Republican
    • 2: Democrat
    • 3: Independent
    • 4: Another party, please specify
    • 5: No preference
    • -1: No response
  • voter_category: past voting behavior:

    • always: respondent voted in all or all-but-one of the elections they were eligible in
    • sporadic: respondent voted in at least two, but fewer than all-but-one of the elections they were eligible in
    • rarely/never: respondent voted in 0 or 1 of the elections they were eligible in

You can read in the data directly from the GitHub repo:

library(nnet)
library(car)
Loading required package: carData
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ dplyr::recode() masks car::recode()
✖ purrr::some()   masks car::some()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(emmeans)
library(ggeffects)
library(knitr)
library(patchwork)
library(broom)
library(parameters)
library(easystats)
# Attaching packages: easystats 0.7.0 (red = needs update)
✔ bayestestR  0.13.2   ✔ correlation 0.8.4 
✔ datawizard  0.9.1    ✔ effectsize  0.8.6 
✖ insight     0.19.8   ✔ modelbased  0.8.7 
✖ performance 0.10.9   ✔ report      0.5.8 
✖ see         0.8.2    
Restart the R-Session and update packages with `easystats::easystats_update()`.


Attaching package: 'easystats'

The following object is masked from 'package:ggeffects':

    install_latest
voter_data <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/non-voters/nonvoters_data.csv")
Rows: 5836 Columns: 119
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr   (5): educ, race, gender, income_cat, voter_category
dbl (114): RespId, weight, Q1, Q2_1, Q2_2, Q2_3, Q2_4, Q2_5, Q2_6, Q2_7, Q2_...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Lab

  • The variable Q30 contains the respondent’s political party identification. Make a new variable that simplifies Q30 into four categories: “Democrat”, “Republican”, “Independent”, “Other” (“Other” also includes respondents who did not answer the question).

    voter_data <- voter_data %>%
      mutate(pol_ident_new = case_when(
        Q30==1 ~ "Rep", 
        Q30==2 ~ "Dem", 
        Q30==3 ~ "Indep", 
        TRUE ~ "Other"
      ))
  • The variable voter_category identifies the respondent’s past voter behavior. Relevel the variable to make rarely/never the baseline level, followed by sporadic, then always

    voter_data$voter_category <- factor(voter_data$voter_category, levels =c("rarely/never", "sporadic", "always"))
# center var
voter_data$ppage <- datawizard::center(voter_data$ppage)
  • In the FiveThirtyEight article, the authors include visualizations of the relationship between the voter category and demographic variables such as race, age, education, etc. Select two demographic variables. For each variable, try to replicate the visualizations and interpret the plot to describe its relationship with voter category. Have fun with it: https://www.mikelee.co/posts/2020-02-08-recreate-fivethirtyeight-chicklet-stacked-bar-chart-in-ggplot2.

     # library
    library(ggplot2)
    library(viridis)
    Loading required package: viridisLite
    library(cowplot)
    
    Attaching package: 'cowplot'
    The following object is masked from 'package:patchwork':
    
        align_plots
    The following object is masked from 'package:ggeffects':
    
        get_title
    The following object is masked from 'package:lubridate':
    
        stamp
    voter_data$race <- factor(voter_data$race, levels =c("Black", "Hispanic", "Other/Mixed", "White"))
    
    p_race <- ggplot(data = voter_data, aes(x = fct_rev(race), fill = voter_category)) +
      geom_bar(position = "fill") +
        labs(x="Race", y="Percentage") +
      theme(text = element_text(size = 16)) +  
       scale_x_discrete(limits = rev(levels("race")))+
      scale_fill_viridis(discrete = TRUE) + 
      scale_y_continuous(labels = scales::percent_format(accuracy = 1)) + 
      coord_flip()
    
    p_race

    voter_data <- voter_data %>%
      mutate(pol =  fct_relevel(pol_ident_new,"Dem", "Rep", "Indep", "Other"))
    
    
    p_id <- ggplot(voter_data, aes(x = fct_rev(pol), fill = voter_category)) +
      geom_bar(position = "fill") +
        labs(x="Political ID", y="Percentage") + 
      theme(text = element_text(size = 16)) +  
      scale_fill_viridis(discrete = TRUE) + 
      scale_y_continuous(labels = scales::percent_format(accuracy = 1)) + 
      coord_flip()
    
    p_id

    library(patchwork)
    p_id+ p_race + plot_layout(guides = "collect") + plot_annotation(title = 'Demographic information of survey repsondents, by voting history')

  • Fit a model using mean-centered age, race, gender, income, and education to predict voter category. Show the code used to fit the model, but do not display the model output.

    library(nnet)

    mm <- multinom(voter_category~ ppage + race + gender+ income_cat + educ, data=voter_data)

    model_parameters(mm)

Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
  using a Wald normal distribution approximation.

The model has a log- or logit-link. Consider using `exponentiate =
  TRUE` to interpret coefficients as ratios.
  • Should party identification be added to the model?
mm_red<- multinom(voter_category~ ppage + race + gender+ income_cat + educ, data=voter_data)

mm_full <- multinom(voter_category~ ppage + race + gender+ income_cat + educ + pol, data=voter_data)

    anova(mm_red, mm_full)
> Yes. It should be included.

Use the model you select for the remainder of the assignment.

LRT

  • Run the full model and report overall significance of each of the terms

    mm_use <- multinom(voter_category~ ppage+ educ + pol, data=voter_data)
    
    tidy(car::Anova(mm_use)) %>%
      kable()

    ppage, \(\chi^2(2)\) = 666.41, educ, \(\chi^2(4)\) = 252.81, p < .001, and pol \(\chi^2(6)\) = 171.91, p < .001

    mm_use %>%
      tidy() %>%
      kable()
    y.level term estimate std.error statistic p.value
    sporadic (Intercept) 1.5221680 0.0822932 18.4968772 0.0000000
    sporadic ppage 0.0457443 0.0023031 19.8621499 0.0000000
    sporadic educHigh school or less -1.0372010 0.0877549 -11.8192889 0.0000000
    sporadic educSome college -0.3864438 0.0904127 -4.2742199 0.0000192
    sporadic polRep -0.0388733 0.0964455 -0.4030594 0.6869045
    sporadic polIndep -0.3802996 0.0941814 -4.0379495 0.0000539
    sporadic polOther -0.9621902 0.1042842 -9.2266187 0.0000000
    always (Intercept) 1.3124074 0.0867341 15.1313876 0.0000000
    always ppage 0.0590465 0.0025373 23.2711612 0.0000000
    always educHigh school or less -1.4771769 0.0984861 -14.9988400 0.0000000
    always educSome college -0.4481505 0.0969629 -4.6218753 0.0000038
    always polRep -0.0020977 0.1027836 -0.0204086 0.9837174
    always polIndep -0.4879135 0.1025275 -4.7588553 0.0000019
    always polOther -1.4043029 0.1287482 -10.9073562 0.0000000

Marginal Effects Political Group - Emmeans

multi_an <- emmeans(mm_use, ~ pol|voter_category)
# uses baseline as contrast of interest
# can change this to get other baselines
# use trt.vs.ctrl" #ref = newbaseline
coefs = contrast(regrid(multi_an, "log"),"trt.vs.ctrl1",  by="pol")

update(coefs, by = "contrast") %>% 
  kable(format = "markdown", digits = 3)
contrast pol estimate SE df t.ratio p.value
sporadic - (rarely/never) Dem 0.963 0.064 14 15.001 0.000
always - (rarely/never) Dem 0.608 0.067 14 9.051 0.000
sporadic - (rarely/never) Rep 0.925 0.071 14 13.089 0.000
always - (rarely/never) Rep 0.606 0.074 14 8.233 0.000
sporadic - (rarely/never) Indep 0.600 0.069 14 8.660 0.000
always - (rarely/never) Indep 0.144 0.075 14 1.933 0.217
sporadic - (rarely/never) Other 0.054 0.081 14 0.673 0.866
always - (rarely/never) Other -0.723 0.105 14 -6.873 0.000

For every political party listed, voting is more likely. That is, Democrats (sporadic - (rarely/never) 2.6195433, Republicans(sporadic - (rarely/never 2.5092904, and Independents (sporadic - (rarely/never1.8221188 were more likely to sporadically vote compared to rarely/never. Voters affiliated with a political party were also more likely to always vote compared to never/rarely vote (Democrats:1.8367542; Republicans: 1.8404314; Independents: 1.1502738 ). The exception here is Other. Others were more likely to rarely vote compared to always vote (0.4852942

Marginal Effects of Education - Emmeans

multi_an <- emmeans(mm_use, ~ educ|voter_category)
# uses baseline as contrast of interest
# can change this to get other baselines
# use trt.vs.ctrl" #ref = newbaseline
coefs = contrast(regrid(multi_an, "log"),"trt.vs.ctrl1",  by="educ")

update(coefs, by = "contrast") %>% 
  kable(format = "markdown", digits = 3)
contrast educ estimate SE df t.ratio p.value
sporadic - (rarely/never) College 1.101 0.068 14 16.299 0.000
always - (rarely/never) College 0.781 0.070 14 11.193 0.000
sporadic - (rarely/never) High school or less 0.112 0.057 14 1.974 0.167
always - (rarely/never) High school or less -0.629 0.068 14 -9.296 0.000
sporadic - (rarely/never) Some college 0.728 0.067 14 10.811 0.000
always - (rarely/never) Some college 0.352 0.071 14 4.951 0.001

College educated (OR = 3.004166), High school or less (OR = 1.1185129), and Some college (OR = 1.1185129) were more likely to sporadically vote compared to rarely/never. College educated voters (OR = 1.1185129), and some college (OR = 1.1185129) were always more likely to vote compared to rarely/never. High school or less (OR = 0.5331247) were less likely to always vote compared to rarely/never.

  • Next, plot the predicted probabilities of voter category as a function of Age and Party ID

    ggemmeans(mm_use, terms = c("ppage")) %>% ggplot(., aes(x = x, y = predicted, fill = response.level)) +
      geom_area() + 
      geom_rug(sides = "b", position = "jitter", alpha = .5) + 
      labs(x = "\nAge", y = "Predicted Probablity\n", title = "Predicted Probabilities of Voting Frequency by Age") +
      scale_fill_manual(
        name = NULL,
        values = c("always" = "#F6B533", "sporadic" = "#D07EA2", "rarely/never" = "#9854F7"),
        labels = c("RARELY OR NEVER VOTE    ", "SOMETIMES VOTE    ", "ALMOST ALWAYS VOTE    "),
        breaks = c("rarely/never", "sporadic", "always")
      ) +
      theme_minimal()
    Data were 'prettified'. Consider using `terms="ppage [all]"` to get
      smooth plots.

    ggemmeans(mm_use, terms = c("educ")) %>% ggplot(., aes(x = x, y = predicted,fill = response.level)) +
      geom_bar(stat = "identity" ) +
      geom_text(aes(label = round(predicted, 3)), color="white", position = position_fill(vjust = 0.5), size = 4)  + 
      labs(x = "\nEducation", y = "Predicted Probablity\n", title = "Predicted Probabilities of Voting Frequency by Educational Attainment") +
      scale_fill_manual(
        name = NULL,
        values = c("always" = "#F6B533", "sporadic" = "#D07EA2", "rarely/never" = "#9854F7"),
        labels = c("RARELY OR NEVER VOTE    ", "SOMETIMES VOTE    ", "ALMOST ALWAYS VOTE    "),
        breaks = c("rarely/never", "sporadic", "always")
      ) +
      theme_minimal()

    ggemmeans(mm_use, terms = c("pol")) %>% ggplot(., aes(x = x, y = predicted,fill = response.level)) + 
      geom_bar(stat = "identity" ) +
      geom_text(aes(label = round(predicted, 3)), color="white", position = position_fill(vjust = 0.5), size = 4)  + 
      labs(x = "\nEducation", y = "Predicted Probablity\n", title = "Predicted Probabilities of Voting Frequency by Party Identification") +
      scale_fill_manual(
        name = NULL,
        values = c("always" = "#F6B533", "sporadic" = "#D07EA2", "rarely/never" = "#9854F7"),
        labels = c("RARELY OR NEVER VOTE    ", "SOMETIMES VOTE    ", "ALMOST ALWAYS VOTE    "),
        breaks = c("rarely/never", "sporadic", "always")
      ) +
      theme_minimal()

    ## Write-up

Differences between political groups and voting behavior - Emmeans

multi_an <- emmeans(mm_use, ~ pol|voter_category)
# uses baseline as contrast of interest
# can change this to get other baselines
# use trt.vs.ctrl" #ref = newbaseline

coefs = contrast(regrid(multi_an, "log"),"trt.vs.ctrl1",  by="pol")

update(coefs, by = "contrast") %>% 
  kable(format = "markdown", digits = 3)
contrast pol estimate SE df t.ratio p.value
sporadic - (rarely/never) Dem 0.963 0.064 14 15.001 0.000
always - (rarely/never) Dem 0.608 0.067 14 9.051 0.000
sporadic - (rarely/never) Rep 0.925 0.071 14 13.089 0.000
always - (rarely/never) Rep 0.606 0.074 14 8.233 0.000
sporadic - (rarely/never) Indep 0.600 0.069 14 8.660 0.000
always - (rarely/never) Indep 0.144 0.075 14 1.933 0.217
sporadic - (rarely/never) Other 0.054 0.081 14 0.673 0.866
always - (rarely/never) Other -0.723 0.105 14 -6.873 0.000
# get difference between yes-no and fair-excellent
contrast(coefs, "revpairwise", by = "contrast") %>%
  kable(format = "markdown", digits = 3)
contrast1 contrast estimate SE df t.ratio p.value
Rep - Dem sporadic - (rarely/never) -0.039 0.094 14 -0.412 0.976
Indep - Dem sporadic - (rarely/never) -0.363 0.091 14 -3.974 0.007
Indep - Rep sporadic - (rarely/never) -0.324 0.098 14 -3.323 0.023
Other - Dem sporadic - (rarely/never) -0.909 0.102 14 -8.934 0.000
Other - Rep sporadic - (rarely/never) -0.871 0.106 14 -8.183 0.000
Other - Indep sporadic - (rarely/never) -0.546 0.106 14 -5.176 0.001
Rep - Dem always - (rarely/never) -0.002 0.098 14 -0.018 1.000
Indep - Dem always - (rarely/never) -0.464 0.097 14 -4.755 0.002
Indep - Rep always - (rarely/never) -0.462 0.103 14 -4.464 0.003
Other - Dem always - (rarely/never) -1.331 0.124 14 -10.725 0.000
Other - Rep always - (rarely/never) -1.329 0.128 14 -10.400 0.000
Other - Indep always - (rarely/never) -0.867 0.129 14 -6.739 0.000

Differences between education level and voting behavior - Emmeans

multi_an <- emmeans(mm_use, ~ educ|voter_category)
# uses baseline as contrast of interest
# can change this to get other baselines
# use trt.vs.ctrl" #ref = newbaseline
coefs = contrast(regrid(multi_an, "log"),"trt.vs.ctrl1",  by="educ")

update(coefs, by = "contrast") %>% 
  kable(format = "markdown", digits = 3)
contrast educ estimate SE df t.ratio p.value
sporadic - (rarely/never) College 1.101 0.068 14 16.299 0.000
always - (rarely/never) College 0.781 0.070 14 11.193 0.000
sporadic - (rarely/never) High school or less 0.112 0.057 14 1.974 0.167
always - (rarely/never) High school or less -0.629 0.068 14 -9.296 0.000
sporadic - (rarely/never) Some college 0.728 0.067 14 10.811 0.000
always - (rarely/never) Some college 0.352 0.071 14 4.951 0.001
# get difference between yes-no and fair-excellent
contrast(coefs, "revpairwise", by = "contrast") %>%
  kable(format = "markdown", digits = 3)
contrast1 contrast estimate SE df t.ratio p.value
High school or less - College sporadic - (rarely/never) -0.989 0.087 14 -11.414 0.000
Some college - College sporadic - (rarely/never) -0.372 0.088 14 -4.212 0.002
Some college - High school or less sporadic - (rarely/never) 0.616 0.087 14 7.086 0.000
High school or less - College always - (rarely/never) -1.410 0.096 14 -14.696 0.000
Some college - College always - (rarely/never) -0.429 0.093 14 -4.599 0.001
Some college - High school or less always - (rarely/never) 0.981 0.097 14 10.150 0.000

A multinomial model was estimated using the nnet package in R to investigate whether political party identification (Democrat, Independent, Republican, Other), education (high school or less, some college, college degree), and age (grand mean centered; M = 51.69) influence voting frequency (rarely or never vote, vote sporadically, almost always vote). All three predictors were significantly associated with voting frequency: party identification, \(\chi^2\) (6) = 171.91, p < .001; education, \(\chi^2\) (4) = 252.81, p < .001; and age, \(\chi^2\) (2) = 666.41, p < .001, \(R^2_{mcfadden}\) = .09. The odds of Independents (and those who support other parties or none) voting sporadically (versus rarely or never) were lower. Specifically, independents were times less likely compared to Democrats, lower compared to Republicans. Others were 0.4029269 times lower compared to Democrats, lower compared to Republicans. The odds that Republicans, relative to Democrats, voted sporadically was negligible. The pattern of results is similar when comparing the odds of always voting versus rarely or never voting. Supporters of all other parties in our data had lower odds of always voting compared to Democrats (Independents: OR = 0.6287636; Other: OR = 0.2644773) and Republicans (Independents: OR = 0.6300223; Other: OR = 0.2671353.

Those with high school and some college education were more likely to rarely vote compared to sporadically vote compared to college educated persons (High School: OR = 0.3719485, p < .001) and some college (OR = 0.6893542, p < .001) or always (High School: OR = 0.2441433, p < .001) and (some college: OR = 0.6511599, p < .001). Stated a bit differently, college voters were more likely to vote than those with a high school or some college education. We also see that those with some college education vs. High school or less were more likely to sporadically vote (Some College: OR = 1.8515072, p < .001) or always vote (Some College: OR = 2.667122, p < .001) compared to rarely/never vote.

For each one-year increase in age beyond 52 (the mean), the odds of voting sporadically (versus rarely or never) were 1.05 times higher (p < .001) relative to the baseline voter. They were 1.06 times higher for always voting (versus rarely or never).