Ordinal regression in R: part 1

A theoretical and applied walkthrough of ordinal regression. Part 1: the frequentist approach with ordinal.

R
regression
ordinal
frequentist statistics
Author
Published

March 15, 2020

R setup
library(tidyverse)
library(dunnr)
library(gt)
library(broom)
library(patchwork)

extrafont::loadfonts(device = "win", quiet = TRUE)
theme_set(theme_td())
set_geom_fonts()
set_palette()

wine_red <- "#58181F"
update_geom_defaults("point", list(color = wine_red))
update_geom_defaults("line", list(color = wine_red))

The purpose of this post is to learn more about ordinal regression models (a.k.a. cumulative link, proportional odds, ordered logit models, etc.) and practice their implementation in R. This is part 1, where I’ll be taking the frequentist approach via the ordinal package. There are other options, like MASS::polr, but two features in particular drew me to ordinal: (1) it allows for random effects, and (2) it has broom::tidy methods available.

Particularly, I’ll be following along with

Setup

Import ordinal, and the included data set wine:

library(ordinal)
data(wine)
wine <- as_tibble(wine)
glimpse(wine)
Rows: 72
Columns: 6
$ response <dbl> 36, 48, 47, 67, 77, 60, 83, 90, 17, 22, 14, 50, 30, 51, 90, 7…
$ rating   <ord> 2, 3, 3, 4, 4, 4, 5, 5, 1, 2, 1, 3, 2, 3, 5, 4, 2, 3, 3, 2, 5…
$ temp     <fct> cold, cold, cold, cold, warm, warm, warm, warm, cold, cold, c…
$ contact  <fct> no, no, yes, yes, no, no, yes, yes, no, no, yes, yes, no, no,…
$ bottle   <fct> 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5…
$ judge    <fct> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3…

wine is a data set from Randall (1989) of wine bitterness ratings from multiple judges. The variables are as follows:

  • Outcome:
    • response: wine bitterness rating on a 0-100 scale
    • rating: ordered factor with 5 levels (grouped version of response) with 1 = “least bitter” and 5 = “most bitter”
  • Treatment factors:
    • temp: temperature during wine production (cold and warm)
    • contact: contact between juice and skins during wine production (no and yes)
  • Random effects
    • bottle with 8 levels
    • judge with 9 levels

Relationship between response and rating:

wine %>%
  ggplot(aes(y = rating, x = response)) +
  geom_boxplot(width = 0.5) +
  geom_jitter(alpha = 0.5)

Note that there is no overlap between the levels.

There are 72 total observations with the following ratings distribution by treatment and random effects:

wine %>%
  transmute(temp, contact, bottle, judge, rating = as.numeric(rating)) %>%
  pivot_wider(names_from = judge, values_from = rating) %>%
  gt() %>%
  tab_spanner(columns = `1`:`9`, label = "judge") %>%
  data_color(
    columns = `1`:`9`,
    colors = scales::col_numeric(
      palette = c("white", wine_red), domain = c(1, 5)
    )
  )
temp contact bottle judge
1 2 3 4 5 6 7 8 9
cold no 1 2 1 2 3 2 3 1 2 1
cold no 2 3 2 3 2 3 2 1 2 2
cold yes 3 3 1 3 3 4 3 2 2 3
cold yes 4 4 3 2 2 3 2 2 3 2
warm no 5 4 2 5 3 3 2 2 3 3
warm no 6 4 3 5 2 3 4 3 3 2
warm yes 7 5 5 4 5 3 5 2 3 4
warm yes 8 5 4 4 3 3 4 3 4 4

So each bottle had a particular temp and contact (2 bottles for each of the 4 combinations), and each judge rated the bitterness each bottle.

Before modeling, can we see a clear effect of temp and contact?

wine %>%
  count(contact, rating, temp) %>%
  mutate(temp = fct_rev(temp)) %>%
  ggplot(aes(x = temp, y = rating, color = temp)) +
  geom_point(aes(group = temp, size = n)) +
  facet_wrap(~contact, scales = "free_x",
             labeller = labeller(contact = label_both)) +
  scale_size(breaks = c(1, 2, 4, 6, 8)) +
  add_facet_borders()

At a glance, it looks like the temp = warm and contact = yes is associated with higher ratings.

Conclusion

Thanks to detailed documentation, fitting cumulative link (mixed) models is very easy with ordinal. In this post, we first learned the theoretical basis for these models, then worked through examples using wine bitterness ratings from multiple judges.

In the next post, I’ll explore the Bayesian approach to ordinal regression with the brms package.

Reproducibility

Session info
 setting  value
 version  R version 4.2.1 (2022-06-23 ucrt)
 os       Windows 10 x64 (build 19044)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Canada.utf8
 ctype    English_Canada.utf8
 tz       America/Curacao
 date     2022-10-27
 pandoc   2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)
Git repository
Local:    main C:/Users/tdunn/Documents/tdunn-quarto
Remote:   main @ origin (https://github.com/taylordunn/tdunn-quarto.git)
Head:     [4eb5bf2] 2022-10-26: Added font import to style sheet

Source code, R environment

References

Randall, J. H. 1989. The Analysis of Sensory Data by Generalized Linear Model.” Biometrical Journal 31 (7): 781–93. https://doi.org/10.1002/bimj.4710310703.

Reuse

Citation

BibTeX citation:
@online{dunn2020,
  author = {Dunn, Taylor},
  title = {Ordinal Regression in {R:} Part 1},
  date = {2020-03-15},
  url = {https://tdunn.ca/posts/2020-03-15-ordinal-regression-in-r-part-1},
  langid = {en}
}
For attribution, please cite this work as:
Dunn, Taylor. 2020. “Ordinal Regression in R: Part 1.” March 15, 2020. https://tdunn.ca/posts/2020-03-15-ordinal-regression-in-r-part-1.