tdunn - Ordinal regression in R: part 1

R setup

library(tidyverse)
library(dunnr)
library(gt)
library(broom)
library(patchwork)

extrafont::loadfonts(device = "win", quiet = TRUE)
theme_set(theme_td())
set_geom_fonts()
set_palette()

wine_red <- "#58181F"
update_geom_defaults("point", list(color = wine_red))
update_geom_defaults("line", list(color = wine_red))

The purpose of this post is to learn more about ordinal regression models (a.k.a. cumulative link, proportional odds, ordered logit models, etc.) and practice their implementation in R. This is part 1, where I’ll be taking the frequentist approach via the ordinal package. There are other options, like MASS::polr, but two features in particular drew me to ordinal: (1) it allows for random effects, and (2) it has broom::tidy methods available.

Particularly, I’ll be following along with

this excellent primer which includes theory and application, and
this vignette which is a tutorial on incorporating random effects.

Setup

Import ordinal, and the included data set wine:

library(ordinal)
data(wine)
wine <- as_tibble(wine)
glimpse(wine)

Rows: 72
Columns: 6
$ response <dbl> 36, 48, 47, 67, 77, 60, 83, 90, 17, 22, 14, 50, 30, 51, 90, 7…
$ rating   <ord> 2, 3, 3, 4, 4, 4, 5, 5, 1, 2, 1, 3, 2, 3, 5, 4, 2, 3, 3, 2, 5…
$ temp     <fct> cold, cold, cold, cold, warm, warm, warm, warm, cold, cold, c…
$ contact  <fct> no, no, yes, yes, no, no, yes, yes, no, no, yes, yes, no, no,…
$ bottle   <fct> 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5…
$ judge    <fct> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3…

wine is a data set from Randall (1989) of wine bitterness ratings from multiple judges. The variables are as follows:

Outcome:
- response: wine bitterness rating on a 0-100 scale
- rating: ordered factor with 5 levels (grouped version of response) with 1 = “least bitter” and 5 = “most bitter”
Treatment factors:
- temp: temperature during wine production (cold and warm)
- contact: contact between juice and skins during wine production (no and yes)
Random effects
- bottle with 8 levels
- judge with 9 levels

Relationship between response and rating:

wine %>%
  ggplot(aes(y = rating, x = response)) +
  geom_boxplot(width = 0.5) +
  geom_jitter(alpha = 0.5)

Note that there is no overlap between the levels.

There are 72 total observations with the following ratings distribution by treatment and random effects:

wine %>%
  transmute(temp, contact, bottle, judge, rating = as.numeric(rating)) %>%
  pivot_wider(names_from = judge, values_from = rating) %>%
  gt() %>%
  tab_spanner(columns = `1`:`9`, label = "judge") %>%
  data_color(
    columns = `1`:`9`,
    colors = scales::col_numeric(
      palette = c("white", wine_red), domain = c(1, 5)
    )
  )

temp	contact	bottle	judge
temp	contact	bottle	1	2	3	4	5	6	7	8	9
cold	no	1	2	1	2	3	2	3	1	2	1
cold	no	2	3	2	3	2	3	2	1	2	2
cold	yes	3	3	1	3	3	4	3	2	2	3
cold	yes	4	4	3	2	2	3	2	2	3	2
warm	no	5	4	2	5	3	3	2	2	3	3
warm	no	6	4	3	5	2	3	4	3	3	2
warm	yes	7	5	5	4	5	3	5	2	3	4
warm	yes	8	5	4	4	3	3	4	3	4	4

So each bottle had a particular temp and contact (2 bottles for each of the 4 combinations), and each judge rated the bitterness each bottle.

Before modeling, can we see a clear effect of temp and contact?

wine %>%
  count(contact, rating, temp) %>%
  mutate(temp = fct_rev(temp)) %>%
  ggplot(aes(x = temp, y = rating, color = temp)) +
  geom_point(aes(group = temp, size = n)) +
  facet_wrap(~contact, scales = "free_x",
             labeller = labeller(contact = label_both)) +
  scale_size(breaks = c(1, 2, 4, 6, 8)) +
  add_facet_borders()

At a glance, it looks like the temp = warm and contact = yes is associated with higher ratings.

The cumulative link model

Theory

The ordinal response \(y_i\) falls into response category \(j\) (out of \(J\) total) with probability \(\pi_{ij}\). The cumulative probabilities are defined:

\[ P(y_i \leq j) = \pi_{i1} + \dots + \pi_{ij}. \]

As an oversimplification, suppose that each probability \(\pi_{ij}\) is equal to the proportion of that response in the wine data. Then the cumulative “probability” can be visualized:

wine_prop <- wine %>%
  count(rating) %>%
  mutate(p = n / sum(n), cumsum_p = cumsum(p))

(
  ggplot(wine_prop, aes(x = rating, y = p)) +
    geom_col(fill = wine_red) +
    scale_y_continuous(labels = scales::percent, expand = c(0, 0)) +
    labs(x = "j", y = "proportion")
) +
  (
    ggplot(wine_prop, aes(x = as.integer(rating), y = cumsum_p)) +
      geom_point(size = 2) +
      geom_line(size = 1) +
      labs(x = "j", y = "cumulative proportion")
  ) +
  (
    ggplot(wine_prop,
        aes(x = as.integer(rating), y = log(cumsum_p) - log(1 - cumsum_p))) +
      geom_point(size = 2) +
      geom_line(size = 1) +
      labs(x = "j", y = "logit(cumulative proportion)")
  )

We will explore other links, but first the most common, the logit link, which is depicted in the right-most panel of the above figure:

\[ \text{logit} (P(y_i \leq j) = \log \frac{P(y_i \leq j)}{1 - P(y_i \leq j)} \]

Note that the above function is defined for all but the last category \(j = J\), because \(1 - P(Y_i \leq J) = 1 - 1 = 0\).

For the wine data, where we have \(J\) = 5 rating categories, we will build up to the following mixed effects model:

\[ \begin{align} \text{logit}(p(y_i \leq j)) &= \theta_j - \beta_1 \text{temp}_i - \beta_2 \text{contact}_i - u( \text{judge}_i) \\ i &= 1, \dots n \; \; \; \; \; \; j = 1, \dots, J - 1 \end{align} \]

where \(\theta_j\) is called the threshold parameter, or cutpoint, of category \(j\). These thresholds can also be thought of as \(J-1\) = 4 intercepts. Note that the fixed effect parameters \(\beta_1\) and \(\beta_2\) are independent of \(j\), so each \(\beta\) has the same effect for each of the \(J-1\) cumulative logits. The judge effects, which are also independent of \(j\), are assumed normal: \(u(\text{judge}_i) \sim N(0, \sigma_u^2)\). We are using the logit link because it is the most popular for this kind of model (and the one I am familiar with), but there are other options we will briefly explore later.

The subtraction of terms in the above model is new to me. The main reason seems to be for familiar interpretation: the larger the value of any independent term \(\beta x\), the smaller the thresholds \(\theta_j\), and therefore a larger probability of the a response falling into a category at the upper end of the scale. This way, \(\beta\) has the same direction of effect as in ordinary linear regression.

We are essentially modeling a “chain” of logistic regressions where the binary response is “less than or equal to a certain level” vs “greater than that level”. In this case, with \(J\) = 5, the thresholds \(\theta_j\) are capturing the adjusted log-odds of observing:

\(j\) = 1: log-odds of rating = 1 vs. 2-5
\(j\) = 2: log-odds of rating = 1-2 vs. 3-5
\(j\) = 3: log-odds of rating = 1-3 vs. 4-5
\(j\) = 4: log-odds of rating = 1-4 vs. 5

Fitting

Now with a surface-level understanding of what is being modeled, we will fit the data using ordinal::clm (cumulative link models) and ordinal::clmm (cumulative link mixed models), and logit links.

Fixed effects model

First, fit a simple model, by maximum likelihood, with contact as the sole predictor:

\[ \text{logit}(p(y_i \leq j)) = \theta_j - \beta_2 \text{contact}_i \]

clm_rating_contact <-
  clm(
    rating ~ contact,
    data = wine, link = "logit"
  )
summary(clm_rating_contact)

formula: rating ~ contact
data:    wine

 link  threshold nobs logLik AIC    niter max.grad cond.H 
 logit flexible  72   -99.96 209.91 5(0)  1.67e-07 1.7e+01

Coefficients:
           Estimate Std. Error z value Pr(>|z|)   
contactyes   1.2070     0.4499   2.683   0.0073 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Threshold coefficients:
    Estimate Std. Error z value
1|2 -2.13933    0.48981  -4.368
2|3  0.04257    0.32063   0.133
3|4  1.71449    0.38637   4.437
4|5  2.97875    0.50207   5.933

The model gives us \(K - 1 = 4\) threshold coefficients, as expected. The \(\beta_2\) coefficient estimate was statistically significant (by a Wald test), and tells us that contact = yes decreases the thresholds \(\theta_j\) by \(\beta_2\) = 1.21 (because of the subtraction of model terms), and therefore is associated with higher ratings.

The condition number of the Hessian for this model is 16.98. The ordinal primer says that larger values (like > 1e4) might indicate that the model is ill-defined.

It is nicely illustrative to compare this model to 4 separate logistic regressions with a dichotomized response:

\[ \begin{align} \text{logit} (p(y_i \leq 1)) &= \theta_1 + \beta_2 \text{contact}_i \\ \text{logit} (p(y_i \leq 2)) &= \theta_2 + \beta_2 \text{contact}_i \\ \text{logit} (p(y_i \leq 3)) &= \theta_3 + \beta_2 \text{contact}_i \\ \text{logit} (p(y_i \leq 4)) &= \theta_4 + \beta_2 \text{contact}_i \end{align} \]

wine %>%
  crossing(j = 1:4) %>%
  # Create a binary (0 or 1) to indicate where rating <= j
  mutate(rating_leq_j = as.numeric(rating) <= j) %>%
  group_by(j) %>%
  nest() %>%
  ungroup() %>%
  mutate(
    mod = map(
      data,
      ~glm(rating_leq_j ~ 1 + contact,
           data = ., family = binomial(link = "logit")) %>% broom::tidy()
    )
  ) %>%
  unnest(mod) %>%
  transmute(
    j, term,
    estimate_se = str_c(round(estimate, 2), " (", round(std.error, 2), ")")
  ) %>%
  pivot_wider(names_from = term, values_from = estimate_se) %>%
  left_join(
    tidy(clm_rating_contact) %>%
      transmute(
        j = as.integer(substr(term, 1, 1)),
        term = if_else(!is.na(j), "theta_j", term),
        estimate_se = str_c(round(estimate, 2), " (", round(std.error, 2), ")")
      ) %>%
      mutate(j = replace_na(j, 1)) %>%
      spread(term, estimate_se),
    by = "j"
  ) %>%
  ungroup() %>%
  gt() %>%
  tab_spanner(label = "Logistic regression",
              columns = c(`(Intercept)`, contactyes.x)) %>%
  tab_spanner(label = "CLM",
              columns = c(theta_j, contactyes.y)) %>%
  fmt_missing(columns = everything(), missing_text = "")

j	Logistic regression		CLM
j	(Intercept)	contactyes.x	theta_j	contactyes.y
1	-2.08 (0.53)	-1.48 (1.14)	-2.14 (0.49)	1.21 (0.45)
2	0 (0.33)	-1.1 (0.51)	0.04 (0.32)
3	1.82 (0.48)	-1.37 (0.59)	1.71 (0.39)
4	2.83 (0.73)	-1.01 (0.87)	2.98 (0.5)

The intercepts from the ordinary logistic regression correspond closely to the threshold parameters \(\theta_j\) from the cumulative link model. In the fixed effect of contact (\(\beta_2\)), first note the sign difference, and second notice that the estimate from CLM is about the average of the 4 estimates from logistic regression. The advantage of the CLM is seen in the small standard error in the \(\beta_2\) estimate.

To quote the primer:

The cumulative logit model can be seen as the model that combines these four ordinary logistic regression models into a single model and therefore makes better use of the information in the data.

For the second model, we add the \(\beta_1 \text{temp}_i\) term:

\[ \text{logit}(p(y_i \leq j)) = \theta_j - \beta_1 \text{temp}_i - \beta_2 \text{contact}_i \]

clm_rating_contact_temp <-
  clm(
    rating ~ contact + temp,
    data = wine, link = "logit"
  )
summary(clm_rating_contact_temp)

formula: rating ~ contact + temp
data:    wine

 link  threshold nobs logLik AIC    niter max.grad cond.H 
 logit flexible  72   -86.49 184.98 6(0)  4.01e-12 2.7e+01

Coefficients:
           Estimate Std. Error z value Pr(>|z|)    
contactyes   1.5278     0.4766   3.205  0.00135 ** 
tempwarm     2.5031     0.5287   4.735 2.19e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Threshold coefficients:
    Estimate Std. Error z value
1|2  -1.3444     0.5171  -2.600
2|3   1.2508     0.4379   2.857
3|4   3.4669     0.5978   5.800
4|5   5.0064     0.7309   6.850

Both fixed effects (contact = yes and temp = warm) are strongly associated with higher probability of higher ratings. The summary function provides \(p\)-values from Wald tests, but more accurate likelihood ratio tests can be done via the drop1 function, which evaluates each fixed effect while controlling the other:

drop1(clm_rating_contact_temp, test = "Chisq")

Single term deletions

Model:
rating ~ contact + temp
        Df    AIC    LRT  Pr(>Chi)    
<none>     184.98                     
contact  1 194.03 11.043 0.0008902 ***
temp     1 209.91 26.928 2.112e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Or the reverse via the add1() function, which evaluates each fixed effect while ignoring the other:

# Fit the null model first
clm_rating_null <- clm(rating ~ 1, data = wine, link = "logit")
add1(clm_rating_null, scope = ~ contact + temp, test = "Chisq")

Single term additions

Model:
rating ~ 1
        Df    AIC     LRT  Pr(>Chi)    
<none>     215.44                      
contact  1 209.91  7.5263   0.00608 ** 
temp     1 194.03 23.4113 1.308e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Symmetric Wald confidence intervals can be extracted with confint or with broom::tidy:

tidy(clm_rating_contact_temp, conf.int = TRUE, conf.type = "Wald") %>%
  ggplot(aes(y = term, x = estimate)) +
  geom_point(size = 2) +
  geom_linerange(size = 1, aes(xmin = conf.low, xmax = conf.high))

In these types of analyses, we are often interested in the odds ratios. For the two categorical fixed effects, which have two levels each, the odds ratios \(y \leq j\) comparing the two levels are:

\[ \begin{align} \text{OR} &= \frac{\gamma_j (\text{temp} = \text{warm})}{\gamma_j (\text{temp} = \text{cold})} = \frac{\exp(\theta_j - \beta_1 - \beta_2 \text{contact})}{\exp (\theta_j - 0 - \beta_2 \text{contact}}) = \exp(\beta_1) \\ \text{OR} &= \frac{\gamma_j (\text{contact} = \text{yes})}{\gamma_j (\text{contact} = \text{no})} = \frac{\exp(\theta_j - \beta_1 \text{temp} - \beta_2 )}{\exp (\theta_j - \beta_1 \text{temp} - 0)}) = \exp(\beta_2) \end{align} \]

where we have introduced the shorthand \(\gamma_j = \text{logit} (p(y \leq j))\). Compute those odds ratios, and their corresponding Wald 95% CIs:

tidy(clm_rating_contact_temp, conf.int = T, conf.type = "Wald") %>%
  transmute(
    term, across(c(estimate, conf.low, conf.high), exp)
  ) %>%
  gt() %>%
  fmt_number(c(estimate, conf.low, conf.high), decimals = 2)

term	estimate	conf.low	conf.high
1\|2	0.26	0.09	0.72
2\|3	3.49	1.48	8.24
3\|4	32.04	9.93	103.39
4\|5	149.37	35.65	625.75
contactyes	4.61	1.81	11.73
tempwarm	12.22	4.34	34.44

One last thing to check: does the data support an interaction between \(\text{temp}_i\) and \(\text{contact}_i\)?

\[ \text{logit}(p(y_i \leq j)) = \theta_j - \beta_1 \text{temp}_i - \beta_2 \text{contact}_i - \beta_3 \text{temp}_i \text{contact}_i \]

clm_rating_contact_temp_inter <-
  clm(
    rating ~ contact * temp, data = wine, link = "logit"
  )

#drop1(clm_rating_contact_temp_inter, test = "Chisq") # this accomplishes the same thing as anova()
anova(clm_rating_contact_temp, clm_rating_contact_temp_inter)

Likelihood ratio tests of cumulative link models:
 
                              formula:                link: threshold:
clm_rating_contact_temp       rating ~ contact + temp logit flexible  
clm_rating_contact_temp_inter rating ~ contact * temp logit flexible  

                              no.par    AIC  logLik LR.stat df Pr(>Chisq)
clm_rating_contact_temp            6 184.98 -86.492                      
clm_rating_contact_temp_inter      7 186.83 -86.416  0.1514  1     0.6972

No, The interaction term contact:temp is not supported by the data.

Comparison to linear model

Consider the following linear model which treats rating as continuous:

\[ y_i = \alpha + \beta_1 \text{temp}_i + \beta_2 \text{contact}_i + \epsilon_i \]

where \(\epsilon_i \sim N(0, \sigma_{\epsilon}^2)\).

lm_rating_contact_temp <- lm(as.numeric(rating) ~ contact + temp, data = wine)

To compare this to a CLM, we must use the probit link:

clm_rating_contact_temp_probit <-
  clm(
    rating ~ contact + temp, data = wine, link = "probit"
  )
tidy(clm_rating_contact_temp_probit) %>%
  filter(coef.type == "location") %>%
  mutate(model = "CLM") %>%
  select(-coef.type) %>%
  bind_rows(
    tidy(lm_rating_contact_temp) %>%
      filter(term != "(Intercept)") %>%
      # Need to divide by the residual SE here to get the right scale
      mutate(estimate = estimate / summary(lm_rating_contact_temp)$sigma,
             model = "LM")
  ) %>%
  group_by(model) %>%
  gt() %>%
  fmt_number(c(estimate, std.error, statistic), decimals = 2) %>%
  fmt(p.value, fns = scales::pvalue)

term	estimate	std.error	statistic	p.value
CLM
contactyes	0.87	0.27	3.25	0.001
tempwarm	1.50	0.29	5.14	<0.001
LM
contactyes	0.79	0.20	3.36	0.001
tempwarm	1.38	0.20	5.87	<0.001

The relative estimates from the linear model are lower than those from the CLM (probit link), indicating that the assumptions of the linear model are not met. In particular, the distance between thresholds is not equidistant, as we can see from differences in the CLM coefficients:

diff(coef(clm_rating_contact_temp_probit)[1:4]) %>% round(2)

 2|3  3|4  4|5 
1.51 1.31 0.90

Mixed effects model

Now that we have explored ordinal regression with just fixed effects, we will fit the following random effects model:

where the judge effects are independent of \(j\), and assumed normal: \(u(\text{judge}_i) \sim N(0, \sigma_u^2)\).

Each judge has 8 ratings each (two per combination of temp and contact). See if we can spot the judge variance in a plot of ratings:

wine %>%
  count(judge, rating) %>%
  ggplot(aes(x = judge, y = rating)) +
  geom_tile(aes(fill = n)) +
  geom_text(aes(label = n), color = "white") +
  scale_x_discrete(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0)) +
  theme(legend.position = "none") +
  labs(title = "Number of ratings by judge")

There is definitely some judge-specific variability in the perception of bitterness of wine. judge 5, for instance, doesn’t stray far from rating = 3, while judge 7 didn’t consider any of the wines particularly bitter.

Fit the full model with ordinal::clmm and logit link:

clmm_rating_contact_temp <-
  clmm(
    rating ~ temp + contact + (1|judge),
    data = wine, link = "logit"
  )
# This is an older function, which we need to run stats::profile later
clmm2_rating_contact_temp <-
  clmm2(
    rating ~ temp + contact, random = judge,
    data = wine, link = "logistic"
  )
summary(clmm_rating_contact_temp)

Cumulative Link Mixed Model fitted with the Laplace approximation

formula: rating ~ temp + contact + (1 | judge)
data:    wine

 link  threshold nobs logLik AIC    niter    max.grad cond.H 
 logit flexible  72   -81.57 177.13 332(999) 1.03e-05 2.8e+01

Random effects:
 Groups Name        Variance Std.Dev.
 judge  (Intercept) 1.279    1.131   
Number of groups:  judge 9 

Coefficients:
           Estimate Std. Error z value Pr(>|z|)    
tempwarm     3.0630     0.5954   5.145 2.68e-07 ***
contactyes   1.8349     0.5125   3.580 0.000344 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Threshold coefficients:
    Estimate Std. Error z value
1|2  -1.6237     0.6824  -2.379
2|3   1.5134     0.6038   2.507
3|4   4.2285     0.8090   5.227
4|5   6.0888     0.9725   6.261

Compare model coefficients:

bind_rows(
  CLM = tidy(clm_rating_contact_temp),
  CLMM = tidy(clmm_rating_contact_temp),
  .id = "model"
) %>%
  select(-coef.type) %>%
  group_by(model) %>%
  gt() %>%
  fmt_number(c(estimate, std.error, statistic), decimals = 2) %>%
  fmt(p.value, fns = scales::pvalue)

term	estimate	std.error	statistic	p.value
CLM
1\|2	−1.34	0.52	−2.60	0.009
2\|3	1.25	0.44	2.86	0.004
3\|4	3.47	0.60	5.80	<0.001
4\|5	5.01	0.73	6.85	<0.001
contactyes	1.53	0.48	3.21	0.001
tempwarm	2.50	0.53	4.73	<0.001
CLMM
1\|2	−1.62	0.68	−2.38	0.017
2\|3	1.51	0.60	2.51	0.012
3\|4	4.23	0.81	5.23	<0.001
4\|5	6.09	0.97	6.26	<0.001
tempwarm	3.06	0.60	5.14	<0.001
contactyes	1.83	0.51	3.58	<0.001

Both fixed effect estimates \(\beta_1\) and \(\beta_2\) are higher in the CLMM. Use anova to compare the CLMM to the CLM:

anova(clm_rating_contact_temp, clmm_rating_contact_temp)

Likelihood ratio tests of cumulative link models:
 
                         formula:                              link: threshold:
clm_rating_contact_temp  rating ~ contact + temp               logit flexible  
clmm_rating_contact_temp rating ~ temp + contact + (1 | judge) logit flexible  

                         no.par    AIC  logLik LR.stat df Pr(>Chisq)   
clm_rating_contact_temp       6 184.98 -86.492                         
clmm_rating_contact_temp      7 177.13 -81.565   9.853  1   0.001696 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Unsurprisingly, the judge term makes a significant improvement to the fit. We can extract profile confidence intervals on the variance \(\sigma_u\) using stats::profile:

profile(clmm2_rating_contact_temp,
        range = c(0.1, 4), nSteps = 30, trace = 0) %>%
  confint()

          2.5 %  97.5 %
stDev 0.5014584 2.26678

Note that these intervals are asymmetric (\(\sigma_u\) = 1.28), unlike the less accurate Wald tests. We can produce “best guess” estimates for judge effects using conditional modes:

tibble(
  judge_effect = clmm_rating_contact_temp$ranef,
  cond_var = clmm_rating_contact_temp$condVar
) %>%
  mutate(
    judge = fct_reorder(factor(1:n()), judge_effect),
    conf.low = judge_effect - qnorm(0.975) * sqrt(cond_var),
    conf.high = judge_effect + qnorm(0.975) * sqrt(cond_var)
  ) %>%
  ggplot(aes(y = judge, x = judge_effect)) +
  geom_point(size = 2) +
  geom_linerange(size = 1, aes(xmin = conf.low, xmax = conf.high)) +
  theme(panel.grid.major.x = element_line(color = "grey"))

Predictions

There are different ways to extract predicted probabilities. First, and most obviously, with the predict function:

wine %>%
  bind_cols(
    pred =  predict(
      # Have to use clmm2 for predict
      clmm2_rating_contact_temp, newdata = wine
    )
  ) %>%
  # These are predicted probabilities for the average judge, so we can
  #  exclude the judge variable
  distinct(rating, temp, contact, pred) %>%
  arrange(temp, contact, rating)

# A tibble: 15 × 4
   rating temp  contact   pred
   <ord>  <fct> <fct>    <dbl>
 1 1      cold  no      0.165 
 2 2      cold  no      0.655 
 3 3      cold  no      0.166 
 4 1      cold  yes     0.0305
 5 2      cold  yes     0.390 
 6 3      cold  yes     0.496 
 7 4      cold  yes     0.0696
 8 2      warm  no      0.166 
 9 3      warm  no      0.587 
10 4      warm  no      0.191 
11 5      warm  no      0.0463
12 2      warm  yes     0.0313
13 3      warm  yes     0.306 
14 4      warm  yes     0.428 
15 5      warm  yes     0.233

This only gives us predictions for rating, temp and contact values which exist in the data. There is no predicted probability for rating > 3, temp cold and contact no, for example.

Another way is to pre-specify which values we want to predict:

nd <-
  crossing(
    temp = factor(c("cold", "warm")),
    contact = factor(c("no", "yes")),
    rating = factor(1:5, ordered = T)
  )
nd %>%
  bind_cols(pred = predict(clmm2_rating_contact_temp, nd)) %>%
  ggplot(aes(x = glue::glue("{temp}-{contact}"), y = pred, fill = rating)) +
  geom_col() +
  scale_fill_td(palette = "div5") +
  scale_y_continuous(expand = c(0, 0), labels = scales::percent) +
  labs(x = "temp-contact", y = "predicted probability")

We can also get model-estimated cumulative probabilities by considering the model coefficients. For example, for a cold temp and contact, the cumulative probability of a bitterness rating \(j\) or less:

\[ P(y_i \leq j) = \text{logit}^{-1} [\theta_j - \beta_2 \text{contact}_i] \]

where we are considering the average judge (\(u(\text{judge}_i) = 0\)). The inverse logit is \(\text{logit}^{-1}(x) = 1 / (1 + \exp(-x))\), and can be calculated with plogis as a shorthand (brms::inv_logit_scaled is another). We can subtract cumulative probabilities to get non-cumulative probabilities of a rating \(j\). For example, \(j\) = 3:

plogis(clmm_rating_contact_temp$Theta[3] - clmm_rating_contact_temp$beta[2]) -
  plogis(clmm_rating_contact_temp$Theta[2] - clmm_rating_contact_temp$beta[2])

contactyes 
 0.4960357

which matches the value calculated previously using predict.

Estimated marginal means

The emmeans package provides functionality for estimating marginal mean effects of ordinal models. The package documentation also provides an example using ordinal and wine data here.

library(emmeans)

In the “Models supported by emmeans” document, we see the following:

Object.class	Package	Group	Arguments/notes
clm	ordinal	O	`mode = c("latent", "linear.predictor", "cum.prob", "exc.prob", "prob", "mean.class", "scale")`
clmm	ordinal	O	Like `clm` but no `"scale"` mode

emmeans(clmm_rating_contact_temp,
        specs = list(pairwise ~ temp, pairwise ~ contact), mode = "latent")

$`emmeans of temp`
 temp emmean    SE  df asymp.LCL asymp.UCL
 cold  -1.63 0.547 Inf    -2.707    -0.562
 warm   1.43 0.532 Inf     0.387     2.470

Results are averaged over the levels of: contact 
Confidence level used: 0.95 

$`pairwise differences of temp`
 1           estimate    SE  df z.ratio p.value
 cold - warm    -3.06 0.595 Inf  -5.145  <.0001

Results are averaged over the levels of: contact 

$`emmeans of contact`
 contact emmean    SE  df asymp.LCL asymp.UCL
 no      -1.020 0.522 Inf    -2.043   0.00274
 yes      0.815 0.513 Inf    -0.191   1.82072

Results are averaged over the levels of: temp 
Confidence level used: 0.95 

$`pairwise differences of contact`
 1        estimate    SE  df z.ratio p.value
 no - yes    -1.83 0.513 Inf  -3.580  0.0003

Results are averaged over the levels of: temp

The contrast estimates are in terms of the latent (underlying unobserved) bitterness rating.

Using mode = "cum.prob" and mode = "exc.prob“, we can get cumulative probabilities and exceedance (1 - cumulative) probabilities. For example, the probability of a rating of at least 4 for different temp:

emmeans(clmm_rating_contact_temp, ~ temp,
        mode = "exc.prob", at = list(cut = "3|4"))

 temp exc.prob     SE  df asymp.LCL asymp.UCL
 cold    0.049 0.0304 Inf   -0.0107     0.109
 warm    0.450 0.1084 Inf    0.2371     0.662

Results are averaged over the levels of: contact 
Confidence level used: 0.95

mode = "prob" gives us probability distributions of each rating, which have a nice auto plot functionality:

emmeans(clmm_rating_contact_temp,
        ~ rating | temp, mode = "prob") %>%
  plot() +
  add_facet_borders()

Choice of link function

So far, we have used the logit link (and briefly the probit link to compare estimates with linear regression). The links available to ordinal::clmm are logit, probit, cloglog, loglog, and cauchit.

We can fit the CLMM using all of these links and compare log-likelihoods:

wine %>%
  nest(data = everything()) %>%
  crossing(
    link = c("logit", "probit", "cloglog", "loglog", "cauchit")
  ) %>%
  mutate(
    mod = map2(
      data, link,
      ~clmm(
        rating ~ 1 + contact + temp + (1|judge),
        data = .x, link = .y
      )
    )
  ) %>%
  mutate(mod_summary = map(mod, glance)) %>%
  unnest(mod_summary) %>%
  select(link, logLik, AIC, BIC) %>%
  arrange(logLik) %>%
  gt()

link	logLik	AIC	BIC
cauchit	-86.83499	187.6700	203.6066
cloglog	-82.72936	179.4587	195.3954
logit	-81.56541	177.1308	193.0675
loglog	-81.54137	177.0827	193.0194
probit	-80.93061	175.8612	191.7979

The probit model appears to be the best description of the data.

We can also consider the effect of “flexible” vs “equidistant” thresholds:

wine %>%
  nest(data = everything()) %>%
  crossing(
    link = c("logit", "probit", "cloglog", "loglog", "cauchit"),
    threshold = c("flexible", "equidistant")
  ) %>%
  mutate(
    mod = pmap(
      list(data, link, threshold),
      function(a, b, c) {
        clmm(
          rating ~ 1 + contact + temp + (1|judge),
          data = a, link = b, threshold = c
        )
      }
    )
  ) %>%
  #mutate(mod_summary = map(mod, glance)) %>%
  mutate(
    mod_summary = map(
      mod,
      # glance() on a clmm object returns a <logLik> variable type which
      #  can't be bound together by unnest(), so need to convert it to numeric
      ~glance(.x) %>% mutate(logLik = as.numeric(logLik))
    )
  ) %>%
  unnest(mod_summary) %>%
  select(link, threshold, logLik, edf, AIC, BIC) %>%
  arrange(logLik) %>%
  gt()

link	threshold	logLik	edf	AIC	BIC
cauchit	equidistant	-87.75021	5	185.5004	196.8837
cauchit	flexible	-86.83499	7	187.6700	203.6066
loglog	equidistant	-84.36440	5	178.7288	190.1121
cloglog	equidistant	-83.32634	5	176.6527	188.0360
logit	equidistant	-83.05497	5	176.1099	187.4933
cloglog	flexible	-82.72936	7	179.4587	195.3954
probit	equidistant	-82.52622	5	175.0524	186.4358
logit	flexible	-81.56541	7	177.1308	193.0675
loglog	flexible	-81.54137	7	177.0827	193.0194
probit	flexible	-80.93061	7	175.8612	191.7979

Note the change in degrees of freedom, resulting in the equidistant probit model having the lowest BIC. In terms of log likelihood, however, flexible always outperform equidistant thresholds.

Conclusion

Thanks to detailed documentation, fitting cumulative link (mixed) models is very easy with ordinal. In this post, we first learned the theoretical basis for these models, then worked through examples using wine bitterness ratings from multiple judges.

In the next post, I’ll explore the Bayesian approach to ordinal regression with the brms package.

Reproducibility

Session info

 setting  value
 version  R version 4.2.1 (2022-06-23 ucrt)
 os       Windows 10 x64 (build 19044)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Canada.utf8
 ctype    English_Canada.utf8
 tz       America/Curacao
 date     2022-10-27
 pandoc   2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)

Git repository

Local:    main C:/Users/tdunn/Documents/tdunn-quarto
Remote:   main @ origin (https://github.com/taylordunn/tdunn-quarto.git)
Head:     [4eb5bf2] 2022-10-26: Added font import to style sheet

Source code, R environment

References

Randall, J. H. 1989. “The Analysis of Sensory Data by Generalized Linear Model.” Biometrical Journal 31 (7): 781–93. https://doi.org/10.1002/bimj.4710310703.

Reuse

https://creativecommons.org/licenses/by/4.0/

Citation

BibTeX citation:

@online{dunn2020,
  author = {Dunn, Taylor},
  title = {Ordinal Regression in {R:} Part 1},
  date = {2020-03-15},
  url = {https://tdunn.ca/posts/2020-03-15-ordinal-regression-in-r-part-1},
  langid = {en}
}

For attribution, please cite this work as:

Dunn, Taylor. 2020. “Ordinal Regression in R: Part 1.” March 15, 2020. https://tdunn.ca/posts/2020-03-15-ordinal-regression-in-r-part-1.