Ordinal regression in R: part 1

regression ordinal frequentist statistics

A theoretical and applied walkthrough of ordinal regression. Part 1: the frequentist approach with ordinal.

Taylor Dunn
2020-03-15
R packages
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(dunnr)
library(gt)
library(broom)
library(patchwork)

extrafont::loadfonts(device = "win", quiet = TRUE)
theme_set(theme_td())
set_geom_fonts()
set_palette()

wine_red <- "#58181F"
update_geom_defaults("point", list(color = wine_red))
update_geom_defaults("line", list(color = wine_red))

The purpose of this post is to learn more about ordinal regression models (a.k.a. cumulative link, proportional odds, ordered logit models, etc.) and practice their implementation in R. This is part 1, where I’ll be taking the frequentist approach via the ordinal package. There are other options, like MASS::polr, but two features in particular drew me to ordinal: (1) it allows for random effects, and (2) it has broom::tidy methods available.

Particularly, I’ll be following along with

Setup

Import ordinal, and the included data set wine:

library(ordinal)
data(wine)
wine <- as_tibble(wine)
glimpse(wine)
Rows: 72
Columns: 6
$ response <dbl> 36, 48, 47, 67, 77, 60, 83, 90, 17, 22, 14, 50, 30,~
$ rating   <ord> 2, 3, 3, 4, 4, 4, 5, 5, 1, 2, 1, 3, 2, 3, 5, 4, 2, ~
$ temp     <fct> cold, cold, cold, cold, warm, warm, warm, warm, col~
$ contact  <fct> no, no, yes, yes, no, no, yes, yes, no, no, yes, ye~
$ bottle   <fct> 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, ~
$ judge    <fct> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, ~

wine is a data set from Randall (1989) of wine bitterness ratings from multiple judges. The variables are as follows:

Relationship between response and rating:

wine %>%
  ggplot(aes(y = rating, x = response)) +
  geom_boxplot(width = 0.5) +
  geom_jitter(alpha = 0.5)

Note that there is no overlap between the levels.

There are 72 total observations with the following ratings distribution by treatment and random effects:

wine %>%
  transmute(temp, contact, bottle, judge, rating = as.numeric(rating)) %>%
  pivot_wider(names_from = judge, values_from = rating) %>%
  gt() %>%
  tab_spanner(columns = `1`:`9`, label = "judge") %>%
  data_color(
    columns = `1`:`9`,
    colors = scales::col_numeric(
      palette = c("white", wine_red), domain = c(1, 5)
    )
  )
temp contact bottle judge
1 2 3 4 5 6 7 8 9
cold no 1 2 1 2 3 2 3 1 2 1
cold no 2 3 2 3 2 3 2 1 2 2
cold yes 3 3 1 3 3 4 3 2 2 3
cold yes 4 4 3 2 2 3 2 2 3 2
warm no 5 4 2 5 3 3 2 2 3 3
warm no 6 4 3 5 2 3 4 3 3 2
warm yes 7 5 5 4 5 3 5 2 3 4
warm yes 8 5 4 4 3 3 4 3 4 4

So each bottle had a particular temp and contact (2 bottles for each of the 4 combinations), and each judge rated the bitterness each bottle.

Before modeling, can we see a clear effect of temp and contact?

wine %>%
  count(contact, rating, temp) %>%
  mutate(temp = fct_rev(temp)) %>%
  ggplot(aes(x = temp, y = rating, color = temp)) +
  geom_point(aes(group = temp, size = n)) +
  facet_wrap(~contact, scales = "free_x",
             labeller = labeller(contact = label_both)) +
  scale_size(breaks = c(1, 2, 4, 6, 8)) +
  add_facet_borders()

At a glance, it looks like the temp = warm and contact = yes is associated with higher ratings.

Theory

The ordinal response \(y_i\) falls into response category \(j\) (out of \(J\) total) with probability \(\pi_{ij}\). The cumulative probabilities are defined:

\[ P(y_i \leq j) = \pi_{i1} + \dots + \pi_{ij}. \]

As an oversimplification, suppose that each probability \(\pi_{ij}\) is equal to the proportion of that response in the wine data. Then the cumulative “probability” can be visualized:

wine_prop <- wine %>%
  count(rating) %>%
  mutate(p = n / sum(n), cumsum_p = cumsum(p))

(
  ggplot(wine_prop, aes(x = rating, y = p)) +
    geom_col(fill = wine_red) +
    scale_y_continuous(labels = scales::percent, expand = c(0, 0)) +
    labs(x = "j", y = "proportion")
) +
  (
    ggplot(wine_prop, aes(x = as.integer(rating), y = cumsum_p)) +
      geom_point(size = 2) +
      geom_line(size = 1) +
      labs(x = "j", y = "cumulative proportion")
  ) +
  (
    ggplot(wine_prop,
        aes(x = as.integer(rating), y = log(cumsum_p) - log(1 - cumsum_p))) +
      geom_point(size = 2) +
      geom_line(size = 1) +
      labs(x = "j", y = "logit(cumulative proportion)")
  )

We will explore other links, but first the most common, the logit link, which is depicted in the right-most panel of the above figure:

\[ \text{logit} (P(y_i \leq j) = \log \frac{P(y_i \leq j)}{1 - P(y_i \leq j)} \]

Note that the above function is defined for all but the last category \(j = J\), because \(1 - P(Y_i \leq J) = 1 - 1 = 0\).

For the wine data, where we have \(J\) = 5 rating categories, we will build up to the following mixed effects model:

\[ \begin{align} \text{logit}(p(y_i \leq j)) &= \theta_j - \beta_1 \text{temp}_i - \beta_2 \text{contact}_i - u( \text{judge}_i) \\ i &= 1, \dots n \; \; \; \; \; \; j = 1, \dots, J - 1 \end{align} \]

where \(\theta_j\) is called the threshold parameter, or cutpoint, of category \(j\). These thresholds can also be thought of as \(J-1\) = 4 intercepts. Note that the fixed effect parameters \(\beta_1\) and \(\beta_2\) are independent of \(j\), so each \(\beta\) has the same effect for each of the \(J-1\) cumulative logits. The judge effects, which are also independent of \(j\), are assumed normal: \(u(\text{judge}_i) \sim N(0, \sigma_u^2)\). We are using the logit link because it is the most popular for this kind of model (and the one I am familiar with), but there are other options we will briefly explore later.

The subtraction of terms in the above model is new to me. The main reason seems to be for familiar interpretation: the larger the value of any independent term \(\beta x\), the smaller the thresholds \(\theta_j\), and therefore a larger probability of the a response falling into a category at the upper end of the scale. This way, \(\beta\) has the same direction of effect as in ordinary linear regression.

We are essentially modeling a “chain” of logistic regressions where the binary response is “less than or equal to a certain level” vs “greater than that level.” In this case, with \(J\) = 5, the thresholds \(\theta_j\) are capturing the adjusted log-odds of observing:

Fitting

Now with a surface-level understanding of what is being modeled, we will fit the data using ordinal::clm (cumulative link models) and ordinal::clmm (cumulative link mixed models), and logit links.

Fixed effects model

First, fit a simple model, by maximum likelihood, with contact as the sole predictor:

\[ \text{logit}(p(y_i \leq j)) = \theta_j - \beta_2 \text{contact}_i \]

clm_rating_contact <-
  clm(
    rating ~ contact,
    data = wine, link = "logit"
  )
summary(clm_rating_contact)
formula: rating ~ contact
data:    wine

 link  threshold nobs logLik AIC    niter max.grad cond.H 
 logit flexible  72   -99.96 209.91 5(0)  1.67e-07 1.7e+01

Coefficients:
           Estimate Std. Error z value Pr(>|z|)   
contactyes   1.2070     0.4499   2.683   0.0073 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Threshold coefficients:
    Estimate Std. Error z value
1|2 -2.13933    0.48981  -4.368
2|3  0.04257    0.32063   0.133
3|4  1.71449    0.38637   4.437
4|5  2.97875    0.50207   5.933

The model gives us \(K - 1 = 4\) threshold coefficients, as expected. The \(\beta_2\) coefficient estimate was statistically significant (by a Wald test), and tells us that contact = yes decreases the thresholds \(\theta_j\) by \(\beta_2\) = 1.21 (because of the subtraction of model terms), and therefore is associated with higher ratings.

The condition number of the Hessian for this model is 16.98. The ordinal primer says that larger values (like > 1e4) might indicate that the model is ill-defined.

It is nicely illustrative to compare this model to 4 separate logistic regressions with a dichotomized response:

\[ \begin{align} \text{logit} (p(y_i \leq 1)) = \theta_1 + \beta_2 \text{contact_i} \\ \text{logit} (p(y_i \leq 2)) = \theta_2 + \beta_2 \text{contact_i} \\ \text{logit} (p(y_i \leq 3)) = \theta_3 + \beta_2 \text{contact_i} \\ \text{logit} (p(y_i \leq 4)) = \theta_4 + \beta_2 \text{contact_i} \\ \end{align} \]

wine %>%
  crossing(j = 1:4) %>%
  # Create a binary (0 or 1) to indicate where rating <= j
  mutate(rating_leq_j = as.numeric(rating) <= j) %>%
  group_by(j) %>%
  nest() %>%
  ungroup() %>%
  mutate(
    mod = map(
      data,
      ~glm(rating_leq_j ~ 1 + contact,
           data = ., family = binomial(link = "logit")) %>% broom::tidy()
    )
  ) %>%
  unnest(mod) %>%
  transmute(
    j, term,
    estimate_se = str_c(round(estimate, 2), " (", round(std.error, 2), ")")
  ) %>%
  pivot_wider(names_from = term, values_from = estimate_se) %>%
  left_join(
    tidy(clm_rating_contact) %>%
      transmute(
        j = as.integer(substr(term, 1, 1)),
        term = if_else(!is.na(j), "theta_j", term),
        estimate_se = str_c(round(estimate, 2), " (", round(std.error, 2), ")")
      ) %>%
      mutate(j = replace_na(j, 1)) %>%
      spread(term, estimate_se),
    by = "j"
  ) %>%
  ungroup() %>%
  gt() %>%
  tab_spanner(label = "Logistic regression",
              columns = vars(`(Intercept)`, contactyes.x)) %>%
  tab_spanner(label = "CLM",
              columns = vars(theta_j, contactyes.y)) %>%
  fmt_missing(columns = everything(), missing_text = "")
j Logistic regression CLM
(Intercept) contactyes.x theta_j contactyes.y
1 -2.08 (0.53) -1.48 (1.14) -2.14 (0.49) 1.21 (0.45)
2 0 (0.33) -1.1 (0.51) 0.04 (0.32)
3 1.82 (0.48) -1.37 (0.59) 1.71 (0.39)
4 2.83 (0.73) -1.01 (0.87) 2.98 (0.5)

The intercepts from the ordinary logistic regression correspond closely to the threshold parameters \(\theta_j\) from the cumulative link model. In the fixed effect of contact (\(\beta_2\)), first note the sign difference, and second notice that the estimate from CLM is about the average of the 4 estimates from logistic regression. The advantage of the CLM is seen in the small standard error in the \(\beta_2\) estimate.

To quote the primer:

The cumulative logit model can be seen as the model that combines these four ordinary logistic regression models into a single model and therefore makes better use of the information in the data.

For the second model, we add the \(\beta_1 \text{temp}_i\) term:

\[ \text{logit}(p(y_i \leq j)) = \theta_j - \beta_1 \text{temp}_i - \beta_2 \text{contact}_i \]

clm_rating_contact_temp <-
  clm(
    rating ~ contact + temp,
    data = wine, link = "logit"
  )
summary(clm_rating_contact_temp)
formula: rating ~ contact + temp
data:    wine

 link  threshold nobs logLik AIC    niter max.grad cond.H 
 logit flexible  72   -86.49 184.98 6(0)  4.01e-12 2.7e+01

Coefficients:
           Estimate Std. Error z value Pr(>|z|)    
contactyes   1.5278     0.4766   3.205  0.00135 ** 
tempwarm     2.5031     0.5287   4.735 2.19e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Threshold coefficients:
    Estimate Std. Error z value
1|2  -1.3444     0.5171  -2.600
2|3   1.2508     0.4379   2.857
3|4   3.4669     0.5978   5.800
4|5   5.0064     0.7309   6.850

Both fixed effects (contact = yes and temp = warm) are strongly associated with higher probability of higher ratings. The summary function provides \(p\)-values from Wald tests, but more accurate likelihood ratio tests can be done via the drop1 function, which evaluates each fixed effect while controlling the other:

drop1(clm_rating_contact_temp, test = "Chisq")
Single term deletions

Model:
rating ~ contact + temp
        Df    AIC    LRT  Pr(>Chi)    
<none>     184.98                     
contact  1 194.03 11.043 0.0008902 ***
temp     1 209.91 26.928 2.112e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Or the reverse via the add1() function, which evaluates each fixed effect while ignoring the other:

# Fit the null model first
clm_rating_null <- clm(rating ~ 1, data = wine, link = "logit")
add1(clm_rating_null, scope = ~ contact + temp, test = "Chisq")
Single term additions

Model:
rating ~ 1
        Df    AIC     LRT  Pr(>Chi)    
<none>     215.44                      
contact  1 209.91  7.5263   0.00608 ** 
temp     1 194.03 23.4113 1.308e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Symmetric Wald confidence intervals can be extracted with confint or with broom::tidy:

tidy(clm_rating_contact_temp, conf.int = TRUE, conf.type = "Wald") %>%
  ggplot(aes(y = term, x = estimate)) +
  geom_point(size = 2) +
  geom_linerange(size = 1, aes(xmin = conf.low, xmax = conf.high))

In these types of analyses, we are often interested in the odds ratios. For the two categorical fixed effects, which have two levels each, the odds ratios \(y \leq j\) comparing the two levels are:

\[ \begin{align} \text{OR} &= \frac{\gamma_j (\text{temp} = \text{warm})}{\gamma_j (\text{temp} = \text{cold})} = \frac{\exp(\theta_j - \beta_1 - \beta_2 \text{contact})}{\exp (\theta_j - 0 - \beta_2 \text{contact}}) = \exp(\beta_1) \\ \text{OR} &= \frac{\gamma_j (\text{contact} = \text{yes})}{\gamma_j (\text{contact} = \text{no})} = \frac{\exp(\theta_j - \beta_1 \text{temp} - \beta_2 )}{\exp (\theta_j - \beta_1 \text{temp} - 0)}) = \exp(\beta_2) \end{align} \]

where we have introduced the shorthand \(\gamma_j = \text{logit} (p(y \leq j))\). Compute those odds ratios, and their corresponding Wald 95% CIs:

tidy(clm_rating_contact_temp, conf.int = T, conf.type = "Wald") %>%
  transmute(
    term, across(c(estimate, conf.low, conf.high), exp)
  ) %>%
  gt() %>%
  fmt_number(c(estimate, conf.low, conf.high), decimals = 2)
term estimate conf.low conf.high
1|2 0.26 0.09 0.72
2|3 3.49 1.48 8.24
3|4 32.04 9.93 103.39
4|5 149.37 35.65 625.75
contactyes 4.61 1.81 11.73
tempwarm 12.22 4.34 34.44

One last thing to check: does the data support an interaction between \(\text{temp}_i\) and \(\text{contact}_i\)?

\[ \text{logit}(p(y_i \leq j)) = \theta_j - \beta_1 \text{temp}_i - \beta_2 \text{contact}_i - \beta_3 \text{temp}_i \text{contact}_i \]

clm_rating_contact_temp_inter <-
  clm(
    rating ~ contact * temp, data = wine, link = "logit"
  )

#drop1(clm_rating_contact_temp_inter, test = "Chisq") # this accomplishes the same thing as anova()
anova(clm_rating_contact_temp, clm_rating_contact_temp_inter)
Likelihood ratio tests of cumulative link models:
 
                              formula:                link:
clm_rating_contact_temp       rating ~ contact + temp logit
clm_rating_contact_temp_inter rating ~ contact * temp logit
                              threshold:
clm_rating_contact_temp       flexible  
clm_rating_contact_temp_inter flexible  

                              no.par    AIC  logLik LR.stat df
clm_rating_contact_temp            6 184.98 -86.492           
clm_rating_contact_temp_inter      7 186.83 -86.416  0.1514  1
                              Pr(>Chisq)
clm_rating_contact_temp                 
clm_rating_contact_temp_inter     0.6972

No, The interaction term contact:temp is not supported by the data.

Comparison to linear model

Consider the following linear model which treats rating as continuous:

\[ y_i = \alpha + \beta_1 \text{temp}_i + \beta_2 \text{contact}_i + \epsilon_i \]

where \(\epsilon_i \sim N(0, \sigma_{\epsilon}^2)\).

lm_rating_contact_temp <- lm(as.numeric(rating) ~ contact + temp, data = wine)

To compare this to a CLM, we must use the probit link:

clm_rating_contact_temp_probit <-
  clm(
    rating ~ contact + temp, data = wine, link = "probit"
  )
tidy(clm_rating_contact_temp_probit) %>%
  filter(coef.type == "location") %>%
  mutate(model = "CLM") %>%
  select(-coef.type) %>%
  bind_rows(
    tidy(lm_rating_contact_temp) %>%
      filter(term != "(Intercept)") %>%
      # Need to divide by the residual SE here to get the right scale
      mutate(estimate = estimate / summary(lm_rating_contact_temp)$sigma,
             model = "LM")
  ) %>%
  group_by(model) %>%
  gt() %>%
  fmt_number(c(estimate, std.error, statistic), decimals = 2) %>%
  fmt(p.value, fns = scales::pvalue)
term estimate std.error statistic p.value
CLM
contactyes 0.87 0.27 3.25 0.001
tempwarm 1.50 0.29 5.14 <0.001
LM
contactyes 0.79 0.20 3.36 0.001
tempwarm 1.38 0.20 5.87 <0.001

The relative estimates from the linear model are lower than those from the CLM (probit link), indicating that the assumptions of the linear model are not met. In particular, the distance between thresholds is not equidistant, as we can see from differences in the CLM coefficients:

diff(coef(clm_rating_contact_temp_probit)[1:4]) %>% round(2)
 2|3  3|4  4|5 
1.51 1.31 0.90 

Mixed effects model

Now that we have explored ordinal regression with just fixed effects, we will fit the following random effects model:

\[ \begin{align} \text{logit}(p(y_i \leq j)) &= \theta_j - \beta_1 \text{temp}_i - \beta_2 \text{contact}_i - u( \text{judge}_i) \\ i &= 1, \dots n \; \; \; \; \; \; j = 1, \dots, J - 1 \end{align} \]

where the judge effects are independent of \(j\), and assumed normal: \(u(\text{judge}_i) \sim N(0, \sigma_u^2)\).

Each judge has 8 ratings each (two per combination of temp and contact). See if we can spot the judge variance in a plot of ratings:

wine %>%
  count(judge, rating) %>%
  ggplot(aes(x = judge, y = rating)) +
  geom_tile(aes(fill = n)) +
  geom_text(aes(label = n), color = "white") +
  scale_x_discrete(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0)) +
  theme(legend.position = "none") +
  labs(title = "Number of ratings by judge")

There is definitely some judge-specific variability in the perception of bitterness of wine. judge 5, for instance, doesn’t stray far from rating = 3, while judge 7 didn’t consider any of the wines particularly bitter.

Fit the full model with ordinal::clmm and logit link:

clmm_rating_contact_temp <-
  clmm(
    rating ~ temp + contact + (1|judge),
    data = wine, link = "logit"
  )
# This is an older function, which we need to run stats::profile later
clmm2_rating_contact_temp <-
  clmm2(
    rating ~ temp + contact, random = judge,
    data = wine, link = "logistic"
  )
summary(clmm_rating_contact_temp)
Cumulative Link Mixed Model fitted with the Laplace approximation

formula: rating ~ temp + contact + (1 | judge)
data:    wine

 link  threshold nobs logLik AIC    niter    max.grad cond.H 
 logit flexible  72   -81.57 177.13 332(999) 1.03e-05 2.8e+01

Random effects:
 Groups Name        Variance Std.Dev.
 judge  (Intercept) 1.279    1.131   
Number of groups:  judge 9 

Coefficients:
           Estimate Std. Error z value Pr(>|z|)    
tempwarm     3.0630     0.5954   5.145 2.68e-07 ***
contactyes   1.8349     0.5125   3.580 0.000344 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Threshold coefficients:
    Estimate Std. Error z value
1|2  -1.6237     0.6824  -2.379
2|3   1.5134     0.6038   2.507
3|4   4.2285     0.8090   5.227
4|5   6.0888     0.9725   6.261

Compare model coefficients:

bind_rows(
  CLM = tidy(clm_rating_contact_temp),
  CLMM = tidy(clmm_rating_contact_temp),
  .id = "model"
) %>%
  select(-coef.type) %>%
  group_by(model) %>%
  gt() %>%
  fmt_number(c(estimate, std.error, statistic), decimals = 2) %>%
  fmt(p.value, fns = scales::pvalue)
term estimate std.error statistic p.value
CLM
1|2 −1.34 0.52 −2.60 0.009
2|3 1.25 0.44 2.86 0.004
3|4 3.47 0.60 5.80 <0.001
4|5 5.01 0.73 6.85 <0.001
contactyes 1.53 0.48 3.21 0.001
tempwarm 2.50 0.53 4.73 <0.001
CLMM
1|2 −1.62 0.68 −2.38 0.017
2|3 1.51 0.60 2.51 0.012
3|4 4.23 0.81 5.23 <0.001
4|5 6.09 0.97 6.26 <0.001
tempwarm 3.06 0.60 5.14 <0.001
contactyes 1.83 0.51 3.58 <0.001

Both fixed effect estimates \(\beta_1\) and \(\beta_2\) are higher in the CLMM. Use anova to compare the CLMM to the CLM:

anova(clm_rating_contact_temp, clmm_rating_contact_temp)
Likelihood ratio tests of cumulative link models:
 
                         formula:                              link:
clm_rating_contact_temp  rating ~ contact + temp               logit
clmm_rating_contact_temp rating ~ temp + contact + (1 | judge) logit
                         threshold:
clm_rating_contact_temp  flexible  
clmm_rating_contact_temp flexible  

                         no.par    AIC  logLik LR.stat df Pr(>Chisq)
clm_rating_contact_temp       6 184.98 -86.492                      
clmm_rating_contact_temp      7 177.13 -81.565   9.853  1   0.001696
                           
clm_rating_contact_temp    
clmm_rating_contact_temp **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Unsurprisingly, the judge term makes a significant improvement to the fit. We can extract profile confidence intervals on the variance \(\sigma_u\) using stats::profile:

profile(clmm2_rating_contact_temp,
        range = c(0.1, 4), nSteps = 30, trace = 0) %>%
  confint()
          2.5 %  97.5 %
stDev 0.5014584 2.26678

Note that these intervals are asymmetric (\(\sigma_u\) = 1.28), unlike the less accurate Wald tests. We can produce “best guess” estimates for judge effects using conditional modes:

tibble(
  judge_effect = clmm_rating_contact_temp$ranef,
  cond_var = clmm_rating_contact_temp$condVar
) %>%
  mutate(
    judge = fct_reorder(factor(1:n()), judge_effect),
    conf.low = judge_effect - qnorm(0.975) * sqrt(cond_var),
    conf.high = judge_effect + qnorm(0.975) * sqrt(cond_var)
  ) %>%
  ggplot(aes(y = judge, x = judge_effect)) +
  geom_point(size = 2) +
  geom_linerange(size = 1, aes(xmin = conf.low, xmax = conf.high)) +
  theme(panel.grid.major.x = element_line(color = "grey"))

Predictions

There are different ways to extract predicted probabilities. First, and most obviously, with the predict function:

wine %>%
  bind_cols(
    pred =  predict(
      # Have to use clmm2 for predict
      clmm2_rating_contact_temp, newdata = wine
    )
  ) %>%
  # These are predicted probabilities for the average judge, so we can
  #  exclude the judge variable
  distinct(rating, temp, contact, pred) %>%
  arrange(temp, contact, rating)
# A tibble: 15 x 4
   rating temp  contact   pred
   <ord>  <fct> <fct>    <dbl>
 1 1      cold  no      0.165 
 2 2      cold  no      0.655 
 3 3      cold  no      0.166 
 4 1      cold  yes     0.0305
 5 2      cold  yes     0.390 
 6 3      cold  yes     0.496 
 7 4      cold  yes     0.0696
 8 2      warm  no      0.166 
 9 3      warm  no      0.587 
10 4      warm  no      0.191 
11 5      warm  no      0.0463
12 2      warm  yes     0.0313
13 3      warm  yes     0.306 
14 4      warm  yes     0.428 
15 5      warm  yes     0.233 

This only gives us predictions for rating, temp and contact values which exist in the data. There is no predicted probability for rating > 3, temp cold and contact no, for example.

Another way is to pre-specify which values we want to predict:

nd <-
  crossing(
    temp = factor(c("cold", "warm")),
    contact = factor(c("no", "yes")),
    rating = factor(1:5, ordered = T)
  )
nd %>%
  bind_cols(pred = predict(clmm2_rating_contact_temp, nd)) %>%
  ggplot(aes(x = glue::glue("{temp}-{contact}"), y = pred, fill = rating)) +
  geom_col() +
  scale_fill_td(palette = "div5") +
  scale_y_continuous(expand = c(0, 0), labels = scales::percent) +
  labs(x = "temp-contact", y = "predicted probability")

We can also get model-estimated cumulative probabilities by considering the model coefficients. For example, for a cold temp and contact, the cumulative probability of a bitterness rating \(j\) or less:

\[ P(y_i \leq j) = \text{logit}^{-1} [\theta_j - \beta_2 \text{contact}_i] \]

where we are considering the average judge (\(u(\text{judge}_i) = 0\)). The inverse logit is \(\text{logit}^{-1}(x) = 1 / (1 + \exp(-x))\), and can be calculated with plogis as a shorthand (brms::inv_logit_scaled is another). We can subtract cumulative probabilities to get non-cumulative probabilities of a rating \(j\). For example, \(j\) = 3:

plogis(clmm_rating_contact_temp$Theta[3] - clmm_rating_contact_temp$beta[2]) -
  plogis(clmm_rating_contact_temp$Theta[2] - clmm_rating_contact_temp$beta[2])
contactyes 
 0.4960357 

which matches the value calculated previously using predict.

Estimated marginal means

The emmeans package provides functionality for estimating marginal mean effects of ordinal models. The package documentation also provides an example using ordinal and wine data here.

In the “Models supported by emmeans” document, we see the following:

Object.class Package Group Arguments/notes
clm ordinal O mode = c("latent", "linear.predictor", "cum.prob", "exc.prob", "prob", "mean.class", "scale")
clmm ordinal O Like clm but no "scale" mode
emmeans(clmm_rating_contact_temp,
        specs = list(pairwise ~ temp, pairwise ~ contact), mode = "latent")
$`emmeans of temp`
 temp emmean    SE  df asymp.LCL asymp.UCL
 cold  -1.63 0.547 Inf    -2.707    -0.562
 warm   1.43 0.532 Inf     0.387     2.470

Results are averaged over the levels of: contact 
Confidence level used: 0.95 

$`pairwise differences of temp`
 1           estimate    SE  df z.ratio p.value
 cold - warm    -3.06 0.595 Inf  -5.145  <.0001

Results are averaged over the levels of: contact 

$`emmeans of contact`
 contact emmean    SE  df asymp.LCL asymp.UCL
 no      -1.020 0.522 Inf    -2.043   0.00274
 yes      0.815 0.513 Inf    -0.191   1.82072

Results are averaged over the levels of: temp 
Confidence level used: 0.95 

$`pairwise differences of contact`
 1        estimate    SE  df z.ratio p.value
 no - yes    -1.83 0.513 Inf  -3.580  0.0003

Results are averaged over the levels of: temp 

The contrast estimates are in terms of the latent (underlying unobserved) bitterness rating.

Using mode = "cum.prob" and mode = "exc.prob", we can get cumulative probabilities and exceedance (1 - cumulative) probabilities. For example, the probability of a rating of at least 4 for different temp:

emmeans(clmm_rating_contact_temp, ~ temp,
        mode = "exc.prob", at = list(cut = "3|4"))
 temp exc.prob     SE  df asymp.LCL asymp.UCL
 cold    0.049 0.0304 Inf   -0.0107     0.109
 warm    0.450 0.1084 Inf    0.2371     0.662

Results are averaged over the levels of: contact 
Confidence level used: 0.95 

mode = "prob" gives us probability distributions of each rating, which have a nice auto plot functionality:

emmeans(clmm_rating_contact_temp,
        ~ rating | temp, mode = "prob") %>%
  plot() +
  add_facet_borders()

So far, we have used the logit link (and briefly the probit link to compare estimates with linear regression). The links available to ordinal::clmm are logit, probit, cloglog, loglog, and cauchit.

We can fit the CLMM using all of these links and compare log-likelihoods:

wine %>%
  nest(everything()) %>%
  crossing(
    link = c("logit", "probit", "cloglog", "loglog", "cauchit")
  ) %>%
  mutate(
    mod = map2(
      data, link,
      ~clmm(
        rating ~ 1 + contact + temp + (1|judge),
        data = .x, link = .y
      )
    )
  ) %>%
  mutate(mod_summary = map(mod, glance)) %>%
  unnest(mod_summary) %>%
  select(link, logLik, AIC, BIC) %>%
  arrange(logLik)
# A tibble: 5 x 4
  link    logLik      AIC   BIC
  <chr>   <logLik>  <dbl> <dbl>
1 cauchit -86.83499  188.  204.
2 cloglog -82.72936  179.  195.
3 logit   -81.56541  177.  193.
4 loglog  -81.54137  177.  193.
5 probit  -80.93061  176.  192.

The probit model appears to be the best description of the data.

We can also consider the effect of “flexible” vs “equidistant” thresholds:

wine %>%
  nest(data = everything()) %>%
  crossing(
    link = c("logit", "probit", "cloglog", "loglog", "cauchit"),
    threshold = c("flexible", "equidistant")
  ) %>%
  mutate(
    mod = pmap(
      list(data, link, threshold),
      function(a, b, c) {
        clmm(
          rating ~ 1 + contact + temp + (1|judge),
          data = a, link = b, threshold = c
        )
      }
    )
  ) %>%
  #mutate(mod_summary = map(mod, glance)) %>%
  mutate(
    mod_summary = map(
      mod,
      # glance() on a clmm object returns a <logLik> variable type which
      #  can't be bound together by unnest(), so need to convert it to numeric
      ~glance(.x) %>% mutate(logLik = as.numeric(logLik))
    )
  ) %>%
  unnest(mod_summary) %>%
  select(link, threshold, logLik, edf, AIC, BIC) %>%
  arrange(logLik)
# A tibble: 10 x 6
   link    threshold   logLik   edf   AIC   BIC
   <chr>   <chr>        <dbl> <dbl> <dbl> <dbl>
 1 cauchit equidistant  -87.8     5  186.  197.
 2 cauchit flexible     -86.8     7  188.  204.
 3 loglog  equidistant  -84.4     5  179.  190.
 4 cloglog equidistant  -83.3     5  177.  188.
 5 logit   equidistant  -83.1     5  176.  187.
 6 cloglog flexible     -82.7     7  179.  195.
 7 probit  equidistant  -82.5     5  175.  186.
 8 logit   flexible     -81.6     7  177.  193.
 9 loglog  flexible     -81.5     7  177.  193.
10 probit  flexible     -80.9     7  176.  192.

Note the change in degrees of freedom, resulting in the equidistant probit model having the lowest BIC. In terms of log likelihood, however, flexible always outperform equidistant thresholds.

Conclusion

Thanks to detailed documentation, fitting cumulative link (mixed) models is very easy with ordinal. In this post, we first learned the theoretical basis for these models, then worked through examples using wine bitterness ratings from multiple judges.

In the next post, I’ll explore the Bayesian approach to ordinal regression with the brms package.

Reproducibility

Session info
 setting  value                       
 version  R version 4.1.2 (2021-11-01)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RTerm                       
 language (EN)                        
 collate  English_Canada.1252         
 ctype    English_Canada.1252         
 tz       America/Curacao             
 date     2022-01-10                  
Git repository
Local:    main C:/Users/tdunn/Documents/tdunn
Remote:   main @ origin (https://github.com/taylordunn/tdunn)
Head:     [da88cbd] 2022-01-08: Build on cloud

Source code

Randall, J. H. 1989. The Analysis of Sensory Data by Generalized Linear Model.” Biometrical Journal 31 (7): 781–93. https://doi.org/10.1002/bimj.4710310703.

References

Citation

For attribution, please cite this work as

Dunn (2020, March 15). tdunn: Ordinal regression in R: part 1. Retrieved from https://tdunn.ca/posts/2020-03-15-ordinal-regression-in-r-part-1/

BibTeX citation

@misc{dunn2020ordinal,
  author = {Dunn, Taylor},
  title = {tdunn: Ordinal regression in R: part 1},
  url = {https://tdunn.ca/posts/2020-03-15-ordinal-regression-in-r-part-1/},
  year = {2020}
}