A theoretical and applied walkthrough of ordinal regression. Part 1: the frequentist approach with ordinal
.
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(dunnr)
library(gt)
library(broom)
library(patchwork)
extrafont::loadfonts(device = "win", quiet = TRUE)
theme_set(theme_td())
set_geom_fonts()
set_palette()
wine_red < "#58181F"
update_geom_defaults("point", list(color = wine_red))
update_geom_defaults("line", list(color = wine_red))
The purpose of this post is to learn more about ordinal regression models (a.k.a. cumulative link, proportional odds, ordered logit models, etc.) and practice their implementation in R. This is part 1, where Iâ€™ll be taking the frequentist approach via the ordinal
package. There are other options, like MASS::polr
, but two features in particular drew me to ordinal
: (1) it allows for random effects, and (2) it has broom::tidy
methods available.
Particularly, Iâ€™ll be following along with
Import ordinal
, and the included data set wine
:
Rows: 72
Columns: 6
$ response <dbl> 36, 48, 47, 67, 77, 60, 83, 90, 17, 22, 14, 50, 30,~
$ rating <ord> 2, 3, 3, 4, 4, 4, 5, 5, 1, 2, 1, 3, 2, 3, 5, 4, 2, ~
$ temp <fct> cold, cold, cold, cold, warm, warm, warm, warm, col~
$ contact <fct> no, no, yes, yes, no, no, yes, yes, no, no, yes, ye~
$ bottle <fct> 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, ~
$ judge <fct> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, ~
wine
is a data set from Randall (1989) of wine bitterness ratings from multiple judges. The variables are as follows:
response
: wine bitterness rating on a 0100 scalerating
: ordered factor with 5 levels (grouped version of response
) with 1 = â€śleast bitterâ€ť and 5 = â€śmost bitterâ€ťtemp
: temperature during wine production (cold and warm)contact
: contact between juice and skins during wine production (no and yes)bottle
with 8 levelsjudge
with 9 levelsRelationship between response
and rating
:
wine %>%
ggplot(aes(y = rating, x = response)) +
geom_boxplot(width = 0.5) +
geom_jitter(alpha = 0.5)
Note that there is no overlap between the levels.
There are 72 total observations with the following ratings distribution by treatment and random effects:
wine %>%
transmute(temp, contact, bottle, judge, rating = as.numeric(rating)) %>%
pivot_wider(names_from = judge, values_from = rating) %>%
gt() %>%
tab_spanner(columns = `1`:`9`, label = "judge") %>%
data_color(
columns = `1`:`9`,
colors = scales::col_numeric(
palette = c("white", wine_red), domain = c(1, 5)
)
)
temp  contact  bottle  judge  

1  2  3  4  5  6  7  8  9  
cold  no  1  2  1  2  3  2  3  1  2  1 
cold  no  2  3  2  3  2  3  2  1  2  2 
cold  yes  3  3  1  3  3  4  3  2  2  3 
cold  yes  4  4  3  2  2  3  2  2  3  2 
warm  no  5  4  2  5  3  3  2  2  3  3 
warm  no  6  4  3  5  2  3  4  3  3  2 
warm  yes  7  5  5  4  5  3  5  2  3  4 
warm  yes  8  5  4  4  3  3  4  3  4  4 
So each bottle
had a particular temp
and contact
(2 bottles for each of the 4 combinations), and each judge
rated the bitterness each bottle.
Before modeling, can we see a clear effect of temp
and contact
?
wine %>%
count(contact, rating, temp) %>%
mutate(temp = fct_rev(temp)) %>%
ggplot(aes(x = temp, y = rating, color = temp)) +
geom_point(aes(group = temp, size = n)) +
facet_wrap(~contact, scales = "free_x",
labeller = labeller(contact = label_both)) +
scale_size(breaks = c(1, 2, 4, 6, 8)) +
add_facet_borders()
At a glance, it looks like the temp
= warm and contact
= yes is associated with higher rating
s.
The ordinal response \(y_i\) falls into response category \(j\) (out of \(J\) total) with probability \(\pi_{ij}\). The cumulative probabilities are defined:
\[ P(y_i \leq j) = \pi_{i1} + \dots + \pi_{ij}. \]
As an oversimplification, suppose that each probability \(\pi_{ij}\) is equal to the proportion of that response in the wine
data. Then the cumulative â€śprobabilityâ€ť can be visualized:
wine_prop < wine %>%
count(rating) %>%
mutate(p = n / sum(n), cumsum_p = cumsum(p))
(
ggplot(wine_prop, aes(x = rating, y = p)) +
geom_col(fill = wine_red) +
scale_y_continuous(labels = scales::percent, expand = c(0, 0)) +
labs(x = "j", y = "proportion")
) +
(
ggplot(wine_prop, aes(x = as.integer(rating), y = cumsum_p)) +
geom_point(size = 2) +
geom_line(size = 1) +
labs(x = "j", y = "cumulative proportion")
) +
(
ggplot(wine_prop,
aes(x = as.integer(rating), y = log(cumsum_p)  log(1  cumsum_p))) +
geom_point(size = 2) +
geom_line(size = 1) +
labs(x = "j", y = "logit(cumulative proportion)")
)
We will explore other links, but first the most common, the logit link, which is depicted in the rightmost panel of the above figure:
\[ \text{logit} (P(y_i \leq j) = \log \frac{P(y_i \leq j)}{1  P(y_i \leq j)} \]
Note that the above function is defined for all but the last category \(j = J\), because \(1  P(Y_i \leq J) = 1  1 = 0\).
For the wine
data, where we have \(J\) = 5 rating categories, we will build up to the following mixed effects model:
\[ \begin{align} \text{logit}(p(y_i \leq j)) &= \theta_j  \beta_1 \text{temp}_i  \beta_2 \text{contact}_i  u( \text{judge}_i) \\ i &= 1, \dots n \; \; \; \; \; \; j = 1, \dots, J  1 \end{align} \]
where \(\theta_j\) is called the threshold parameter, or cutpoint, of category \(j\). These thresholds can also be thought of as \(J1\) = 4 intercepts. Note that the fixed effect parameters \(\beta_1\) and \(\beta_2\) are independent of \(j\), so each \(\beta\) has the same effect for each of the \(J1\) cumulative logits. The judge effects, which are also independent of \(j\), are assumed normal: \(u(\text{judge}_i) \sim N(0, \sigma_u^2)\). We are using the logit link because it is the most popular for this kind of model (and the one I am familiar with), but there are other options we will briefly explore later.
The subtraction of terms in the above model is new to me. The main reason seems to be for familiar interpretation: the larger the value of any independent term \(\beta x\), the smaller the thresholds \(\theta_j\), and therefore a larger probability of the a response falling into a category at the upper end of the scale. This way, \(\beta\) has the same direction of effect as in ordinary linear regression.
We are essentially modeling a â€śchainâ€ť of logistic regressions where the binary response is â€śless than or equal to a certain levelâ€ť vs â€śgreater than that level.â€ť In this case, with \(J\) = 5, the thresholds \(\theta_j\) are capturing the adjusted logodds of observing:
rating
= 1 vs.Â 25rating
= 12 vs.Â 35rating
= 13 vs.Â 45rating
= 14 vs.Â 5Now with a surfacelevel understanding of what is being modeled, we will fit the data using ordinal::clm
(cumulative link models) and ordinal::clmm
(cumulative link mixed models), and logit links.
First, fit a simple model, by maximum likelihood, with contact
as the sole predictor:
\[ \text{logit}(p(y_i \leq j)) = \theta_j  \beta_2 \text{contact}_i \]
clm_rating_contact <
clm(
rating ~ contact,
data = wine, link = "logit"
)
summary(clm_rating_contact)
formula: rating ~ contact
data: wine
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 72 99.96 209.91 5(0) 1.67e07 1.7e+01
Coefficients:
Estimate Std. Error z value Pr(>z)
contactyes 1.2070 0.4499 2.683 0.0073 **

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Threshold coefficients:
Estimate Std. Error z value
12 2.13933 0.48981 4.368
23 0.04257 0.32063 0.133
34 1.71449 0.38637 4.437
45 2.97875 0.50207 5.933
The model gives us \(K  1 = 4\) threshold coefficients, as expected. The \(\beta_2\) coefficient estimate was statistically significant (by a Wald test), and tells us that contact
= yes decreases the thresholds \(\theta_j\) by \(\beta_2\) = 1.21 (because of the subtraction of model terms), and therefore is associated with higher rating
s.
The condition number of the Hessian for this model is 16.98. The ordinal
primer says that larger values (like > 1e4
) might indicate that the model is illdefined.
It is nicely illustrative to compare this model to 4 separate logistic regressions with a dichotomized response:
\[ \begin{align} \text{logit} (p(y_i \leq 1)) = \theta_1 + \beta_2 \text{contact_i} \\ \text{logit} (p(y_i \leq 2)) = \theta_2 + \beta_2 \text{contact_i} \\ \text{logit} (p(y_i \leq 3)) = \theta_3 + \beta_2 \text{contact_i} \\ \text{logit} (p(y_i \leq 4)) = \theta_4 + \beta_2 \text{contact_i} \\ \end{align} \]
wine %>%
crossing(j = 1:4) %>%
# Create a binary (0 or 1) to indicate where rating <= j
mutate(rating_leq_j = as.numeric(rating) <= j) %>%
group_by(j) %>%
nest() %>%
ungroup() %>%
mutate(
mod = map(
data,
~glm(rating_leq_j ~ 1 + contact,
data = ., family = binomial(link = "logit")) %>% broom::tidy()
)
) %>%
unnest(mod) %>%
transmute(
j, term,
estimate_se = str_c(round(estimate, 2), " (", round(std.error, 2), ")")
) %>%
pivot_wider(names_from = term, values_from = estimate_se) %>%
left_join(
tidy(clm_rating_contact) %>%
transmute(
j = as.integer(substr(term, 1, 1)),
term = if_else(!is.na(j), "theta_j", term),
estimate_se = str_c(round(estimate, 2), " (", round(std.error, 2), ")")
) %>%
mutate(j = replace_na(j, 1)) %>%
spread(term, estimate_se),
by = "j"
) %>%
ungroup() %>%
gt() %>%
tab_spanner(label = "Logistic regression",
columns = vars(`(Intercept)`, contactyes.x)) %>%
tab_spanner(label = "CLM",
columns = vars(theta_j, contactyes.y)) %>%
fmt_missing(columns = everything(), missing_text = "")
j  Logistic regression  CLM  

(Intercept)  contactyes.x  theta_j  contactyes.y  
1  2.08 (0.53)  1.48 (1.14)  2.14 (0.49)  1.21 (0.45) 
2  0 (0.33)  1.1 (0.51)  0.04 (0.32)  
3  1.82 (0.48)  1.37 (0.59)  1.71 (0.39)  
4  2.83 (0.73)  1.01 (0.87)  2.98 (0.5) 
The intercepts from the ordinary logistic regression correspond closely to the threshold parameters \(\theta_j\) from the cumulative link model. In the fixed effect of contact
(\(\beta_2\)), first note the sign difference, and second notice that the estimate from CLM is about the average of the 4 estimates from logistic regression. The advantage of the CLM is seen in the small standard error in the \(\beta_2\) estimate.
To quote the primer:
The cumulative logit model can be seen as the model that combines these four ordinary logistic regression models into a single model and therefore makes better use of the information in the data.
For the second model, we add the \(\beta_1 \text{temp}_i\) term:
\[ \text{logit}(p(y_i \leq j)) = \theta_j  \beta_1 \text{temp}_i  \beta_2 \text{contact}_i \]
clm_rating_contact_temp <
clm(
rating ~ contact + temp,
data = wine, link = "logit"
)
summary(clm_rating_contact_temp)
formula: rating ~ contact + temp
data: wine
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 72 86.49 184.98 6(0) 4.01e12 2.7e+01
Coefficients:
Estimate Std. Error z value Pr(>z)
contactyes 1.5278 0.4766 3.205 0.00135 **
tempwarm 2.5031 0.5287 4.735 2.19e06 ***

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Threshold coefficients:
Estimate Std. Error z value
12 1.3444 0.5171 2.600
23 1.2508 0.4379 2.857
34 3.4669 0.5978 5.800
45 5.0064 0.7309 6.850
Both fixed effects (contact
= yes and temp
= warm) are strongly associated with higher probability of higher ratings. The summary
function provides \(p\)values from Wald tests, but more accurate likelihood ratio tests can be done via the drop1
function, which evaluates each fixed effect while controlling the other:
drop1(clm_rating_contact_temp, test = "Chisq")
Single term deletions
Model:
rating ~ contact + temp
Df AIC LRT Pr(>Chi)
<none> 184.98
contact 1 194.03 11.043 0.0008902 ***
temp 1 209.91 26.928 2.112e07 ***

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Or the reverse via the add1()
function, which evaluates each fixed effect while ignoring the other:
# Fit the null model first
clm_rating_null < clm(rating ~ 1, data = wine, link = "logit")
add1(clm_rating_null, scope = ~ contact + temp, test = "Chisq")
Single term additions
Model:
rating ~ 1
Df AIC LRT Pr(>Chi)
<none> 215.44
contact 1 209.91 7.5263 0.00608 **
temp 1 194.03 23.4113 1.308e06 ***

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Symmetric Wald confidence intervals can be extracted with confint
or with broom::tidy
:
tidy(clm_rating_contact_temp, conf.int = TRUE, conf.type = "Wald") %>%
ggplot(aes(y = term, x = estimate)) +
geom_point(size = 2) +
geom_linerange(size = 1, aes(xmin = conf.low, xmax = conf.high))
In these types of analyses, we are often interested in the odds ratios. For the two categorical fixed effects, which have two levels each, the odds ratios \(y \leq j\) comparing the two levels are:
\[ \begin{align} \text{OR} &= \frac{\gamma_j (\text{temp} = \text{warm})}{\gamma_j (\text{temp} = \text{cold})} = \frac{\exp(\theta_j  \beta_1  \beta_2 \text{contact})}{\exp (\theta_j  0  \beta_2 \text{contact}}) = \exp(\beta_1) \\ \text{OR} &= \frac{\gamma_j (\text{contact} = \text{yes})}{\gamma_j (\text{contact} = \text{no})} = \frac{\exp(\theta_j  \beta_1 \text{temp}  \beta_2 )}{\exp (\theta_j  \beta_1 \text{temp}  0)}) = \exp(\beta_2) \end{align} \]
where we have introduced the shorthand \(\gamma_j = \text{logit} (p(y \leq j))\). Compute those odds ratios, and their corresponding Wald 95% CIs:
tidy(clm_rating_contact_temp, conf.int = T, conf.type = "Wald") %>%
transmute(
term, across(c(estimate, conf.low, conf.high), exp)
) %>%
gt() %>%
fmt_number(c(estimate, conf.low, conf.high), decimals = 2)
term  estimate  conf.low  conf.high 

12  0.26  0.09  0.72 
23  3.49  1.48  8.24 
34  32.04  9.93  103.39 
45  149.37  35.65  625.75 
contactyes  4.61  1.81  11.73 
tempwarm  12.22  4.34  34.44 
One last thing to check: does the data support an interaction between \(\text{temp}_i\) and \(\text{contact}_i\)?
\[ \text{logit}(p(y_i \leq j)) = \theta_j  \beta_1 \text{temp}_i  \beta_2 \text{contact}_i  \beta_3 \text{temp}_i \text{contact}_i \]
clm_rating_contact_temp_inter <
clm(
rating ~ contact * temp, data = wine, link = "logit"
)
#drop1(clm_rating_contact_temp_inter, test = "Chisq") # this accomplishes the same thing as anova()
anova(clm_rating_contact_temp, clm_rating_contact_temp_inter)
Likelihood ratio tests of cumulative link models:
formula: link:
clm_rating_contact_temp rating ~ contact + temp logit
clm_rating_contact_temp_inter rating ~ contact * temp logit
threshold:
clm_rating_contact_temp flexible
clm_rating_contact_temp_inter flexible
no.par AIC logLik LR.stat df
clm_rating_contact_temp 6 184.98 86.492
clm_rating_contact_temp_inter 7 186.83 86.416 0.1514 1
Pr(>Chisq)
clm_rating_contact_temp
clm_rating_contact_temp_inter 0.6972
No, The interaction term contact:temp
is not supported by the data.
Consider the following linear model which treats rating
as continuous:
\[ y_i = \alpha + \beta_1 \text{temp}_i + \beta_2 \text{contact}_i + \epsilon_i \]
where \(\epsilon_i \sim N(0, \sigma_{\epsilon}^2)\).
lm_rating_contact_temp < lm(as.numeric(rating) ~ contact + temp, data = wine)
To compare this to a CLM, we must use the probit link:
clm_rating_contact_temp_probit <
clm(
rating ~ contact + temp, data = wine, link = "probit"
)
tidy(clm_rating_contact_temp_probit) %>%
filter(coef.type == "location") %>%
mutate(model = "CLM") %>%
select(coef.type) %>%
bind_rows(
tidy(lm_rating_contact_temp) %>%
filter(term != "(Intercept)") %>%
# Need to divide by the residual SE here to get the right scale
mutate(estimate = estimate / summary(lm_rating_contact_temp)$sigma,
model = "LM")
) %>%
group_by(model) %>%
gt() %>%
fmt_number(c(estimate, std.error, statistic), decimals = 2) %>%
fmt(p.value, fns = scales::pvalue)
term  estimate  std.error  statistic  p.value 

CLM  
contactyes  0.87  0.27  3.25  0.001 
tempwarm  1.50  0.29  5.14  <0.001 
LM  
contactyes  0.79  0.20  3.36  0.001 
tempwarm  1.38  0.20  5.87  <0.001 
The relative estimates from the linear model are lower than those from the CLM (probit link), indicating that the assumptions of the linear model are not met. In particular, the distance between thresholds is not equidistant, as we can see from differences in the CLM coefficients:
Now that we have explored ordinal regression with just fixed effects, we will fit the following random effects model:
\[ \begin{align} \text{logit}(p(y_i \leq j)) &= \theta_j  \beta_1 \text{temp}_i  \beta_2 \text{contact}_i  u( \text{judge}_i) \\ i &= 1, \dots n \; \; \; \; \; \; j = 1, \dots, J  1 \end{align} \]
where the judge effects are independent of \(j\), and assumed normal: \(u(\text{judge}_i) \sim N(0, \sigma_u^2)\).
Each judge has 8 rating
s each (two per combination of temp
and contact
). See if we can spot the judge
variance in a plot of rating
s:
There is definitely some judgespecific variability in the perception of bitterness of wine. judge
5, for instance, doesnâ€™t stray far from rating
= 3, while judge
7 didnâ€™t consider any of the wines particularly bitter.
Fit the full model with ordinal::clmm
and logit link:
clmm_rating_contact_temp <
clmm(
rating ~ temp + contact + (1judge),
data = wine, link = "logit"
)
# This is an older function, which we need to run stats::profile later
clmm2_rating_contact_temp <
clmm2(
rating ~ temp + contact, random = judge,
data = wine, link = "logistic"
)
summary(clmm_rating_contact_temp)
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: rating ~ temp + contact + (1  judge)
data: wine
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 72 81.57 177.13 332(999) 1.03e05 2.8e+01
Random effects:
Groups Name Variance Std.Dev.
judge (Intercept) 1.279 1.131
Number of groups: judge 9
Coefficients:
Estimate Std. Error z value Pr(>z)
tempwarm 3.0630 0.5954 5.145 2.68e07 ***
contactyes 1.8349 0.5125 3.580 0.000344 ***

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Threshold coefficients:
Estimate Std. Error z value
12 1.6237 0.6824 2.379
23 1.5134 0.6038 2.507
34 4.2285 0.8090 5.227
45 6.0888 0.9725 6.261
Compare model coefficients:
bind_rows(
CLM = tidy(clm_rating_contact_temp),
CLMM = tidy(clmm_rating_contact_temp),
.id = "model"
) %>%
select(coef.type) %>%
group_by(model) %>%
gt() %>%
fmt_number(c(estimate, std.error, statistic), decimals = 2) %>%
fmt(p.value, fns = scales::pvalue)
term  estimate  std.error  statistic  p.value 

CLM  
12  −1.34  0.52  −2.60  0.009 
23  1.25  0.44  2.86  0.004 
34  3.47  0.60  5.80  <0.001 
45  5.01  0.73  6.85  <0.001 
contactyes  1.53  0.48  3.21  0.001 
tempwarm  2.50  0.53  4.73  <0.001 
CLMM  
12  −1.62  0.68  −2.38  0.017 
23  1.51  0.60  2.51  0.012 
34  4.23  0.81  5.23  <0.001 
45  6.09  0.97  6.26  <0.001 
tempwarm  3.06  0.60  5.14  <0.001 
contactyes  1.83  0.51  3.58  <0.001 
Both fixed effect estimates \(\beta_1\) and \(\beta_2\) are higher in the CLMM. Use anova
to compare the CLMM to the CLM:
anova(clm_rating_contact_temp, clmm_rating_contact_temp)
Likelihood ratio tests of cumulative link models:
formula: link:
clm_rating_contact_temp rating ~ contact + temp logit
clmm_rating_contact_temp rating ~ temp + contact + (1  judge) logit
threshold:
clm_rating_contact_temp flexible
clmm_rating_contact_temp flexible
no.par AIC logLik LR.stat df Pr(>Chisq)
clm_rating_contact_temp 6 184.98 86.492
clmm_rating_contact_temp 7 177.13 81.565 9.853 1 0.001696
clm_rating_contact_temp
clmm_rating_contact_temp **

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Unsurprisingly, the judge
term makes a significant improvement to the fit. We can extract profile confidence intervals on the variance \(\sigma_u\) using stats::profile
:
2.5 % 97.5 %
stDev 0.5014584 2.26678
Note that these intervals are asymmetric (\(\sigma_u\) = 1.28), unlike the less accurate Wald tests. We can produce â€śbest guessâ€ť estimates for judge
effects using conditional modes:
tibble(
judge_effect = clmm_rating_contact_temp$ranef,
cond_var = clmm_rating_contact_temp$condVar
) %>%
mutate(
judge = fct_reorder(factor(1:n()), judge_effect),
conf.low = judge_effect  qnorm(0.975) * sqrt(cond_var),
conf.high = judge_effect + qnorm(0.975) * sqrt(cond_var)
) %>%
ggplot(aes(y = judge, x = judge_effect)) +
geom_point(size = 2) +
geom_linerange(size = 1, aes(xmin = conf.low, xmax = conf.high)) +
theme(panel.grid.major.x = element_line(color = "grey"))
There are different ways to extract predicted probabilities. First, and most obviously, with the predict
function:
wine %>%
bind_cols(
pred = predict(
# Have to use clmm2 for predict
clmm2_rating_contact_temp, newdata = wine
)
) %>%
# These are predicted probabilities for the average judge, so we can
# exclude the judge variable
distinct(rating, temp, contact, pred) %>%
arrange(temp, contact, rating)
# A tibble: 15 x 4
rating temp contact pred
<ord> <fct> <fct> <dbl>
1 1 cold no 0.165
2 2 cold no 0.655
3 3 cold no 0.166
4 1 cold yes 0.0305
5 2 cold yes 0.390
6 3 cold yes 0.496
7 4 cold yes 0.0696
8 2 warm no 0.166
9 3 warm no 0.587
10 4 warm no 0.191
11 5 warm no 0.0463
12 2 warm yes 0.0313
13 3 warm yes 0.306
14 4 warm yes 0.428
15 5 warm yes 0.233
This only gives us predictions for rating
, temp
and contact
values which exist in the data. There is no predicted probability for rating
> 3, temp
cold and contact
no, for example.
Another way is to prespecify which values we want to predict:
nd <
crossing(
temp = factor(c("cold", "warm")),
contact = factor(c("no", "yes")),
rating = factor(1:5, ordered = T)
)
nd %>%
bind_cols(pred = predict(clmm2_rating_contact_temp, nd)) %>%
ggplot(aes(x = glue::glue("{temp}{contact}"), y = pred, fill = rating)) +
geom_col() +
scale_fill_td(palette = "div5") +
scale_y_continuous(expand = c(0, 0), labels = scales::percent) +
labs(x = "tempcontact", y = "predicted probability")
We can also get modelestimated cumulative probabilities by considering the model coefficients. For example, for a cold temp
and contact
, the cumulative probability of a bitterness rating
\(j\) or less:
\[ P(y_i \leq j) = \text{logit}^{1} [\theta_j  \beta_2 \text{contact}_i] \]
where we are considering the average judge (\(u(\text{judge}_i) = 0\)). The inverse logit is \(\text{logit}^{1}(x) = 1 / (1 + \exp(x))\), and can be calculated with plogis
as a shorthand (brms::inv_logit_scaled
is another). We can subtract cumulative probabilities to get noncumulative probabilities of a rating \(j\). For example, \(j\) = 3:
plogis(clmm_rating_contact_temp$Theta[3]  clmm_rating_contact_temp$beta[2]) 
plogis(clmm_rating_contact_temp$Theta[2]  clmm_rating_contact_temp$beta[2])
contactyes
0.4960357
which matches the value calculated previously using predict
.
The emmeans
package provides functionality for estimating marginal mean effects of ordinal models. The package documentation also provides an example using ordinal
and wine
data here.
In the â€śModels supported by emmeans
â€ť document, we see the following:
Object.class  Package  Group  Arguments/notes  

clm  ordinal  O  mode = c("latent", "linear.predictor", "cum.prob", "exc.prob", "prob", "mean.class", "scale") 

clmm  ordinal  O  Like clm but no "scale" mode 

emmeans(clmm_rating_contact_temp,
specs = list(pairwise ~ temp, pairwise ~ contact), mode = "latent")
$`emmeans of temp`
temp emmean SE df asymp.LCL asymp.UCL
cold 1.63 0.547 Inf 2.707 0.562
warm 1.43 0.532 Inf 0.387 2.470
Results are averaged over the levels of: contact
Confidence level used: 0.95
$`pairwise differences of temp`
1 estimate SE df z.ratio p.value
cold  warm 3.06 0.595 Inf 5.145 <.0001
Results are averaged over the levels of: contact
$`emmeans of contact`
contact emmean SE df asymp.LCL asymp.UCL
no 1.020 0.522 Inf 2.043 0.00274
yes 0.815 0.513 Inf 0.191 1.82072
Results are averaged over the levels of: temp
Confidence level used: 0.95
$`pairwise differences of contact`
1 estimate SE df z.ratio p.value
no  yes 1.83 0.513 Inf 3.580 0.0003
Results are averaged over the levels of: temp
The contrast estimates are in terms of the latent (underlying unobserved) bitterness rating
.
Using mode = "cum.prob"
and mode = "exc.prob
", we can get cumulative probabilities and exceedance (1  cumulative) probabilities. For example, the probability of a rating
of at least 4 for different temp
:
temp exc.prob SE df asymp.LCL asymp.UCL
cold 0.049 0.0304 Inf 0.0107 0.109
warm 0.450 0.1084 Inf 0.2371 0.662
Results are averaged over the levels of: contact
Confidence level used: 0.95
mode = "prob"
gives us probability distributions of each rating
, which have a nice auto plot
functionality:
emmeans(clmm_rating_contact_temp,
~ rating  temp, mode = "prob") %>%
plot() +
add_facet_borders()
So far, we have used the logit link (and briefly the probit link to compare estimates with linear regression). The links available to ordinal::clmm
are logit, probit, cloglog, loglog, and cauchit.
We can fit the CLMM using all of these links and compare loglikelihoods:
wine %>%
nest(everything()) %>%
crossing(
link = c("logit", "probit", "cloglog", "loglog", "cauchit")
) %>%
mutate(
mod = map2(
data, link,
~clmm(
rating ~ 1 + contact + temp + (1judge),
data = .x, link = .y
)
)
) %>%
mutate(mod_summary = map(mod, glance)) %>%
unnest(mod_summary) %>%
select(link, logLik, AIC, BIC) %>%
arrange(logLik)
# A tibble: 5 x 4
link logLik AIC BIC
<chr> <logLik> <dbl> <dbl>
1 cauchit 86.83499 188. 204.
2 cloglog 82.72936 179. 195.
3 logit 81.56541 177. 193.
4 loglog 81.54137 177. 193.
5 probit 80.93061 176. 192.
The probit model appears to be the best description of the data.
We can also consider the effect of â€śflexibleâ€ť vs â€śequidistantâ€ť thresholds:
wine %>%
nest(data = everything()) %>%
crossing(
link = c("logit", "probit", "cloglog", "loglog", "cauchit"),
threshold = c("flexible", "equidistant")
) %>%
mutate(
mod = pmap(
list(data, link, threshold),
function(a, b, c) {
clmm(
rating ~ 1 + contact + temp + (1judge),
data = a, link = b, threshold = c
)
}
)
) %>%
#mutate(mod_summary = map(mod, glance)) %>%
mutate(
mod_summary = map(
mod,
# glance() on a clmm object returns a <logLik> variable type which
# can't be bound together by unnest(), so need to convert it to numeric
~glance(.x) %>% mutate(logLik = as.numeric(logLik))
)
) %>%
unnest(mod_summary) %>%
select(link, threshold, logLik, edf, AIC, BIC) %>%
arrange(logLik)
# A tibble: 10 x 6
link threshold logLik edf AIC BIC
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 cauchit equidistant 87.8 5 186. 197.
2 cauchit flexible 86.8 7 188. 204.
3 loglog equidistant 84.4 5 179. 190.
4 cloglog equidistant 83.3 5 177. 188.
5 logit equidistant 83.1 5 176. 187.
6 cloglog flexible 82.7 7 179. 195.
7 probit equidistant 82.5 5 175. 186.
8 logit flexible 81.6 7 177. 193.
9 loglog flexible 81.5 7 177. 193.
10 probit flexible 80.9 7 176. 192.
Note the change in degrees of freedom, resulting in the equidistant probit model having the lowest BIC. In terms of log likelihood, however, flexible always outperform equidistant thresholds.
Thanks to detailed documentation, fitting cumulative link (mixed) models is very easy with ordinal
. In this post, we first learned the theoretical basis for these models, then worked through examples using wine
bitterness ratings from multiple judges.
In the next post, Iâ€™ll explore the Bayesian approach to ordinal regression with the brms
package.
setting value
version R version 4.1.2 (20211101)
os Windows 10 x64
system x86_64, mingw32
ui RTerm
language (EN)
collate English_Canada.1252
ctype English_Canada.1252
tz America/Curacao
date 20220110
Local: main C:/Users/tdunn/Documents/tdunn
Remote: main @ origin (https://github.com/taylordunn/tdunn)
Head: [da88cbd] 20220108: Build on cloud
For attribution, please cite this work as
Dunn (2020, March 15). tdunn: Ordinal regression in R: part 1. Retrieved from https://tdunn.ca/posts/20200315ordinalregressioninrpart1/
BibTeX citation
@misc{dunn2020ordinal, author = {Dunn, Taylor}, title = {tdunn: Ordinal regression in R: part 1}, url = {https://tdunn.ca/posts/20200315ordinalregressioninrpart1/}, year = {2020} }