tdunn - Canada COVID-19 data in R: exploring the API

R setup

library(tidyverse)
library(patchwork)
library(httr)
library(gt)

library(dunnr)
extrafont::loadfonts(device = "win", quiet = TRUE)
theme_set(theme_td())
set_geom_fonts()
set_palette()

Introduction

With this post, I will explore the Canadian COVID-19 tracker API and, depending on how it goes, turn some of the code into an R package. For an introduction to working with APIs, see this vignette from the httr package.

Note

In 2022, I ported my website from Distill to Quarto. This required me to re-run all the code in this post, so the data will go up to 2022-08-21, not 2021-12-28 (the original date of the post).

Summary

The first data I will retrieve is the data summaries overall, by province, and by health region. To save typing it every time, the following base_url is required for all GET requests:

base_url <- "https://api.covid19tracker.ca/"

Overall

Modify the URL with summary to get the latest data across all provinces:

api_url <- paste0(base_url, "summary")

Send the GET request with httr:

resp <- httr::GET(api_url)
resp

Response [https://api.covid19tracker.ca/summary]
  Date: 2022-10-27 16:55
  Status: 200
  Content-Type: application/json
  Size: 690 B

This returned a response object with the following structure:

str(resp, max.level = 1)

List of 10
 $ url        : chr "https://api.covid19tracker.ca/summary"
 $ status_code: int 200
 $ headers    :List of 12
  ..- attr(*, "class")= chr [1:2] "insensitive" "list"
 $ all_headers:List of 1
 $ cookies    :'data.frame':    0 obs. of  7 variables:
 $ content    : raw [1:690] 7b 22 64 61 ...
 $ date       : POSIXct[1:1], format: "2022-10-27 16:55:54"
 $ times      : Named num [1:6] 0 0.0453 0.0862 0.179 0.2352 ...
  ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
 $ request    :List of 7
  ..- attr(*, "class")= chr "request"
 $ handle     :Class 'curl_handle' <externalptr> 
 - attr(*, "class")= chr "response"

The status_code is the first thing to check:

resp$status_code

[1] 200

An HTTP status code of 200 is the standard indicator of a successful request.

Once confirmed successful, the content returned from the request is:

head(resp$content, 25)

 [1] 7b 22 64 61 74 61 22 3a 5b 7b 22 6c 61 74 65 73 74 5f 64 61 74 65 22 3a 22

Looks like the raw data is represented in hexadecimal. The httr::content() function can parse this data:

content_parsed <- httr::content(resp, as = "parsed")
str(content_parsed)

List of 2
 $ data        :List of 1
  ..$ :List of 23
  .. ..$ latest_date                : chr "2022-10-26"
  .. ..$ change_cases               : chr "2317"
  .. ..$ change_fatalities          : chr "52"
  .. ..$ change_tests               : chr "0"
  .. ..$ change_hospitalizations    : chr "24"
  .. ..$ change_criticals           : chr "-3"
  .. ..$ change_recoveries          : chr "0"
  .. ..$ change_vaccinations        : chr "84199"
  .. ..$ change_vaccinated          : chr "2099"
  .. ..$ change_boosters_1          : chr "6693"
  .. ..$ change_boosters_2          : chr "59646"
  .. ..$ change_vaccines_distributed: chr "0"
  .. ..$ total_cases                : chr "4278841"
  .. ..$ total_fatalities           : chr "45808"
  .. ..$ total_tests                : chr "61472226"
  .. ..$ total_hospitalizations     : chr "5629"
  .. ..$ total_criticals            : chr "279"
  .. ..$ total_recoveries           : chr "3815086"
  .. ..$ total_vaccinations         : chr "91945442"
  .. ..$ total_vaccinated           : chr "31624598"
  .. ..$ total_boosters_1           : chr "19362861"
  .. ..$ total_boosters_2           : chr "7266271"
  .. ..$ total_vaccines_distributed : chr "101586018"
 $ last_updated: chr "2022-10-26 23:19:02"

Per the documentation of httr, it is good practice to check the mime type of the response before parsing it.

httr::http_type(resp)

[1] "application/json"

Here, the data is JSON, which is a type that can be parsed with httr::content (via the jsonlite package). We can also parse this data manually as follows (not run):

jsonlite::fromJSON(
  content(resp, "text"),
  simplifyVector = FALSE
)

The returned data structure is a list of lists. data is a list with all of the summary statistics, while last_updated gives a timestamp of when the data was last updated. Put the data into a data frame:

summary_overall <- content_parsed$data %>% data.frame()
glimpse(summary_overall)

Rows: 1
Columns: 23
$ latest_date                 <chr> "2022-10-26"
$ change_cases                <chr> "2317"
$ change_fatalities           <chr> "52"
$ change_tests                <chr> "0"
$ change_hospitalizations     <chr> "24"
$ change_criticals            <chr> "-3"
$ change_recoveries           <chr> "0"
$ change_vaccinations         <chr> "84199"
$ change_vaccinated           <chr> "2099"
$ change_boosters_1           <chr> "6693"
$ change_boosters_2           <chr> "59646"
$ change_vaccines_distributed <chr> "0"
$ total_cases                 <chr> "4278841"
$ total_fatalities            <chr> "45808"
$ total_tests                 <chr> "61472226"
$ total_hospitalizations      <chr> "5629"
$ total_criticals             <chr> "279"
$ total_recoveries            <chr> "3815086"
$ total_vaccinations          <chr> "91945442"
$ total_vaccinated            <chr> "31624598"
$ total_boosters_1            <chr> "19362861"
$ total_boosters_2            <chr> "7266271"
$ total_vaccines_distributed  <chr> "101586018"

All of these variables are character type, and should be converted into integer and Date types:

summary_overall <- summary_overall %>%
  mutate(
    across(matches("^change|^total"), as.integer),
    across(matches("date"), as.Date)
  )
glimpse(summary_overall)

Rows: 1
Columns: 23
$ latest_date                 <date> 2022-10-26
$ change_cases                <int> 2317
$ change_fatalities           <int> 52
$ change_tests                <int> 0
$ change_hospitalizations     <int> 24
$ change_criticals            <int> -3
$ change_recoveries           <int> 0
$ change_vaccinations         <int> 84199
$ change_vaccinated           <int> 2099
$ change_boosters_1           <int> 6693
$ change_boosters_2           <int> 59646
$ change_vaccines_distributed <int> 0
$ total_cases                 <int> 4278841
$ total_fatalities            <int> 45808
$ total_tests                 <int> 61472226
$ total_hospitalizations      <int> 5629
$ total_criticals             <int> 279
$ total_recoveries            <int> 3815086
$ total_vaccinations          <int> 91945442
$ total_vaccinated            <int> 31624598
$ total_boosters_1            <int> 19362861
$ total_boosters_2            <int> 7266271
$ total_vaccines_distributed  <int> 101586018

Province

Instead of aggregating over all provinces, I can use /summary/split to get province-level summaries:

resp <- httr::GET(paste0(base_url, "summary/split"))

content_parsed <- content(resp, as = "parsed")
str(content_parsed, max.level = 2)

List of 2
 $ data        :List of 13
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
 $ last_updated: chr "2022-10-26 23:19:02"

The data list now has 13 lists corresponding to the 13 provinces and territories. Look at the structure of one of them:

str(content_parsed$data[[1]])

List of 24
 $ province                   : chr "ON"
 $ date                       : chr "2022-10-21"
 $ change_cases               : int 0
 $ change_fatalities          : int 0
 $ change_tests               : int 0
 $ change_hospitalizations    : int 0
 $ change_criticals           : int 0
 $ change_recoveries          : int 0
 $ change_vaccinations        : int 0
 $ change_vaccinated          : int 0
 $ change_boosters_1          : int 0
 $ change_boosters_2          : int 0
 $ change_vaccines_distributed: int 0
 $ total_cases                : int 1477669
 $ total_fatalities           : int 14603
 $ total_tests                : int 25776503
 $ total_hospitalizations     : int 1663
 $ total_criticals            : int 158
 $ total_recoveries           : int 1448143
 $ total_vaccinations         : int 35825406
 $ total_vaccinated           : int 12319953
 $ total_boosters_1           : int 7624409
 $ total_boosters_2           : int 3076429
 $ total_vaccines_distributed : int 38554461

This is the same structure as the overall summary, but with the extra variable province indicating that these numbers are specific to Ontario.

A shortcut to compiling all of these lists into a single data frame with a row per province/territory is to use dplyr::bind_rows():

summary_province <- bind_rows(content_parsed$data)
glimpse(summary_province)

Rows: 13
Columns: 24
$ province                    <chr> "ON", "QC", "NS", "NB", "MB", "BC", "PE", …
$ date                        <chr> "2022-10-21", "2022-10-26", "2022-10-21", …
$ change_cases                <int> 0, 1013, 0, 0, 0, 0, 0, 0, 1304, 0, 0, 0, 0
$ change_fatalities           <int> 0, 24, 0, 0, 0, 0, 0, 0, 28, 0, 0, 0, 0
$ change_tests                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
$ change_hospitalizations     <int> 0, -24, 0, 0, 0, 0, 0, 0, 48, 0, 0, 0, 0
$ change_criticals            <int> 0, 2, 0, 0, 0, 0, 0, 0, -5, 0, 0, 0, 0
$ change_recoveries           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
$ change_vaccinations         <int> 0, 26264, 0, 0, 0, 0, 0, 0, 57935, 0, 0, 0…
$ change_vaccinated           <int> 0, 253, 0, 0, 0, 0, 0, 0, 1846, 0, 0, 0, 0
$ change_boosters_1           <int> 0, 1121, 0, 0, 0, 0, 0, 0, 5572, 0, 0, 0, 0
$ change_boosters_2           <int> 0, 10290, 0, 0, 0, 0, 0, 0, 49356, 0, 0, 0…
$ change_vaccines_distributed <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
$ total_cases                 <int> 1477669, 1223930, 106158, 70545, 150957, 3…
$ total_fatalities            <int> 14603, 16904, 451, 438, 2199, 4423, 46, 14…
$ total_tests                 <int> 25776503, 16733572, 1777959, 782596, 15498…
$ total_hospitalizations      <int> 1663, 1942, 44, 22, 291, 389, 10, 147, 111…
$ total_criticals             <int> 158, 52, 8, 1, 5, 21, 1, 5, 28, 0, 0, 0, 0
$ total_recoveries            <int> 1448143, 1099129, 43018, 65138, 129338, 30…
$ total_vaccinations          <int> 35825406, 22125898, 2235685, 1757172, 3001…
$ total_vaccinated            <int> 12319953, 7214095, 842600, 661271, 1101542…
$ total_boosters_1            <int> 7624409, 4638755, 503257, 395689, 607635, …
$ total_boosters_2            <int> 3076429, 2108496, 0, 0, 146806, 1091493, 7…
$ total_vaccines_distributed  <int> 38554461, 23092789, 2518832, 2212445, 3842…

bind_rows() also automatically converts the numeric columns to integer, but the date column is still character:

summary_province <- summary_province %>% mutate(date = as.Date(date))

Health region

Data my be split even further by health region with summary/split/hr:

resp <- httr::GET(paste0(base_url, "summary/split/hr"))

content_parsed <- content(resp, as = "parsed")
str(content_parsed, max.level = 1)

List of 2
 $ data        :List of 92
 $ last_updated: chr "2022-10-26 23:19:02"

This data consists of 92 entries with mostly the same variables as previous summaries:

str(content_parsed$data[[1]])

List of 22
 $ hr_uid                 : int 1201
 $ date                   : chr "2022-08-10"
 $ change_cases           : NULL
 $ change_fatalities      : NULL
 $ change_tests           : NULL
 $ change_hospitalizations: NULL
 $ change_criticals       : NULL
 $ change_recoveries      : NULL
 $ change_vaccinations    : NULL
 $ change_vaccinated      : NULL
 $ change_boosters_1      : NULL
 $ change_boosters_2      : NULL
 $ total_cases            : int 0
 $ total_fatalities       : int 1
 $ total_tests            : NULL
 $ total_hospitalizations : NULL
 $ total_criticals        : NULL
 $ total_recoveries       : int 640
 $ total_vaccinations     : NULL
 $ total_vaccinated       : NULL
 $ total_boosters_1       : NULL
 $ total_boosters_2       : NULL

The differences are the hr_uid column in place of province, and the lack of change_vaccines_distributed and total_vaccines_distributed, presumably because these numbers aren’t available at this granularity.

summary_region <- bind_rows(content_parsed$data) %>%
  mutate(date = as.Date(date))
glimpse(summary_region)

Rows: 92
Columns: 21
$ hr_uid                  <int> 1201, 1202, 1203, 1204, 1301, 1302, 1303, 1304…
$ date                    <date> 2022-08-10, 2022-08-10, 2022-08-10, 2022-08-1…
$ total_cases             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4…
$ total_fatalities        <int> 1, 1, 5, 58, 2, 1, 0, 0, 4, 0, 0, 3, 0, 1, 0, …
$ total_recoveries        <int> 640, 943, 947, 6344, 15766, 11569, 8848, 4374,…
$ total_tests             <int> NA, NA, NA, NA, 293294, 135345, 146674, 78585,…
$ total_hospitalizations  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1,…
$ total_criticals         <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1,…
$ total_vaccinations      <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ total_vaccinated        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ total_boosters_1        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ total_boosters_2        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ change_vaccinations     <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ change_vaccinated       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ change_boosters_1       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ change_cases            <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ change_fatalities       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ change_hospitalizations <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ change_criticals        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ change_boosters_2       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ change_recoveries       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

hr_uid is a unique identifier for each health region. A lookup table is available through the API with regions:

resp <- httr::GET(paste0(base_url, "regions"))

content_parsed <- content(resp, as = "parsed")
str(content_parsed, max.level = 1)

List of 1
 $ data:List of 92

There are 92 elements, matching the 92 health regions in the summary data, with the following structure:

regions <- bind_rows(content_parsed$data)
glimpse(regions)

Rows: 92
Columns: 4
$ hr_uid   <int> 471, 472, 473, 474, 475, 476, 591, 592, 593, 594, 595, 1011, …
$ province <chr> "SK", "SK", "SK", "SK", "SK", "SK", "BC", "BC", "BC", "BC", "…
$ engname  <chr> "Far North", "North", "Central", "Saskatoon", "Regina", "Sout…
$ frename  <chr> "Far North", "North", "Central", "Saskatoon", "Regina", "Sout…

Add the health region to the summary_region data:

summary_region <- regions %>%
  left_join(summary_region, by = "hr_uid")
glimpse(summary_region)

Rows: 92
Columns: 24
$ hr_uid                  <int> 471, 472, 473, 474, 475, 476, 591, 592, 593, 5…
$ province                <chr> "SK", "SK", "SK", "SK", "SK", "SK", "BC", "BC"…
$ engname                 <chr> "Far North", "North", "Central", "Saskatoon", …
$ frename                 <chr> "Far North", "North", "Central", "Saskatoon", …
$ date                    <date> 2022-10-08, 2022-10-08, 2022-10-08, 2022-10-0…
$ total_cases             <int> 0, 0, 0, 0, 0, 0, 68979, 169214, 78383, 38762,…
$ total_fatalities        <int> 8, 16, 5, 6, 3, 7, 670, 1807, 1049, 511, 386, …
$ total_recoveries        <int> 10248, 24271, 9069, 29440, 23925, 14978, 39899…
$ total_tests             <int> 66969, 206300, 115789, 380252, 290508, 176859,…
$ total_hospitalizations  <int> 2, 61, 22, 171, 54, 22, 48, 135, 110, 77, 15, …
$ total_criticals         <int> 0, 7, 1, 17, 5, 1, 3, 10, 4, 3, 1, 1, 1, 3, 0,…
$ total_vaccinations      <int> 70534, 319807, 201758, 513380, 430413, 273939,…
$ total_vaccinated        <int> 33236, 152611, 98324, 249513, 208970, 133557, …
$ total_boosters_1        <int> NA, NA, NA, NA, NA, NA, 413203, 967396, 829512…
$ total_boosters_2        <int> NA, NA, NA, NA, NA, NA, 181063, 348400, 268145…
$ change_vaccinations     <int> NA, NA, NA, NA, NA, NA, 23869, 56412, 39975, 3…
$ change_vaccinated       <int> NA, NA, NA, NA, NA, NA, 272, 1033, 1053, 595, …
$ change_boosters_1       <int> NA, NA, NA, NA, NA, NA, 1261, 2998, 2300, 1876…
$ change_cases            <int> NA, NA, NA, NA, NA, NA, 122, 171, 153, 114, 67…
$ change_fatalities       <int> NA, NA, NA, NA, NA, NA, 11, 13, 17, 8, 4, NA, …
$ change_hospitalizations <int> NA, NA, NA, NA, NA, NA, -24, 21, 15, 15, -5, N…
$ change_criticals        <int> NA, NA, NA, NA, NA, NA, -1, 2, 0, 0, 1, NA, NA…
$ change_boosters_2       <int> NA, NA, NA, NA, NA, NA, 15716, 40202, 30843, 2…
$ change_recoveries       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

Reports

Overall

Reports are much like summaries, but for every day instead of just the most recent.

resp <- httr::GET(paste0(base_url, "reports"))

content_parsed <- content(resp, as = "parsed")
str(content_parsed, max.level = 1)

List of 3
 $ province    : chr "All"
 $ last_updated: chr "2022-10-26 23:19:02"
 $ data        :List of 1004

An additional top-level variable province defines the scope of the report. The data list here consists of 705 elements with the following structure:

str(content_parsed$data[[1]])

List of 23
 $ date                       : chr "2020-01-25"
 $ change_cases               : int 1
 $ change_fatalities          : int 0
 $ change_tests               : int 2
 $ change_hospitalizations    : int 0
 $ change_criticals           : int 0
 $ change_recoveries          : int 0
 $ change_vaccinations        : int 0
 $ change_vaccinated          : int 0
 $ change_boosters_1          : int 0
 $ change_boosters_2          : int 0
 $ change_vaccines_distributed: int 0
 $ total_cases                : int 1
 $ total_fatalities           : int 0
 $ total_tests                : int 2
 $ total_hospitalizations     : int 0
 $ total_criticals            : int 0
 $ total_recoveries           : int 0
 $ total_vaccinations         : int 0
 $ total_vaccinated           : int 0
 $ total_boosters_1           : int 0
 $ total_boosters_2           : int 0
 $ total_vaccines_distributed : int 0

This first element has many zeroes, which makes sense as the date (January 25th, 2020) corresponds to the first confirmed case of COVID in Canada. The last element of this list should have today’s data:

str(content_parsed$data[[length(content_parsed$data)]])

List of 23
 $ date                       : chr "2022-10-26"
 $ change_cases               : int 2317
 $ change_fatalities          : int 52
 $ change_tests               : int 0
 $ change_hospitalizations    : int 24
 $ change_criticals           : int -3
 $ change_recoveries          : int 0
 $ change_vaccinations        : int 84199
 $ change_vaccinated          : int 2099
 $ change_boosters_1          : int 6693
 $ change_boosters_2          : int 59646
 $ change_vaccines_distributed: int 0
 $ total_cases                : int 1836623
 $ total_fatalities           : int 21915
 $ total_tests                : int 23647154
 $ total_hospitalizations     : int 3060
 $ total_criticals            : int 80
 $ total_recoveries           : int 1622808
 $ total_vaccinations         : int 31580797
 $ total_vaccinated           : int 10646491
 $ total_boosters_1           : int 6401148
 $ total_boosters_2           : int 2733841
 $ total_vaccines_distributed : int 35371888

The data may be bound together in the same way:

report_overall <- bind_rows(content_parsed$data) %>%
  mutate(date = as.Date(date))

Province

To split data by province, the two-letter code is provided as reports/province/{code}:

resp <- httr::GET(paste0(base_url, "reports/province/ns"))

content_parsed <- content(resp, as = "parsed")
report_ns <- bind_rows(content_parsed$data) %>%
  mutate(date = as.Date(date))
glimpse(report_ns)

Rows: 1,001
Columns: 23
$ date                        <date> 2020-01-25, 2020-01-26, 2020-01-27, 2020-…
$ change_cases                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ change_fatalities           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ change_tests                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ change_hospitalizations     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ change_criticals            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ change_recoveries           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ change_vaccinations         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ change_vaccinated           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ change_boosters_1           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ change_boosters_2           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ change_vaccines_distributed <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ total_cases                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ total_fatalities            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ total_tests                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ total_hospitalizations      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ total_criticals             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ total_recoveries            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ total_vaccinations          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ total_vaccinated            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ total_boosters_1            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ total_boosters_2            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ total_vaccines_distributed  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

Health region

Similarly, provide the hr_uid in reports/regions/{hr_uid} to get health region reports:

resp <- httr::GET(paste0(base_url, "reports/regions/1204"))

content_parsed <- content(resp, as = "parsed")
report_ns_central <- bind_rows(content_parsed$data) %>%
  mutate(date = as.Date(date))
glimpse(report_ns_central)

Rows: 856
Columns: 7
$ date              <date> 2020-01-15, 2020-01-16, 2020-01-17, 2020-01-18, 202…
$ change_cases      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ change_fatalities <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ total_cases       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ total_fatalities  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ change_recoveries <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ total_recoveries  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

I chose Nova Scotia central because it is where I live and, looking at this data, it clearly isn’t being updated day-to-day:

report_ns_central %>%
  filter(date >= "2021-12-20", date < "2021-12-28") %>%
  glimpse()

Rows: 8
Columns: 7
$ date              <date> 2021-12-20, 2021-12-21, 2021-12-22, 2021-12-23, 202…
$ change_cases      <int> 136, 63, 0, NA, NA, NA, NA, NA
$ change_fatalities <int> 0, 0, 0, NA, NA, NA, NA, NA
$ total_cases       <int> 6718, 6781, 6781, 6781, 6781, 6781, 6781, 6781
$ total_fatalities  <int> 87, 87, 87, 87, 87, 87, 87, 87
$ change_recoveries <int> 100, 113, 0, NA, NA, NA, NA, NA
$ total_recoveries  <int> 6231, 6344, 6344, 6344, 6344, 6344, 6344, 6344

There has, unfortunately, been hundreds of cases per day here recently. These numbers are reflected in the province report however:

report_ns %>%
  filter(date >= "2021-12-20", date < "2021-12-28") %>%
  glimpse()

Rows: 8
Columns: 23
$ date                        <date> 2021-12-20, 2021-12-21, 2021-12-22, 2021-…
$ change_cases                <int> 485, 522, 537, 689, 611, 569, 578, 581
$ change_fatalities           <int> 0, 0, 1, 0, 0, 0, 0, 0
$ change_tests                <int> 34815, 0, 10254, 0, 0, 0, 0, 0
$ change_hospitalizations     <int> 2, 1, 0, 4, 0, 0, 0, 0
$ change_criticals            <int> 0, 1, 0, 1, 0, 0, 0, 0
$ change_recoveries           <int> 263, 0, 0, 0, 0, 0, 0, 0
$ change_vaccinations         <int> 13018, 8953, 10225, 7903, 0, 0, 0, 0
$ change_vaccinated           <int> 423, 296, 366, 275, 0, 0, 0, 0
$ change_boosters_1           <int> 9403, 7298, 8283, 6367, 0, 0, 0, 0
$ change_boosters_2           <int> 0, 0, 0, 0, 0, 0, 0, 0
$ change_vaccines_distributed <int> 0, 0, 0, 0, 0, 0, 0, 0
$ total_cases                 <int> 11318, 11840, 12377, 13066, 13677, 14246, …
$ total_fatalities            <int> 110, 110, 111, 111, 111, 111, 111, 111
$ total_tests                 <int> 1506218, 1506218, 1516472, 1516472, 151647…
$ total_hospitalizations      <int> 9, 10, 10, 14, 14, 14, 14, 14
$ total_criticals             <int> 2, 3, 3, 4, 4, 4, 4, 4
$ total_recoveries            <int> 8643, 8643, 8643, 8643, 8643, 8643, 8643, …
$ total_vaccinations          <int> 1731205, 1740158, 1750383, 1758286, 175828…
$ total_vaccinated            <int> 792552, 792848, 793214, 793489, 793489, 79…
$ total_boosters_1            <int> 83071, 90369, 98652, 105019, 105019, 10501…
$ total_boosters_2            <int> 0, 0, 0, 0, 0, 0, 0, 0
$ total_vaccines_distributed  <int> 1950040, 1950040, 1950040, 1950040, 195004…

Parameters

The reports have a number of optional parameters available to alter the API request.

The fill_dates option fills dates with missing entries:

content_parsed <- paste0(base_url, "reports/regions/1204?fill_dates=false") %>%
  httr::GET() %>%
  content(as = "parsed")

bind_rows(content_parsed$data) %>% glimpse()

Rows: 939
Columns: 8
$ date              <chr> "2020-01-15", "2020-01-16", "2020-01-17", "2020-01-1…
$ change_cases      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ change_fatalities <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ total_cases       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ total_fatalities  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ change_recoveries <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ total_recoveries  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ fill              <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

The stat argument allows one to specify a single statistic to return:

content_parsed <- paste0(base_url, "reports/province/ns?stat=cases") %>%
  httr::GET() %>%
  content(as = "parsed")

bind_rows(content_parsed$data) %>% glimpse()

Rows: 1,001
Columns: 3
$ date         <chr> "2020-01-25", "2020-01-26", "2020-01-27", "2020-01-28", "…
$ change_cases <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ total_cases  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

The date parameter returns a report from a single date:

content_parsed <- paste0(base_url, "reports/province/ab?date=2021-12-25") %>%
  httr::GET() %>%
  content(as = "parsed")

bind_rows(content_parsed$data) %>% glimpse()

Rows: 1
Columns: 23
$ date                        <chr> "2021-12-25"
$ change_cases                <int> 2484
$ change_fatalities           <int> 0
$ change_tests                <int> 11479
$ change_hospitalizations     <int> 0
$ change_criticals            <int> 0
$ change_recoveries           <int> 0
$ change_vaccinations         <int> 0
$ change_vaccinated           <int> 0
$ change_boosters_1           <int> 0
$ change_boosters_2           <int> 0
$ change_vaccines_distributed <int> 0
$ total_cases                 <int> 351199
$ total_fatalities            <int> 3299
$ total_tests                 <int> 6347374
$ total_hospitalizations      <int> 318
$ total_criticals             <int> 64
$ total_recoveries            <int> 335047
$ total_vaccinations          <int> 7452649
$ total_vaccinated            <int> 3211241
$ total_boosters_1            <int> 761153
$ total_boosters_2            <int> 0
$ total_vaccines_distributed  <int> 8799859

Lastly, the after and before parameters return on/after and on/before specific dates:

content_parsed <-
  paste0(base_url, "reports/province/qc?after=2021-12-24&before=2021-12-26") %>%
  httr::GET() %>%
  content(as = "parsed")

bind_rows(content_parsed$data) %>% glimpse()

Rows: 3
Columns: 23
$ date                        <chr> "2021-12-24", "2021-12-25", "2021-12-26"
$ change_cases                <int> 10031, 9206, 8231
$ change_fatalities           <int> 2, 4, 10
$ change_tests                <int> 55863, 53334, 44022
$ change_hospitalizations     <int> 0, 0, 0
$ change_criticals            <int> 0, 0, 0
$ change_recoveries           <int> 3017, 3559, 0
$ change_vaccinations         <int> 85039, 24435, 117
$ change_vaccinated           <int> 2120, 638, 9
$ change_boosters_1           <int> 78745, 22612, 98
$ change_boosters_2           <int> 186, 124, 2
$ change_vaccines_distributed <int> 0, 0, 0
$ total_cases                 <int> 521126, 530332, 538563
$ total_fatalities            <int> 11660, 11664, 11674
$ total_tests                 <int> 14573238, 14626572, 14670594
$ total_hospitalizations      <int> 473, 473, 473
$ total_criticals             <int> 91, 91, 91
$ total_recoveries            <int> 460647, 464206, 464206
$ total_vaccinations          <int> 15034784, 15059219, 15059336
$ total_vaccinated            <int> 6701798, 6702436, 6702445
$ total_boosters_1            <int> 1021886, 1044498, 1044596
$ total_boosters_2            <int> 3469, 3593, 3595
$ total_vaccines_distributed  <int> 16179459, 16179459, 16179459

Note how parameters can be combined as above, by separating the arguments with &.

Vaccination data

We have already seen the vaccination data returned by summary and report requests. The variables include:

vaccinations: total doses administered
vaccinated: total number of people with two doses
boosters_1: total number of boosters (3rd dose) administered
vaccines_administered: total number of doses delivered to province

At the summary level:

summary_province %>%
  filter(province == "NS") %>%
  select(date, matches("vacc|boost")) %>%
  glimpse()

Rows: 1
Columns: 11
$ date                        <date> 2022-10-21
$ change_vaccinations         <int> 0
$ change_vaccinated           <int> 0
$ change_boosters_1           <int> 0
$ change_boosters_2           <int> 0
$ change_vaccines_distributed <int> 0
$ total_vaccinations          <int> 2235685
$ total_vaccinated            <int> 842600
$ total_boosters_1            <int> 503257
$ total_boosters_2            <int> 0
$ total_vaccines_distributed  <int> 2518832

At the report level:

report_ns %>%
  select(date, matches("vacc|boost")) %>%
  tail() %>%
  glimpse()

Rows: 6
Columns: 11
$ date                        <date> 2022-10-16, 2022-10-17, 2022-10-18, 2022-…
$ change_vaccinations         <int> 0, 0, 0, 0, 0, 0
$ change_vaccinated           <int> 0, 0, 0, 0, 0, 0
$ change_boosters_1           <int> 0, 0, 0, 0, 0, 0
$ change_boosters_2           <int> 0, 0, 0, 0, 0, 0
$ change_vaccines_distributed <int> 0, 0, 0, 0, 0, 0
$ total_vaccinations          <int> 2235685, 2235685, 2235685, 2235685, 223568…
$ total_vaccinated            <int> 842600, 842600, 842600, 842600, 842600, 84…
$ total_boosters_1            <int> 503257, 503257, 503257, 503257, 503257, 50…
$ total_boosters_2            <int> 0, 0, 0, 0, 0, 0
$ total_vaccines_distributed  <int> 2518832, 2518832, 2518832, 2518832, 251883…

Subregions

Vaccination date is also available at the subregion level for certain provinces and territories. The API documentation doesn’t actually specify which provinces and territories, but I can find out by requesting the data as follows:

resp <- httr::GET(paste0(base_url, "reports/sub-regions/summary"))

content_parsed <- content(resp, as = "parsed")
subregion_vacc_summary <- bind_rows(content_parsed$data) %>%
  mutate(date = as.Date(date))

glimpse(subregion_vacc_summary)

Rows: 806
Columns: 11
$ code           <chr> "SK001", "SK002", "SK003", "SK004", "SK005", "SK006", "…
$ date           <date> 2022-02-06, 2022-02-06, 2022-02-06, 2022-02-06, 2022-0…
$ total_dose_1   <int> 18733, 1959, 16606, 66614, 67900, 32682, 263867, 27576,…
$ percent_dose_1 <chr> "0.70048", "0.68282", "0.75147", "0.86337", "0.79184", …
$ source_dose_1  <chr> "total", "total", "total", "total", "total", "total", "…
$ total_dose_2   <int> 16875, 1706, 14655, 58628, 63117, 30866, 249513, 26220,…
$ percent_dose_2 <chr> "0.63101", "0.59463", "0.66318", "0.75986", "0.73606", …
$ source_dose_2  <chr> "total", "total", "total", "total", "total", "total", "…
$ total_dose_3   <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 367…
$ percent_dose_3 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "0.…
$ source_dose_3  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "pe…

The code labels can be retrieved via sub-regions:

resp <- httr::GET(paste0(base_url, "sub-regions"))

content_parsed <- content(resp, as = "parsed")
subregions <- bind_rows(content_parsed$data)

glimpse(subregions)

Rows: 806
Columns: 5
$ code       <chr> "AB001", "AB002", "AB003", "AB004", "AB005", "AB006", "AB00…
$ province   <chr> "AB", "AB", "AB", "AB", "AB", "AB", "AB", "AB", "AB", "AB",…
$ zone       <chr> "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUT…
$ region     <chr> "CROWSNEST PASS", "PINCHER CREEK", "FORT MACLEOD", "CARDSTO…
$ population <int> 6280, 8344, 6753, 16595, 25820, 19028, 11104, 6409, 27753, …

806 subregions, which matches the count from the summary, with the following distribution by province:

subregions %>% count(province) %>% gt()

province	n
AB	132
MB	79
NL	38
NT	30
ON	514
SK	13

Vaccine age groups

Overall

Vaccine data by age groups is reported week-by-week, and accessed with vaccines/age-groups:

resp <- httr::GET(paste0(base_url, "vaccines/age-groups"))

content_parsed <- content(resp, as = "parsed")
vaccine_age_groups <- bind_rows(content_parsed$data) %>%
  mutate(date = as.Date(date)) %>%
  filter(date <= "2021-12-28")

glimpse(vaccine_age_groups)

Rows: 54
Columns: 2
$ date <date> 2020-12-19, 2020-12-26, 2021-01-02, 2021-01-09, 2021-01-16, 2021…
$ data <chr> "{\"80+\": {\"full\": 0, \"group\": \"80+\", \"partial\": 335, \"…

The data here is returned as an un-parsed JSON string. Per the API documentation, it has to do with shifting reporting standards across weeks:

due to reporting standard shifts overtime, the JSON string data may not be consistent across weeks. Minimal effort is taken to normalize some of this data.

Look at the first element of data:

vaccine_age_groups$data[[1]] %>% str_trunc(80)

[1] "{\"80+\": {\"full\": 0, \"group\": \"80+\", \"partial\": 335, \"atleast1\": 335}, \"0-15\":..."

Parse the JSON:

jsonlite::fromJSON(vaccine_age_groups$data[[1]]) %>%
  str()

List of 8
 $ 80+         :List of 4
  ..$ full    : int 0
  ..$ group   : chr "80+"
  ..$ partial : int 335
  ..$ atleast1: int 335
 $ 0-15        :List of 4
  ..$ full    : int 0
  ..$ group   : chr "0-15"
  ..$ partial : int 0
  ..$ atleast1: int 0
 $ 16-69       :List of 4
  ..$ full    : int 0
  ..$ group   : chr "16-69"
  ..$ partial : int 11768
  ..$ atleast1: int 11768
 $ 70-74       :List of 4
  ..$ full    : int 0
  ..$ group   : chr "70-74"
  ..$ partial : int 174
  ..$ atleast1: int 174
 $ 75-79       :List of 4
  ..$ full    : int 0
  ..$ group   : chr "75-79"
  ..$ partial : int 85
  ..$ atleast1: int 85
 $ unknown     :List of 4
  ..$ full    : int 0
  ..$ group   : chr "Unknown"
  ..$ partial : int 0
  ..$ atleast1: int 0
 $ all_ages    :List of 4
  ..$ full    : int 0
  ..$ group   : chr "All ages"
  ..$ partial : int 12362
  ..$ atleast1: int 12362
 $ not_reported:List of 4
  ..$ full    : int 0
  ..$ group   : chr "Not reported"
  ..$ partial : int 0
  ..$ atleast1: int 0

To see how the reporting has changed over time, here is the most recent age group vaccination data:

jsonlite::fromJSON(
  vaccine_age_groups$data[[length(vaccine_age_groups$data)]]
) %>%
  str()

List of 13
 $ 0-4         :List of 4
  ..$ full    : int 0
  ..$ group   : chr "0-4"
  ..$ partial : int 276
  ..$ atleast1: int 276
 $ 80+         :List of 4
  ..$ full    : int 1640558
  ..$ group   : chr "80+"
  ..$ partial : int 30717
  ..$ atleast1: int 1671275
 $ 05-11       :List of 4
  ..$ full    : int 44794
  ..$ group   : chr "05-11"
  ..$ partial : int 1206532
  ..$ atleast1: int 1251326
 $ 12-17       :List of 4
  ..$ full    : int 2039683
  ..$ group   : chr "12-17"
  ..$ partial : int 111096
  ..$ atleast1: int 2150779
 $ 18-29       :List of 4
  ..$ full    : int 4916940
  ..$ group   : chr "18-29"
  ..$ partial : int 259662
  ..$ atleast1: int 5176602
 $ 30-39       :List of 4
  ..$ full    : int 4475399
  ..$ group   : chr "30-39"
  ..$ partial : int 185294
  ..$ atleast1: int 4660693
 $ 40-49       :List of 4
  ..$ full    : int 4250041
  ..$ group   : chr "40-49"
  ..$ partial : int 125031
  ..$ atleast1: int 4375072
 $ 50-59       :List of 4
  ..$ full    : int 4526830
  ..$ group   : chr "50-59"
  ..$ partial : int 104559
  ..$ atleast1: int 4631389
 $ 60-69       :List of 4
  ..$ full    : int 4459359
  ..$ group   : chr "60-69"
  ..$ partial : int 78044
  ..$ atleast1: int 4537403
 $ 70-79       :List of 4
  ..$ full    : int 2970726
  ..$ group   : chr "70-79"
  ..$ partial : int 41346
  ..$ atleast1: int 3012072
 $ unknown     :List of 4
  ..$ full    : int 2021
  ..$ group   : chr "Unknown"
  ..$ partial : int 804
  ..$ atleast1: int 2825
 $ all_ages    :List of 4
  ..$ full    : int 29326351
  ..$ group   : chr "All ages"
  ..$ partial : int 2143361
  ..$ atleast1: int 31469712
 $ not_reported:List of 4
  ..$ full    : int 0
  ..$ group   : chr "Not reported"
  ..$ partial : int 0
  ..$ atleast1: int 0

Each JSON data point can be converted to a data frame as follows:

jsonlite::fromJSON(vaccine_age_groups$data[[1]]) %>%
  bind_rows(.id = "group_code") %>%
  gt()

group_code	group	partial	atleast1
80+	80+	335	335
0-15	0-15	0	0
16-69	16-69	11768	11768
70-74	70-74	174	174
75-79	75-79	85	85
unknown	Unknown	0	0
all_ages	All ages	12362	12362
not_reported	Not reported	0	0

Use map and unnest to apply this to each row of the data:

vaccine_age_groups <- vaccine_age_groups %>%
  mutate(
    data = map(
      data,
      ~jsonlite::fromJSON(.x) %>% bind_rows(.id = "group_code")
    )
  ) %>%
  unnest(data)
glimpse(vaccine_age_groups)

Rows: 603
Columns: 6
$ date       <date> 2020-12-19, 2020-12-19, 2020-12-19, 2020-12-19, 2020-12-19…
$ group_code <chr> "80+", "0-15", "16-69", "70-74", "75-79", "unknown", "all_a…
$ full       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ group      <chr> "80+", "0-15", "16-69", "70-74", "75-79", "Unknown", "All a…
$ partial    <int> 335, 0, 11768, 174, 85, 0, 12362, 0, 2229, 5, 40170, 649, 4…
$ atleast1   <int> 335, 0, 11768, 174, 85, 0, 12362, 0, 2229, 5, 40170, 649, 4…

The unique groups:

vaccine_age_groups %>% count(group_code, group) %>% rmarkdown::paged_table()

ABCDEFGHIJ0123456789

group_code <chr>	group <chr>	n <int>
0-15	0-15	3
0-17	0-17	21
0-4	0-4	30
05-11	05-11	30
12-17	12-17	30
16-69	16-69	3
18-29	18-29	37
18-69	18-69	14
30-39	30-39	37
40-49	40-49	37

Visualize how the age ranges evolve over time:

# Make it a function that will allow splits later
plot_age_ranges <- function(vaccine_age_groups, split = "overall", ncol = 3) {
  p <- vaccine_age_groups %>%
    filter(str_detect(group, "\\d")) %>%
    separate(group, into = c("age_min", "age_max"),
             sep = "-", fill = "right", remove = FALSE) %>%
    mutate(
      age_min = readr::parse_number(age_min),
      # Set the upper range of the age to 100 (arbitrarility)
      age_max = replace_na(age_max, replace = "100") %>% as.numeric(),
      age_mid = (age_max + age_min) / 2,
      group = fct_reorder(group, age_mid)
    ) %>%
    ggplot(aes(x = date, color = group)) +
    geom_errorbar(aes(ymin = age_min, ymax = age_max)) +
    geom_text(
      data = . %>%
        slice_min(date) %>%
        mutate(age_mid = (age_max + age_min) / 2),
      aes(label = group, y = age_mid),
      hjust = 1, nudge_x = -3, show.legend = FALSE
    ) +
    geom_text(
      data = . %>%
        slice_max(date) %>%
        mutate(age_mid = (age_max + age_min) / 2),
      aes(label = group, y = age_mid),
      hjust = 0, nudge_x = 3, show.legend = FALSE
    ) +
    expand_limits(x = c(min(vaccine_age_groups$date) - 10,
                        max(vaccine_age_groups$date) + 10)) +
    scale_color_viridis_d(end = 0.8) +
    theme(legend.position = "none") +
    labs(x = "Date", y = "Age",
         title = "Age ranges for weekly vaccination reports, by date")
  
  if (split == "province") p + facet_wrap(~province, ncol = ncol)
  else if (split == "region") p + facet_wrap(~hr_uid, ncol = ncol)
  else {p}
}

plot_age_ranges(vaccine_age_groups)

Unsurprisingly, the age ranges become more granular overtime, with the exception of 70-79 which was originally split into 70-74 and 75-79.

Province

As with the other data, adding /split to the query returns vaccination data by province:

content_parsed <- paste0(base_url, "vaccines/age-groups/split") %>%
  httr::GET() %>%
  content(as = "parsed")

vaccine_age_groups_province <- bind_rows(content_parsed$data) %>%
  mutate(date = as.Date(date)) %>%
  filter(date <= "2021-12-28")
glimpse(vaccine_age_groups_province)

Rows: 1,564
Columns: 3
$ date     <date> 2020-12-14, 2020-12-15, 2020-12-16, 2020-12-16, 2020-12-17, …
$ data     <chr> "{\"0-4\": {\"full\": 0, \"group\": \"0-4\", \"partial\": 1, …
$ province <chr> "QC", "QC", "QC", "ON", "QC", "ON", "QC", "ON", "BC", "NL", "…

vaccine_age_groups_province <- vaccine_age_groups_province %>%
  mutate(
    data = map(
      data,
      ~jsonlite::fromJSON(.x) %>% bind_rows(.id = "group_code")
    )
  ) %>%
  unnest(data)
glimpse(vaccine_age_groups_province)

Rows: 15,715
Columns: 7
$ date       <date> 2020-12-14, 2020-12-14, 2020-12-14, 2020-12-14, 2020-12-14…
$ group_code <chr> "0-4", "80+", "05-11", "12-17", "18-29", "30-39", "40-49", …
$ full       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ group      <chr> "0-4", "80+", "05-11", "12-17", "18-29", "30-39", "40-49", …
$ partial    <int> 1, 169, 0, 0, 9, 11, 23, 22, 26, 35, 1, 328, 0, 2, 127, 216…
$ atleast1   <int> 1, 169, 0, 0, 9, 11, 23, 22, 26, 35, 1, 328, 0, 2, 127, 216…
$ province   <chr> "QC", "QC", "QC", "QC", "QC", "QC", "QC", "QC", "QC", "QC",…

vaccine_age_groups_province %>%
  filter(province == "QC") %>%
  plot_age_ranges(split = "province", ncol = 1)

A single province can also be obtained by altering the query with vaccines/age-groups/province/{code}:

content_parsed <- paste0(base_url, "vaccines/age-groups/province/ns") %>%
  httr::GET() %>%
  content(as = "parsed")

vaccine_age_groups_ns <- bind_rows(content_parsed$data) %>%
  mutate(
    date = as.Date(date),
    data = map(data, ~jsonlite::fromJSON(.x) %>% bind_rows(.id = "group_code"))
  ) %>%
  unnest(data) %>%
  filter(date <= "2021-12-28")
plot_age_ranges(vaccine_age_groups_ns)

Parameters

This query also has the after and before parameters available:

content_parsed <- paste0(base_url,
                         "vaccines/age-groups/province/ns?after=2021-11-01") %>%
  httr::GET() %>%
  content(as = "parsed")

glimpse(bind_rows(content_parsed$data))

Rows: 13
Columns: 2
$ date <chr> "2021-11-06", "2021-11-13", "2021-11-20", "2021-11-27", "2021-12-…
$ data <chr> "{\"0-4\": {\"full\": 0, \"group\": \"0-4\", \"partial\": 0, \"at…

A specific age group can also be queried with the group parameter. The value must be passed in URL encoding. For example, the 80+ range:

content_parsed <- paste0(base_url,
                         "vaccines/age-groups?after=2021-11-01&group=80%2B") %>%
  httr::GET() %>%
  content(as = "parsed")

bind_rows(content_parsed$data) %>%
  mutate(
    date = as.Date(date),
    data = map(data, ~jsonlite::fromJSON(.x) %>% bind_rows(.id = "group_code"))
  ) %>%
  unnest(data) %>%
  filter(date <= "2021-12-28") %>%
  glimpse()

Rows: 8
Columns: 5
$ date     <date> 2021-11-06, 2021-11-13, 2021-11-20, 2021-11-27, 2021-12-04, …
$ full     <int> 1581895, 1585409, 1588815, 1592112, 1630884, 1633436, 1637238…
$ group    <chr> "80+", "80+", "80+", "80+", "80+", "80+", "80+", "80+"
$ partial  <int> 39515, 38628, 37810, 37087, 28833, 28621, 28542, 30717
$ atleast1 <int> 1621410, 1624037, 1626625, 1629199, 1659717, 1662057, 1665780…

The utils package has a URLencode function for translating the age groups:

vaccine_age_groups %>%
  distinct(group_code) %>%
  mutate(group_encoded = utils::URLencode(group_code, reserved = TRUE)) %>%
  gt()

group_code	group_encoded
80+	80%2B
0-15	0-15
16-69	16-69
70-74	70-74
75-79	75-79
unknown	unknown
all_ages	all_ages
not_reported	not_reported
0-17	0-17
18-69	18-69
18-29	18-29
30-39	30-39
40-49	40-49
50-59	50-59
60-69	60-69
70-79	70-79
0-4	0-4
05-11	05-11
12-17	12-17

Provinces

The API also provides a list of provinces and some population/geographical data:

content_parsed <- paste0(base_url, "provinces") %>%
  httr::GET() %>%
  content(as = "parsed")

provinces <- bind_rows(content_parsed)
glimpse(provinces)

Rows: 16
Columns: 10
$ id          <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
$ code        <chr> "ON", "QC", "NS", "NB", "MB", "BC", "PE", "SK", "AB", "NL"…
$ name        <chr> "Ontario", "Quebec", "Nova Scotia", "New Brunswick", "Mani…
$ population  <int> 14826276, 8604495, 992055, 789225, 1383765, 5214805, 16431…
$ area        <int> 917741, 1356128, 53338, 71450, 553556, 925186, 5660, 59167…
$ gdp         <int> 857384, 439375, 44354, 36966, 72688, 295401, 6994, 80679, …
$ geographic  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0
$ data_status <chr> "Reported", "Reported", "No report expected today", "Repor…
$ updated_at  <chr> "2022-10-22T00:09:07.000000Z", "2022-10-27T05:09:04.000000…
$ density     <dbl> 16.15518540, 6.34489886, 18.59940380, 11.04583625, 2.49977…

The extra elements reported here are not related to any particular province/territory:

provinces %>% filter(is.na(population)) %>% glimpse()

Rows: 3
Columns: 10
$ id          <int> 14, 15, 16
$ code        <chr> "_RC", "FA", "NFR"
$ name        <chr> "Repatriated Canadians", "Federal Allocation", "National F…
$ population  <int> NA, NA, NA
$ area        <int> NA, NA, NA
$ gdp         <int> NA, NA, NA
$ geographic  <int> 0, 0, 0
$ data_status <chr> "", "", ""
$ updated_at  <chr> NA, "2022-03-10T20:41:40.000000Z", "2022-08-15T22:51:36.00…
$ density     <dbl> NA, NA, NA

The geo_only parameter can be set to true to exclude these:

paste0(base_url, "provinces?geo_only=true") %>%
  httr::GET() %>%
  content(as = "parsed") %>%
  bind_rows() %>%
  glimpse()

Rows: 13
Columns: 10
$ id          <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
$ code        <chr> "ON", "QC", "NS", "NB", "MB", "BC", "PE", "SK", "AB", "NL"…
$ name        <chr> "Ontario", "Quebec", "Nova Scotia", "New Brunswick", "Mani…
$ population  <int> 14826276, 8604495, 992055, 789225, 1383765, 5214805, 16431…
$ area        <int> 917741, 1356128, 53338, 71450, 553556, 925186, 5660, 59167…
$ gdp         <int> 857384, 439375, 44354, 36966, 72688, 295401, 6994, 80679, …
$ geographic  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
$ data_status <chr> "Reported", "Reported", "No report expected today", "Repor…
$ updated_at  <chr> "2022-10-22T00:09:07.000000Z", "2022-10-27T05:09:04.000000…
$ density     <dbl> 16.15518540, 6.34489886, 18.59940380, 11.04583625, 2.49977…

A helpful variable is data_status, which indicates if the daily numbers have been reported:

provinces %>%
  select(name, data_status, updated_at) %>%
  gt()

name	data_status	updated_at
Ontario	Reported	2022-10-22T00:09:07.000000Z
Quebec	Reported	2022-10-27T05:09:04.000000Z
Nova Scotia	No report expected today	2022-08-15T22:52:42.000000Z
New Brunswick	Reported	2022-08-15T22:50:10.000000Z
Manitoba	No report expected today	2022-10-21T04:17:34.000000Z
British Columbia	Reported	2022-10-21T04:50:19.000000Z
Prince Edward Island	Reported	2022-08-15T22:50:27.000000Z
Saskatchewan	No report expected today	2022-10-21T16:51:04.000000Z
Alberta	Reported	2022-10-27T05:18:23.000000Z
Newfoundland and Labrador	No report expected today	2022-08-15T22:50:18.000000Z
Northwest Territories	No report expected today	2022-08-15T22:50:34.000000Z
Yukon	Reported	2022-08-15T22:51:21.000000Z
Nunavut	No report expected today	2022-08-15T22:51:02.000000Z
Repatriated Canadians		NA
Federal Allocation		2022-03-10T20:41:40.000000Z
National Federal Reserve		2022-08-15T22:51:36.000000Z

data_status may take on the following values:

`data_status`	Meaning
Waiting for report	This status indicated that an update is expected to happen in the current day, but has not yet occurred.
In progress	This status indicates that an update is in-progress and will be completed soon. Note that when this status is indicated, some or all data may not be updated yet.
Reported	When this status is indicated, the province has been updated with final data for the day, and the update is complete.
No report expected today	When this status is indicated, the province is not expected to provide an update on the current day, and one should not be expected.
Custom	Custom statuses are used to communicate certain issues with a province’s update including delays or partial updates.

The density variable is population density, which is computed by dividing population by area:

provinces %>%
  transmute(name, population, area, density,
            density_manual = population / area) %>%
  gt()

name	population	area	density	density_manual
Ontario	14826276	917741	16.15518540	16.15518540
Quebec	8604495	1356128	6.34489886	6.34489886
Nova Scotia	992055	53338	18.59940380	18.59940380
New Brunswick	789225	71450	11.04583625	11.04583625
Manitoba	1383765	553556	2.49977419	2.49977419
British Columbia	5214805	925186	5.63649363	5.63649363
Prince Edward Island	164318	5660	29.03144876	29.03144876
Saskatchewan	1179844	591670	1.99409130	1.99409130
Alberta	4442879	642317	6.91695689	6.91695689
Newfoundland and Labrador	520553	373872	1.39232946	1.39232946
Northwest Territories	45504	1183085	0.03846216	0.03846216
Yukon	42986	474391	0.09061302	0.09061302
Nunavut	39403	1936113	0.02035160	0.02035160
Repatriated Canadians	NA	NA	NA	NA
Federal Allocation	NA	NA	NA	NA
National Federal Reserve	NA	NA	NA	NA

Next steps

I’m impressed by the organization and accessibility of this API, and decided to write a simple R package to wrap it. In my next post, I’ll detail my steps and thought process.

Reproducibility

Session info

 setting  value
 version  R version 4.2.1 (2022-06-23 ucrt)
 os       Windows 10 x64 (build 19044)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Canada.utf8
 ctype    English_Canada.utf8
 tz       America/Curacao
 date     2022-10-27
 pandoc   2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)

ABCDEFGHIJ0123456789

	package <chr>	ondiskversion <chr>	loadedversion <chr>
assertthat	assertthat	0.2.1	0.2.1
backports	backports	1.4.1	1.4.1
broom	broom	1.0.0	1.0.0
cachem	cachem	1.0.6	1.0.6
callr	callr	3.7.1	3.7.1
cellranger	cellranger	1.1.0	1.1.0
cli	cli	3.3.0	3.4.1
codetools	codetools	0.2.18	0.2-18
colorspace	colorspace	2.0.3	2.0-3
crayon	crayon	1.5.1	1.5.1

Git repository

Local:    main C:/Users/tdunn/Documents/tdunn-quarto
Remote:   main @ origin (https://github.com/taylordunn/tdunn-quarto.git)
Head:     [4eb5bf2] 2022-10-26: Added font import to style sheet

Source code, R environment

Reuse

https://creativecommons.org/licenses/by/4.0/

Citation

BibTeX citation:

@online{dunn2021,
  author = {Dunn, Taylor},
  title = {Canada {COVID-19} Data in {R:} Exploring the {API}},
  date = {2021-12-28},
  url = {https://tdunn.ca/posts/2021-12-28-canada-covid-19-data-in-r-exploring-the-api},
  langid = {en}
}

For attribution, please cite this work as:

Dunn, Taylor. 2021. “Canada COVID-19 Data in R: Exploring the API.” December 28, 2021. https://tdunn.ca/posts/2021-12-28-canada-covid-19-data-in-r-exploring-the-api.