Canada COVID-19 data in R: exploring the API

R COVID-19 API

An exploration of the Canadian COVID-19 tracker API.

Taylor Dunn
2021-12-28
Setup
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(patchwork)
library(httr)

library(dunnr)
extrafont::loadfonts(device = "win", quiet = TRUE)
theme_set(theme_td())
set_geom_fonts()
set_palette()

Introduction

With this post, I will explore the Canadian COVID-19 tracker API and, depending on how it goes, turn some of the code into an R package. For an introduction to working with APIs, see this vignette from the httr package.

Summary

The first data I will retrieve is the data summaries overall, by province, and by health region. To save typing it every time, the following base_url is required for all GET requests:

base_url <- "https://api.covid19tracker.ca/"

Overall

Modify the URL with summary to get the latest data across all provinces:

api_url <- paste0(base_url, "summary")

Send the GET request with httr:

resp <- httr::GET(api_url)
resp
Response [https://api.covid19tracker.ca/summary]
  Date: 2022-02-08 02:23
  Status: 200
  Content-Type: application/json
  Size: 701 B

This returned a response object with the following structure:

str(resp, max.level = 1)
List of 10
 $ url        : chr "https://api.covid19tracker.ca/summary"
 $ status_code: int 200
 $ headers    :List of 12
  ..- attr(*, "class")= chr [1:2] "insensitive" "list"
 $ all_headers:List of 1
 $ cookies    :'data.frame':    0 obs. of  7 variables:
 $ content    : raw [1:701] 7b 22 64 61 ...
 $ date       : POSIXct[1:1], format: "2022-02-08 02:23:53"
 $ times      : Named num [1:6] 0 0.00378 0.0416 0.1281 0.18053 ...
  ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
 $ request    :List of 7
  ..- attr(*, "class")= chr "request"
 $ handle     :Class 'curl_handle' <externalptr> 
 - attr(*, "class")= chr "response"

The status_code is the first thing to check:

resp$status_code
[1] 200

An HTTP status code of 200 is the standard indicator of a successful request.

Once confirmed successful, the content returned from the request is:

head(resp$content, 25)
 [1] 7b 22 64 61 74 61 22 3a 5b 7b 22 6c 61 74 65 73 74 5f 64 61 74 65
[23] 22 3a 22

Looks like the raw data is represented in hexadecimal. The httr::content function can parse this data:

content_parsed <- httr::content(resp, as = "parsed")
str(content_parsed)
List of 2
 $ data        :List of 1
  ..$ :List of 23
  .. ..$ latest_date                : chr "2022-02-07"
  .. ..$ change_cases               : chr "10053"
  .. ..$ change_fatalities          : chr "85"
  .. ..$ change_tests               : chr "32403"
  .. ..$ change_hospitalizations    : chr "-108"
  .. ..$ change_criticals           : chr "-4"
  .. ..$ change_recoveries          : chr "10723"
  .. ..$ change_vaccinations        : chr "120982"
  .. ..$ change_vaccinated          : chr "47192"
  .. ..$ change_boosters_1          : chr "67481"
  .. ..$ change_boosters_2          : chr "816"
  .. ..$ change_vaccines_distributed: chr "0"
  .. ..$ total_cases                : chr "3133071"
  .. ..$ total_fatalities           : chr "34804"
  .. ..$ total_tests                : chr "57312382"
  .. ..$ total_hospitalizations     : chr "8397"
  .. ..$ total_criticals            : chr "1040"
  .. ..$ total_recoveries           : chr "2917286"
  .. ..$ total_vaccinations         : chr "79029175"
  .. ..$ total_vaccinated           : chr "30395498"
  .. ..$ total_boosters_1           : chr "16111453"
  .. ..$ total_boosters_2           : chr "115246"
  .. ..$ total_vaccines_distributed : chr "86403442"
 $ last_updated: chr "2022-02-07 18:17:09"

The returned data structure is a list of lists. data is a list with all of the summary statistics, while last_updated gives a timestamp of when the data was last updated. Put the data into a data frame:

summary_overall <- content_parsed$data %>% data.frame()
glimpse(summary_overall)
Rows: 1
Columns: 23
$ latest_date                 <chr> "2022-02-07"
$ change_cases                <chr> "10053"
$ change_fatalities           <chr> "85"
$ change_tests                <chr> "32403"
$ change_hospitalizations     <chr> "-108"
$ change_criticals            <chr> "-4"
$ change_recoveries           <chr> "10723"
$ change_vaccinations         <chr> "120982"
$ change_vaccinated           <chr> "47192"
$ change_boosters_1           <chr> "67481"
$ change_boosters_2           <chr> "816"
$ change_vaccines_distributed <chr> "0"
$ total_cases                 <chr> "3133071"
$ total_fatalities            <chr> "34804"
$ total_tests                 <chr> "57312382"
$ total_hospitalizations      <chr> "8397"
$ total_criticals             <chr> "1040"
$ total_recoveries            <chr> "2917286"
$ total_vaccinations          <chr> "79029175"
$ total_vaccinated            <chr> "30395498"
$ total_boosters_1            <chr> "16111453"
$ total_boosters_2            <chr> "115246"
$ total_vaccines_distributed  <chr> "86403442"

All of these variables are character type, and should be converted into integer and Date types:

summary_overall <- summary_overall %>%
  mutate(
    across(matches("^change|^total"), as.integer),
    across(matches("date"), as.Date)
  )
glimpse(summary_overall)
Rows: 1
Columns: 23
$ latest_date                 <date> 2022-02-07
$ change_cases                <int> 10053
$ change_fatalities           <int> 85
$ change_tests                <int> 32403
$ change_hospitalizations     <int> -108
$ change_criticals            <int> -4
$ change_recoveries           <int> 10723
$ change_vaccinations         <int> 120982
$ change_vaccinated           <int> 47192
$ change_boosters_1           <int> 67481
$ change_boosters_2           <int> 816
$ change_vaccines_distributed <int> 0
$ total_cases                 <int> 3133071
$ total_fatalities            <int> 34804
$ total_tests                 <int> 57312382
$ total_hospitalizations      <int> 8397
$ total_criticals             <int> 1040
$ total_recoveries            <int> 2917286
$ total_vaccinations          <int> 79029175
$ total_vaccinated            <int> 30395498
$ total_boosters_1            <int> 16111453
$ total_boosters_2            <int> 115246
$ total_vaccines_distributed  <int> 86403442

Province

Instead of aggregating over all provinces, I can use /summary/split to get province-level summaries:

api_url <- paste0(base_url, "summary/split")
resp <- httr::GET(api_url)
content_parsed <- content(resp, as = "parsed")

str(content_parsed, max.level = 2)
List of 2
 $ data        :List of 13
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
  ..$ :List of 24
 $ last_updated: chr "2022-02-07 18:17:09"

The data list now has 13 lists corresponding to the 13 provinces and territories. Look at the structure of one of them:

str(content_parsed$data[[1]])
List of 24
 $ province                   : chr "ON"
 $ date                       : chr "2022-02-07"
 $ change_cases               : int 2088
 $ change_fatalities          : int 11
 $ change_tests               : int 12880
 $ change_hospitalizations    : int -75
 $ change_criticals           : int 0
 $ change_recoveries          : int 3556
 $ change_vaccinations        : int 76139
 $ change_vaccinated          : int 25058
 $ change_boosters_1          : int 43057
 $ change_boosters_2          : int 0
 $ change_vaccines_distributed: int 0
 $ total_cases                : int 1056149
 $ total_fatalities           : int 11836
 $ total_tests                : int 22758270
 $ total_hospitalizations     : int 2155
 $ total_criticals            : int 486
 $ total_recoveries           : int 1010878
 $ total_vaccinations         : int 31025150
 $ total_vaccinated           : int 11827116
 $ total_boosters_1           : int 6604089
 $ total_boosters_2           : int 92886
 $ total_vaccines_distributed : int 33390981

This is the same structure as the overall summary, but with the extra variable province indicating that these numbers are specific to Ontario.

A shortcut to compiling all of these lists into a single data frame with a row per province/territory is to use dplyr::bind_rows:

summary_province <- bind_rows(content_parsed$data)
glimpse(summary_province)
Rows: 13
Columns: 24
$ province                    <chr> "ON", "QC", "NS", "NB", "MB", "B~
$ date                        <chr> "2022-02-07", "2022-02-07", "202~
$ change_cases                <int> 2088, 2240, 0, 0, 1107, 0, 0, 0,~
$ change_fatalities           <int> 11, 20, 0, 0, 15, 0, 0, 0, 39, 0~
$ change_tests                <int> 12880, 0, 0, 0, 4611, 0, 0, 0, 1~
$ change_hospitalizations     <int> -75, 14, 0, 0, -5, 0, 0, 0, -42,~
$ change_criticals            <int> 0, 1, 0, 0, -5, 0, 0, 0, 0, 0, 0~
$ change_recoveries           <int> 3556, 0, 0, 0, -29, 0, 0, 0, 719~
$ change_vaccinations         <int> 76139, 0, 0, 0, 13615, 0, 0, 0, ~
$ change_vaccinated           <int> 25058, 0, 0, 0, 4552, 0, 0, 0, 1~
$ change_boosters_1           <int> 43057, 0, 0, 0, 8081, 0, 0, 0, 1~
$ change_boosters_2           <int> 0, 0, 0, 0, 0, 0, 0, 0, 816, 0, ~
$ change_vaccines_distributed <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ total_cases                 <int> 1056149, 883192, 39245, 30756, 1~
$ total_fatalities            <int> 11836, 13495, 158, 259, 1600, 26~
$ total_tests                 <int> 22758270, 16203377, 1723072, 698~
$ total_hospitalizations      <int> 2155, 2425, 95, 159, 702, 946, 1~
$ total_criticals             <int> 486, 178, 13, 17, 47, 139, 2, 31~
$ total_recoveries            <int> 1010878, 818774, 34587, 26941, 1~
$ total_vaccinations          <int> 31025150, 18256163, 2145666, 168~
$ total_vaccinated            <int> 11827116, 6955609, 809686, 64020~
$ total_boosters_1            <int> 6604089, 3858192, 458668, 347621~
$ total_boosters_2            <int> 92886, NA, NA, NA, NA, NA, NA, 1~
$ total_vaccines_distributed  <int> 33390981, 19822969, 2243162, 196~

bind_rows also automatically converts the numeric columns to integer, but the date column is still character:

summary_province <- summary_province %>% mutate(date = as.Date(date))

Health region

Data my be split even further by health region with summary/split/hr:

api_url <- paste0(base_url, "summary/split/hr")
resp <- httr::GET(api_url)
content_parsed <- content(resp, as = "parsed")

str(content_parsed, max.level = 1)
List of 2
 $ data        :List of 92
 $ last_updated: chr "2022-02-07 18:17:09"

This data consists of 92 entries with mostly the same variables as previous summaries:

str(content_parsed$data[[1]])
List of 22
 $ hr_uid                 : int 6001
 $ date                   : chr "2022-02-03"
 $ change_cases           : int 22
 $ change_fatalities      : int 2
 $ change_tests           : int -28
 $ change_hospitalizations: NULL
 $ change_criticals       : NULL
 $ change_recoveries      : int 15
 $ change_vaccinations    : int 0
 $ change_vaccinated      : int 0
 $ change_boosters_1      : int 0
 $ change_boosters_2      : NULL
 $ total_cases            : int 3168
 $ total_fatalities       : int 18
 $ total_tests            : int 29464
 $ total_hospitalizations : NULL
 $ total_criticals        : NULL
 $ total_recoveries       : int 3041
 $ total_vaccinations     : int 90574
 $ total_vaccinated       : int 34856
 $ total_boosters_1       : int 18308
 $ total_boosters_2       : NULL

The differences are the hr_uid column in place of province, and the lack of change_vaccines_distributed and total_vaccines_distributed, presumably because these numbers aren’t available at this granularity.

summary_region <- bind_rows(content_parsed$data) %>%
  mutate(date = as.Date(date))
glimpse(summary_region)
Rows: 92
Columns: 20
$ hr_uid                  <int> 6001, 6101, 6201, 591, 592, 593, 594~
$ date                    <date> 2022-02-03, 2022-02-03, 2022-02-03,~
$ change_cases            <int> 22, 142, 71, 177945, 533, 218, 216, ~
$ change_fatalities       <int> 2, 2, 0, 1438, 0, 6, 2, 6, NA, NA, N~
$ change_tests            <int> -28, 316, 78, 7141, 204, 50, 29, 593~
$ change_recoveries       <int> 15, 118, 91, NA, NA, NA, NA, NA, NA,~
$ change_vaccinations     <int> 0, NA, NA, 10944, 3203, 5453, 990, 9~
$ change_vaccinated       <int> 0, NA, NA, 1266, 503, 321, 264, 1199~
$ change_boosters_1       <int> 0, NA, NA, 9126, 2510, 4997, 643, 79~
$ total_cases             <int> 3168, 6363, 1891, 330638, 54112, 287~
$ total_fatalities        <int> 18, 17, 5, 2675, 322, 185, 287, 639,~
$ total_tests             <int> 29464, 39858, 31609, 10349, 1874, 85~
$ total_recoveries        <int> 3041, 5379, 1444, 123400, 39899, 210~
$ total_vaccinations      <int> 90574, 96926, 69834, 3852814, 165917~
$ total_vaccinated        <int> 34856, 38000, 25925, 1522022, 637703~
$ total_boosters_1        <int> 18308, 18272, 11770, 712827, 349656,~
$ total_hospitalizations  <int> NA, 0, NA, 946, 167, 121, 32, 249, 1~
$ total_criticals         <int> NA, 0, NA, 139, 31, 10, 14, 30, 5, 0~
$ change_hospitalizations <int> NA, NA, NA, 556, -10, 4, -2, -5, NA,~
$ change_criticals        <int> NA, NA, NA, 82, -1, 0, 2, -2, NA, NA~

hr_uid is a unique identifier for each health region. A lookup table is available through the API with regions:

api_url <- paste0(base_url, "regions")
resp <- httr::GET(api_url)
content_parsed <- content(resp, as = "parsed")

str(content_parsed, max.level = 1)
List of 1
 $ data:List of 92

There are 92 elements, matching the 92 health regions in the summary data, with the following structure:

regions <- bind_rows(content_parsed$data)
glimpse(regions)
Rows: 92
Columns: 4
$ hr_uid   <int> 471, 472, 473, 474, 475, 476, 591, 592, 593, 594, 5~
$ province <chr> "SK", "SK", "SK", "SK", "SK", "SK", "BC", "BC", "BC~
$ engname  <chr> "Far North", "North", "Central", "Saskatoon", "Regi~
$ frename  <chr> "Far North", "North", "Central", "Saskatoon", "Regi~

Add the health region to the summary_region data:

summary_region <- regions %>%
  left_join(summary_region, by = "hr_uid")
glimpse(summary_region)
Rows: 92
Columns: 23
$ hr_uid                  <int> 471, 472, 473, 474, 475, 476, 591, 5~
$ province                <chr> "SK", "SK", "SK", "SK", "SK", "SK", ~
$ engname                 <chr> "Far North", "North", "Central", "Sa~
$ frename                 <chr> "Far North", "North", "Central", "Sa~
$ date                    <date> 2022-02-06, 2022-02-06, 2022-02-06,~
$ change_cases            <int> 37, 366, 139, 509, 413, 361, 177945,~
$ change_fatalities       <int> 0, 1, 1, 0, 3, 2, 1438, 0, 6, 2, 6, ~
$ change_tests            <int> 109, -1098, 375, 1105, 883, 771, 714~
$ change_recoveries       <int> 83, 428, 266, 829, 459, 416, NA, NA,~
$ change_vaccinations     <int> 117, 412, 192, 527, 432, 261, 10944,~
$ change_vaccinated       <int> 70, 287, 150, 355, 325, 201, 1266, 5~
$ change_boosters_1       <int> NA, NA, NA, NA, NA, NA, 9126, 2510, ~
$ total_cases             <int> 10614, 26088, 10035, 32115, 26257, 1~
$ total_fatalities        <int> 85, 258, 90, 190, 219, 169, 2675, 32~
$ total_tests             <int> 66969, 206300, 115789, 380252, 29050~
$ total_recoveries        <int> 10248, 24271, 9069, 29440, 23925, 14~
$ total_vaccinations      <int> 70534, 319807, 201758, 513380, 43041~
$ total_vaccinated        <int> 33236, 152611, 98324, 249513, 208970~
$ total_boosters_1        <int> NA, NA, NA, NA, NA, NA, 712827, 3496~
$ total_hospitalizations  <int> 2, 61, 22, 171, 54, 22, 946, 167, 12~
$ total_criticals         <int> 0, 7, 1, 17, 5, 1, 139, 31, 10, 14, ~
$ change_hospitalizations <int> -1, -1, -3, -15, -9, -2, 556, -10, 4~
$ change_criticals        <int> 0, 2, -1, -3, -1, 0, 82, -1, 0, 2, -~

Reports

Overall

Reports are much like summaries, but for every day instead of just the most recent.

api_url <- paste0(base_url, "reports")
resp <- httr::GET(api_url)
content_parsed <- content(resp, as = "parsed")

str(content_parsed, max.level = 1)
List of 3
 $ province    : chr "All"
 $ last_updated: chr "2022-02-07 18:17:09"
 $ data        :List of 745

An additional top-level variable province defines the scope of the report. The data list here consists of 705 elements with the following structure:

str(content_parsed$data[[1]])
List of 23
 $ date                       : chr "2020-01-25"
 $ change_cases               : int 1
 $ change_fatalities          : int 0
 $ change_tests               : int 2
 $ change_hospitalizations    : int 0
 $ change_criticals           : int 0
 $ change_recoveries          : int 0
 $ change_vaccinations        : int 0
 $ change_vaccinated          : int 0
 $ change_boosters_1          : int 0
 $ change_boosters_2          : NULL
 $ change_vaccines_distributed: int 0
 $ total_cases                : int 1
 $ total_fatalities           : int 0
 $ total_tests                : int 2
 $ total_hospitalizations     : int 0
 $ total_criticals            : int 0
 $ total_recoveries           : int 0
 $ total_vaccinations         : int 0
 $ total_vaccinated           : int 0
 $ total_boosters_1           : int 0
 $ total_boosters_2           : NULL
 $ total_vaccines_distributed : int 0

This first element has many zeroes, which makes sense as the date (January 25th, 2020) corresponds to the first confirmed case of COVID in Canada. The last element of this list should have today’s data:

str(content_parsed$data[[length(content_parsed$data)]])
List of 23
 $ date                       : chr "2022-02-07"
 $ change_cases               : int 10053
 $ change_fatalities          : int 85
 $ change_tests               : int 32403
 $ change_hospitalizations    : int -108
 $ change_criticals           : int -4
 $ change_recoveries          : int 10723
 $ change_vaccinations        : int 120982
 $ change_vaccinated          : int 47192
 $ change_boosters_1          : int 67481
 $ change_boosters_2          : int 816
 $ change_vaccines_distributed: int 0
 $ total_cases                : int 3133071
 $ total_fatalities           : int 34804
 $ total_tests                : int 57312382
 $ total_hospitalizations     : int 8397
 $ total_criticals            : int 1040
 $ total_recoveries           : int 2917286
 $ total_vaccinations         : int 79029175
 $ total_vaccinated           : int 30395498
 $ total_boosters_1           : int 16111453
 $ total_boosters_2           : int 115246
 $ total_vaccines_distributed : int 86403442

The data may be bound together in the same way:

report_overall <- bind_rows(content_parsed$data) %>%
  mutate(date = as.Date(date))

Province

To split data by province, the two-letter code is provided as reports/province/{code}:

api_url <- paste0(base_url, "reports/province/ns")
resp <- httr::GET(api_url)
content_parsed <- content(resp, as = "parsed")

report_ns <- bind_rows(content_parsed$data) %>%
  mutate(date = as.Date(date))
glimpse(report_ns)
Rows: 745
Columns: 22
$ date                        <date> 2020-01-25, 2020-01-26, 2020-01~
$ change_cases                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ change_fatalities           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ change_tests                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ change_hospitalizations     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ change_criticals            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ change_recoveries           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ change_vaccinations         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ change_vaccinated           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ change_boosters_1           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ change_vaccines_distributed <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ total_cases                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ total_fatalities            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ total_tests                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ total_hospitalizations      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ total_criticals             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ total_recoveries            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ total_vaccinations          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ total_vaccinated            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ total_boosters_1            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ total_vaccines_distributed  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ change_boosters_2           <int> NA, NA, NA, NA, NA, NA, NA, NA, ~

Health region

Similarly, provide the hr_uid in reports/regions/{hr_uid} to get health region reports:

api_url <- paste0(base_url, "reports/regions/1204")
resp <- httr::GET(api_url)
content_parsed <- content(resp, as = "parsed")

report_ns_central <- bind_rows(content_parsed$data) %>%
  mutate(date = as.Date(date))
glimpse(report_ns_central)
Rows: 729
Columns: 7
$ date              <date> 2020-01-17, 2020-01-18, 2020-01-19, 2020-~
$ change_cases      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
$ change_fatalities <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
$ total_cases       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
$ total_fatalities  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
$ change_recoveries <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
$ total_recoveries  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~

I chose Nova Scotia central because it is where I live and, looking at this data, it clearly isn’t being updated day-to-day:

tail(report_ns_central) %>% glimpse()
Rows: 6
Columns: 7
$ date              <date> 2022-02-01, 2022-02-02, 2022-02-03, 2022-~
$ change_cases      <int> NA, NA, NA, NA, NA, NA
$ change_fatalities <int> NA, NA, NA, NA, NA, NA
$ total_cases       <int> 6781, 6781, 6781, 6781, 6781, 6781
$ total_fatalities  <int> 87, 87, 87, 87, 87, 87
$ change_recoveries <int> NA, NA, NA, NA, NA, NA
$ total_recoveries  <int> 6344, 6344, 6344, 6344, 6344, 6344

There has, unfortunately, been hundreds of cases per day here recently. These numbers are reflected in the province report however:

tail(report_ns) %>% glimpse()
Rows: 6
Columns: 22
$ date                        <date> 2022-02-02, 2022-02-03, 2022-02~
$ change_cases                <int> 395, 401, 594, 382, 349, 0
$ change_fatalities           <int> 6, 4, 1, 0, 0, 0
$ change_tests                <int> 3115, 3950, 3239, 0, 0, 0
$ change_hospitalizations     <int> -3, 5, 2, 3, -7, 0
$ change_criticals            <int> 0, 0, 2, -1, -1, 0
$ change_recoveries           <int> 387, 382, 471, 0, 0, 0
$ change_vaccinations         <int> 9855, 10978, 9559, 0, 0, 0
$ change_vaccinated           <int> 1787, 2050, 1799, 0, 0, 0
$ change_boosters_1           <int> 7501, 8461, 7300, 0, 0, 0
$ change_vaccines_distributed <int> 0, 0, 0, 0, 0, 0
$ total_cases                 <int> 37519, 37920, 38514, 38896, 3924~
$ total_fatalities            <int> 153, 157, 158, 158, 158, 158
$ total_tests                 <int> 1715883, 1719833, 1723072, 17230~
$ total_hospitalizations      <int> 92, 97, 99, 102, 95, 95
$ total_criticals             <int> 13, 13, 15, 14, 13, 13
$ total_recoveries            <int> 33734, 34116, 34587, 34587, 3458~
$ total_vaccinations          <int> 2125129, 2136107, 2145666, 21456~
$ total_vaccinated            <int> 805837, 807887, 809686, 809686, ~
$ total_boosters_1            <int> 442907, 451368, 458668, 458668, ~
$ total_vaccines_distributed  <int> 2243162, 2243162, 2243162, 22431~
$ change_boosters_2           <int> 0, 0, 0, 0, 0, 0

Parameters

The reports have a number of optional parameters available to alter the API request.

The fill_dates option fills dates with missing entries:

content_parsed <- paste0(base_url, "reports/regions/1204?fill_dates=false") %>%
  httr::GET() %>%
  content(as = "parsed")
bind_rows(content_parsed$data) %>% glimpse()
Rows: 752
Columns: 8
$ date              <chr> "2020-01-17", "2020-01-18", "2020-01-19", ~
$ change_cases      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
$ change_fatalities <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
$ total_cases       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
$ total_fatalities  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
$ change_recoveries <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
$ total_recoveries  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
$ fill              <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~

There aren’t any missing dates in this report, so the fill_dates parameter makes no difference here.

The stat argument allows one to specify a single statistic to return:

content_parsed <- paste0(base_url, "reports/province/ns?stat=cases") %>%
  httr::GET() %>%
  content(as = "parsed")
bind_rows(content_parsed$data) %>% glimpse()
Rows: 745
Columns: 3
$ date         <chr> "2020-01-25", "2020-01-26", "2020-01-27", "2020~
$ change_cases <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ total_cases  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~

The date parameter returns a report from a single date:

content_parsed <- paste0(base_url, "reports/province/ab?date=2021-12-25") %>%
  httr::GET() %>%
  content(as = "parsed")
bind_rows(content_parsed$data) %>% glimpse()
Rows: 1
Columns: 21
$ date                        <chr> "2021-12-25"
$ change_cases                <int> 2484
$ change_fatalities           <int> 0
$ change_tests                <int> 11479
$ change_hospitalizations     <int> 0
$ change_criticals            <int> 0
$ change_recoveries           <int> 0
$ change_vaccinations         <int> 0
$ change_vaccinated           <int> 0
$ change_boosters_1           <int> 0
$ change_vaccines_distributed <int> 0
$ total_cases                 <int> 351199
$ total_fatalities            <int> 3299
$ total_tests                 <int> 6347374
$ total_hospitalizations      <int> 318
$ total_criticals             <int> 64
$ total_recoveries            <int> 335047
$ total_vaccinations          <int> 7452649
$ total_vaccinated            <int> 3211241
$ total_boosters_1            <int> 761153
$ total_vaccines_distributed  <int> 8799859

Lastly, the after and before parameters return on/after and on/before specific dates:

content_parsed <-
  paste0(base_url, "reports/province/qc?after=2021-12-24&before=2021-12-26") %>%
  httr::GET() %>%
  content(as = "parsed")
bind_rows(content_parsed$data) %>% glimpse()
Rows: 3
Columns: 21
$ date                        <chr> "2021-12-24", "2021-12-25", "202~
$ change_cases                <int> 10031, 9206, 8231
$ change_fatalities           <int> 2, 4, 10
$ change_tests                <int> 55863, 53334, 44022
$ change_hospitalizations     <int> 0, 0, 0
$ change_criticals            <int> 0, 0, 0
$ change_recoveries           <int> 3017, 3559, 0
$ change_vaccinations         <int> 97263, 0, 0
$ change_vaccinated           <int> 3523, 0, 0
$ change_boosters_1           <int> 87581, 0, 0
$ change_vaccines_distributed <int> 0, 0, 0
$ total_cases                 <int> 521126, 530332, 538563
$ total_fatalities            <int> 11660, 11664, 11674
$ total_tests                 <int> 14573238, 14626572, 14670594
$ total_hospitalizations      <int> 473, 473, 473
$ total_criticals             <int> 91, 91, 91
$ total_recoveries            <int> 460647, 464206, 464206
$ total_vaccinations          <int> 14963214, 14963214, 14963214
$ total_vaccinated            <int> 6678444, 6678444, 6678444
$ total_boosters_1            <int> 997853, 997853, 997853
$ total_vaccines_distributed  <int> 16179459, 16179459, 16179459

Note how parameters can be combined as above, by separating the arguments with &.

Vaccination data

We have already seen the vaccination data returned by summary and report requests. The variables include:

At the summary level:

summary_province %>%
  filter(province == "NS") %>%
  select(date, matches("vacc|boost")) %>%
  glimpse()
Rows: 1
Columns: 11
$ date                        <date> 2022-02-07
$ change_vaccinations         <int> 0
$ change_vaccinated           <int> 0
$ change_boosters_1           <int> 0
$ change_boosters_2           <int> 0
$ change_vaccines_distributed <int> 0
$ total_vaccinations          <int> 2145666
$ total_vaccinated            <int> 809686
$ total_boosters_1            <int> 458668
$ total_boosters_2            <int> NA
$ total_vaccines_distributed  <int> 2243162

At the report level:

report_ns %>%
  select(date, matches("vacc|boost")) %>%
  tail() %>%
  glimpse()
Rows: 6
Columns: 10
$ date                        <date> 2022-02-02, 2022-02-03, 2022-02~
$ change_vaccinations         <int> 9855, 10978, 9559, 0, 0, 0
$ change_vaccinated           <int> 1787, 2050, 1799, 0, 0, 0
$ change_boosters_1           <int> 7501, 8461, 7300, 0, 0, 0
$ change_vaccines_distributed <int> 0, 0, 0, 0, 0, 0
$ total_vaccinations          <int> 2125129, 2136107, 2145666, 21456~
$ total_vaccinated            <int> 805837, 807887, 809686, 809686, ~
$ total_boosters_1            <int> 442907, 451368, 458668, 458668, ~
$ total_vaccines_distributed  <int> 2243162, 2243162, 2243162, 22431~
$ change_boosters_2           <int> 0, 0, 0, 0, 0, 0

Subregions

Vaccination date is also available at the subregion level for certain provinces and territories. The API documentation doesn’t actually specify which provinces and territories, but I can find out by requesting the data as follows:

api_url <- paste0(base_url, "reports/sub-regions/summary")
resp <- httr::GET(api_url)
content_parsed <- content(resp, as = "parsed")

subregion_vacc_summary <- bind_rows(content_parsed$data) %>%
  mutate(date = as.Date(date))
glimpse(subregion_vacc_summary)
Rows: 806
Columns: 11
$ code           <chr> "ON001", "ON002", "ON003", "ON004", "ON005", ~
$ date           <date> 2021-12-10, 2021-12-10, 2021-12-10, 2021-12-~
$ total_dose_1   <int> 94206, 16667, 42626, 32777, 33222, 38548, 264~
$ percent_dose_1 <chr> "0.80220", "0.74250", "0.76330", "0.78180", "~
$ source_dose_1  <chr> "percent", "percent", "percent", "percent", "~
$ total_dose_2   <int> 90037, 16086, 41001, 31590, 32136, 36948, 254~
$ percent_dose_2 <chr> "0.76670", "0.71660", "0.73420", "0.75350", "~
$ source_dose_2  <chr> "percent", "percent", "percent", "percent", "~
$ total_dose_3   <int> 5249, 644, 2167, 1711, 2193, 3099, 1278, 6384~
$ percent_dose_3 <chr> "0.04470", "0.02870", "0.03880", "0.04080", "~
$ source_dose_3  <chr> "percent", "percent", "percent", "percent", "~

The code labels can be retrieved via sub-regions:

api_url <- paste0(base_url, "sub-regions")
resp <- httr::GET(api_url)
content_parsed <- content(resp, as = "parsed")
subregions <- bind_rows(content_parsed$data)
glimpse(subregions)
Rows: 806
Columns: 5
$ code       <chr> "AB001", "AB002", "AB003", "AB004", "AB005", "AB0~
$ province   <chr> "AB", "AB", "AB", "AB", "AB", "AB", "AB", "AB", "~
$ zone       <chr> "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOU~
$ region     <chr> "CROWSNEST PASS", "PINCHER CREEK", "FORT MACLEOD"~
$ population <int> 6280, 8344, 6753, 16595, 25820, 19028, 11104, 640~

806 subregions, which matches the count from the summary, with the following distribution by province:

subregions %>% count(province)
# A tibble: 6 x 2
  province     n
  <chr>    <int>
1 AB         132
2 MB          79
3 NL          38
4 NT          30
5 ON         514
6 SK          13

Vaccine age groups

Overall

Vaccine data by age groups is reported week-by-week, and accessed with vaccines/age-groups:

api_url <- paste0(base_url, "vaccines/age-groups")
resp <- httr::GET(api_url)
content_parsed <- content(resp, as = "parsed")
vaccine_age_groups <- bind_rows(content_parsed$data) %>%
  mutate(date = as.Date(date))
glimpse(vaccine_age_groups)
Rows: 59
Columns: 2
$ date <date> 2020-12-19, 2020-12-26, 2021-01-02, 2021-01-09, 2021-0~
$ data <chr> "{\"80+\": {\"full\": 0, \"group\": \"80+\", \"partial\~

The data here is returned as an un-parsed JSON string. Per the API documentation, it has to do with shifting reporting standards across weeks:

due to reporting standard shifts overtime, the JSON string data may not be consistent across weeks. Minimal effort is taken to normalize some of this data.

Look at the first element of data:

vaccine_age_groups$data[[1]] %>% str_trunc(80)
[1] "{\"80+\": {\"full\": 0, \"group\": \"80+\", \"partial\": 335, \"atleast1\": 335}, \"0-15\":..."

Parse the JSON:

jsonlite::fromJSON(vaccine_age_groups$data[[1]]) %>%
  str()
List of 8
 $ 80+         :List of 4
  ..$ full    : int 0
  ..$ group   : chr "80+"
  ..$ partial : int 335
  ..$ atleast1: int 335
 $ 0-15        :List of 4
  ..$ full    : int 0
  ..$ group   : chr "0-15"
  ..$ partial : int 0
  ..$ atleast1: int 0
 $ 16-69       :List of 4
  ..$ full    : int 0
  ..$ group   : chr "16-69"
  ..$ partial : int 11768
  ..$ atleast1: int 11768
 $ 70-74       :List of 4
  ..$ full    : int 0
  ..$ group   : chr "70-74"
  ..$ partial : int 174
  ..$ atleast1: int 174
 $ 75-79       :List of 4
  ..$ full    : int 0
  ..$ group   : chr "75-79"
  ..$ partial : int 85
  ..$ atleast1: int 85
 $ unknown     :List of 4
  ..$ full    : int 0
  ..$ group   : chr "Unknown"
  ..$ partial : int 0
  ..$ atleast1: int 0
 $ all_ages    :List of 4
  ..$ full    : int 0
  ..$ group   : chr "All ages"
  ..$ partial : int 12362
  ..$ atleast1: int 12362
 $ not_reported:List of 4
  ..$ full    : int 0
  ..$ group   : chr "Not reported"
  ..$ partial : int 0
  ..$ atleast1: int 0

To see how the reporting has changed over time, here is the most recent age group vaccination data:

jsonlite::fromJSON(
  vaccine_age_groups$data[[length(vaccine_age_groups$data)]]
) %>%
  str()
List of 13
 $ 0-4         :List of 4
  ..$ full    : int 5
  ..$ group   : chr "0-4"
  ..$ partial : int 316
  ..$ atleast1: int 321
 $ 80+         :List of 4
  ..$ full    : int 1655401
  ..$ group   : chr "80+"
  ..$ partial : int 30058
  ..$ atleast1: int 1685459
 $ 05-11       :List of 4
  ..$ full    : int 463426
  ..$ group   : chr "05-11"
  ..$ partial : int 1108954
  ..$ atleast1: int 1572380
 $ 12-17       :List of 4
  ..$ full    : int 2053646
  ..$ group   : chr "12-17"
  ..$ partial : int 112114
  ..$ atleast1: int 2165760
 $ 18-29       :List of 4
  ..$ full    : int 5015063
  ..$ group   : chr "18-29"
  ..$ partial : int 257319
  ..$ atleast1: int 5272382
 $ 30-39       :List of 4
  ..$ full    : int 4550284
  ..$ group   : chr "30-39"
  ..$ partial : int 182607
  ..$ atleast1: int 4732891
 $ 40-49       :List of 4
  ..$ full    : int 4301162
  ..$ group   : chr "40-49"
  ..$ partial : int 120528
  ..$ atleast1: int 4421690
 $ 50-59       :List of 4
  ..$ full    : int 4558823
  ..$ group   : chr "50-59"
  ..$ partial : int 102525
  ..$ atleast1: int 4661348
 $ 60-69       :List of 4
  ..$ full    : int 4498215
  ..$ group   : chr "60-69"
  ..$ partial : int 80600
  ..$ atleast1: int 4578815
 $ 70-79       :List of 4
  ..$ full    : int 2996554
  ..$ group   : chr "70-79"
  ..$ partial : int 42435
  ..$ atleast1: int 3038989
 $ unknown     :List of 4
  ..$ full    : int 1989
  ..$ group   : chr "Unknown"
  ..$ partial : int 2322
  ..$ atleast1: int 4311
 $ all_ages    :List of 4
  ..$ full    : int 30094568
  ..$ group   : chr "All ages"
  ..$ partial : int 2039778
  ..$ atleast1: int 32134346
 $ not_reported:List of 4
  ..$ full    : int 0
  ..$ group   : chr "Not reported"
  ..$ partial : int 0
  ..$ atleast1: int 0

Each JSON data point can be converted to a data frame as follows:

jsonlite::fromJSON(vaccine_age_groups$data[[1]]) %>%
  bind_rows(.id = "group_code")
# A tibble: 8 x 5
  group_code    full group        partial atleast1
  <chr>        <int> <chr>          <int>    <int>
1 80+              0 80+              335      335
2 0-15             0 0-15               0        0
3 16-69            0 16-69          11768    11768
4 70-74            0 70-74            174      174
5 75-79            0 75-79             85       85
6 unknown          0 Unknown            0        0
7 all_ages         0 All ages       12362    12362
8 not_reported     0 Not reported       0        0

Use map and unnest to apply this to each row of the data:

vaccine_age_groups <- vaccine_age_groups %>%
  mutate(
    data = map(
      data,
      ~jsonlite::fromJSON(.x) %>% bind_rows(.id = "group_code")
    )
  ) %>%
  unnest(data)
glimpse(vaccine_age_groups)
Rows: 668
Columns: 6
$ date       <date> 2020-12-19, 2020-12-19, 2020-12-19, 2020-12-19, ~
$ group_code <chr> "80+", "0-15", "16-69", "70-74", "75-79", "unknow~
$ full       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ group      <chr> "80+", "0-15", "16-69", "70-74", "75-79", "Unknow~
$ partial    <int> 335, 0, 11768, 174, 85, 0, 12362, 0, 2229, 5, 401~
$ atleast1   <int> 335, 0, 11768, 174, 85, 0, 12362, 0, 2229, 5, 401~

The unique groups:

vaccine_age_groups %>% count(group_code, group) %>% rmarkdown::paged_table()

Visualize how the age ranges evolve over time:

# Make it a function that will allow splits later
plot_age_ranges <- function(vaccine_age_groups, split = "overall", ncol = 3) {
  p <- vaccine_age_groups %>%
    filter(str_detect(group, "\\d")) %>%
    separate(group, into = c("age_min", "age_max"),
             sep = "-", fill = "right", remove = FALSE) %>%
    mutate(
      age_min = readr::parse_number(age_min),
      # Set the upper range of the age to 100 (arbitrarility)
      age_max = replace_na(age_max, replace = 100) %>% as.numeric(),
      age_mid = (age_max + age_min) / 2,
      group = fct_reorder(group, age_mid)
    ) %>%
    ggplot(aes(x = date, color = group)) +
    geom_errorbar(aes(ymin = age_min, ymax = age_max)) +
    geom_text(
      data = . %>%
        slice_min(date) %>%
        mutate(age_mid = (age_max + age_min) / 2),
      aes(label = group, y = age_mid),
      hjust = 1, nudge_x = -3, show.legend = FALSE
    ) +
    geom_text(
      data = . %>%
        slice_max(date) %>%
        mutate(age_mid = (age_max + age_min) / 2),
      aes(label = group, y = age_mid),
      hjust = 0, nudge_x = 3, show.legend = FALSE
    ) +
    expand_limits(x = c(min(vaccine_age_groups$date) - 10,
                        max(vaccine_age_groups$date) + 10)) +
    scale_color_viridis_d(end = 0.8) +
    theme(legend.position = "none") +
    labs(x = "Date", y = "Age",
         title = "Age ranges for weekly vaccination reports, by date")
  
  if (split == "province") p + facet_wrap(~province, ncol = ncol)
  else if (split == "region") p + facet_wrap(~hr_uid, ncol = ncol)
  else {p}
}

plot_age_ranges(vaccine_age_groups)

Unsurprisingly, the age ranges become more granular overtime, with the exception of 70-79 which was originally split into 70-74 and 75-79.

Province

As with the other data, adding /split to the query returns vaccination data by province:

content_parsed <- paste0(base_url, "vaccines/age-groups/split") %>%
  httr::GET() %>%
  content(as = "parsed")
vaccine_age_groups_province <- bind_rows(content_parsed$data) %>%
  mutate(date = as.Date(date))
glimpse(vaccine_age_groups_province)
Rows: 1,727
Columns: 3
$ date     <date> 2020-12-14, 2020-12-15, 2020-12-16, 2020-12-16, 20~
$ data     <chr> "{\"0-4\": {\"full\": 0, \"group\": \"0-4\", \"part~
$ province <chr> "QC", "QC", "QC", "ON", "QC", "ON", "QC", "ON", "BC~
vaccine_age_groups_province <- vaccine_age_groups_province %>%
  mutate(
    data = map(
      data,
      ~jsonlite::fromJSON(.x) %>% bind_rows(.id = "group_code")
    )
  ) %>%
  unnest(data)
glimpse(vaccine_age_groups_province)
Rows: 17,495
Columns: 7
$ date       <date> 2020-12-14, 2020-12-14, 2020-12-14, 2020-12-14, ~
$ group_code <chr> "0-4", "80+", "05-11", "12-17", "18-29", "30-39",~
$ full       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ group      <chr> "0-4", "80+", "05-11", "12-17", "18-29", "30-39",~
$ partial    <int> 1, 169, 0, 0, 9, 11, 23, 22, 26, 35, 1, 328, 0, 2~
$ atleast1   <int> 1, 169, 0, 0, 9, 11, 23, 22, 26, 35, 1, 328, 0, 2~
$ province   <chr> "QC", "QC", "QC", "QC", "QC", "QC", "QC", "QC", "~
vaccine_age_groups_province %>%
  filter(province == "QC") %>%
  plot_age_ranges(split = "province", ncol = 1)

A single province can also be obtained by altering the query with vaccines/age-groups/province/{code}:

content_parsed <- paste0(base_url, "vaccines/age-groups/province/ns") %>%
  httr::GET() %>%
  content(as = "parsed")
vaccine_age_groups_ns <- bind_rows(content_parsed$data) %>%
  mutate(
    date = as.Date(date),
    data = map(data, ~jsonlite::fromJSON(.x) %>% bind_rows(.id = "group_code"))
  ) %>%
  unnest(data)
plot_age_ranges(vaccine_age_groups_ns)

Parameters

This query also has the after and before parameters available:

content_parsed <- paste0(base_url,
                         "vaccines/age-groups/province/ns?after=2021-11-01") %>%
  httr::GET() %>%
  content(as = "parsed")
glimpse(bind_rows(content_parsed$data))
Rows: 13
Columns: 2
$ date <chr> "2021-11-06", "2021-11-13", "2021-11-20", "2021-11-27",~
$ data <chr> "{\"0-4\": {\"full\": 0, \"group\": \"0-4\", \"partial\~

A specific age group can also be queried with the group parameter. The value must be passed in URL encoding. For example, the 80+ range:

content_parsed <- paste0(base_url,
                         "vaccines/age-groups?after=2021-11-01&group=80%2B") %>%
  httr::GET() %>%
  content(as = "parsed")
bind_rows(content_parsed$data) %>%
  mutate(
    date = as.Date(date),
    data = map(data, ~jsonlite::fromJSON(.x) %>% bind_rows(.id = "group_code"))
  ) %>%
  unnest(data) %>%
  glimpse()
Rows: 13
Columns: 5
$ date     <date> 2021-11-06, 2021-11-13, 2021-11-20, 2021-11-27, 20~
$ full     <int> 1581895, 1585409, 1588815, 1592112, 1630884, 163343~
$ group    <chr> "80+", "80+", "80+", "80+", "80+", "80+", "80+", "8~
$ partial  <int> 39515, 38628, 37810, 37087, 28833, 28621, 28542, 30~
$ atleast1 <int> 1621410, 1624037, 1626625, 1629199, 1659717, 166205~

The utils package has a URLencode function for translating the age groups:

vaccine_age_groups %>%
  distinct(group_code) %>%
  mutate(group_encoded = utils::URLencode(group_code, reserved = TRUE))
# A tibble: 19 x 2
   group_code   group_encoded
   <chr>        <chr>        
 1 80+          80%2B        
 2 0-15         0-15         
 3 16-69        16-69        
 4 70-74        70-74        
 5 75-79        75-79        
 6 unknown      unknown      
 7 all_ages     all_ages     
 8 not_reported not_reported 
 9 0-17         0-17         
10 18-69        18-69        
11 18-29        18-29        
12 30-39        30-39        
13 40-49        40-49        
14 50-59        50-59        
15 60-69        60-69        
16 70-79        70-79        
17 0-4          0-4          
18 05-11        05-11        
19 12-17        12-17        

Provinces

The API also provides a list of provinces and some population/geographical data:

content_parsed <- paste0(base_url, "provinces") %>%
  httr::GET() %>%
  content(as = "parsed")
provinces <- bind_rows(content_parsed)
glimpse(provinces)
Rows: 16
Columns: 10
$ id          <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1~
$ code        <chr> "ON", "QC", "NS", "NB", "MB", "BC", "PE", "SK", ~
$ name        <chr> "Ontario", "Quebec", "Nova Scotia", "New Brunswi~
$ population  <int> 14826276, 8604495, 992055, 789225, 1383765, 5214~
$ area        <int> 917741, 1356128, 53338, 71450, 553556, 925186, 5~
$ gdp         <int> 857384, 439375, 44354, 36966, 72688, 295401, 699~
$ geographic  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0
$ data_status <chr> "Reported", "In progress", "Waiting for report",~
$ updated_at  <chr> "2022-02-08T00:16:55.000000Z", "2022-02-07T16:09~
$ density     <dbl> 16.15518540, 6.34489886, 18.59940380, 11.0458362~

The extra elements reported here are not related to any particular province/territory:

provinces %>% filter(is.na(population)) %>% glimpse()
Rows: 3
Columns: 10
$ id          <int> 14, 15, 16
$ code        <chr> "_RC", "FA", "NFR"
$ name        <chr> "Repatriated Canadians", "Federal Allocation", "~
$ population  <int> NA, NA, NA
$ area        <int> NA, NA, NA
$ gdp         <int> NA, NA, NA
$ geographic  <int> 0, 0, 0
$ data_status <chr> "", "", ""
$ updated_at  <chr> NA, "2022-01-08T16:57:15.000000Z", "2022-01-08T1~
$ density     <dbl> NA, NA, NA

The geo_only parameter can be set to true to exclude these:

paste0(base_url, "provinces?geo_only=true") %>%
  httr::GET() %>%
  content(as = "parsed") %>%
  bind_rows() %>%
  glimpse()
Rows: 13
Columns: 10
$ id          <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
$ code        <chr> "ON", "QC", "NS", "NB", "MB", "BC", "PE", "SK", ~
$ name        <chr> "Ontario", "Quebec", "Nova Scotia", "New Brunswi~
$ population  <int> 14826276, 8604495, 992055, 789225, 1383765, 5214~
$ area        <int> 917741, 1356128, 53338, 71450, 553556, 925186, 5~
$ gdp         <int> 857384, 439375, 44354, 36966, 72688, 295401, 699~
$ geographic  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
$ data_status <chr> "Reported", "In progress", "Waiting for report",~
$ updated_at  <chr> "2022-02-08T00:16:55.000000Z", "2022-02-07T16:09~
$ density     <dbl> 16.15518540, 6.34489886, 18.59940380, 11.0458362~

A helpful variable is data_status, which indicates if the daily numbers have been reported:

provinces %>% select(name, data_status, updated_at)
# A tibble: 16 x 3
   name                      data_status                   updated_at 
   <chr>                     <chr>                         <chr>      
 1 Ontario                   "Reported"                    2022-02-08~
 2 Quebec                    "In progress"                 2022-02-07~
 3 Nova Scotia               "Waiting for report"          2022-02-07~
 4 New Brunswick             "Waiting for report"          2022-02-07~
 5 Manitoba                  "Reported"                    2022-02-07~
 6 British Columbia          "Waiting for report"          2022-02-07~
 7 Prince Edward Island      "Waiting for report"          2022-02-07~
 8 Saskatchewan              "DAILY REPORT DISCONT. BY SK" 2022-02-07~
 9 Alberta                   "Reported"                    2022-02-07~
10 Newfoundland and Labrador "Waiting for report"          2022-02-07~
11 Northwest Territories     "Waiting for report"          2022-02-07~
12 Yukon                     "Waiting for report"          2022-02-07~
13 Nunavut                   "Waiting for report"          2022-02-07~
14 Repatriated Canadians     ""                            <NA>       
15 Federal Allocation        ""                            2022-01-08~
16 National Federal Reserve  ""                            2022-01-08~

data_status may take on the following values:

data_status Meaning
Waiting for report This status indicated that an update is expected to happen in the current day, but has not yet occurred.
In progress This status indicates that an update is in-progress and will be completed soon. Note that when this status is indicated, some or all data may not be updated yet.
Reported When this status is indicated, the province has been updated with final data for the day, and the update is complete.
No report expected today When this status is indicated, the province is not expected to provide an update on the current day, and one should not be expected.
Custom Custom statuses are used to communicate certain issues with a province’s update including delays or partial updates.

The density variable is population density, which is computed by dividing population by area:

provinces %>%
  transmute(name, population, area, density, density_manual = population / area)
# A tibble: 16 x 5
   name                      population    area density density_manual
   <chr>                          <int>   <int>   <dbl>          <dbl>
 1 Ontario                     14826276  917741 16.2           16.2   
 2 Quebec                       8604495 1356128  6.34           6.34  
 3 Nova Scotia                   992055   53338 18.6           18.6   
 4 New Brunswick                 789225   71450 11.0           11.0   
 5 Manitoba                     1383765  553556  2.50           2.50  
 6 British Columbia             5214805  925186  5.64           5.64  
 7 Prince Edward Island          164318    5660 29.0           29.0   
 8 Saskatchewan                 1179844  591670  1.99           1.99  
 9 Alberta                      4442879  642317  6.92           6.92  
10 Newfoundland and Labrador     520553  373872  1.39           1.39  
11 Northwest Territories          45504 1183085  0.0385         0.0385
12 Yukon                          42986  474391  0.0906         0.0906
13 Nunavut                        39403 1936113  0.0204         0.0204
14 Repatriated Canadians             NA      NA NA             NA     
15 Federal Allocation                NA      NA NA             NA     
16 National Federal Reserve          NA      NA NA             NA     

Next steps

I’m impressed by the organization and accessibility of this API, and decided to write a simple R package to wrap it. In my next post, I’ll detail my steps and thought process.

Reproducibility

Session info
 setting  value                       
 version  R version 4.1.2 (2021-11-01)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RTerm                       
 language (EN)                        
 collate  English_Canada.1252         
 ctype    English_Canada.1252         
 tz       America/Curacao             
 date     2022-02-07                  
Git repository
Local:    main C:/Users/tdunn/Documents/tdunn
Remote:   main @ origin (https://github.com/taylordunn/tdunn)
Head:     [6658077] 2022-02-08: Rebuild site

Source code

Citation

For attribution, please cite this work as

Dunn (2021, Dec. 28). tdunn: Canada COVID-19 data in R: exploring the API. Retrieved from https://tdunn.ca/posts/2021-12-28-canada-covid-19-data-in-r-exploring-the-api/

BibTeX citation

@misc{dunn2021canada,
  author = {Dunn, Taylor},
  title = {tdunn: Canada COVID-19 data in R: exploring the API},
  url = {https://tdunn.ca/posts/2021-12-28-canada-covid-19-data-in-r-exploring-the-api/},
  year = {2021}
}