tdunn - Advent of Code 2021: Days 6-10

R setup

library(tidyverse)
library(gt)
library(lubridate)

Python setup

library(reticulate)
use_virtualenv("r-reticulate")

import numpy as np
import pandas as pd

Day 6: Lanternfish

day6 <- read_lines("day06-input.txt")
str_trunc(day6, 70)

[1] "5,1,4,1,5,1,1,5,4,4,4,4,5,1,2,2,1,3,4,1,1,5,1,5,2,2,2,2,1,4,2,4,3,3..."

Part 1

A massive school of glowing lanternfish swims past. They must spawn quickly to reach such large numbers - maybe exponentially quickly? You should model their growth rate to be sure. Although you know nothing about this specific species of lanternfish, you make some guesses about their attributes. Surely, each lanternfish creates a new lanternfish once every 7 days. However, this process isn’t necessarily synchronized between every lanternfish - one lanternfish might have 2 days left until it creates another lanternfish, while another might have 4. So, you can model each fish as a single number that represents the number of days until it creates a new lanternfish. Furthermore, you reason, a new lanternfish would surely need slightly longer before it’s capable of producing more lanternfish: two more days for its first cycle.

Each day, a 0 becomes a 6 and adds a new 8 to the end of the list, while each other number decreases by 1 if it was present at the start of the day. Find a way to simulate lanternfish. How many lanternfish would there be after 80 days?

This is simple enough to do with a vector and a loop over 80 days:

fish_timers <- strsplit(day6, ",") %>% unlist() %>% as.numeric()

for (day in seq(1:80)) {
  # Decrease each timer by one
  fish_timers <- map_dbl(fish_timers, ~{.x - 1})
  
  # Look for elapsed timers
  elapsed_timers <- fish_timers < 0
  # Reset them to 6
  fish_timers[elapsed_timers] <- 6
  # Add new fish with timers starting at 8
  fish_timers <- c(fish_timers, rep(8, sum(elapsed_timers)))
}

# Count the number of fish after 80 days
length(fish_timers)

[1] 355386

For Python, I’ll use a list:

fish_timers = [int(x) for x in r.day6.split(',')]

for day in range(80):
  fish_timers = [f - 1 for f in fish_timers]
  
  n_new_fish = fish_timers.count(-1)
  fish_timers = [6 if f == -1 else f for f in fish_timers]
  
  if n_new_fish > 0:
    fish_timers.extend([8] * n_new_fish)

len(fish_timers)

Part 2

Suppose the lanternfish live forever and have unlimited food and space. Would they take over the entire ocean? How many lanternfish would there be after 256 days?

For this part, the list of fish would quickly become too large to store in memory after 256 days. Instead, find the number of fish at each timer value:

fish_timers_df <- tibble(
  timer = strsplit(day6, ",") %>% unlist() %>% as.numeric()
) %>%
  count(timer, name = "n_fish") %>%
  # Need to fill in the missing days with 0 fish
  bind_rows(tibble(timer = c(0, 6, 7, 8), n_fish = c(0, 0, 0, 0))) %>%
  arrange(timer)
fish_timers_df

# A tibble: 9 × 2
  timer n_fish
  <dbl>  <dbl>
1     0      0
2     1     96
3     2     54
4     3     48
5     4     51
6     5     51
7     6      0
8     7      0
9     8      0

Now it is a matter of looping through the days and updating the counts:

for (day in 1:256) {
  # Decrease the timer values by 1
  fish_timers_df <- fish_timers_df %>% mutate(timer = timer - 1)
  # Count new fish
  new_fish <- fish_timers_df$n_fish[fish_timers_df$timer == -1]
  
  # Add new fish with timers at 8 
  fish_timers_df <- fish_timers_df %>%
    bind_rows(tibble(timer = 8, n_fish = new_fish))
 
  # Reset elapsed timers to 6
  fish_timers_df$n_fish[fish_timers_df$timer == 6] <-
    fish_timers_df$n_fish[fish_timers_df$timer == 6] +
    new_fish
  
  # Remove the -1 values now that they've been accounted for
  fish_timers_df <- fish_timers_df %>% filter(timer != -1)
}

fish_timers_df

# A tibble: 9 × 2
  timer       n_fish
  <dbl>        <dbl>
1     0 141979186737
2     1 163480438824
3     2 172999320549
4     3 189419502198
5     4 211569179655
6     5 220260612873
7     6 255636828738
8     7 118721109675
9     8 139349146560

That’s a lot of fish. To get the answer, I need to format the sum so that it is not printed in scientific notation:

format(sum(fish_timers_df$n_fish), scientific = FALSE)

[1] "1613415325809"

Python:

fish_timers_df = pd.DataFrame({'timer': [int(x) for x in r.day6.split(',')]})
fish_timers_df = fish_timers_df.value_counts('timer') \
  .rename_axis('n_fish') \
  .reindex(range(9), fill_value = 0)
  
for day in range(256):
  n_new_fish = fish_timers_df[0]
  
  for i in range(1, len(fish_timers_df)):
    fish_timers_df[i-1] = fish_timers_df[i]
  
  fish_timers_df[6] += n_new_fish
  fish_timers_df[8] = n_new_fish
  
fish_timers_df.sum()

1613415325809

Day 7: The Treachery of Whales

day7 <- read_lines("day07-input.txt")
str_trunc(day7, 70)

[1] "1101,1,29,67,1102,0,1,65,1008,65,35,66,1005,66,28,1,67,65,20,4,0,10..."

Part 1

A giant whale has decided your submarine is its next meal, and it’s much faster than you are. There’s nowhere to run!

Suddenly, a swarm of crabs (each in its own tiny submarine - it’s too deep for them otherwise) zooms in to rescue you! They seem to be preparing to blast a hole in the ocean floor; sensors indicate a massive underground cave system just beyond where they’re aiming!

The crab submarines all need to be aligned before they’ll have enough power to blast a large enough hole for your submarine to get through. However, it doesn’t look like they’ll be aligned before the whale catches you! Maybe you can help?

There’s one major catch - crab submarines can only move horizontally.

You quickly make a list of the horizontal position of each crab (your puzzle input). Crab submarines have limited fuel, so you need to find a way to make all of their horizontal positions match while requiring them to spend as little fuel as possible.

Each change of 1 step in horizontal position of a single crab costs 1 fuel. Determine the horizontal position that the crabs can align to using the least fuel possible. How much fuel must they spend to align to that position?

The amount of fuel spent by a crab $i$ at position $x_{i}$ moving to position $x_{0}$ can be represented as the absolute deviation $| x_{i} - x_{0} |$ . We seek the position $x_{0}$ that minimizes the sum of absolute deviations for all $N$ crabs:

$\sum_{i}^{N} | x_{i} - x_{0} |$

My mathematical proof skills are rusty (it has been about 8 years since I took Mathematical Reasoning at UPEI), but Google tells me that the median minimizes the sum of absolute deviations. The median position of the input is:

crab_positions <- strsplit(day7, ",") %>% unlist() %>% as.numeric()
median(crab_positions)

[1] 331

And so the total fuel taken for each crab to move to this position is:

sum(abs(crab_positions - median(crab_positions)))

[1] 349769

Python:

crab_positions = [int(x) for x in r.day7.split(',')]
sum([np.abs(x - np.median(crab_positions)) for x in crab_positions])

349769.0

Part 2

The crabs don’t seem interested in your proposed solution. Perhaps you misunderstand crab engineering?

As it turns out, crab submarine engines don’t burn fuel at a constant rate. Instead, each change of 1 step in horizontal position costs 1 more unit of fuel than the last: the first step costs 1, the second step costs 2, the third step costs 3, and so on.

Determine the horizontal position that the crabs can align to using the least fuel possible so they can make you an escape route! How much fuel must they spend to align to that position?

My first thought for calculating this fuel was the cumulative sum. For example, moving 7 steps would cost fuel equal to the last value of this cumsum():

cumsum(1:7)

[1]  1  3  6 10 15 21 28

But then I figured there is probably a useful formula for this sum, and another lazy Google search tells me that these are called triangular numbers with the simple formula:

$\sum_{i = 1}^{n} k = \frac{n (n + 1)}{2} .$

And so we want to find $x_{0}$ that minimizes:

$\sum_{i}^{N} \frac{| x_{i} - x_{0} | (| x_{i} - x_{0} | + 1)}{2} .$

I’m not aware of a simple solution (like the median in part 1) to this optimization problem. The brute force way is to loop over values of $x_{0}$ , ranging from $min (x_{i})$ to $max (x_{i})$ :

fuel_spent <- tibble(
  x0 = seq(min(crab_positions), max(crab_positions))
) %>%
  mutate(
    crab_dist = map(x0, ~abs(crab_positions - .x)),
    fuel_spent = map_dbl(crab_dist, ~sum(.x * (.x + 1) / 2))
  ) %>%
  arrange(fuel_spent)
head(fuel_spent)

# A tibble: 6 × 3
     x0 crab_dist     fuel_spent
  <int> <list>             <dbl>
1   479 <dbl [1,000]>   99540554
2   480 <dbl [1,000]>   99540639
3   478 <dbl [1,000]>   99541469
4   481 <dbl [1,000]>   99541724
5   477 <dbl [1,000]>   99543385
6   482 <dbl [1,000]>   99543809

We find that $x_{0}$ = 479 minimizes the fuel with 99540554 spent.

Use a pandas.Series to do the same:

fuel_spent = pd.Series(dtype = np.float64)
for x0 in range(min(crab_positions), max(crab_positions)):
  crab_dist = [np.abs(x - x0) for x in crab_positions]
  fuel_spent.at[x0] = sum([d * (d + 1) / 2 for d in crab_dist])
  
fuel_spent.sort_values().head()

479    99540554.0
480    99540639.0
478    99541469.0
481    99541724.0
477    99543385.0
dtype: float64

Day 8: Seven Segment Search

day8 <- read_lines("day08-input.txt")
head(day8)

[1] "cbgefad agc fdega cgdf ecdgfa efgca gaefbd edagbc cg ecafb | cdgafbe cfdg cg gac"   
[2] "gcbdfe cgefb bgadcfe de cfabeg cbgade dbfe ced acfdg egdcf | de bedf ecdgf ecd"     
[3] "dcafgb eagdfb bagdefc abg ba afbc bdgec cdgab gaedfc gadfc | abg ba agfdc gab"      
[4] "agfbe fc fgbadec bdgcef gcdbe fbcead ebgcf cfe gdfc dcbega | edfabc efgab cdfg ecf" 
[5] "fcdeg cfbedg dcfeba fge fcdga ebdg cfebd edgcabf efbagc ge | cbdgef begcaf ge dbcfe"
[6] "fceadbg ecbg cab cb bdfcea agfce cfaedg gadbf bfagc gceabf | cba cab bc bgce"

Part 1

You barely reach the safety of the cave when the whale smashes into the cave mouth, collapsing it. Sensors indicate another exit to this cave at a much greater depth, so you have no choice but to press on.

As your submarine slowly makes its way through the cave system, you notice that the four-digit seven-segment displays in your submarine are malfunctioning; they must have been damaged during the escape. You’ll be in a lot of trouble without them, so you’d better figure out what’s wrong.

Each digit of a seven-segment display is rendered by turning on or off any of seven segments named a through g:

  0:      1:      2:      3:      4:     5:      6:      7:      8:      9:
 aaaa    ....    aaaa    aaaa    ....   aaaa    aaaa    aaaa    aaaa    aaaa
b    c  .    c  .    c  .    c  b    c b    .  b    .  .    c  b    c  b    c
b    c  .    c  .    c  .    c  b    c b    .  b    .  .    c  b    c  b    c
 ....    ....    dddd    dddd    dddd  dddd    dddd    ....    dddd    dddd
e    f  .    f  e    .  .    f  .    f .    f  e    f  .    f  e    f  .    f  
e    f  .    f  e    .  .    f  .    f .    f  e    f  .    f  e    f  .    f
 gggg    ....    gggg    gggg    ....   gggg    gggg    ....    gggg    gggg

So, to render a 1, only segments c and f would be turned on; the rest would be off. To render a 7, only segments a, c, and f would be turned on.

The problem is that the signals which control the segments have been mixed up on each display. The submarine is still trying to display numbers by producing output on signal wires a through g, but those wires are connected to segments randomly. Worse, the wire/segment connections are mixed up separately for each four-digit display! (All of the digits within a display use the same connections, though.)

For each display, you watch the changing signals for a while, make a note of all ten unique signal patterns you see, and then write down a single four digit output value (your puzzle input). Using the signal patterns, you should be able to work out which pattern corresponds to which digit.

Each entry consists of ten unique signal patterns, a | delimiter, and finally the four digit output value. Within an entry, the same wire/segment connections are used (but you don’t know what the connections actually are). The unique signal patterns correspond to the ten different ways the submarine tries to render a digit using the current wire/segment connections.

In the output values, how many times do digits 1, 4, 7, or 8 appear?

First, for my own reference, compile the number of segments used by each digit:

digit_segments_count <- tribble(
  ~digit, ~n_segments,
  0, 6,
  1, 2,
  2, 5,
  3, 5,
  4, 4,
  5, 5,
  6, 6,
  7, 3,
  8, 7,
  9, 6
)
digit_segments_count %>%
  group_by(n_segments) %>%
  summarise(digits = str_c(digit, collapse = ","), .groups = "drop")

# A tibble: 6 × 2
  n_segments digits
       <dbl> <chr> 
1          2 1     
2          3 7     
3          4 4     
4          5 2,3,5 
5          6 0,6,9 
6          7 8

I see why part 1 is asking about just the digits 1, 4, 7, and 8 – they consist of a unique number of segments (2, 4, 3, and 7, respectively).

Split the input into the the signal patterns (ten unique values) and four digit output values:

day8_split <- strsplit(day8, " \\| ") %>%
  map(strsplit, " ")

day8_df <-
  tibble(
    signal_patterns = map(day8_split, 1),
    output = map(day8_split, 2)
  )
day8_df

# A tibble: 200 × 2
   signal_patterns output   
   <list>          <list>   
 1 <chr [10]>      <chr [4]>
 2 <chr [10]>      <chr [4]>
 3 <chr [10]>      <chr [4]>
 4 <chr [10]>      <chr [4]>
 5 <chr [10]>      <chr [4]>
 6 <chr [10]>      <chr [4]>
 7 <chr [10]>      <chr [4]>
 8 <chr [10]>      <chr [4]>
 9 <chr [10]>      <chr [4]>
10 <chr [10]>      <chr [4]>
# … with 190 more rows
# ℹ Use `print(n = ...)` to see more rows

Consider a single row of this data:

signals1 <- day8_df %>% slice(2) %>%
  unnest(signal_patterns) %>% pull(signal_patterns)
output1 <- day8_df %>% slice(2) %>% unnest(output) %>% pull(output)
signals1; output1

 [1] "gcbdfe"  "cgefb"   "bgadcfe" "de"      "cfabeg"  "cbgade"  "dbfe"   
 [8] "ced"     "acfdg"   "egdcf"

[1] "de"    "bedf"  "ecdgf" "ecd"

We actually don’t need the 10 signal_patterns for part 1 – we just need to find the string length of the output values:

nchar(output1)

[1] 2 4 5 3

From these lengths, I know that the first digit is 1, the second digit is 4, the third digit is one of {2, 3, 5}, and the fourth digit is 7. We want to count the occurrences of 1, 4, 7 and 8, so this row contributes 3 to that count. Apply this logic to the full input:

part1_count <- day8_df %>%
  mutate(
    n_1_4_7_8 = map_int(
      output,
      ~sum(nchar(.x) %in% c(2, 3, 4, 7))
    )
  )
part1_count

# A tibble: 200 × 3
   signal_patterns output    n_1_4_7_8
   <list>          <list>        <int>
 1 <chr [10]>      <chr [4]>         4
 2 <chr [10]>      <chr [4]>         3
 3 <chr [10]>      <chr [4]>         3
 4 <chr [10]>      <chr [4]>         2
 5 <chr [10]>      <chr [4]>         1
 6 <chr [10]>      <chr [4]>         4
 7 <chr [10]>      <chr [4]>         3
 8 <chr [10]>      <chr [4]>         2
 9 <chr [10]>      <chr [4]>         2
10 <chr [10]>      <chr [4]>         4
# … with 190 more rows
# ℹ Use `print(n = ...)` to see more rows

sum(part1_count$n_1_4_7_8)

[1] 514

Part 2

For each entry, determine all of the wire/segment connections and decode the four-digit output values. What do you get if you add up all of the output values?

Consider again the segment patterns for each digit:

  0:      1:      2:      3:      4:     5:      6:      7:      8:      9:
 aaaa    ....    aaaa    aaaa    ....   aaaa    aaaa    aaaa    aaaa    aaaa
b    c  .    c  .    c  .    c  b    c b    .  b    .  .    c  b    c  b    c
b    c  .    c  .    c  .    c  b    c b    .  b    .  .    c  b    c  b    c
 ....    ....    dddd    dddd    dddd  dddd    dddd    ....    dddd    dddd
e    f  .    f  e    .  .    f  .    f .    f  e    f  .    f  e    f  .    f  
e    f  .    f  e    .  .    f  .    f .    f  e    f  .    f  e    f  .    f
 gggg    ....    gggg    gggg    ....   gggg    gggg    ....    gggg    gggg

digit_segments <- c("abcefg", "cf", "acdeg", "acdfg", "bcdf",
                    "abdfg", "abdefg", "acf", "abcdefg", "abcdfg")

We know digits 1, 4, 7 and 8 are uniquely identified by their number of segments. We also know that the digit 1 has two overlapping segments (c and f) with the digit 0. This can be derived with intersect on the letters:

compute_overlap <- function(segments1, segments2) {
  length(
    intersect(strsplit(segments1, "") %>% unlist(),
              strsplit(segments2, "") %>% unlist())
  )
}
compute_overlap(digit_segments[1], digit_segments[2])

[1] 2

Compute the amount of overlap between each pair of digits:

digit_overlap <- 
  crossing(
    tibble(digit1 = 0:9, segments1 = digit_segments),
    tibble(digit2 = 0:9, segments2 = digit_segments)
  ) %>%
  filter(digit1 != digit2) %>%
  mutate(
    n_segment_overlap = map2_int(
      segments1, segments2,
      compute_overlap
    )
  )
digit_overlap %>%
  select(digit1, digit2, n_segment_overlap) %>%
  pivot_wider(names_from = digit2, values_from = n_segment_overlap) %>%
  relocate(`0`, .before = `1`) %>% # Need to correct order of columns
  gt(rowname_col = "digit1") %>%
  sub_missing(everything(), missing_text = "")

	0	1	2	3	4	5	6	7	8	9
0		2	4	4	3	4	5	3	6	5
1	2		1	2	2	1	1	2	2	2
2	4	1		4	2	3	4	2	5	4
3	4	2	4		3	4	4	3	5	5
4	3	2	2	3		3	3	2	4	4
5	4	1	3	4	3		5	2	5	5
6	5	1	4	4	3	5		2	6	5
7	3	2	2	3	2	2	2		3	3
8	6	2	5	5	4	5	6	3		6
9	5	2	4	5	4	5	5	3	6

Using these relationships, it is simply a matter of elimination to determine which pattern of segments corresponds to which digit:

The digit 1 has 2 segments.
The digit 4 has 4 segments.
The digit 7 has 3 segments.
The digit 8 has 7 segments.
The digits 2, 3 and 5 have 5 segments.
- The digit 3 shares 2 segments with 1.
- The digit 2 shares 2 segments with 4.
- The digit 5 remains.
The digits 0, 6 and 9 have 6 segments.
- The digit 6 shares 2 segments with 7.
- The digit 9 shares 4 segments with 4.
- The digit 0 remains.

Write a function that follows this logic, and returns a list mapping segment patterns to digits:

get_digit_map <- function(signals) {
  digit_map <- list()
  
  d1 <- signals[nchar(signals) == 2]
  digit_map[d1] <- 1
  
  d4 <- signals[nchar(signals) == 4]
  digit_map[d4] <- 4
  
  d7 <- signals[nchar(signals) == 3]
  digit_map[d7] <- 7
  
  d8 <- signals[nchar(signals) == 7]
  digit_map[d8] <- 8
  
  # Remove the four digits identified so far
  signals <- setdiff(signals, names(digit_map))
  
  d3 <- signals[nchar(signals) == 5 & map_int(signals, compute_overlap, d1) == 2]
  digit_map[d3] <- 3
  signals <- setdiff(signals, d3)
  
  d2 <- signals[nchar(signals) == 5 & map_int(signals, compute_overlap, d4) == 2]
  digit_map[d2] <- 2
  signals <- setdiff(signals, d2)
  
  d5 <- signals[nchar(signals) == 5]
  digit_map[d5] <- 5
  signals <- setdiff(signals, d5)
  
  d6 <- signals[nchar(signals) == 6 & map_int(signals, compute_overlap, d7) == 2]
  digit_map[d6] <- 6
  signals <- setdiff(signals, d6)
  
  d9 <- signals[nchar(signals) == 6 & map_int(signals, compute_overlap, d4) == 4]
  digit_map[d9] <- 9
  signals <- setdiff(signals, d9)
  
  # The last digit is 0
  digit_map[signals] <- 0
  
  return(digit_map)
}

One last step before applying this function is the sort the segments alphabetically, because the output does not necessarily match the patterns in the signals, e.g. “bedf” may appear as “dbfe” in the output.

sort_segment <- function(segment) {
  strsplit(segment, "")[[1]] %>%
    sort() %>%
    paste0(collapse = "")
}

day8_df <- day8_df %>%
  mutate(
    signal_patterns = map(
      signal_patterns, ~map_chr(.x, sort_segment)
    ),
    output = map(
      output, ~map_chr(.x, sort_segment)
    )
  )
day8_df$signal_patterns[[5]]

 [1] "cdefg"   "bcdefg"  "abcdef"  "efg"     "acdfg"   "bdeg"    "bcdef"  
 [8] "abcdefg" "abcefg"  "eg"

Now find the digit mapping for each set of signals:

day8_df <- day8_df %>%
  mutate(digit_map = map(signal_patterns, get_digit_map))
day8_df$digit_map[[5]] %>% glimpse()

List of 10
 $ eg     : num 1
 $ bdeg   : num 4
 $ efg    : num 7
 $ abcdefg: num 8
 $ cdefg  : num 3
 $ acdfg  : num 2
 $ bcdef  : num 5
 $ abcdef : num 6
 $ bcdefg : num 9
 $ abcefg : num 0

Now use each digit_map to determine the 4 digit output codes:

day8_df <- day8_df %>%
  mutate(
    output_decoded = map2_int(
      output, digit_map,
      ~as.integer(paste0(.y[.x], collapse = ""))
    )
  )
day8_df

# A tibble: 200 × 4
   signal_patterns output    digit_map         output_decoded
   <list>          <list>    <list>                     <int>
 1 <chr [10]>      <chr [4]> <named list [10]>           8417
 2 <chr [10]>      <chr [4]> <named list [10]>           1437
 3 <chr [10]>      <chr [4]> <named list [10]>           7157
 4 <chr [10]>      <chr [4]> <named list [10]>            247
 5 <chr [10]>      <chr [4]> <named list [10]>           9015
 6 <chr [10]>      <chr [4]> <named list [10]>           7714
 7 <chr [10]>      <chr [4]> <named list [10]>           7764
 8 <chr [10]>      <chr [4]> <named list [10]>           1425
 9 <chr [10]>      <chr [4]> <named list [10]>           9677
10 <chr [10]>      <chr [4]> <named list [10]>           4784
# … with 190 more rows
# ℹ Use `print(n = ...)` to see more rows

And finally, the sum:

sum(day8_df$output_decoded)

[1] 1012272

Day 9: Smoke Basin

day9 <- read_lines("day09-input.txt")
head(day9) %>% str_trunc(70)

[1] "4210129998765678999876598999987654567899891099876534567899989543458..."
[2] "4321298987674567998965467898996543478998789987654323478998775412367..."
[3] "5734987794543678987654356987987656567897678998943214589987654323456..."
[4] "7649876543212347898765567896598767678976567899876323678998775454868..."
[5] "8756989652103456999898678966459878989785456989965456789999976565699..."
[6] "9897898773212367895939989656346989799654347778976587997896989876989..."

Part 1

These caves seem to be lava tubes. Parts are even still volcanically active; small hydrothermal vents release smoke into the caves that slowly settles like rain.

If you can model how the smoke flows through the caves, you might be able to avoid it and be that much safer. The submarine generates a heightmap of the floor of the nearby caves for you (your puzzle input). Smoke flows to the lowest point of the area it’s in. Each number corresponds to the height of a particular location, where 9 is the highest and 0 is the lowest a location can be.

Your first goal is to find the low points - the locations that are lower than any of its adjacent locations. Most locations have four adjacent locations (up, down, left, and right); locations on the edge or corner of the map have three or two adjacent locations, respectively. (Diagonal locations do not count as adjacent.)

The risk level of a low point is 1 plus its height. Find all of the low points on your heightmap. What is the sum of the risk levels of all low points on your heightmap?

I’ll use a base R matrix to represent the height map of the cave floor:

# Split the columns and convert to integers
height_map <- day9 %>% map(~unlist(strsplit(.x, "")) %>% as.integer)
# Convert it to a matrix
height_map <- matrix(unlist(height_map), nrow = length(height_map), byrow = TRUE)

height_map[1:10, 1:10]

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    4    2    1    0    1    2    9    9    9     8
 [2,]    4    3    2    1    2    9    8    9    8     7
 [3,]    5    7    3    4    9    8    7    7    9     4
 [4,]    7    6    4    9    8    7    6    5    4     3
 [5,]    8    7    5    6    9    8    9    6    5     2
 [6,]    9    8    9    7    8    9    8    7    7     3
 [7,]    5    9    9    8    9    9    9    7    6     5
 [8,]    4    5    6    9    8    9    9    8    7     5
 [9,]    3    4    5    6    7    8    9    9    7     6
[10,]    2    3    4    8    9    9    9    9    8     7

I’ll also define a helper function that finds neighboring points (while still within the bounds of the matrix):

get_neighbor_coords <- function(row, col, max_row, max_col) {
  neighbor_coords <- cbind(row + c(-1, 1, 0, 0),
                           col + c(0, 0, -1, 1))
  
  in_bounds <- (neighbor_coords[,1] <= max_row) & (neighbor_coords[,1] > 0) &
    (neighbor_coords[,2] <= max_col) & (neighbor_coords[,2] > 0)
  neighbor_coords[in_bounds, , drop = FALSE]
}

Loop over all points and determine if it is a low point:

low_point_heights <- c()
for (row in 1:nrow(height_map)) {
  for (col in 1:ncol(height_map)) {
    height <- height_map[row, col]
    
    # Get adjacent points
    neighbor_coords <- get_neighbor_coords(row, col,
                                           nrow(height_map), ncol(height_map))
    neighbor_heights <- height_map[neighbor_coords]
    
    # Is it the lowest point?
    if (all(height < neighbor_heights)) {
      low_point_heights <- c(low_point_heights, height)
    } 
  }
}
low_point_heights

  [1] 0 0 0 0 1 3 1 3 1 0 2 2 2 0 1 1 0 0 2 1 0 2 0 0 0 1 3 0 6 2 2 1 1 0 2 1 1
 [38] 0 3 1 0 3 0 1 1 0 0 0 4 1 5 0 0 3 2 0 0 2 0 3 1 0 2 0 1 2 0 0 0 2 2 1 1 5
 [75] 2 1 0 2 0 1 1 4 1 2 2 1 0 2 1 1 1 0 0 0 2 0 0 2 1 0 0 1 0 4 0 0 0 0 1 5 1
[112] 4 2 2 0 4 2 2 0 1 0 1 0 0 0 1 1 0 3 3 1 1 3 3 0 1 0 2 1 1 0 0 3 0 0 0 1 0
[149] 3 2 1 0 0 1 1 0 0 0 0 0 2 0 0 1 0 0 3 1 0 0 1 3 0 1 0 0 0 0 1 0 2 2 1 1 4
[186] 0 0 3 1 1 6 5 1 0 1 0 0 0

Then add 1 to get the risk level and sum it up:

sum(low_point_heights + 1)

[1] 417

In Python, represent the height map with a 2d numpy.array:

height_map = np.array([[height for height in row] for row in r.day9],
                      dtype = int)
height_map[0:9, 0:9]

array([[4, 2, 1, 0, 1, 2, 9, 9, 9],
       [4, 3, 2, 1, 2, 9, 8, 9, 8],
       [5, 7, 3, 4, 9, 8, 7, 7, 9],
       [7, 6, 4, 9, 8, 7, 6, 5, 4],
       [8, 7, 5, 6, 9, 8, 9, 6, 5],
       [9, 8, 9, 7, 8, 9, 8, 7, 7],
       [5, 9, 9, 8, 9, 9, 9, 7, 6],
       [4, 5, 6, 9, 8, 9, 9, 8, 7],
       [3, 4, 5, 6, 7, 8, 9, 9, 7]])

Then apply the same logic to get the sum of risk levels:

def get_neighbor_coords(row, col, max_row, max_col):
  neighbor_coords = [[row + d for d in [-1, 1, 0, 0]],
                     [col + d for d in [0, 0, -1, 1]]]
  neighbor_coords = np.array(neighbor_coords).T
  
  in_bounds = (neighbor_coords[:, 0] >= 0) & \
    (neighbor_coords[:, 0] < max_row) & \
    (neighbor_coords[:, 1] >= 0) & \
    (neighbor_coords[:, 1] < max_col)
    
  return(neighbor_coords[in_bounds, :])

low_point_heights = []

for row in range(height_map.shape[0]):
  for col in range(height_map.shape[1]):
    height = height_map[row, col]
    
    neighbor_coords = get_neighbor_coords(row, col,
                                          height_map.shape[0],
                                          height_map.shape[1])
    neighbor_heights = [height_map[nc[0], nc[1]] for nc in neighbor_coords]
    
    if all([height < nh for nh in neighbor_heights]):
      low_point_heights.append(height)
    
sum([h + 1 for h in low_point_heights])

Part 2

Next, you need to find the largest basins so you know what areas are most important to avoid.

A basin is all locations that eventually flow downward to a single low point. Therefore, every low point has a basin, although some basins are very small. Locations of height 9 do not count as being in any basin, and all other locations will always be part of exactly one basin.

The size of a basin is the number of locations within the basin, including the low point. The example above has four basins.

What do you get if you multiply together the sizes of the three largest basins?

My strategy for part 2 is:

Loop over each point.
- If that point has a height <9, start a new basin.
- Rewrite that point with 9 (to prevent overcounting).
- For each of that points neighbors:
  - If that point has a height <9, add it to the basin.
  - Re-write that point with height 9.
  - Add the points neighbors to a new list.
- Repeat the above loop with new neighbors until no neighbors remain.
- Compute the number of points in the basin.

Basically, I’m finding points that are not 9, spreading out from those points, filling the basins with 9s, and counting those points along the way to get the basin size.

basin_size <- c()

for (row in 1:nrow(height_map)) {
  for (col in 1:ncol(height_map)) {
    height <- height_map[row, col]
    if (height < 9) {
      basin_points <- c(height)
      height_map[row, col] <- 9
      
      neighbor_coords <- get_neighbor_coords(row, col,
                                             nrow(height_map), ncol(height_map)) 
      
      while (nrow(neighbor_coords) > 0) {
        new_neighbor_coords <- matrix(nrow = 0, ncol = 2)
        for (neighbor in 1:nrow(neighbor_coords)) {
          nc <- neighbor_coords[neighbor, , drop = FALSE]
          height <- height_map[nc]
          if (height < 9) {
            basin_points <- c(basin_points, height)
            height_map[nc] <- 9
            
            new_neighbor_coords <- rbind(
              new_neighbor_coords,
              get_neighbor_coords(nc[1], nc[2],
                                  nrow(height_map), ncol(height_map))
            )
          }
        }
        neighbor_coords <- new_neighbor_coords
      }
      basin_size <- c(basin_size, length(basin_points))
    }
  }
}

basin_size

  [1]  26  96  34  37  64   2  58  52   2  53  32  54  55  60   4   6  68  39
 [19]  14  69  30  22   5  21  33  64  38  96  45  21  15  29  16  89   3   7
 [37]  55  39   7  32   3  85  74  41  60  23  18  84   4  32  15  97   8   8
 [55]  50  15  24  66  20  68  45  84  42  67  68  70  10  16  49  63   4  19
 [73]   2  31  15  51   9  22  52  36   3  28  13   2  79  18  12   9  68   2
 [91]  83   4  39  95  75 115  41   7  79   2   4  27  85  26  25  87  16   8
[109]   5  62  48   8  22  59  39   2  25  61  89  25  69  31  22  40   7  73
[127]   8   4  38  50  23  57  30  18  41  92   4  69   2  48  86  20  52  94
[145]  29  36   7  30  37  36  75  84  91  24  26   4  16 103  93  74  44   5
[163]  31   6  51   5  15  83  93  73  18  94  22  61  72   8   8  53  53  45
[181]  40  55   6  48  33  32  20   9   3  18  14  20  19   7   4   3   4   2

The product of the three largest basins:

sort(basin_size, decreasing = TRUE)[1:3] %>% prod()

[1] 1148965

And in Python:

basin_size = []

for row in range(height_map.shape[0]):
  for col in range(height_map.shape[1]):
    height = height_map[row, col]
    
    if height < 9:
      basin_points = [height]
      height_map[row, col] = 9
      
      neighbor_coords = get_neighbor_coords(row, col,
                                            height_map.shape[0],
                                            height_map.shape[1])
      while len(neighbor_coords) > 0:
        new_neighbor_coords = np.empty((0, 2), dtype = int)
        
        for nc in neighbor_coords:
          height = height_map[nc[0], nc[1]]
          if height < 9:
            basin_points.append(height)
            height_map[nc[0], nc[1]] = 9
            
            new_neighbor_coords = np.append(
              new_neighbor_coords,
              get_neighbor_coords(nc[0], nc[1],
                                  height_map.shape[0], height_map.shape[1]),
              axis = 0
            )
            
        neighbor_coords = new_neighbor_coords 
        
      basin_size.append(len(basin_points))

basin_size.sort(reverse = True)
np.prod(basin_size[0:3])

Day 10: Syntax Scoring

day10 <- read_lines("day10-input.txt")
head(day10) %>% str_trunc(70)

[1] "(<{([{([[[<{(<(){}>({}()))}[(<[]<>>{<>})[{{}()}<{}[]>]]><[[<{}{}>([..."
[2] "<[<<{<[(([[[[<[][]>][(<>()){<>()}]]][<{<<>{}>(()<>)}><([<>{}]{()()}..."
[3] "<(({(<{<({[<[[[]()][<>[]]]{{{}()}({}[])}>]{[{<<>[]>[()[]]}][[[{}[]]..."
[4] "{{[{[[[{{{{{[([]())(<>())](<[]<>>{[]{}})}([[(){}]<<>()>]<<()()><{}{..."
[5] "({([({([[({([{[]<>}]<<<>{}>[<>[]]>)})({{{[()[]]<()>}(([]<>)<[]()>>}..."
[6] "<(((<<<{([[[<{()<>}{{}()}>[({}())([][])]]{{(()())(<>{})}(<{}><[]{}>..."

Part 1

You ask the submarine to determine the best route out of the deep-sea cave, but it only replies: Syntax error in navigation subsystem on line: all of them All of them?! The damage is worse than you thought. You bring up a copy of the navigation subsystem (your puzzle input).

The navigation subsystem syntax is made of several lines containing chunks. There are one or more chunks on each line, and chunks contain zero or more other chunks. Adjacent chunks are not separated by any delimiter; if one chunk stops, the next chunk (if any) can immediately start. Every chunk must open and close with one of four legal pairs of matching characters:

If a chunk opens with (, it must close with ). If a chunk opens with [, it must close with ]. If a chunk opens with {, it must close with }. If a chunk opens with <, it must close with >.

So, () is a legal chunk that contains no other chunks, as is []. More complex but valid chunks include ([]), {()()()}, <([{}])>, [<>({}){}[([])<>]], and even (((((((((()))))))))).

Some lines are incomplete, but others are corrupted. Find and discard the corrupted lines first.

A corrupted line is one where a chunk closes with the wrong character - that is, where the characters it opens and closes with do not form one of the four legal pairs listed above.

Examples of corrupted chunks include (], {()()()>, (((()))}, and <([]){()}[{}]). Such a chunk can appear anywhere within a line, and its presence causes the whole line to be considered corrupted.

To calculate the syntax error score for a line, take the first illegal character on the line and look it up in the following table:

): 3 points. ]: 57 points. }: 1197 points. >: 25137 points.

Find the first illegal character in each corrupted line of the navigation subsystem. What is the total syntax error score for those errors?

I’ll make heavy use of stringr and regular expressions to solve this problem. First, define a function that iterates over a line and determines if it is corrupt:

# Regex representations of the character pairs (requires \\ to escape)
pairs <- c("\\(\\)" = "", "\\[\\]" = "", "\\{\\}" = "", "\\<\\>" = "")

remove_pairs <- function(line) {
  repeat {
    new_line <- line %>% str_replace_all(pairs)
    if (new_line == line) return(new_line)
    else line <- new_line
  }
}

line <- "([{<>}])[](<>)"
line_corrupt <- "([{<>]])"

remove_pairs(line)

[1] ""

remove_pairs(line_corrupt)

[1] "([{]])"

Then for corrupt lines, I need to find the location of the wrong closing characters:

# Regex representations of mismatched pairs "[>", "(>", "{>", "[}", etc.
mismatched_pairs <- c("[\\(\\[\\{]\\>",
                      "[\\(\\[\\<]\\}",
                      "[\\(\\{\\<]\\]",
                      "[\\[\\{\\<]\\)")

find_corrupt_loc <- function(line_reduced) {
  map_dfr(
    mismatched_pairs,
    ~remove_pairs(line_reduced) %>% str_locate(.x) %>% as.data.frame()
  ) %>%
    drop_na()
}

corrupt_loc <- find_corrupt_loc(remove_pairs(line_corrupt))
corrupt_loc

  start end
1     3   4

Then I use the end location (the smallest value to get the first occurrence) to determine which character is incorrect:

remove_pairs(line_corrupt) %>%
  str_sub(start = min(corrupt_loc$end), end = min(corrupt_loc$end))

[1] "]"

Apply this to the full input:

day10_df <- tibble(line = day10) %>%
  mutate(
    line_reduced = map_chr(line, remove_pairs),
    corrupt_loc = map(line_reduced, find_corrupt_loc)
  )

corrupt_chars <- day10_df %>%
  mutate(line_num = 1:n()) %>%
  # Filter down to just corrupt lines
  filter(map_int(corrupt_loc, nrow) > 0) %>%
  transmute(
    line_num,
    corrupt_char = map2_chr(
      line_reduced, corrupt_loc,
      ~str_sub(.x, start = min(.y$end), end = min(.y$end))
    )
  )
corrupt_chars

# A tibble: 47 × 2
   line_num corrupt_char
      <int> <chr>       
 1        1 }           
 2        5 >           
 3        9 }           
 4       12 }           
 5       14 }           
 6       16 ]           
 7       17 }           
 8       18 >           
 9       21 ]           
10       22 ]           
# … with 37 more rows
# ℹ Use `print(n = ...)` to see more rows

Finally, add up the points of the characters:

corrupt_chars %>%
  mutate(score = recode(corrupt_char,
                        `)` = 3, `]` = 57, `}` = 1197, `>` = 25137)) %>%
  pull(score) %>% sum()

[1] 469755

Part 2

Now, discard the corrupted lines. The remaining lines are incomplete.

Incomplete lines don’t have any incorrect characters - instead, they’re missing some closing characters at the end of the line. To repair the navigation subsystem, you just need to figure out the sequence of closing characters that complete all open chunks in the line.

You can only use closing characters ((, ], }, or >), and you must add them in the correct order so that only legal pairs are formed and all chunks end up closed.

The score is determined by considering the completion string character-by-character. Start with a total score of 0. Then, for each character, multiply the total score by 5 and then increase the total score by the point value given for the character in the following table:

): 1 point. ]: 2 points. }: 3 points. >: 4 points.

Autocomplete tools are an odd bunch: the winner is found by sorting all of the scores and then taking the middle score. (There will always be an odd number of scores to consider.) In this example, the middle score is 288957 because there are the same number of scores smaller and larger than it.

Find the completion string for each incomplete line, score the completion strings, and sort the scores. What is the middle score?

From part 1, we have these incomplete (and non-corrupt) lines:

day10_incomplete <- day10_df %>%
  filter(map_int(corrupt_loc, nrow) == 0) %>%
  select(-corrupt_loc)
glimpse(day10_incomplete)

Rows: 47
Columns: 2
$ line         <chr> "<[<<{<[(([[[[<[][]>][(<>()){<>()}]]][<{<<>{}>(()<>)}><([…
$ line_reduced <chr> "<[<<{<[({{<{", "<(({(<{<((", "{{[{[[[{{((<<({", "<(((<<<…

As expected, we have an odd number of lines so we can get a middle score. Because I’ve already reduced the lines down to line_reduced without matching pairs, I can simply reverse the order, and count the scores. For example:

reversed_chars <- remove_pairs("[({(<(())[]>[[{[]{<()<>>") %>%
  strsplit("") %>%
  unlist() %>%
  rev()
 
total_score <- 0
for (char in reversed_chars) {
  total_score <- total_score * 5
  # No need to replace the opening with closing characters
  total_score <- total_score + switch(char, `(` = 1, `[` = 2, `{` = 3, `<` = 4)
}
total_score

[1] 288957

day10_incomplete <- day10_incomplete %>%
  mutate(
    total_score = map_dbl(
      line_reduced,
      ~{
        reversed_chars <- str_split(.x, "") %>% unlist() %>% rev()
        
        total_score <- 0
        for (char in reversed_chars) {
          total_score <- total_score * 5
          total_score <- total_score + switch(char, `(` = 1, `[` = 2, `{` = 3, `<` = 4)
        }
        total_score
      }
    )
  )
sort(day10_incomplete$total_score)[(nrow(day10_incomplete) + 1) / 2]

[1] 2762335572

Stats

My personal stats for this period:

tibble::tribble(
  ~Part, ~Day, ~Time, ~Rank,
  1, 10, "11:11:48", 34029,
  2, 10, "11:25:31", 32045,
  1, 9, "14:58:48", 42395,
  2, 9, "16:04:08", 33905,
  1, 8, "16:37:13", 49102,
  2, 8, "16:55:20", 34954,
  1, 7, "11:23:20", 45816,
  2, 7, "11:42:48", 43886,
  1, 6, "10:38:42", 41671,
  2, 6, "10:58:31", 34999
) %>%
  pivot_wider(names_from = Part, values_from = c(Time, Rank),
              names_glue = "Part {Part}_{.value}") %>%
  mutate(
    `Time between parts` = as.numeric(hms(`Part 2_Time`) - hms(`Part 1_Time`),
                                      "minutes") %>% round(1)
  ) %>%
  gt() %>%
  tab_spanner_delim(delim = "_", split = "first") %>%
  sub_missing(columns = "Time between parts", missing_text = "")

Day	Time		Rank		Time between parts
Day	Part 1	Part 2	Part 1	Part 2	Time between parts
10	11:11:48	11:25:31	34029	32045	13.7
9	14:58:48	16:04:08	42395	33905	65.3
8	16:37:13	16:55:20	49102	34954	18.1
7	11:23:20	11:42:48	45816	43886	19.5
6	10:38:42	10:58:31	41671	34999	19.8

Reproducibility

Session info

 setting  value
 version  R version 4.2.1 (2022-06-23 ucrt)
 os       Windows 10 x64 (build 19044)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Canada.utf8
 ctype    English_Canada.utf8
 tz       America/Curacao
 date     2022-10-27
 pandoc   2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)

ABCDEFGHIJ0123456789

	package <chr>	ondiskversion <chr>	loadedversion <chr>
assertthat	assertthat	0.2.1	0.2.1
backports	backports	1.4.1	1.4.1
bit	bit	4.0.4	4.0.4
bit64	bit64	4.0.5	4.0.5
broom	broom	1.0.0	1.0.0
cachem	cachem	1.0.6	1.0.6
callr	callr	3.7.1	3.7.1
cellranger	cellranger	1.1.0	1.1.0
checkmate	checkmate	2.1.0	2.1.0
cli	cli	3.3.0	3.4.1

python:         C:/Users/tdunn/Documents/.virtualenvs/r-reticulate/Scripts/python.exe
libpython:      C:/Users/tdunn/AppData/Local/r-reticulate/r-reticulate/pyenv/pyenv-win/versions/3.9.13/python39.dll
pythonhome:     C:/Users/tdunn/Documents/.virtualenvs/r-reticulate
version:        3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)]
Architecture:   64bit
numpy:          C:/Users/tdunn/Documents/.virtualenvs/r-reticulate/Lib/site-packages/numpy
numpy_version:  1.23.3

NOTE: Python version was forced by use_python function

ABCDEFGHIJ0123456789

package <chr>	version <chr>	requirement <chr>
contourpy	1.0.5	contourpy==1.0.5
cycler	0.11.0	cycler==0.11.0
fonttools	4.37.2	fonttools==4.37.2
kiwisolver	1.4.4	kiwisolver==1.4.4
matplotlib	3.6.0	matplotlib==3.6.0
numpy	1.23.3	numpy==1.23.3
packaging	21.3	packaging==21.3
pandas	1.4.4	pandas==1.4.4
Pillow	9.2.0	Pillow==9.2.0
pyparsing	3.0.9	pyparsing==3.0.9

Git repository

Local:    main C:/Users/tdunn/Documents/tdunn-quarto
Remote:   main @ origin (https://github.com/taylordunn/tdunn-quarto.git)
Head:     [4eb5bf2] 2022-10-26: Added font import to style sheet

Source code, R environment

Reuse

https://creativecommons.org/licenses/by/4.0/

Citation

BibTeX citation:

@online{dunn2021,
  author = {Dunn, Taylor},
  title = {Advent of {Code} 2021: {Days} 6-10},
  date = {2021-12-06},
  url = {https://tdunn.ca/posts/2021-12-06-advent-of-code-2021-days-6-10},
  langid = {en}
}

For attribution, please cite this work as:

Dunn, Taylor. 2021. “Advent of Code 2021: Days 6-10.” December 6, 2021. https://tdunn.ca/posts/2021-12-06-advent-of-code-2021-days-6-10.