Advent of Code 2021: Days 1-5

My solutions to the #AdventOfCode2021 coding challenges, days 1 through 5.

Advent of Code
Python
R
Author
Published

December 1, 2021

R setup
library(tidyverse)
library(lubridate)
library(gt)

The Advent of Code has begun for 2021, and I decided to participate this year to work on my programming and problem solving skills in R and, when I have the time, I’ll try to translate the solutions to Python. Load the reticulate package and activate my virtual Python environment:

library(reticulate)
use_virtualenv("r-reticulate")

I’ll also be “competing” in a private leaderboard started by Tan Ho. I don’t expect to rank highly here because the puzzles are released at 1AM my time (and scores are based on time from release) but it’ll be a good source of motivation throughout the month. There are 25 days of challenges, so my current plan is to split up the posts into 5-day chunks.

Day 1: Sonar Sweep

Part 1

The first order of business is to figure out how quickly the depth increases, just so you know what you’re dealing with - you never know if the keys will get carried into deeper water by an ocean current or a fish or something. To do this, count the number of times a depth measurement increases from the previous measurement. (There is no measurement before the first measurement.)

Import the measurements:

day1 <- read_lines("day01-input.txt") %>%
  as.integer()
head(day1)
[1] 191 185 188 189 204 213

The tidyverse solution to this problem is to use the dplyr::lag()/lead() function to refer to previous/next values. For example, for a vector of values 1-10 (in random order), I can show the cases where the value has increased like this:

d <- sample(1:10)
bind_cols(
  value = d,
  increased = d > dplyr::lag(d)
)
# A tibble: 10 × 2
   value increased
   <int> <lgl>    
 1    10 NA       
 2     9 FALSE    
 3     8 FALSE    
 4     2 FALSE    
 5     3 TRUE     
 6     7 TRUE     
 7     1 FALSE    
 8     5 TRUE     
 9     6 TRUE     
10     4 FALSE    

Excluding the NA value, which occurs due to being the first element, sum() up the cases of larger measurements:

sum(lag(day1) < day1, na.rm = TRUE)
[1] 1709

For the Python solution, I will use the numpy.diff function to calculate the difference between consecutive values:

import numpy as np

# Reference an object from the R session with r.obj
(np.diff(r.day1) > 0)
array([False,  True,  True, ...,  True,  True,  True])

Then chain the .sum() function to add up the True values:

(np.diff(r.day1) > 0).sum()
1709

Note that this method is also possible in base R, and is a bit simpler than the tidyverse solution:

sum(diff(day1) > 0)
[1] 1709

Part 2

Your goal now is to count the number of times the sum of measurements in this sliding window increases from the previous sum. So, compare A with B, then compare B with C, then C with D, and so on. Stop when there aren’t enough measurements left to create a new three-measurement sum.

Here, I will use both lag and lead to compute the sum of the window:

d_sum3 <- lag(d) + d + lead(d)
bind_cols(
  value = d,
  sum3 = d_sum3,
  increased = lag(d_sum3) < d_sum3
)
# A tibble: 10 × 3
   value  sum3 increased
   <int> <int> <lgl>    
 1    10    NA NA       
 2     9    27 NA       
 3     8    19 FALSE    
 4     2    13 FALSE    
 5     3    12 FALSE    
 6     7    11 FALSE    
 7     1    13 TRUE     
 8     5    12 FALSE    
 9     6    15 TRUE     
10     4    NA NA       

Now sum the number of increases in the day 1 data:

day1_sum3 <- lag(day1) + day1 + lead(day1)
sum(day1_sum3 > lag(day1_sum3), na.rm = TRUE)
[1] 1761

In Python, the np.convolve function allows computation in sliding windows:

np.convolve(r.day1, np.ones(3, dtype = int))
array([  191,   376,   564, ..., 31566, 21051, 10526])

Above, we provided the np.ones(3, dtype = int) array which is simply [1, 1, 1] and works as the convolution operator that slides along the r.day1 array. Note that the first two elements are not correct, however, because the boundaries (with fewer than 3 values) were returned. Fix this with the mode argument:

np.convolve(r.day1, np.ones(3, dtype = int), mode = 'valid')
array([  564,   562,   581, ..., 31536, 31551, 31566])
(np.diff(np.convolve(r.day1, np.ones(3, dtype = int), mode = 'valid')) > 0) \
  .sum()
1761

Day 2: Dive!

Part 1

Your horizontal position and depth both start at 0. The steps above would then modify them as follows:

  • forward 5 adds 5 to your horizontal position, a total of 5.
  • down 5 adds 5 to your depth, resulting in a value of 5.
  • forward 8 adds 8 to your horizontal position, a total of 13.
  • up 3 decreases your depth by 3, resulting in a value of 2.
  • down 8 adds 8 to your depth, resulting in a value of 10.
  • forward 2 adds 2 to your horizontal position, a total of 15.

Calculate the horizontal position and depth you would have after following the planned course. What do you get if you multiply your final horizontal position by your final depth?

Import the steps:

day2 <- read_lines("day02-input.txt")
head(day2)
[1] "forward 7" "down 2"    "forward 7" "down 6"    "forward 1" "forward 7"

Put it in a tibble, and tidyr::separate the instruction and the amount:

d_day2 <- tibble(step = day2) %>%
  separate(step, into = c("instruction", "amount"), sep = " ", convert = TRUE)
head(d_day2)
# A tibble: 6 × 2
  instruction amount
  <chr>        <int>
1 forward          7
2 down             2
3 forward          7
4 down             6
5 forward          1
6 forward          7

Then summarize the horizontal position and depth, and multiply the result:

d_day2 %>%
  summarise(
    horizontal_position = sum(amount[instruction == "forward"]),
    # Depth is inverse, so down - up
    depth = sum(amount[instruction == "down"]) -
      sum(amount[instruction == "up"]),
    .groups = "drop"
  ) %>%
  mutate(product = horizontal_position * depth)
# A tibble: 1 × 3
  horizontal_position depth product
                <int> <int>   <int>
1                1990  1000 1990000

For the Python solution, I’ll use pandas:

import pandas as pd

day2_df = pd.DataFrame(r.day2, dtype = str, columns = ['step']) \
  .step.str.split(' ', expand = True) \
  .rename(columns = {0: 'instruction', 1: 'amount'}) \
  .astype({'amount': 'int32'})
day2_df
    instruction  amount
0       forward       7
1          down       2
2       forward       7
3          down       6
4       forward       1
..          ...     ...
995     forward       9
996        down       3
997        down       7
998        down       5
999     forward       7

[1000 rows x 2 columns]

Then it is easy enough to sum up the different columns:

day2_df[day2_df.instruction == 'forward'].amount.sum()
1990
day2_df[day2_df.instruction == 'down'].amount.sum() - \
  day2_df[day2_df.instruction == 'up'].amount.sum()
1000

Here is another way with the groupby and aggregate functions:

day2_df_sum = day2_df \
  .groupby('instruction', as_index = True) \
  .aggregate('sum')
  
day2_df_sum.loc['forward'].amount
day2_df_sum.loc['down'].amount - day2_df_sum.loc['up'].amount

Part 2

In addition to horizontal position and depth, you’ll also need to track a third value, aim, which also starts at 0. The commands also mean something entirely different than you first thought:

  • down X increases your aim by X units.
  • up X decreases your aim by X units.
  • forward X does two things:
  • It increases your horizontal position by X units.
  • It increases your depth by your aim multiplied by X.

Using this new interpretation of the commands, calculate the horizontal position and depth you would have after following the planned course. What do you get if you multiply your final horizontal position by your final depth?

First, I’ll use cumsum() to add a running total of the aim variable from the “down” and “up” instructions:

d_day2 <- d_day2 %>%
  mutate(
    # Have to use a placeholder variable so it has the same length as the
    #  "aim" variable below
    aim_placeholder = case_when(
      instruction == "down" ~ amount,
      instruction == "up" ~ -amount,
      TRUE ~ 0L
    ),
    aim = cumsum(aim_placeholder)
  ) %>%
  select(-aim_placeholder)
head(d_day2, 9)
# A tibble: 9 × 3
  instruction amount   aim
  <chr>        <int> <int>
1 forward          7     0
2 down             2     2
3 forward          7     2
4 down             6     8
5 forward          1     8
6 forward          7     8
7 down             3    11
8 up               5     6
9 forward          7     6

Now with the running total of aim, I can compute horizontal position and depth:

d_day2 %>%
  summarise(
    horizontal_position = sum(amount[instruction == "forward"]),
    depth = sum(
      # Depth is aim multiplied by forward amount
      aim[instruction == "forward"] * amount[instruction == "forward"]
    ),
    .groups = "drop"
  ) %>%
  mutate(product = horizontal_position * depth)
# A tibble: 1 × 3
  horizontal_position  depth    product
                <int>  <int>      <int>
1                1990 992674 1975421260

In Python, I will assign a new aim column, and use the np.select() function to conditionally sum the values:

day2_df = day2_df \
  .assign(
    aim = np.select(
      [day2_df.instruction == 'down',
       day2_df.instruction == 'up',
       day2_df.instruction == 'forward'],
      [day2_df.amount, -day2_df.amount, 0]
    )
  )
day2_df.aim = day2_df.aim.cumsum()

The aggregate function can only operate on single columns, so need to make a new depth column first by multiplying aim with amount (for instruction = ‘forward’):

day2_df = day2_df \
  .assign(
    depth = np.where(
      day2_df.instruction == 'forward', day2_df.aim * day2_df.amount, 0
    ),
    horizontal_position = np.where(
      day2_df.instruction == 'forward', day2_df.amount, 0
    )
  )
day2_df 
    instruction  amount   aim  depth  horizontal_position
0       forward       7     0      0                    7
1          down       2     2      0                    0
2       forward       7     2     14                    7
3          down       6     8      0                    0
4       forward       1     8      8                    1
..          ...     ...   ...    ...                  ...
995     forward       9   985   8865                    9
996        down       3   988      0                    0
997        down       7   995      0                    0
998        down       5  1000      0                    0
999     forward       7  1000   7000                    7

[1000 rows x 5 columns]

I’ve also added the horizontal_position variable, so that I can compute the sums with a simple aggregate:

day2_df[['depth', 'horizontal_position']].aggregate('sum')
depth                  992674
horizontal_position      1990
dtype: int64

Day 3: Binary Diagnostic

Part 1

The diagnostic report (your puzzle input) consists of a list of binary numbers which, when decoded properly, can tell you many useful things about the conditions of the submarine. The first parameter to check is the power consumption.

You need to use the binary numbers in the diagnostic report to generate two new binary numbers (called the gamma rate and the epsilon rate). The power consumption can then be found by multiplying the gamma rate by the epsilon rate.

Each bit in the gamma rate can be determined by finding the most common bit in the corresponding position of all numbers in the diagnostic report.

The epsilon rate is calculated in a similar way; rather than use the most common bit, the least common bit from each position is used.

Use the binary numbers in your diagnostic report to calculate the gamma rate and epsilon rate, then multiply them together. What is the power consumption of the submarine?

Import the binary numbers:

day3 <- read_lines("day03-input.txt")
head(day3)
[1] "001000010101" "010010111110" "001010110111" "001001011101" "001001010011"
[6] "001111100111"

Each bit needs to be considered separately, so use strsplit like this:

strsplit(day3[1:2], split = "")
[[1]]
 [1] "0" "0" "1" "0" "0" "0" "0" "1" "0" "1" "0" "1"

[[2]]
 [1] "0" "1" "0" "0" "1" "0" "1" "1" "1" "1" "1" "0"

Split every binary number and put it into a tibble of integers:

day3_split <- strsplit(day3, split = "")
day3_df <- matrix(unlist(day3_split), ncol = 12, byrow = TRUE) %>%
  as_tibble(.name_repair = "unique") %>%
  mutate(across(everything(), as.integer))

head(day3_df)
# A tibble: 6 × 12
   ...1  ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9 ...10 ...11 ...12
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     0     0     1     0     0     0     0     1     0     1     0     1
2     0     1     0     0     1     0     1     1     1     1     1     0
3     0     0     1     0     1     0     1     1     0     1     1     1
4     0     0     1     0     0     1     0     1     1     1     0     1
5     0     0     1     0     0     1     0     1     0     0     1     1
6     0     0     1     1     1     1     1     0     0     1     1     1

Before computing the solution, and there any bits with an equal number of 0s and 1s?

day3_df %>%
  summarise(across(everything(), mean)) %>%
  pivot_longer(everything(), names_to = "bit", values_to = "prop")
# A tibble: 12 × 2
   bit    prop
   <chr> <dbl>
 1 ...1  0.489
 2 ...2  0.509
 3 ...3  0.487
 4 ...4  0.504
 5 ...5  0.498
 6 ...6  0.498
 7 ...7  0.493
 8 ...8  0.52 
 9 ...9  0.493
10 ...10 0.534
11 ...11 0.486
12 ...12 0.493

No, none the bits have prop = 0.500. Compute the gamma and epsilon rates:

# There isn't a function available in base R to compute the mode of a vector,
#  so define one here that takes the most frequent by default (freq_rank = 1)
vector_mode <- function(x, freq_rank = 1) {
  # Frequency table
  table(x) %>%
    # Sort it by count
    sort(decreasing = TRUE) %>%
    # Get the labels of the counts
    names() %>%
    pluck(freq_rank)
}

day3_rates <- day3_df %>%
  pivot_longer(everything(), names_to = "bit", values_to = "value") %>%
  mutate(bit = as.integer(str_remove(bit, "..."))) %>%
  group_by(bit) %>%
  summarise(
    gamma = vector_mode(value, freq_rank = 1),
    epsilon = vector_mode(value, freq_rank = 2),
    .groups = "drop"
  ) %>%
  summarise(
    # Collapse the most/least frequent values into a single string
    across(c(gamma, epsilon), str_c, collapse = "")
  )
day3_rates
# A tibble: 1 × 2
  gamma        epsilon     
  <chr>        <chr>       
1 010100010100 101011101011

We now have the binary representations, which we convert using strtoi:

day3_rates %>%
  mutate(across(c(gamma, epsilon), strtoi, base = 2),
         prod = gamma * epsilon)
# A tibble: 1 × 3
  gamma epsilon    prod
  <int>   <int>   <int>
1  1300    2795 3633500

To put this into a pandas DataFrame, use list comprehension to split the strings into characters:

day3_df = pd.DataFrame([list(number) for number in r.day3]).astype('int32')
day3_df
     0   1   2   3   4   5   6   7   8   9   10  11
0     0   0   1   0   0   0   0   1   0   1   0   1
1     0   1   0   0   1   0   1   1   1   1   1   0
2     0   0   1   0   1   0   1   1   0   1   1   1
3     0   0   1   0   0   1   0   1   1   1   0   1
4     0   0   1   0   0   1   0   1   0   0   1   1
..   ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..
995   0   0   1   0   1   1   1   1   1   0   1   0
996   1   0   0   1   1   0   0   1   0   0   1   1
997   1   0   0   1   1   1   1   1   1   0   1   1
998   0   0   0   0   1   0   0   1   0   0   1   1
999   1   0   0   1   1   1   1   0   1   1   0   0

[1000 rows x 12 columns]

Then get gamma and epsilon rates:

gamma = day3_df.aggregate('mode')
# For epsilon rate, just swap the numbers
epsilon = gamma.replace([0, 1], [1, 0])

# Concatenate the bits into a single string
gamma = gamma.apply(lambda row: ''.join(row.values.astype(str)), axis = 1)[0]
epsilon = epsilon.apply(lambda row: ''.join(row.values.astype(str)), axis = 1)[0]
gamma; epsilon
'010100010100'
'101011101011'

Finally, use int() with base = 2 to convert to decimal:

int(gamma, 2) * int(epsilon, 2)
3633500

Part 2

Next, you should verify the life support rating, which can be determined by multiplying the oxygen generator rating by the CO2 scrubber rating.

Both the oxygen generator rating and the CO2 scrubber rating are values that can be found in your diagnostic report - finding them is the tricky part. Both values are located using a similar process that involves filtering out values until only one remains. Before searching for either rating value, start with the full list of binary numbers from your diagnostic report and consider just the first bit of those numbers. Then:

  • Keep only numbers selected by the bit criteria for the type of rating value for which you are searching. Discard numbers which do not match the bit criteria.
  • If you only have one number left, stop; this is the rating value for which you are searching.
  • Otherwise, repeat the process, considering the next bit to the right.

The bit criteria depends on which type of rating value you want to find:

  • To find oxygen generator rating, determine the most common value (0 or 1) in the current bit position, and keep only numbers with that bit in that position. If 0 and 1 are equally common, keep values with a 1 in the position being considered.
  • To find CO2 scrubber rating, determine the least common value (0 or 1) in the current bit position, and keep only numbers with that bit in that position. If 0 and 1 are equally common, keep values with a 0 in the position being considered.

Before doing anything, I need to alter my vector_mode function to deal with ties:

vector_mode_part2 <- function(x, freq_rank = 1) {
  freq_table <- table(x) %>%
    sort(decreasing = TRUE)
  
  # If there is a tie
  if (freq_table["0"] == freq_table["1"]) {
    # And we're looking for the most frequent (oxygen rating)
    if (freq_rank == 1) {
      # Then return 1
      return(1) 
    } else {
      # Otherwise return 0 (CO2 rating)
      return(0)
    }
  # Otherwise, return the value from the table as usual
  } else {
    freq_table %>%
      names() %>%
      pluck(freq_rank) %>%
      as.integer()
  }
}

This definitely isn’t the most efficient way to implement the bit criteria, but an easy solution is to just filter bit-by-bit.

oxygen_rating <- day3_df
for (bit in names(day3_df)) {
  # If 1 number (row) remains, we have found the single oxygen rating
  if (nrow(oxygen_rating) == 1) break
  
  most_freq <- vector_mode_part2(oxygen_rating[[bit]])
  oxygen_rating <- oxygen_rating %>%
    filter(!!sym(bit) == most_freq)
}
oxygen_rating
# A tibble: 1 × 12
   ...1  ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9 ...10 ...11 ...12
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     0     1     0     1     0     0     1     0     1     1     1     1
co2_rating <- day3_df
for (bit in names(day3_df)) {
  if (nrow(co2_rating) == 1) break
  
  least_freq <- vector_mode_part2(co2_rating[[bit]], 2)
  co2_rating <- co2_rating %>%
    filter(!!sym(bit) == least_freq)
}
co2_rating
# A tibble: 1 × 12
   ...1  ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9 ...10 ...11 ...12
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     1     1     0     1     0     1     1     0     0     1     0     1

Convert the binary representations and compute the product:

tibble(
  oxygen_rating = oxygen_rating %>% str_c(collapse = ""),
  co2_rating = co2_rating %>% str_c(collapse = "")
) %>%
  mutate(across(c(oxygen_rating, co2_rating), strtoi, base = 2),
         prod = oxygen_rating * co2_rating)
# A tibble: 1 × 3
  oxygen_rating co2_rating    prod
          <int>      <int>   <int>
1          1327       3429 4550283

It is simple enough to reproduce those loops in Python:

oxygen_rating = day3_df
for bit in day3_df:
  if len(oxygen_rating) == 1:
    break
  
  bit_value_counts = oxygen_rating[bit].value_counts()
  if bit_value_counts[1] >= bit_value_counts[0]:
    oxygen_rating = oxygen_rating[oxygen_rating[bit] == 1]
  else:
    oxygen_rating = oxygen_rating[oxygen_rating[bit] == 0]
    
co2_rating = day3_df
for bit in day3_df:
  if len(co2_rating) == 1:
    break
  
  bit_value_counts = co2_rating[bit].value_counts()
  # In cases where there are no 0s or no 1s, need to fill with 0 
  bit_value_counts = bit_value_counts.reindex([0, 1], fill_value = 0)
  
  if bit_value_counts[0] <= bit_value_counts[1]:
    co2_rating = co2_rating[co2_rating[bit] == 0]
  else:
    co2_rating = co2_rating[co2_rating[bit] == 1]
   
# In part 1, I used apply, here I'll use aggregate along axis = 1
co2_rating = co2_rating.astype(str).aggregate(''.join, axis = 1).values[0]
oxygen_rating = oxygen_rating.astype(str).aggregate(''.join, axis = 1).values[0]

int(co2_rating, 2) * int(oxygen_rating, 2)
4550283

Day 4: Giant Squid

Part 1

Bingo is played on a set of boards each consisting of a 5x5 grid of numbers. Numbers are chosen at random, and the chosen number is marked on all boards on which it appears. (Numbers may not appear on all boards.) If all numbers in any row or any column of a board are marked, that board wins. (Diagonals don’t count.)

The submarine has a bingo subsystem to help passengers (currently, you and the giant squid) pass the time. It automatically generates a random order in which to draw numbers and a random set of boards (your puzzle input).

The score of the winning board can now be calculated. Start by finding the sum of all unmarked numbers on that board; in this case, the sum is 188. Then, multiply that sum by the number that was just called when the board won, 24, to get the final score, 188 * 24 = 4512.

To guarantee victory against the giant squid, figure out which board will win first. What will your final score be if you choose that board?

Import the bingo input:

day4 <- read_lines("day04-input.txt")
print(str_trunc(head(day4, 8), 50))
[1] "94,21,58,16,4,1,44,6,17,48,20,92,55,36,40,63,62..."
[2] ""                                                  
[3] "49 74 83 34 40"                                    
[4] "87 16 57 75  3"                                    
[5] "68 94 77 78 89"                                    
[6] "56 38 29 26 60"                                    
[7] "41 42 45 19  1"                                    
[8] ""                                                  

The data needs to be split up into the called numbers (at the top) and the boards. To do this, I’ll use this trick with cumsum that I found on Stack Overflow:

day4_split <- split(
  day4[day4 != ""],
  cumsum(day4 == "")[day4 != ""]
)

called_numbers <- day4_split[[1]]
bingo_boards <- day4_split[2:length(day4_split)]
str_trunc(called_numbers, 50); bingo_boards[1]
[1] "94,21,58,16,4,1,44,6,17,48,20,92,55,36,40,63,62..."
$`1`
[1] "49 74 83 34 40" "87 16 57 75  3" "68 94 77 78 89" "56 38 29 26 60"
[5] "41 42 45 19  1"

Now I need to convert the called_numbers to a numeric vector, and the bingo_boards to numeric matrices:

called_numbers <- strsplit(called_numbers, ",")[[1]] %>% as.integer()

bingo_boards <- bingo_boards %>%
  map(
    ~ {
      .x %>%
        # str_squish replaces the double spaces before single digits numbers
        #  with single spaces, so that we can properly strsplit by " "
        str_squish() %>%
        str_split(" ") %>%
        map(as.integer) %>%
        unlist() %>%
        matrix(nrow = 5, byrow = TRUE)
    }
  )

head(called_numbers); bingo_boards[1]
[1] 94 21 58 16  4  1
$`1`
     [,1] [,2] [,3] [,4] [,5]
[1,]   49   74   83   34   40
[2,]   87   16   57   75    3
[3,]   68   94   77   78   89
[4,]   56   38   29   26   60
[5,]   41   42   45   19    1

Here is my iteration strategy for identifying and marking called numbers (not evaluated, just a demonstration with one board and one number):

bingo_board1 <- bingo_boards[[1]]
called_number1 <- 49 # suppose 49 was called
# Replace any 49s with -1
bingo_board1[bingo_board1 == called_number1] <- -1

# Look for and row or column sums that = -5 (all values = -1)
row_sums1 <- rowSums(bingo_board1)
col_sums1 <- colSums(bingo_board1)

# If we have bingo
if (-5 %in% c(row_sums1, col_sums1)) {
  # Compute the sum of the uncalled (non-negative) numbers
  uncalled_sum <- sum(bingo_board1[bingo_board1 > 0])
  # Return the product as the answer to the puzzle
  called_number1 * uncalled_sum
}

Now put it into a loop over all numbers and boards:

bingo_boards_part1 <- bingo_boards

for (called_number in called_numbers) {
  bingo_boards_part1 <- map(
    bingo_boards_part1,
    ~{
      .x[.x == called_number] <- -1
      .x
    }
  )
  
  # Find any winning boards
  bingo_board_winner <- map_lgl(
    bingo_boards_part1,
    ~{-5 %in% c(rowSums(.x), colSums(.x))}
  )
  
  if (sum(bingo_board_winner) > 0) {
    bingo_board_final <- bingo_boards_part1[bingo_board_winner] 
    break
  }
}
bingo_board_final; called_number
$`19`
     [,1] [,2] [,3] [,4] [,5]
[1,]   93   -1   26   35   39
[2,]   91   -1   85   69   -1
[3,]   -1   -1   27   57   10
[4,]   -1   -1   30   73   22
[5,]   -1   -1   -1   -1    9
[1] 7

Then the solution is:

sum(bingo_board_final[[1]][bingo_board_final[[1]] > 0]) * called_number
[1] 4662

For the Python solution, I’ll practice my list comprehension to compile the bingo boards:

called_numbers = [int(s) for s in r.day4[0].split(',')]

# Find the indices of the '' characters separating the bingo boards
bingo_boards_sep = [i for i,j in enumerate(r.day4) if j == '']
# Compile a list of bingo boards
bingo_boards = [r.day4[(i+1):(i+6)] for i in bingo_boards_sep]
# For each row of each board, split the string into multiple values
bingo_boards = [[board_row.split() for board_row in board] for board in bingo_boards]

That last line is a bit of a mess – it is a nested list comprehension loop which iterates over boards and then iterates over rows of each board to split the string into single values – but converting it all to numeric arrays is now simple:

bingo_boards = [np.array(board).astype(int) for board in bingo_boards]
bingo_boards[0]
array([[49, 74, 83, 34, 40],
       [87, 16, 57, 75,  3],
       [68, 94, 77, 78, 89],
       [56, 38, 29, 26, 60],
       [41, 42, 45, 19,  1]])

Now I can re-create the same loop from the R solution:

# In Python, you use deepcopy() to make copies of nested structures like this
import copy
bingo_boards_part1 = copy.deepcopy(bingo_boards)

for called_number in called_numbers:
  # For each board, mark the called numbers as -1
  for i,b in enumerate(bingo_boards_part1):
    bingo_boards_part1[i][bingo_boards_part1[i] == called_number] = -1
  
  # Find winning boards
  winners = [-5 in np.concatenate([board.sum(axis = 0), board.sum(axis = 1)]) \
             for board in bingo_boards_part1]
  
  if True in winners:
    bingo_board_final = bingo_boards_part1[winners.index(True)]
    break

bingo_board_final[bingo_board_final > 0].sum() * called_number
4662

Part 2

On the other hand, it might be wise to try a different strategy: let the giant squid win.

You aren’t sure how many bingo boards a giant squid could play at once, so rather than waste time counting its arms, the safe thing to do is to figure out which board will win last and choose that one. That way, no matter which boards it picks, it will win for sure.

Figure out which board will win last. Once it wins, what would its final score be?

Simple enough to alter the loop to iteratively remove winning boards until one remains:

bingo_boards_part2 <- bingo_boards

for (called_number in called_numbers) {
  bingo_boards_part2 <- map(
    bingo_boards_part2,
    ~{
      .x[.x == called_number] <- -1
      .x
    }
  )
  
  # Find any winning boards
  bingo_board_winner <- map_lgl(
    bingo_boards_part2,
    ~{-5 %in% c(rowSums(.x), colSums(.x))}
  )
  
  # If more than one board remains, remove winners
  if (length(bingo_boards_part2) > 1) {
    bingo_boards_part2 <- bingo_boards_part2[!bingo_board_winner]
  } else {
    # Otherwise, continue until the last board wins
    if (sum(bingo_board_winner) > 0) {
      bingo_board_final <- bingo_boards_part2[bingo_board_winner] 
      break
    }
  }
}
bingo_board_final; called_number
$`14`
     [,1] [,2] [,3] [,4] [,5]
[1,]   -1   -1   -1   -1   -1
[2,]   -1   -1   -1   -1    0
[3,]   10   38   -1   -1   25
[4,]   -1   11   -1   -1   -1
[5,]   -1   -1   67   -1   -1
[1] 80

And the product:

sum(bingo_board_final[[1]][bingo_board_final[[1]] > 0]) * called_number
[1] 12080

Python:

bingo_boards_part2 = copy.deepcopy(bingo_boards)

for called_number in called_numbers:
  for i,b in enumerate(bingo_boards_part2):
    bingo_boards_part2[i][bingo_boards_part2[i] == called_number] = -1
  
  winners = [-5 in np.concatenate([board.sum(axis = 0), board.sum(axis = 1)]) \
             for board in bingo_boards_part2]
  
  # If more than one board remains, remove winners
  if len(bingo_boards_part2) > 1:
    bingo_boards_part2 = [b for i,b in \
                          enumerate(bingo_boards_part2) if not winners[i]]
  else:
    if True in winners:
      bingo_board_final = bingo_boards_part2[winners.index(True)]
      break

bingo_board_final[bingo_board_final > 0].sum() * called_number
12080

Day 5: Hydrothermal Venture

Part 1

You come across a field of hydrothermal vents on the ocean floor! These vents constantly produce large, opaque clouds, so it would be best to avoid them if possible. They tend to form in lines; the submarine helpfully produces a list of nearby lines of vents (your puzzle input) for you to review.

Each line of vents is given as a line segment in the format x1,y1 -> x2,y2 where x1,y1 are the coordinates of one end the line segment and x2,y2 are the coordinates of the other end. These line segments include the points at both ends. For now, only consider horizontal and vertical lines: lines where either x1 = x2 or y1 = y2.

To avoid the most dangerous areas, you need to determine the number of points where at least two lines overlap. Consider only horizontal and vertical lines. At how many points do at least two lines overlap?

Import the lines:

day5 <- read_lines("day05-input.txt")
head(day5)
[1] "223,805 -> 223,548" "609,164 -> 609,503" "461,552 -> 796,552"
[4] "207,361 -> 207,34"  "503,879 -> 503,946" "937,52 -> 937,268" 

I’ll use a series of separates to get the numeric coordinates

day5_df <- tibble(x = day5) %>%
  separate(x, into = c("x1_y1", "x2_y2"), sep = " -> ") %>%
  separate(x1_y1, into = c("x1", "y1"), sep = ",", convert = TRUE) %>%
  separate(x2_y2, into = c("x2", "y2"), sep = ",", convert = TRUE)
head(day5_df)
# A tibble: 6 × 4
     x1    y1    x2    y2
  <int> <int> <int> <int>
1   223   805   223   548
2   609   164   609   503
3   461   552   796   552
4   207   361   207    34
5   503   879   503   946
6   937    52   937   268

Get the straight lines by looking for x1 == x2 or y1 == y2, then use crossing to get all the points touched by each line:

# Only straight lines
day5_straight <- day5_df %>%
  filter((x1 == x2) | (y1 == y2)) %>%
  rowwise() %>%
  mutate(xy = list(crossing(x = x1:x2, y = y1:y2))) %>%
  ungroup()
# As an example, show the first few points crossed by the first line
day5_straight %>%
  slice(1) %>%
  unnest(xy) %>%
  head()
# A tibble: 6 × 6
     x1    y1    x2    y2     x     y
  <int> <int> <int> <int> <int> <int>
1   223   805   223   548   223   548
2   223   805   223   548   223   549
3   223   805   223   548   223   550
4   223   805   223   548   223   551
5   223   805   223   548   223   552
6   223   805   223   548   223   553

Now to find the dangerous points, just need to look for any combinations of x and y that occur more than once:

day5_straight %>%
  unnest(xy) %>%
  count(x, y) %>%
  summarise(dangerous_points = sum(n > 1))
# A tibble: 1 × 1
  dangerous_points
             <int>
1             7142

Create the same data frame in a pandas DataFrame:

day5_df = [[coord.split(',') for coord in line.split(' -> ')] \
            for line in r.day5]
# "Flatten" the lists so that each element has the four coordinates
day5_df = [xy[0] + xy[1] for xy in day5_df]

day5_df = pd.DataFrame(day5_df, columns = ['x1', 'y1', 'x2', 'y2']).astype(int)
day5_df.head()
    x1   y1   x2   y2
0  223  805  223  548
1  609  164  609  503
2  461  552  796  552
3  207  361  207   34
4  503  879  503  946

Find the straight lines with query:

day5_straight = day5_df.query('(x1 == x2) | (y1 == y2)')

I’m going to brute force a solution here with a for loop and a grid of values:

ocean_floor = np.zeros((1000, 1000))

for index, row in day5_straight.iterrows():
  # Need to fix the range() step if going "backwards"
  x_step = 1 if row.x1 <= row.x2 else -1
  y_step = 1 if row.y1 <= row.y2 else -1
  
  for x in range(row.x1, row.x2 + x_step, x_step):
    for y in range(row.y1, row.y2 + y_step, y_step):
      ocean_floor[x, y] += 1
      
np.count_nonzero(ocean_floor > 1)
7142

Part 2

Unfortunately, considering only horizontal and vertical lines doesn’t give you the full picture; you need to also consider diagonal lines. Because of the limits of the hydrothermal vent mapping system, the lines in your list will only ever be horizontal, vertical, or a diagonal line at exactly 45 degrees. Consider all of the lines. At how many points do at least two lines overlap?

Find the diagonal line points:

day5_diag <- day5_df %>%
  filter(x1 != x2, y1 != y2) %>%
  rowwise() %>%
  mutate(x = list(x1:x2), y = list(y1:y2)) %>%
  ungroup()

Combine the straight and diagonal lines and add up the points:

bind_rows(
  day5_straight %>%  unnest(xy),
  day5_diag %>% unnest(c(x, y))
) %>%
  count(x, y) %>%
  summarise(dangerous_points = sum(n > 1))
# A tibble: 1 × 1
  dangerous_points
             <int>
1            20012

For the Python brute force solution, I can continue adding to the existing ocean_floor grid from part 1:

day5_diag = day5_df.query('(x1 != x2) & (y1 != y2)')

for index, row in day5_diag.iterrows():
  x_step = 1 if row.x1 <= row.x2 else -1
  y_step = 1 if row.y1 <= row.y2 else -1
  
  for x, y in zip(range(row.x1, row.x2 + x_step, x_step),
                  range(row.y1, row.y2 + y_step, y_step)):
    ocean_floor[x, y] += 1
      
np.count_nonzero(ocean_floor > 1)
20012

Stats

Here are my personal stats so far:

tibble::tribble(
  ~Part, ~Day, ~Time, ~Rank, ~Score,
  1, 5, "00:50:24", 6542, 0,
  2, 5, "00:57:17", 4865, 0,
  1, 4, "10:52:11", 33771, 0,
  2, 4, "11:07:58", 30829, 0,
  1, 3, "10:43:14", 64952, 0,
  2, 3, "11:43:13", 45788, 0,
  1, 2, "11:48:16", 74444, 0,
  2, 2, "12:21:09", 72356, 0,
  1, 1, "13:02:23", 72332, 0,
  2, 1, "13:23:44", 63804, 0
) %>%
  pivot_wider(names_from = Part, values_from = c(Time, Rank, Score),
              names_glue = "Part {Part}_{.value}") %>%
  mutate(
    `Time between parts` = as.numeric(hms(`Part 2_Time`) - hms(`Part 1_Time`),
                                      "minutes") %>% round(1)
  ) %>%
  gt() %>%
  tab_spanner_delim(delim = "_", split = "first")
Day Time Rank Score Time between parts
Part 1 Part 2 Part 1 Part 2 Part 1 Part 2
5 00:50:24 00:57:17 6542 4865 0 0 6.9
4 10:52:11 11:07:58 33771 30829 0 0 15.8
3 10:43:14 11:43:13 64952 45788 0 0 60.0
2 11:48:16 12:21:09 74444 72356 0 0 32.9
1 13:02:23 13:23:44 72332 63804 0 0 21.4

Except for day 5 (when I stayed up late because it was the weekend), I’ve been completing the puzzles around lunch time on my break from work.

These 0 scores come from the global leaderboard, which only gives points to the first 100 users to finish, which I definitely won’t be doing. A better benchmark is the private leaderboard:

library(httr)

leaderboard <- httr::GET(
  url = "https://adventofcode.com/2021/leaderboard/private/view/1032765.json",
  httr::set_cookies(session = Sys.getenv("AOC_COOKIE"))
) %>%
  content() %>%
  as_tibble() %>%
  unnest_wider(members) %>%
  arrange(desc(local_score)) %>%
  transmute(
    Rank = 1:n(), Name = name, Score = local_score, Stars = stars
  )
leaderboard %>%
  gt() %>%
  text_transform(
    locations = cells_body(columns = Stars),
    fn = function(stars_col) {
      map_chr(stars_col,
              ~ html(rep(fontawesome::fa('star', fill = 'gold'),
                         times = as.integer(.x))))
    }
  ) %>%
  cols_align("left") %>%
  tab_style(
    style = list(cell_text(weight = "bold")),
    locations = cells_body(
      rows = (Name == "taylordunn")
    )
  ) %>%
  tab_options(container.height = 500)
Rank Name Score Stars
1 Emil Hvitfeldt 1191
2 Colin Rundel 1183
3 trang1618 1170
4 @_TanHo 1154
5 mkiang 1126
6 gpecci 1112
7 David Robinson 1112
8 Jarosław Nirski 1110
9 dhimmel 1105
10 hrushikeshrv 1032
11 pritikadasgupta 1028
12 Ildikó Czeller 1026
13 Sherry Zhang 1023
14 john-b-edwards 1014
15 @_mnar99 1012
16 Melinda Tang 984
17 Josh Gray 952
18 Jonathan Spring 952
19 Derek Holliday 946
20 Jacqueline Nolis 905
21 Anna Fergusson 901
22 Andrew Argeros 894
23 Zach Bogart 💙 891
24 exunckly 876
25 Riinu Pius 869
26 HannesOberreiter 863
27 Tokhir Dadaev 856
28 ashbaldry 835
29 cramosu 831
30 Tom Jemmett 795
31 fabio machado 791
32 @Mid1995Sed 766
33 karawoo 761
34 Flavien Petit 756
35 Farhan Reynaldo 751
36 patelis 746
37 Arun Chavan 746
38 @woodspock 740
39 delabj 737
40 MetaMoraleMundo 715
41 Calum You 712
42 Daniel Coulton 708
43 mbjoseph 684
44 Alex N 681
45 KT421 680
46 jordi figueras puig 680
47 pi55p00r 671
48 scalgary 659
49 Erez Shomron 654
50 Jim Leach 649
51 rywhale 641
52 Matt Onimus 635
53 collinberke 632
54 Ghislain Nono Gueye 628
55 ldnam 612
56 taylordunn 606
57 jaapwalhout 603
58 Jeffrey Brabec 589
59 dirkschumacher 574
60 AlbertRapp 561
61 Nathan Moore 556
62 mfiorina 553
63 Miha Gazvoda 553
64 Nerwosolek 545
65 TylerGrantSmith 542
66 Darrin Speegle 540
67 duju211 525
68 blongworth 514
69 Sydney 510
70 Kelly N. Bodwin 508
71 David Schoch 503
72 long39ng 489
73 CarlssonLeo 486
74 A-Farina 468
75 cathblatter 463
76 Scott-Gee 445
77 Julian Tagell 434
78 Josiah Parry 419
79 thedivtagguy 400
80 jwinget 397
81 andrew-tungate-cms 385
82 @mfarkhann 382
83 @Maatspencer 372
84 @KentWeyrauch 368
85 Andrew Tungate 365
86 Emryn Hofmann 359
87 columbaspexit 346
88 ALBERT 343
89 Maya Gans 338
90 Alan Feder 314
91 Jenna Jordan 306
92 Kevin Kent 305
93 olmgeorg 296
94 Wendy Christensen 291
95 Eric Ekholm 287
96 Daniel Gemara 276
97 AmitLevinson 268
98 quickcoffee 258
99 cynthiahqy 235
100 Andrew Fraser 226
101 jennifer-furman 223
102 soto solo 222
103 antdurrant 211
104 Adrian Perez 196
105 Billy Fryer 186
106 April 181
107 Lukas Gröninger 156
108 Jose Pliego San Martin 113
109 aleighbrown 106
110 Kyle Ligon 105
111 Bruno Mioto 85
112 Duncan Gates 68
113 @jdknguyen 30
114 Matthew Wankiewicz 17
115 chapmandu2 0
116 NA 0
117 Wiktor Jacaszek 0
118 jacquietran 0
119 Tony ElHabr 0
120 Rizky Luthfianto 0
121 CaioBrighenti 0

Currently at rank 56, so about middle of the pack.

Reproducibility

Session info
 setting  value
 version  R version 4.2.1 (2022-06-23 ucrt)
 os       Windows 10 x64 (build 19044)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Canada.utf8
 ctype    English_Canada.utf8
 tz       America/Curacao
 date     2022-10-27
 pandoc   2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)
python:         C:/Users/tdunn/Documents/.virtualenvs/r-reticulate/Scripts/python.exe
libpython:      C:/Users/tdunn/AppData/Local/r-reticulate/r-reticulate/pyenv/pyenv-win/versions/3.9.13/python39.dll
pythonhome:     C:/Users/tdunn/Documents/.virtualenvs/r-reticulate
version:        3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)]
Architecture:   64bit
numpy:          C:/Users/tdunn/Documents/.virtualenvs/r-reticulate/Lib/site-packages/numpy
numpy_version:  1.23.3

NOTE: Python version was forced by use_python function
Git repository
Local:    main C:/Users/tdunn/Documents/tdunn-quarto
Remote:   main @ origin (https://github.com/taylordunn/tdunn-quarto.git)
Head:     [4eb5bf2] 2022-10-26: Added font import to style sheet

Source code, R environment

Reuse

Citation

BibTeX citation:
@online{dunn2021,
  author = {Dunn, Taylor},
  title = {Advent of {Code} 2021: {Days} 1-5},
  date = {2021-12-01},
  url = {https://tdunn.ca/posts/2021-12-01-advent-of-code-2021-days-1-5},
  langid = {en}
}
For attribution, please cite this work as:
Dunn, Taylor. 2021. “Advent of Code 2021: Days 1-5.” December 1, 2021. https://tdunn.ca/posts/2021-12-01-advent-of-code-2021-days-1-5.