tdunn - Advent of Code 2021: Days 1-5

R setup

library(tidyverse)
library(lubridate)
library(gt)

The Advent of Code has begun for 2021, and I decided to participate this year to work on my programming and problem solving skills in R and, when I have the time, I’ll try to translate the solutions to Python. Load the reticulate package and activate my virtual Python environment:

library(reticulate)
use_virtualenv("r-reticulate")

I’ll also be “competing” in a private leaderboard started by Tan Ho. I don’t expect to rank highly here because the puzzles are released at 1AM my time (and scores are based on time from release) but it’ll be a good source of motivation throughout the month. There are 25 days of challenges, so my current plan is to split up the posts into 5-day chunks.

Day 1: Sonar Sweep

Part 1

The first order of business is to figure out how quickly the depth increases, just so you know what you’re dealing with - you never know if the keys will get carried into deeper water by an ocean current or a fish or something. To do this, count the number of times a depth measurement increases from the previous measurement. (There is no measurement before the first measurement.)

Import the measurements:

day1 <- read_lines("day01-input.txt") %>%
  as.integer()
head(day1)

[1] 191 185 188 189 204 213

The tidyverse solution to this problem is to use the dplyr::lag()/lead() function to refer to previous/next values. For example, for a vector of values 1-10 (in random order), I can show the cases where the value has increased like this:

d <- sample(1:10)
bind_cols(
  value = d,
  increased = d > dplyr::lag(d)
)

# A tibble: 10 × 2
   value increased
   <int> <lgl>    
 1    10 NA       
 2     9 FALSE    
 3     8 FALSE    
 4     2 FALSE    
 5     3 TRUE     
 6     7 TRUE     
 7     1 FALSE    
 8     5 TRUE     
 9     6 TRUE     
10     4 FALSE

Excluding the NA value, which occurs due to being the first element, sum() up the cases of larger measurements:

sum(lag(day1) < day1, na.rm = TRUE)

[1] 1709

For the Python solution, I will use the numpy.diff function to calculate the difference between consecutive values:

import numpy as np

# Reference an object from the R session with r.obj
(np.diff(r.day1) > 0)

array([False,  True,  True, ...,  True,  True,  True])

Then chain the .sum() function to add up the True values:

(np.diff(r.day1) > 0).sum()

Note that this method is also possible in base R, and is a bit simpler than the tidyverse solution:

sum(diff(day1) > 0)

[1] 1709

Part 2

Your goal now is to count the number of times the sum of measurements in this sliding window increases from the previous sum. So, compare A with B, then compare B with C, then C with D, and so on. Stop when there aren’t enough measurements left to create a new three-measurement sum.

Here, I will use both lag and lead to compute the sum of the window:

d_sum3 <- lag(d) + d + lead(d)
bind_cols(
  value = d,
  sum3 = d_sum3,
  increased = lag(d_sum3) < d_sum3
)

# A tibble: 10 × 3
   value  sum3 increased
   <int> <int> <lgl>    
 1    10    NA NA       
 2     9    27 NA       
 3     8    19 FALSE    
 4     2    13 FALSE    
 5     3    12 FALSE    
 6     7    11 FALSE    
 7     1    13 TRUE     
 8     5    12 FALSE    
 9     6    15 TRUE     
10     4    NA NA

Now sum the number of increases in the day 1 data:

day1_sum3 <- lag(day1) + day1 + lead(day1)
sum(day1_sum3 > lag(day1_sum3), na.rm = TRUE)

[1] 1761

In Python, the np.convolve function allows computation in sliding windows:

np.convolve(r.day1, np.ones(3, dtype = int))

array([  191,   376,   564, ..., 31566, 21051, 10526])

Above, we provided the np.ones(3, dtype = int) array which is simply [1, 1, 1] and works as the convolution operator that slides along the r.day1 array. Note that the first two elements are not correct, however, because the boundaries (with fewer than 3 values) were returned. Fix this with the mode argument:

np.convolve(r.day1, np.ones(3, dtype = int), mode = 'valid')

array([  564,   562,   581, ..., 31536, 31551, 31566])

(np.diff(np.convolve(r.day1, np.ones(3, dtype = int), mode = 'valid')) > 0) \
  .sum()

Day 2: Dive!

Part 1

Your horizontal position and depth both start at 0. The steps above would then modify them as follows:

forward 5 adds 5 to your horizontal position, a total of 5.

down 5 adds 5 to your depth, resulting in a value of 5.

forward 8 adds 8 to your horizontal position, a total of 13.

up 3 decreases your depth by 3, resulting in a value of 2.

down 8 adds 8 to your depth, resulting in a value of 10.

forward 2 adds 2 to your horizontal position, a total of 15.

Calculate the horizontal position and depth you would have after following the planned course. What do you get if you multiply your final horizontal position by your final depth?

Import the steps:

day2 <- read_lines("day02-input.txt")
head(day2)

[1] "forward 7" "down 2"    "forward 7" "down 6"    "forward 1" "forward 7"

Put it in a tibble, and tidyr::separate the instruction and the amount:

d_day2 <- tibble(step = day2) %>%
  separate(step, into = c("instruction", "amount"), sep = " ", convert = TRUE)
head(d_day2)

# A tibble: 6 × 2
  instruction amount
  <chr>        <int>
1 forward          7
2 down             2
3 forward          7
4 down             6
5 forward          1
6 forward          7

Then summarize the horizontal position and depth, and multiply the result:

d_day2 %>%
  summarise(
    horizontal_position = sum(amount[instruction == "forward"]),
    # Depth is inverse, so down - up
    depth = sum(amount[instruction == "down"]) -
      sum(amount[instruction == "up"]),
    .groups = "drop"
  ) %>%
  mutate(product = horizontal_position * depth)

# A tibble: 1 × 3
  horizontal_position depth product
                <int> <int>   <int>
1                1990  1000 1990000

For the Python solution, I’ll use pandas:

import pandas as pd

day2_df = pd.DataFrame(r.day2, dtype = str, columns = ['step']) \
  .step.str.split(' ', expand = True) \
  .rename(columns = {0: 'instruction', 1: 'amount'}) \
  .astype({'amount': 'int32'})
day2_df

    instruction  amount
0       forward       7
1          down       2
2       forward       7
3          down       6
4       forward       1
..          ...     ...
995     forward       9
996        down       3
997        down       7
998        down       5
999     forward       7

[1000 rows x 2 columns]

Then it is easy enough to sum up the different columns:

day2_df[day2_df.instruction == 'forward'].amount.sum()

day2_df[day2_df.instruction == 'down'].amount.sum() - \
  day2_df[day2_df.instruction == 'up'].amount.sum()

Here is another way with the groupby and aggregate functions:

day2_df_sum = day2_df \
  .groupby('instruction', as_index = True) \
  .aggregate('sum')
  
day2_df_sum.loc['forward'].amount
day2_df_sum.loc['down'].amount - day2_df_sum.loc['up'].amount

Part 2

In addition to horizontal position and depth, you’ll also need to track a third value, aim, which also starts at 0. The commands also mean something entirely different than you first thought:

down X increases your aim by X units.

up X decreases your aim by X units.

forward X does two things:

It increases your horizontal position by X units.

It increases your depth by your aim multiplied by X.

Using this new interpretation of the commands, calculate the horizontal position and depth you would have after following the planned course. What do you get if you multiply your final horizontal position by your final depth?

First, I’ll use cumsum() to add a running total of the aim variable from the “down” and “up” instructions:

d_day2 <- d_day2 %>%
  mutate(
    # Have to use a placeholder variable so it has the same length as the
    #  "aim" variable below
    aim_placeholder = case_when(
      instruction == "down" ~ amount,
      instruction == "up" ~ -amount,
      TRUE ~ 0L
    ),
    aim = cumsum(aim_placeholder)
  ) %>%
  select(-aim_placeholder)
head(d_day2, 9)

# A tibble: 9 × 3
  instruction amount   aim
  <chr>        <int> <int>
1 forward          7     0
2 down             2     2
3 forward          7     2
4 down             6     8
5 forward          1     8
6 forward          7     8
7 down             3    11
8 up               5     6
9 forward          7     6

Now with the running total of aim, I can compute horizontal position and depth:

d_day2 %>%
  summarise(
    horizontal_position = sum(amount[instruction == "forward"]),
    depth = sum(
      # Depth is aim multiplied by forward amount
      aim[instruction == "forward"] * amount[instruction == "forward"]
    ),
    .groups = "drop"
  ) %>%
  mutate(product = horizontal_position * depth)

# A tibble: 1 × 3
  horizontal_position  depth    product
                <int>  <int>      <int>
1                1990 992674 1975421260

In Python, I will assign a new aim column, and use the np.select() function to conditionally sum the values:

day2_df = day2_df \
  .assign(
    aim = np.select(
      [day2_df.instruction == 'down',
       day2_df.instruction == 'up',
       day2_df.instruction == 'forward'],
      [day2_df.amount, -day2_df.amount, 0]
    )
  )
day2_df.aim = day2_df.aim.cumsum()

The aggregate function can only operate on single columns, so need to make a new depth column first by multiplying aim with amount (for instruction = ‘forward’):

day2_df = day2_df \
  .assign(
    depth = np.where(
      day2_df.instruction == 'forward', day2_df.aim * day2_df.amount, 0
    ),
    horizontal_position = np.where(
      day2_df.instruction == 'forward', day2_df.amount, 0
    )
  )
day2_df

    instruction  amount   aim  depth  horizontal_position
0       forward       7     0      0                    7
1          down       2     2      0                    0
2       forward       7     2     14                    7
3          down       6     8      0                    0
4       forward       1     8      8                    1
..          ...     ...   ...    ...                  ...
995     forward       9   985   8865                    9
996        down       3   988      0                    0
997        down       7   995      0                    0
998        down       5  1000      0                    0
999     forward       7  1000   7000                    7

[1000 rows x 5 columns]

I’ve also added the horizontal_position variable, so that I can compute the sums with a simple aggregate:

day2_df[['depth', 'horizontal_position']].aggregate('sum')

depth                  992674
horizontal_position      1990
dtype: int64

Day 3: Binary Diagnostic

Part 1

The diagnostic report (your puzzle input) consists of a list of binary numbers which, when decoded properly, can tell you many useful things about the conditions of the submarine. The first parameter to check is the power consumption.

You need to use the binary numbers in the diagnostic report to generate two new binary numbers (called the gamma rate and the epsilon rate). The power consumption can then be found by multiplying the gamma rate by the epsilon rate.

Each bit in the gamma rate can be determined by finding the most common bit in the corresponding position of all numbers in the diagnostic report.

The epsilon rate is calculated in a similar way; rather than use the most common bit, the least common bit from each position is used.

Use the binary numbers in your diagnostic report to calculate the gamma rate and epsilon rate, then multiply them together. What is the power consumption of the submarine?

Import the binary numbers:

day3 <- read_lines("day03-input.txt")
head(day3)

[1] "001000010101" "010010111110" "001010110111" "001001011101" "001001010011"
[6] "001111100111"

Each bit needs to be considered separately, so use strsplit like this:

strsplit(day3[1:2], split = "")

[[1]]
 [1] "0" "0" "1" "0" "0" "0" "0" "1" "0" "1" "0" "1"

[[2]]
 [1] "0" "1" "0" "0" "1" "0" "1" "1" "1" "1" "1" "0"

Split every binary number and put it into a tibble of integers:

day3_split <- strsplit(day3, split = "")
day3_df <- matrix(unlist(day3_split), ncol = 12, byrow = TRUE) %>%
  as_tibble(.name_repair = "unique") %>%
  mutate(across(everything(), as.integer))

head(day3_df)

# A tibble: 6 × 12
   ...1  ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9 ...10 ...11 ...12
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     0     0     1     0     0     0     0     1     0     1     0     1
2     0     1     0     0     1     0     1     1     1     1     1     0
3     0     0     1     0     1     0     1     1     0     1     1     1
4     0     0     1     0     0     1     0     1     1     1     0     1
5     0     0     1     0     0     1     0     1     0     0     1     1
6     0     0     1     1     1     1     1     0     0     1     1     1

Before computing the solution, and there any bits with an equal number of 0s and 1s?

day3_df %>%
  summarise(across(everything(), mean)) %>%
  pivot_longer(everything(), names_to = "bit", values_to = "prop")

# A tibble: 12 × 2
   bit    prop
   <chr> <dbl>
 1 ...1  0.489
 2 ...2  0.509
 3 ...3  0.487
 4 ...4  0.504
 5 ...5  0.498
 6 ...6  0.498
 7 ...7  0.493
 8 ...8  0.52 
 9 ...9  0.493
10 ...10 0.534
11 ...11 0.486
12 ...12 0.493

No, none the bits have prop = 0.500. Compute the gamma and epsilon rates:

# There isn't a function available in base R to compute the mode of a vector,
#  so define one here that takes the most frequent by default (freq_rank = 1)
vector_mode <- function(x, freq_rank = 1) {
  # Frequency table
  table(x) %>%
    # Sort it by count
    sort(decreasing = TRUE) %>%
    # Get the labels of the counts
    names() %>%
    pluck(freq_rank)
}

day3_rates <- day3_df %>%
  pivot_longer(everything(), names_to = "bit", values_to = "value") %>%
  mutate(bit = as.integer(str_remove(bit, "..."))) %>%
  group_by(bit) %>%
  summarise(
    gamma = vector_mode(value, freq_rank = 1),
    epsilon = vector_mode(value, freq_rank = 2),
    .groups = "drop"
  ) %>%
  summarise(
    # Collapse the most/least frequent values into a single string
    across(c(gamma, epsilon), str_c, collapse = "")
  )
day3_rates

# A tibble: 1 × 2
  gamma        epsilon     
  <chr>        <chr>       
1 010100010100 101011101011

We now have the binary representations, which we convert using strtoi:

day3_rates %>%
  mutate(across(c(gamma, epsilon), strtoi, base = 2),
         prod = gamma * epsilon)

# A tibble: 1 × 3
  gamma epsilon    prod
  <int>   <int>   <int>
1  1300    2795 3633500

To put this into a pandas DataFrame, use list comprehension to split the strings into characters:

day3_df = pd.DataFrame([list(number) for number in r.day3]).astype('int32')
day3_df

     0   1   2   3   4   5   6   7   8   9   10  11
0     0   0   1   0   0   0   0   1   0   1   0   1
1     0   1   0   0   1   0   1   1   1   1   1   0
2     0   0   1   0   1   0   1   1   0   1   1   1
3     0   0   1   0   0   1   0   1   1   1   0   1
4     0   0   1   0   0   1   0   1   0   0   1   1
..   ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..
995   0   0   1   0   1   1   1   1   1   0   1   0
996   1   0   0   1   1   0   0   1   0   0   1   1
997   1   0   0   1   1   1   1   1   1   0   1   1
998   0   0   0   0   1   0   0   1   0   0   1   1
999   1   0   0   1   1   1   1   0   1   1   0   0

[1000 rows x 12 columns]

Then get gamma and epsilon rates:

gamma = day3_df.aggregate('mode')
# For epsilon rate, just swap the numbers
epsilon = gamma.replace([0, 1], [1, 0])

# Concatenate the bits into a single string
gamma = gamma.apply(lambda row: ''.join(row.values.astype(str)), axis = 1)[0]
epsilon = epsilon.apply(lambda row: ''.join(row.values.astype(str)), axis = 1)[0]
gamma; epsilon

'010100010100'
'101011101011'

Finally, use int() with base = 2 to convert to decimal:

int(gamma, 2) * int(epsilon, 2)

Part 2

Next, you should verify the life support rating, which can be determined by multiplying the oxygen generator rating by the CO2 scrubber rating.

Both the oxygen generator rating and the CO2 scrubber rating are values that can be found in your diagnostic report - finding them is the tricky part. Both values are located using a similar process that involves filtering out values until only one remains. Before searching for either rating value, start with the full list of binary numbers from your diagnostic report and consider just the first bit of those numbers. Then:

Keep only numbers selected by the bit criteria for the type of rating value for which you are searching. Discard numbers which do not match the bit criteria.

If you only have one number left, stop; this is the rating value for which you are searching.

Otherwise, repeat the process, considering the next bit to the right.

The bit criteria depends on which type of rating value you want to find:

To find oxygen generator rating, determine the most common value (0 or 1) in the current bit position, and keep only numbers with that bit in that position. If 0 and 1 are equally common, keep values with a 1 in the position being considered.

To find CO2 scrubber rating, determine the least common value (0 or 1) in the current bit position, and keep only numbers with that bit in that position. If 0 and 1 are equally common, keep values with a 0 in the position being considered.

Before doing anything, I need to alter my vector_mode function to deal with ties:

vector_mode_part2 <- function(x, freq_rank = 1) {
  freq_table <- table(x) %>%
    sort(decreasing = TRUE)
  
  # If there is a tie
  if (freq_table["0"] == freq_table["1"]) {
    # And we're looking for the most frequent (oxygen rating)
    if (freq_rank == 1) {
      # Then return 1
      return(1) 
    } else {
      # Otherwise return 0 (CO2 rating)
      return(0)
    }
  # Otherwise, return the value from the table as usual
  } else {
    freq_table %>%
      names() %>%
      pluck(freq_rank) %>%
      as.integer()
  }
}

This definitely isn’t the most efficient way to implement the bit criteria, but an easy solution is to just filter bit-by-bit.

oxygen_rating <- day3_df
for (bit in names(day3_df)) {
  # If 1 number (row) remains, we have found the single oxygen rating
  if (nrow(oxygen_rating) == 1) break
  
  most_freq <- vector_mode_part2(oxygen_rating[[bit]])
  oxygen_rating <- oxygen_rating %>%
    filter(!!sym(bit) == most_freq)
}
oxygen_rating

# A tibble: 1 × 12
   ...1  ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9 ...10 ...11 ...12
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     0     1     0     1     0     0     1     0     1     1     1     1

co2_rating <- day3_df
for (bit in names(day3_df)) {
  if (nrow(co2_rating) == 1) break
  
  least_freq <- vector_mode_part2(co2_rating[[bit]], 2)
  co2_rating <- co2_rating %>%
    filter(!!sym(bit) == least_freq)
}
co2_rating

# A tibble: 1 × 12
   ...1  ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9 ...10 ...11 ...12
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     1     1     0     1     0     1     1     0     0     1     0     1

Convert the binary representations and compute the product:

tibble(
  oxygen_rating = oxygen_rating %>% str_c(collapse = ""),
  co2_rating = co2_rating %>% str_c(collapse = "")
) %>%
  mutate(across(c(oxygen_rating, co2_rating), strtoi, base = 2),
         prod = oxygen_rating * co2_rating)

# A tibble: 1 × 3
  oxygen_rating co2_rating    prod
          <int>      <int>   <int>
1          1327       3429 4550283

It is simple enough to reproduce those loops in Python:

oxygen_rating = day3_df
for bit in day3_df:
  if len(oxygen_rating) == 1:
    break
  
  bit_value_counts = oxygen_rating[bit].value_counts()
  if bit_value_counts[1] >= bit_value_counts[0]:
    oxygen_rating = oxygen_rating[oxygen_rating[bit] == 1]
  else:
    oxygen_rating = oxygen_rating[oxygen_rating[bit] == 0]
    
co2_rating = day3_df
for bit in day3_df:
  if len(co2_rating) == 1:
    break
  
  bit_value_counts = co2_rating[bit].value_counts()
  # In cases where there are no 0s or no 1s, need to fill with 0 
  bit_value_counts = bit_value_counts.reindex([0, 1], fill_value = 0)
  
  if bit_value_counts[0] <= bit_value_counts[1]:
    co2_rating = co2_rating[co2_rating[bit] == 0]
  else:
    co2_rating = co2_rating[co2_rating[bit] == 1]
   
# In part 1, I used apply, here I'll use aggregate along axis = 1
co2_rating = co2_rating.astype(str).aggregate(''.join, axis = 1).values[0]
oxygen_rating = oxygen_rating.astype(str).aggregate(''.join, axis = 1).values[0]

int(co2_rating, 2) * int(oxygen_rating, 2)

Day 4: Giant Squid

Part 1

Bingo is played on a set of boards each consisting of a 5x5 grid of numbers. Numbers are chosen at random, and the chosen number is marked on all boards on which it appears. (Numbers may not appear on all boards.) If all numbers in any row or any column of a board are marked, that board wins. (Diagonals don’t count.)

The submarine has a bingo subsystem to help passengers (currently, you and the giant squid) pass the time. It automatically generates a random order in which to draw numbers and a random set of boards (your puzzle input).

The score of the winning board can now be calculated. Start by finding the sum of all unmarked numbers on that board; in this case, the sum is 188. Then, multiply that sum by the number that was just called when the board won, 24, to get the final score, 188 * 24 = 4512.

To guarantee victory against the giant squid, figure out which board will win first. What will your final score be if you choose that board?

Import the bingo input:

day4 <- read_lines("day04-input.txt")
print(str_trunc(head(day4, 8), 50))

[1] "94,21,58,16,4,1,44,6,17,48,20,92,55,36,40,63,62..."
[2] ""                                                  
[3] "49 74 83 34 40"                                    
[4] "87 16 57 75  3"                                    
[5] "68 94 77 78 89"                                    
[6] "56 38 29 26 60"                                    
[7] "41 42 45 19  1"                                    
[8] ""

The data needs to be split up into the called numbers (at the top) and the boards. To do this, I’ll use this trick with cumsum that I found on Stack Overflow:

day4_split <- split(
  day4[day4 != ""],
  cumsum(day4 == "")[day4 != ""]
)

called_numbers <- day4_split[[1]]
bingo_boards <- day4_split[2:length(day4_split)]
str_trunc(called_numbers, 50); bingo_boards[1]

[1] "94,21,58,16,4,1,44,6,17,48,20,92,55,36,40,63,62..."

$`1`
[1] "49 74 83 34 40" "87 16 57 75  3" "68 94 77 78 89" "56 38 29 26 60"
[5] "41 42 45 19  1"

Now I need to convert the called_numbers to a numeric vector, and the bingo_boards to numeric matrices:

called_numbers <- strsplit(called_numbers, ",")[[1]] %>% as.integer()

bingo_boards <- bingo_boards %>%
  map(
    ~ {
      .x %>%
        # str_squish replaces the double spaces before single digits numbers
        #  with single spaces, so that we can properly strsplit by " "
        str_squish() %>%
        str_split(" ") %>%
        map(as.integer) %>%
        unlist() %>%
        matrix(nrow = 5, byrow = TRUE)
    }
  )

head(called_numbers); bingo_boards[1]

[1] 94 21 58 16  4  1

$`1`
     [,1] [,2] [,3] [,4] [,5]
[1,]   49   74   83   34   40
[2,]   87   16   57   75    3
[3,]   68   94   77   78   89
[4,]   56   38   29   26   60
[5,]   41   42   45   19    1

Here is my iteration strategy for identifying and marking called numbers (not evaluated, just a demonstration with one board and one number):

bingo_board1 <- bingo_boards[[1]]
called_number1 <- 49 # suppose 49 was called
# Replace any 49s with -1
bingo_board1[bingo_board1 == called_number1] <- -1

# Look for and row or column sums that = -5 (all values = -1)
row_sums1 <- rowSums(bingo_board1)
col_sums1 <- colSums(bingo_board1)

# If we have bingo
if (-5 %in% c(row_sums1, col_sums1)) {
  # Compute the sum of the uncalled (non-negative) numbers
  uncalled_sum <- sum(bingo_board1[bingo_board1 > 0])
  # Return the product as the answer to the puzzle
  called_number1 * uncalled_sum
}

Now put it into a loop over all numbers and boards:

bingo_boards_part1 <- bingo_boards

for (called_number in called_numbers) {
  bingo_boards_part1 <- map(
    bingo_boards_part1,
    ~{
      .x[.x == called_number] <- -1
      .x
    }
  )
  
  # Find any winning boards
  bingo_board_winner <- map_lgl(
    bingo_boards_part1,
    ~{-5 %in% c(rowSums(.x), colSums(.x))}
  )
  
  if (sum(bingo_board_winner) > 0) {
    bingo_board_final <- bingo_boards_part1[bingo_board_winner] 
    break
  }
}
bingo_board_final; called_number

$`19`
     [,1] [,2] [,3] [,4] [,5]
[1,]   93   -1   26   35   39
[2,]   91   -1   85   69   -1
[3,]   -1   -1   27   57   10
[4,]   -1   -1   30   73   22
[5,]   -1   -1   -1   -1    9

[1] 7

Then the solution is:

sum(bingo_board_final[[1]][bingo_board_final[[1]] > 0]) * called_number

[1] 4662

For the Python solution, I’ll practice my list comprehension to compile the bingo boards:

called_numbers = [int(s) for s in r.day4[0].split(',')]

# Find the indices of the '' characters separating the bingo boards
bingo_boards_sep = [i for i,j in enumerate(r.day4) if j == '']
# Compile a list of bingo boards
bingo_boards = [r.day4[(i+1):(i+6)] for i in bingo_boards_sep]
# For each row of each board, split the string into multiple values
bingo_boards = [[board_row.split() for board_row in board] for board in bingo_boards]

That last line is a bit of a mess – it is a nested list comprehension loop which iterates over boards and then iterates over rows of each board to split the string into single values – but converting it all to numeric arrays is now simple:

bingo_boards = [np.array(board).astype(int) for board in bingo_boards]
bingo_boards[0]

array([[49, 74, 83, 34, 40],
       [87, 16, 57, 75,  3],
       [68, 94, 77, 78, 89],
       [56, 38, 29, 26, 60],
       [41, 42, 45, 19,  1]])

Now I can re-create the same loop from the R solution:

# In Python, you use deepcopy() to make copies of nested structures like this
import copy
bingo_boards_part1 = copy.deepcopy(bingo_boards)

for called_number in called_numbers:
  # For each board, mark the called numbers as -1
  for i,b in enumerate(bingo_boards_part1):
    bingo_boards_part1[i][bingo_boards_part1[i] == called_number] = -1
  
  # Find winning boards
  winners = [-5 in np.concatenate([board.sum(axis = 0), board.sum(axis = 1)]) \
             for board in bingo_boards_part1]
  
  if True in winners:
    bingo_board_final = bingo_boards_part1[winners.index(True)]
    break

bingo_board_final[bingo_board_final > 0].sum() * called_number

Part 2

On the other hand, it might be wise to try a different strategy: let the giant squid win.

You aren’t sure how many bingo boards a giant squid could play at once, so rather than waste time counting its arms, the safe thing to do is to figure out which board will win last and choose that one. That way, no matter which boards it picks, it will win for sure.

Figure out which board will win last. Once it wins, what would its final score be?

Simple enough to alter the loop to iteratively remove winning boards until one remains:

bingo_boards_part2 <- bingo_boards

for (called_number in called_numbers) {
  bingo_boards_part2 <- map(
    bingo_boards_part2,
    ~{
      .x[.x == called_number] <- -1
      .x
    }
  )
  
  # Find any winning boards
  bingo_board_winner <- map_lgl(
    bingo_boards_part2,
    ~{-5 %in% c(rowSums(.x), colSums(.x))}
  )
  
  # If more than one board remains, remove winners
  if (length(bingo_boards_part2) > 1) {
    bingo_boards_part2 <- bingo_boards_part2[!bingo_board_winner]
  } else {
    # Otherwise, continue until the last board wins
    if (sum(bingo_board_winner) > 0) {
      bingo_board_final <- bingo_boards_part2[bingo_board_winner] 
      break
    }
  }
}
bingo_board_final; called_number

$`14`
     [,1] [,2] [,3] [,4] [,5]
[1,]   -1   -1   -1   -1   -1
[2,]   -1   -1   -1   -1    0
[3,]   10   38   -1   -1   25
[4,]   -1   11   -1   -1   -1
[5,]   -1   -1   67   -1   -1

[1] 80

And the product:

sum(bingo_board_final[[1]][bingo_board_final[[1]] > 0]) * called_number

[1] 12080

Python:

bingo_boards_part2 = copy.deepcopy(bingo_boards)

for called_number in called_numbers:
  for i,b in enumerate(bingo_boards_part2):
    bingo_boards_part2[i][bingo_boards_part2[i] == called_number] = -1
  
  winners = [-5 in np.concatenate([board.sum(axis = 0), board.sum(axis = 1)]) \
             for board in bingo_boards_part2]
  
  # If more than one board remains, remove winners
  if len(bingo_boards_part2) > 1:
    bingo_boards_part2 = [b for i,b in \
                          enumerate(bingo_boards_part2) if not winners[i]]
  else:
    if True in winners:
      bingo_board_final = bingo_boards_part2[winners.index(True)]
      break

bingo_board_final[bingo_board_final > 0].sum() * called_number

Day 5: Hydrothermal Venture

Part 1

You come across a field of hydrothermal vents on the ocean floor! These vents constantly produce large, opaque clouds, so it would be best to avoid them if possible. They tend to form in lines; the submarine helpfully produces a list of nearby lines of vents (your puzzle input) for you to review.

Each line of vents is given as a line segment in the format x1,y1 -> x2,y2 where x1,y1 are the coordinates of one end the line segment and x2,y2 are the coordinates of the other end. These line segments include the points at both ends. For now, only consider horizontal and vertical lines: lines where either x1 = x2 or y1 = y2.

To avoid the most dangerous areas, you need to determine the number of points where at least two lines overlap. Consider only horizontal and vertical lines. At how many points do at least two lines overlap?

Import the lines:

day5 <- read_lines("day05-input.txt")
head(day5)

[1] "223,805 -> 223,548" "609,164 -> 609,503" "461,552 -> 796,552"
[4] "207,361 -> 207,34"  "503,879 -> 503,946" "937,52 -> 937,268"

I’ll use a series of separates to get the numeric coordinates

day5_df <- tibble(x = day5) %>%
  separate(x, into = c("x1_y1", "x2_y2"), sep = " -> ") %>%
  separate(x1_y1, into = c("x1", "y1"), sep = ",", convert = TRUE) %>%
  separate(x2_y2, into = c("x2", "y2"), sep = ",", convert = TRUE)
head(day5_df)

# A tibble: 6 × 4
     x1    y1    x2    y2
  <int> <int> <int> <int>
1   223   805   223   548
2   609   164   609   503
3   461   552   796   552
4   207   361   207    34
5   503   879   503   946
6   937    52   937   268

Get the straight lines by looking for x1 == x2 or y1 == y2, then use crossing to get all the points touched by each line:

# Only straight lines
day5_straight <- day5_df %>%
  filter((x1 == x2) | (y1 == y2)) %>%
  rowwise() %>%
  mutate(xy = list(crossing(x = x1:x2, y = y1:y2))) %>%
  ungroup()
# As an example, show the first few points crossed by the first line
day5_straight %>%
  slice(1) %>%
  unnest(xy) %>%
  head()

# A tibble: 6 × 6
     x1    y1    x2    y2     x     y
  <int> <int> <int> <int> <int> <int>
1   223   805   223   548   223   548
2   223   805   223   548   223   549
3   223   805   223   548   223   550
4   223   805   223   548   223   551
5   223   805   223   548   223   552
6   223   805   223   548   223   553

Now to find the dangerous points, just need to look for any combinations of x and y that occur more than once:

day5_straight %>%
  unnest(xy) %>%
  count(x, y) %>%
  summarise(dangerous_points = sum(n > 1))

# A tibble: 1 × 1
  dangerous_points
             <int>
1             7142

Create the same data frame in a pandas DataFrame:

day5_df = [[coord.split(',') for coord in line.split(' -> ')] \
            for line in r.day5]
# "Flatten" the lists so that each element has the four coordinates
day5_df = [xy[0] + xy[1] for xy in day5_df]

day5_df = pd.DataFrame(day5_df, columns = ['x1', 'y1', 'x2', 'y2']).astype(int)
day5_df.head()

    x1   y1   x2   y2
0  223  805  223  548
1  609  164  609  503
2  461  552  796  552
3  207  361  207   34
4  503  879  503  946

Find the straight lines with query:

day5_straight = day5_df.query('(x1 == x2) | (y1 == y2)')

I’m going to brute force a solution here with a for loop and a grid of values:

ocean_floor = np.zeros((1000, 1000))

for index, row in day5_straight.iterrows():
  # Need to fix the range() step if going "backwards"
  x_step = 1 if row.x1 <= row.x2 else -1
  y_step = 1 if row.y1 <= row.y2 else -1
  
  for x in range(row.x1, row.x2 + x_step, x_step):
    for y in range(row.y1, row.y2 + y_step, y_step):
      ocean_floor[x, y] += 1
      
np.count_nonzero(ocean_floor > 1)

Part 2

Unfortunately, considering only horizontal and vertical lines doesn’t give you the full picture; you need to also consider diagonal lines. Because of the limits of the hydrothermal vent mapping system, the lines in your list will only ever be horizontal, vertical, or a diagonal line at exactly 45 degrees. Consider all of the lines. At how many points do at least two lines overlap?

Find the diagonal line points:

day5_diag <- day5_df %>%
  filter(x1 != x2, y1 != y2) %>%
  rowwise() %>%
  mutate(x = list(x1:x2), y = list(y1:y2)) %>%
  ungroup()

Combine the straight and diagonal lines and add up the points:

bind_rows(
  day5_straight %>%  unnest(xy),
  day5_diag %>% unnest(c(x, y))
) %>%
  count(x, y) %>%
  summarise(dangerous_points = sum(n > 1))

# A tibble: 1 × 1
  dangerous_points
             <int>
1            20012

For the Python brute force solution, I can continue adding to the existing ocean_floor grid from part 1:

day5_diag = day5_df.query('(x1 != x2) & (y1 != y2)')

for index, row in day5_diag.iterrows():
  x_step = 1 if row.x1 <= row.x2 else -1
  y_step = 1 if row.y1 <= row.y2 else -1
  
  for x, y in zip(range(row.x1, row.x2 + x_step, x_step),
                  range(row.y1, row.y2 + y_step, y_step)):
    ocean_floor[x, y] += 1
      
np.count_nonzero(ocean_floor > 1)

Stats

Here are my personal stats so far:

tibble::tribble(
  ~Part, ~Day, ~Time, ~Rank, ~Score,
  1, 5, "00:50:24", 6542, 0,
  2, 5, "00:57:17", 4865, 0,
  1, 4, "10:52:11", 33771, 0,
  2, 4, "11:07:58", 30829, 0,
  1, 3, "10:43:14", 64952, 0,
  2, 3, "11:43:13", 45788, 0,
  1, 2, "11:48:16", 74444, 0,
  2, 2, "12:21:09", 72356, 0,
  1, 1, "13:02:23", 72332, 0,
  2, 1, "13:23:44", 63804, 0
) %>%
  pivot_wider(names_from = Part, values_from = c(Time, Rank, Score),
              names_glue = "Part {Part}_{.value}") %>%
  mutate(
    `Time between parts` = as.numeric(hms(`Part 2_Time`) - hms(`Part 1_Time`),
                                      "minutes") %>% round(1)
  ) %>%
  gt() %>%
  tab_spanner_delim(delim = "_", split = "first")

Day	Time		Rank		Score		Time between parts
Day	Part 1	Part 2	Part 1	Part 2	Part 1	Part 2	Time between parts
5	00:50:24	00:57:17	6542	4865	0	0	6.9
4	10:52:11	11:07:58	33771	30829	0	0	15.8
3	10:43:14	11:43:13	64952	45788	0	0	60.0
2	11:48:16	12:21:09	74444	72356	0	0	32.9
1	13:02:23	13:23:44	72332	63804	0	0	21.4

Except for day 5 (when I stayed up late because it was the weekend), I’ve been completing the puzzles around lunch time on my break from work.

These 0 scores come from the global leaderboard, which only gives points to the first 100 users to finish, which I definitely won’t be doing. A better benchmark is the private leaderboard:

library(httr)

leaderboard <- httr::GET(
  url = "https://adventofcode.com/2021/leaderboard/private/view/1032765.json",
  httr::set_cookies(session = Sys.getenv("AOC_COOKIE"))
) %>%
  content() %>%
  as_tibble() %>%
  unnest_wider(members) %>%
  arrange(desc(local_score)) %>%
  transmute(
    Rank = 1:n(), Name = name, Score = local_score, Stars = stars
  )

leaderboard %>%
  gt() %>%
  text_transform(
    locations = cells_body(columns = Stars),
    fn = function(stars_col) {
      map_chr(stars_col,
              ~ html(rep(fontawesome::fa('star', fill = 'gold'),
                         times = as.integer(.x))))
    }
  ) %>%
  cols_align("left") %>%
  tab_style(
    style = list(cell_text(weight = "bold")),
    locations = cells_body(
      rows = (Name == "taylordunn")
    )
  ) %>%
  tab_options(container.height = 500)

Rank	Name	Score
1	Emil Hvitfeldt	1191
2	Colin Rundel	1183
3	trang1618	1170
4	@_TanHo	1154
5	mkiang	1126
6	gpecci	1112
7	David Robinson	1112
8	Jarosław Nirski	1110
9	dhimmel	1105
10	hrushikeshrv	1032
11	pritikadasgupta	1028
12	Ildikó Czeller	1026
13	Sherry Zhang	1023
14	john-b-edwards	1014
15	@_mnar99	1012
16	Melinda Tang	984
17	Josh Gray	952
18	Jonathan Spring	952
19	Derek Holliday	946
20	Jacqueline Nolis	905
21	Anna Fergusson	901
22	Andrew Argeros	894
23	Zach Bogart 💙	891
24	exunckly	876
25	Riinu Pius	869
26	HannesOberreiter	863
27	Tokhir Dadaev	856
28	ashbaldry	835
29	cramosu	831
30	Tom Jemmett	795
31	fabio machado	791
32	@Mid1995Sed	766
33	karawoo	761
34	Flavien Petit	756
35	Farhan Reynaldo	751
36	patelis	746
37	Arun Chavan	746
38	@woodspock	740
39	delabj	737
40	MetaMoraleMundo	715
41	Calum You	712
42	Daniel Coulton	708
43	mbjoseph	684
44	Alex N	681
45	KT421	680
46	jordi figueras puig	680
47	pi55p00r	671
48	scalgary	659
49	Erez Shomron	654
50	Jim Leach	649
51	rywhale	641
52	Matt Onimus	635
53	collinberke	632
54	Ghislain Nono Gueye	628
55	ldnam	612
56	taylordunn	606
57	jaapwalhout	603
58	Jeffrey Brabec	589
59	dirkschumacher	574
60	AlbertRapp	561
61	Nathan Moore	556
62	mfiorina	553
63	Miha Gazvoda	553
64	Nerwosolek	545
65	TylerGrantSmith	542
66	Darrin Speegle	540
67	duju211	525
68	blongworth	514
69	Sydney	510
70	Kelly N. Bodwin	508
71	David Schoch	503
72	long39ng	489
73	CarlssonLeo	486
74	A-Farina	468
75	cathblatter	463
76	Scott-Gee	445
77	Julian Tagell	434
78	Josiah Parry	419
79	thedivtagguy	400
80	jwinget	397
81	andrew-tungate-cms	385
82	@mfarkhann	382
83	@Maatspencer	372
84	@KentWeyrauch	368
85	Andrew Tungate	365
86	Emryn Hofmann	359
87	columbaspexit	346
88	ALBERT	343
89	Maya Gans	338
90	Alan Feder	314
91	Jenna Jordan	306
92	Kevin Kent	305
93	olmgeorg	296
94	Wendy Christensen	291
95	Eric Ekholm	287
96	Daniel Gemara	276
97	AmitLevinson	268
98	quickcoffee	258
99	cynthiahqy	235
100	Andrew Fraser	226
101	jennifer-furman	223
102	soto solo	222
103	antdurrant	211
104	Adrian Perez	196
105	Billy Fryer	186
106	April	181
107	Lukas Gröninger	156
108	Jose Pliego San Martin	113
109	aleighbrown	106
110	Kyle Ligon	105
111	Bruno Mioto	85
112	Duncan Gates	68
113	@jdknguyen	30
114	Matthew Wankiewicz	17
115	chapmandu2	0
116	NA	0
117	Wiktor Jacaszek	0
118	jacquietran	0
119	Tony ElHabr	0
120	Rizky Luthfianto	0
121	CaioBrighenti	0

Currently at rank 56, so about middle of the pack.

Reproducibility

Session info

 setting  value
 version  R version 4.2.1 (2022-06-23 ucrt)
 os       Windows 10 x64 (build 19044)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Canada.utf8
 ctype    English_Canada.utf8
 tz       America/Curacao
 date     2022-10-27
 pandoc   2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)

ABCDEFGHIJ0123456789

	package <chr>	ondiskversion <chr>	loadedversion <chr>
assertthat	assertthat	0.2.1	0.2.1
backports	backports	1.4.1	1.4.1
bit	bit	4.0.4	4.0.4
bit64	bit64	4.0.5	4.0.5
broom	broom	1.0.0	1.0.0
cachem	cachem	1.0.6	1.0.6
callr	callr	3.7.1	3.7.1
cellranger	cellranger	1.1.0	1.1.0
checkmate	checkmate	2.1.0	2.1.0
cli	cli	3.3.0	3.4.1

python:         C:/Users/tdunn/Documents/.virtualenvs/r-reticulate/Scripts/python.exe
libpython:      C:/Users/tdunn/AppData/Local/r-reticulate/r-reticulate/pyenv/pyenv-win/versions/3.9.13/python39.dll
pythonhome:     C:/Users/tdunn/Documents/.virtualenvs/r-reticulate
version:        3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)]
Architecture:   64bit
numpy:          C:/Users/tdunn/Documents/.virtualenvs/r-reticulate/Lib/site-packages/numpy
numpy_version:  1.23.3

NOTE: Python version was forced by use_python function

ABCDEFGHIJ0123456789

package <chr>	version <chr>	requirement <chr>
contourpy	1.0.5	contourpy==1.0.5
cycler	0.11.0	cycler==0.11.0
fonttools	4.37.2	fonttools==4.37.2
kiwisolver	1.4.4	kiwisolver==1.4.4
matplotlib	3.6.0	matplotlib==3.6.0
numpy	1.23.3	numpy==1.23.3
packaging	21.3	packaging==21.3
pandas	1.4.4	pandas==1.4.4
Pillow	9.2.0	Pillow==9.2.0
pyparsing	3.0.9	pyparsing==3.0.9

Git repository

Local:    main C:/Users/tdunn/Documents/tdunn-quarto
Remote:   main @ origin (https://github.com/taylordunn/tdunn-quarto.git)
Head:     [4eb5bf2] 2022-10-26: Added font import to style sheet

Source code, R environment

Reuse

https://creativecommons.org/licenses/by/4.0/

Citation

BibTeX citation:

@online{dunn2021,
  author = {Dunn, Taylor},
  title = {Advent of {Code} 2021: {Days} 1-5},
  date = {2021-12-01},
  url = {https://tdunn.ca/posts/2021-12-01-advent-of-code-2021-days-1-5},
  langid = {en}
}

For attribution, please cite this work as:

Dunn, Taylor. 2021. “Advent of Code 2021: Days 1-5.” December 1, 2021. https://tdunn.ca/posts/2021-12-01-advent-of-code-2021-days-1-5.