mebeim / aoc

🎄 My solutions and walkthroughs for Advent of Code and more related stuff.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Speedup of day 16

marhoy opened this issue · comments

Thanks for a nice explanation of the reverse cumsum approach for part 2!

You complain about the speed, but it can if fact be speeded up even further:
Since we know we are only going to look at a small part of the whole list, we don't need to copy the input 10_000 times. With my input, I only had to copy it 806 times.
This function solves my input in 7s with standard Python:

def part2(input_string, phases=100):

    # Find the start index and convert input to list of ints
    start_index = int(input_string[:7])
    numbers = list(map(int, input_string))

    # Make sure that start index is indeed in the second half,
    # otherwise the trick won't work
    assert start_index > len(input_string)*10_000 / 2
    
    # We are only going to compute the numbers from just before start_index
    # to the end (in reverse). So we don't need to copy the input 10_000 times.
    n_repeats = (len(input_string)*10_000 - start_index ) // len(input_string) + 1
    
    # Compute new start index for this shorter list:
    start_index -= (10_000 - n_repeats)*len(input_string)
    
    numbers = numbers*n_repeats
    for _ in range(phases):
        cumsum = 0
        for i in range(len(numbers) - 1, start_index - 1, -1):
            cumsum += numbers[i]
            numbers[i] = cumsum % 10
            
    return "".join(map(str, numbers[start_index:start_index + 8]))

Hey, thank you very much for reading my walkthrough and for the time you took to open an Issue here. I already replied to your Reddit comment, but I'm answering here too.

The behavior you're observing is actually a false positive. Even though working with a smaller list is better, it does not give a noticeable performance boost. The speedup of your second part over mine only comes from the fact that you enclosed it inside a function. If I enclose my second part in a function, the speed is the same as yours. Multiplying the list 10k times is not a big deal.

This happens because inside functions the LOAD_FAST python opcode is used, which is much faster than LOAD_GLOBAL, used for global variables and therefore all over the place in the main body of the script.

With this said, thanks for reminding me of this, I added the explanation in my walkthrough 👍

Fixed in b32c06a moving day 16 part 2 code inside a function.