viralinstruction

Advent of Code 2021, day 1

Written 2021-12-01

This year's Advent of Code has begun, meaning that ♪'tis the season to be coding♪. AoC is a great opportunity to learn a new programming language or new set of programming tools. For example, you could use this year's AoC to learn... oh, I don't know, Julia?

Great choice! You've come to the right place. In this post, I will be showing you how I've solved the day 1 puzzle in Julia.

A fair warning though: The day 1 puzzle is not very challenging. But because this post is really about how best to get started coding a project using Julia, with unit tests and benchmarks and the whole shebang, the solution will be wildly over-engineered. Don't let the takeaway be that causal programming in Julia requires all this setup.

You can find the code here (click).

Setting up your project

You'll want to install Julia first (of course), and some kind of editor. I recommend VS Code, because it currently has the best Julia IDE, but editors can be an, uh, touchy subject, so you just pick your favourite. If you choose VS Code you'll also want to install the julia extension - just get the extension called "julia".

To solve the 24 Advent of Code puzzles, We could just make a series of scripts, but let's do it the right way and instead create a new Julia project for this year's AoC. We can then have one source file per day. To do this, launch Julia, then type ] to enter Pkg mode, then type:

(@v1.7) pkg> generate AoC2021
  Generating  project AoC2021:
    AoC2021/Project.toml
    AoC2021/src/AoC2021.jl

This is the part where some people grimace in disgust: I'm using the REPL to initialize a project from within Julia? Why not simply do it from the shell?

Right, about that... When developing Julia, it is possible to run all your commands from the shell, but I strongly recommend you don't do that, and work from the REPL instead. For some people, this may be a jarring shift in your normal development workflow, I know. But when in Rome, do as the Romans - Julia's development experience really does work better when you interact with it from the REPL. If it makes it easier, you can use the Julia REPL in the VSCode terminal.

With your open REPL, navigate to the AoC2021 directory you just created (type ; to enter shell mode, navigate to the path and hit backspace to exit to REPL mode), enter package mode and execute activate . (note the trailing dot for "this directory"!) to have the package manager activate the project. If it works, you should see the prompt look like

(AoC2021) pkg>

You can then use backspace to get back to the julia> REPL mode. Okay, so currently our directory looks like:

$ exa -T
.
├── Project.toml
└── src
   └── AoC2021.jl

Let's add a data directory and add today's input file in that. We also add a new day01.jl file in the src directory for today's code, and make the toplevel project file AoC2021.jl file include the new file. The directory structure is now:

$ exa -T
.
├── data
│  └── day01.txt
├── Project.toml
└── src
   ├── AoC2021.jl
   └── day01.jl

And the main file AoC2021.jl contains:

module AoC2021

include("day01.jl")

end # module

Let's also make the directory a git repository - that's always a good idea when developing:

$ git init

and add the following to .gitignore:

/data

We don't want to waste kilobytes by tracking the data directory (and in any case, different people get different challenge inputs). Normally, when developing Julia packages, we would also ignore the Manifest.toml file which we will soon create with the Julia package manager. See where the Project.toml file is used the specify dependencies and compatibility bounds for a package, Manifest.toml contains the resolved dependency graph of the project. If you are familiar with Rust, Project.toml and Manifest.toml correspond to Cargo.toml and Cargo.lock, respectively. The Manifest.toml file can be used if the user needs to reconstruct an environment precisely, for example when doing reproducible science. When developing packages, you usually don't want to completely specify an environment. Instead you want your package to be usable in a broad range of environments. And so, normally, the manifest is ignored with version control.

However, in our case, an Advent of Code challenge is one of those cases where you might want to have the full environment, so here, we keep the manifest.

Now, we can begin to look at today's challenge. The structure of the file day01.jl will be like this:

module day01

function solve(io::IO)
    # code goes here...
end

end # module

Such that the function can accept any IO-like argument - this could be a string, or a file-like object. To create more generic code, I could also just have left off the type annotation ::IO altogether, but we really do only want to call it on IO inputs, and the type annotation can serve as self-documenting code.

If you are using VSCode with the julia extension, it will automatically load a bunch of Julia packages related to development (incidentally, this is why the editor is somewhat slow to start). For example, it loads Revise.jl, a near-obligatory package which tracks changes to your code and reloads any changed code automatically. If you are not using VSCode, I highly highly recommend installing Revise. You don't want to install Revise into your current project, making it a dependency. Instead, enter your default environment, install it there, and then return to the AoC2021 environment.

(AoC2021) pkg> activate
  Activating environment at `~/.julia/environments/v1.7/Project.toml`
  
(@v1.7) pkg> add Revise
 [ OUTPUT ELIDED ]

(@v1.7) pkg> activate .
  Activating project at `~/code/AoC2021`

(AoC2021) pkg>

Then, to start developing, open the REPL and type

julia> using Revise

julia> using AoC2021

The code itself

So, for part 1, I have a list of numbers that looks like this:

199
200
208
210
200
207
240
269
260
263

And I need to count how many numbers are larger than the previous number - in this case 7. Let's have this list of numbers in the source code for day 1, then we can also use it as a test case:

const TEST_STRING = """199
200
208
210
200
207
240
269
260
263"""

To solve any problem, it's a good idea to break it down. First, I need to parse it from a text file into a list of numbers. To find suitable functions in Base Julia, you can search the online documentation (which also exists locally in your Julia install), or you can use the function apropos in the REPL to search through docstrings.

In this case, I need parse and eachline:

function solve(io::IO)
    v = [parse(Int, line) for line in eachline(io)]
end

When I have some approach I think may work, I test it out in the REPL:

julia> AoC2021.day01.solve(IOBuffer(AoC2021.day01.TEST_STRING))
10-element Vector{Int64}:
 199
 200
 208
 210
 200
 207
 240
 269
 260
 263

That looks right.

Okay, now to answer the question by comparing consecutive numbers. What immediately springs to mind is the old Python idiom for getting consecutive pairs in a list:

>>> zip(v, v[1:])

We can translate that to Julia, and use sum with a generator expression to get the result, just like I would in Python:

function solve(io::IO)
    v = [parse(Int, line) for line in eachline(io)]
    sum(next > prev for (prev, next) in zip(v, v[2:end]))
end

Testing it in the REPL returns 7, which is correct. I can now test it for the input:

julia> open(AoC2021.day01.solve, "data/day01.txt")
1390

Which happens to be the correct answer for me!

Part 2

As always there is a twist: The puzzle now asks to count the number of sliding windows of size 3 for which the sum is larger than the previous sliding window!

Instead of messing around with the actual windows, I notice that when the first window (containing the numbers (v[1], v[2], v[3])) slides a slot to the window at indices 2, 3, and 4, the sum decreases by v[1] which is no longer part of the window, and increases by v[4]. So, the sum is larger iff v[4] > v[1]. In other words, part 2 can be viewed just like part 1, except that we consider pairs of numbers with a distance of 4 instead of 1.

So the code just needs a slight modificiation from part 1:

function solve(io::IO)
    v = [parse(Int, line) for line in eachline(io)]
    part1 = sum(next > prev for (prev, next) in zip(v, v[2:end]))
    part2 = sum(next > prev for (prev, next) in zip(v, v[4:end]))
    (part1, part2)
end

And it works! :)

Adding tests

Adding test-specific dependencies

Of course, no project is complete without tests. We could use Julia's built-in testing package, but that is quite bare-bones. Instead, let us use ReTest, which allows us to write tests next to the source code the tests work on (and much more).

We want ReTest as a dependency when testing, but for just running the code we don't need all the functionality of ReTest. We just need a package that allows tests to be written in-line. The smaller, lightweight package InlineTest will allow us to do that. So, we need test-specific dependencies: ReTest when testing, and InlineTest when running. How to do that?

First, add both packages as dependencies using the package manager:

(AoC2021) pkg> add ReTest@0.3, InlineTest@0.2

I specify the versions explicitly to make sure the code in this post will remain functional in the future. Feel free to install the latest version.

Now, manually modify the Project.toml to specify test-specific dependencies. Currently, your file looks something like this:

name = "AoC2021"
uuid = "10e8a67a-2997-442f-ba23-999a08daf998"
authors = ["Your Name <youremail@domain.com>"]
version = "0.1.0"

[deps]
InlineTest = "bd334432-b1e7-49c7-a2dc-dd9149e4ebd6"
ReTest = "e0db7c4e-2690-44b9-bad6-7687da720f89"

Change it so it looks like this:

name = "AoC2021"
uuid = "10e8a67a-2997-442f-ba23-999a08daf998"
authors = ["Your Name <youremail@domain.com>"]
version = "0.1.0"

[deps]
InlineTest = "bd334432-b1e7-49c7-a2dc-dd9149e4ebd6"

[compat]
julia = "1.6"
InlineTest = "0.2"
ReTest = "0.3"

[extras]
ReTest = "e0db7c4e-2690-44b9-bad6-7687da720f89"

[targets]
test = ["ReTest"]

The notable differences are:

Unfortunately, as of Julia 1.7, you have to specify these compat bounds manually. It would be nicer if the Julia package manager automatically added SemVer-compatible compat bounds. Maybe someday, it will!

Now to actually write the test code

We import the package in the top-level file AoC2021.jl:

module AoC2021

using InlineTest

include("day01.jl")

end # module

and also import it at the top of the day01 file using a relative import (so it imports it from the top-level module):

module day01

using ..InlineTest

[ rest of file elided ]

We can now write a test set using the @testset macro, that makes sure our code works for the test data at least:

@testset "day01" begin
    @test solve(IOBuffer(TEST_STRING)) == (7, 5)
end

To test it, must have imported our module AoC2021. We also need to import ReTest (install it in the default environment like you did Revise.jl, not in the current environment!) and run AoC2021.runtests():

julia> using ReTest

julia> AoC2021.runtests()
                   Pass  
AoC2021.day01:
  day01        |      1

Adding static analysis

After having tested, we can be fairly sure it works as we expect. However, writing a comprehensive test suite that covers all the edge cases is difficult. To be a little more confident our program is well-behaved, we can analyze the behaviour statically. Of course for projects as tiny as Advent of Code day 1, there really isn't much point to static analysis, but let me show you anyway:

For this, I use the package JET.jl. Just like Revise, better not install this into the AoC2021 project, but instead into the default environment.

An interesting wrinkle when doing static analysis of Julia is that the program's behaviour is essentially un-analyzable until it is actually compiled, and it is not compiled until we run it - or at least give concrete input types to its functions. Therefore, unlike for static languages, it is mostly meaningless to try to analyze the source code of our file - instead, we have to analyze specific uses of the code.

Here, for example, I analyze on the type-level how the main function behaves when called with an IOBuffer:

julia> using JET

julia> @report_call AoC2021.day01.solve(IOBuffer())
No errors !
Union{Nothing, Int64}

No errors! Whew!

If you're using VSCode, JET.jl integrates with the editor and displays any errors in the problems pane (or in this case, does nothing as there are no errors.)

It would be possible to add JET.jl static analysis to the test suite. However, with the current state of static analysis in Julia, this is not advisable. Unlike static languages, the presence of a potential error - for example an unresolvable function call - may not be an actual problem in Julia code. For example, suppose we had a few dependencies, and the dependencies got updated. It would be completely fair game for these dependencies to now include code that is not statically inferrable, but which behaves correctly when running. It is therefore best to not include static analysis in automatic tests. Static analysis may be improved in future versions of Julia, but for now, we'll leave it as it is.

Benchmarking

Wise programmers say that, most of the time, slow code is fast enough, and developers should optimize their code for maintainability instead of speed. That may be true, but that has never stopped me from over engineering otherwise uncomplicated software until it goes brrrrrr.

For benchmarking, we'll use the tested and tried BenchmarkTools package. Again, this shouldn't be a dependency of our actual project, so let's install it in our home environment:

(AoC2021) pkg> activate
  Activating environment at `~/.julia/environments/v1.7/Project.toml`

(@v1.7) pkg> add BenchmarkTools
  [ output elided ]

(@v1.7) pkg> activate .
  Activating environment at `~/code/AoC2021/Project.toml`

Now we can benchmark a function call like so:

julia> @benchmark open(AoC2021.day01.solve, "data/day01.txt")
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  264.768 μs …   3.268 ms  ┊ GC (min … max): 0.00% … 83.96%
 Time  (median):     283.136 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   296.689 μs ± 174.874 μs  ┊ GC (mean ± σ):  3.38% ±  5.22%

         ▁▅█▄
  ▂▂▃▃▅▆▇████▇▆▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▁▁▂▁▁▂▁▂▁▂▂▂▂▂▂▂▂▂▂ ▃
  265 μs           Histogram: frequency by time          384 μs <

 Memory estimate: 179.92 KiB, allocs estimate: 4021.

(or get just the minimum time using @btime, where the output is less unwieldy). Those times include reading and parsing the input file. Not too unreasonable!

Let's see where we can improve. The first thing that strikes me is that we allocate new vectors when we index into them using ranges. Let's change that to views.

Second, the parse function tries to autodetect the base, such that parsing of e.g. base 16 text works automatically. That's nice, but the check is a little slow, so let's specify the base is 10. This is also more correct, so that's nice.

Another issue is fold-based iterators like sum. Their behaviour is undefined for empty collections (what's the sum of nothing?), so they throw an error in Julia. The potential to throw is both an edge case I ought to cover, and slows the code down a little. This can be prevented by setting the init keyword, giving it an initial value to start from.

Lastly, for some bizzare reason, iterating a view is inefficient in Julia compared to indexing. That really really ought to be fixed, but for now, let's change the zip iterator to a solution that uses indexing:

Another issue is fold-based iterators like sum. Their behaviour is undefined for empty collections (what's the sum of nothing?), so they throw an error in Julia. This can be prevented by setting the init keyword, giving it an initial value to start from:

function solve(io::IO)
    zipdiff(a, b) = sum((@inbounds b[i] > a[i] for i in eachindex(b)), init=0)
    v = [parse(Int, line, base=10) for line in eachline(io)]
    return (zipdiff(v, @view v[2:end]), zipdiff(v, @view v[4:end]))
end

And then

julia> @btime open(AoC2021.day01.solve, "data/day01.txt");
  249.112 μs (4021 allocations: 179.92 KiB)

That's, uh... basically no difference. A bit more benchmarking confirms that the gain is insignificant compared to the total runtime. Unfortunately, IO is the bane of Julia's speed, since IO operations are much less optimized than other types of operations in Julia. For example, every line in the eachline iterator needlessly allocates a string instead of mutating the same string or returning an AbstractString view into a buffer. And there is little to no good abstractions for tweaking file buffering in Julia.

I factor out the parsing from the core algorithm to more accurately asses the optimizations:

parse(io::IO) = [Base.parse(Int, line, base=10) for line in eachline(io)]
solve(io::IO) = solve(parse(io))

function solve(v::AbstractVector{<:Integer})
    zipdiff(a, b) = sum((@inbounds b[i] > a[i] for i in eachindex(b)), init=0)
    return (zipdiff(v, @view v[2:end]), zipdiff(v, @view v[4:end]))
end

Benchmarking only the solve(::AbstractVector) method by itself reveals that the optimization changed the minimum run time from 4.68 μs to 409 ns. A great improvement on the numerical part, but insignificant compared to the parse cost. In any case, I leave in the optimizations.

The easy way

Setting up the whole environment and tests and such was a lot of work. Does it really have to be that complicated to code in Julia?

No! A more typical Julia solution for day one could simply be a script with:

function main(path)
    v = [parse(Int, line) for line in eachline(path)]
    println(sum(next > prev for (prev, next) in zip(v, v[2:end])))
    println(sum(next > prev for (prev, next) in zip(v, v[4:end])))
end

main(ARGS[1])

Which runs in less than 1 second including startup time.