This post builds on my last post, which covered how to develop your own custom NixOS test for custom packages and services.
If you want to contribute or work on upstream nixpkgs NixOS module tests, here’s a few pointers I picked up while experimenting with a flake based trial and error workflow.
Run the test with: nix run '.#nixosTests.$TEST_NAME.driver.
For example, to run the nginx NixOS tests via flakes, run:
$ nix run '.#nixosTests.nginx.driver'
You can also run the tests with an interactive shell via:
$ nix run '.#nixosTests.nginx.driverInteractive'
Nix package derivations often will link a related test suite via a passthru.tests attribute, so you can execute affected tests when
you update or change a package. For example, the gotosocialpackage derivation links the tests like this:
So another way to run the tests is via the linked attribute in the package derivation like so:
$ nix run '.#gotosocial.tests.gotosocial.driver'
Just like in my previous post, a QEMU VM will boot and execute the test suite and print its results to your terminal window.
I’ve been using Nix / NixOS for a couple years now, but have never really bothered to learn the NixOS testing framework. After playing with it a bit, I really like it!
It’s like all the best properties of classic Vagrant and Chef Test Kitchen, but with the power and reproducibility of Nix.
Andreas Fuchs’ blogged about how NixOS tests actually helped them catch a regression in their code, so I felt inspired to give it a try.
Since I use nix flakes for all my personal development and machine setup, and the Nix documentation doesn’t go into detail about
how to use the NixOS test harness with flakes, I wrote up my initial investigation here.
This setup will allow us to:
Build a custom package with nix flakes.
Run that package as a NixOS systemd service via a NixOS module.
Boot a QEMU machine and execute automated tests against the package / module configuration.
Because it’s Nix, the entire process of building the package, assembling it into a systemd service, booting a QEMU VM, and executing tests all happens
with a single command with a declarative build definition.
First, setup a bare-bones flake with a custom package. For this example, we’ll just make a simple netcat echo server that listens on port 3000.
flake.nix
Once you save your flake.nix file, you should be able to build the package from within the directory:
This just runs a simple nc server that listens on port 3000.
Next, let’s create a NixOS module that will make a systemd service for our little nc server. This is just a bog-standard
NixOS service module that we’re going to eventually write a test for.
hello-module.nix
If you’ve written NixOS modules before, this should look completely familiar to you.
To make this module work, we have to do a few things:
Create a pkgs overlay that includes our new pkgs.helloNixosTests package.
Create a top-level nixosModules entry in our flake.nix file for the module.
Let’s do that now, and extend our flake file some more.
flake.nix
Notice we extended pkgs with our overlay definition, which will extend the base nixpkgs with our custom package. We also
created a nixosModules.helloNixosModule attribute that imports the module we just wrote.
We’ve made a package, and a module, now let’s write the NixOS test to make sure they both work!
hello-boots.nix
This file is a normal nix derivation like any other one, but builds a complete QEMU based test harness to test our package and module.
A few things to notice:
The top-level nodes attribute list allows us to define complete node / machine definitions. In this example, we just have one machine called machine.
We configure that machine to run our module. The import block is critical to bring our service module definition into scope to run on the machine.
The testScript key is where we run our tests. The NixOS test framework provides a python library to exercise the environment. You can see a list of available commands on the NixOS wiki.
Now, let’s hook in the test to our top-level flake.nix file, so we can run the test. This is the full, finished flake.nix file:
flake.nix
We added a checks output attribute to the flake, which accept any normal nix derivation. After hooking it into our flake, we
should be able to run:
$ nix flake check
This will build and run our test script inside a complete QEMU VM, and if all goes well, our scirpt will return success. If things don’t
go as expected, you can boot an interactive Python shell to debug the tests like so:
[blake@blake-framework:~/code/nixos-test]$ nix run '.#checks.x86_64-linux.helloNixosTest.driverInteractive'
Machine state will be reset. To keep it, pass --keep-vm-state
start all VLans
start vlan
running vlan (pid 267780; ctl /tmp/vde1.ctl)
(finished: start all VLans, in 0.00 seconds)
additionally exposed symbols:
machine,
vlan1,
start_all, test_script, machines, vlans, driver, log, os, create_machine, subtest, run_tests, join_all, retry, serial_stdout_off, serial_stdout_on, polling_condition, Machine
>>>
You can run the tests interactively. Boot all the test machines, and run the tests:
>>> start_all()
>>> run_tests()
Test will time out and terminate in 3600 seconds
run the VM test script
additionally exposed symbols:
machine,
vlan1,
start_all, test_script, machines, vlans, driver, log, os, create_machine, subtest, run_tests, join_all, retry, serial_stdout_off, serial_stdout_on, polling_condition, Machine
machine: waiting for unit helloNixosTests.service
machine: waiting for the VM to finish booting
machine: Guest shell says: b'Spawning backdoor root shell...\n'
machine: connected to guest root shell
machine: (connecting took 0.00 seconds)
(finished: waiting for the VM to finish booting, in 0.00 seconds)
(finished: waiting for unit helloNixosTests.service, in 0.04 seconds)
machine: waiting for TCP port 3000 on localhost
machine # Connection to localhost (127.0.0.1) 3000 port [tcp/hbci] succeeded!
machine # [ 30.490067] systemd[1]: helloNixosTests.service: Deactivated successfully.
machine # [ 30.491654] systemd[1]: helloNixosTests.service: Consumed 3ms CPU time, no IO, received 216B IP traffic, sent 112B IP traffic.
(finished: waiting for TCP port 3000 on localhost, in 0.02 seconds)
(finished: run the VM test script, in 0.06 seconds)
If you are setting up an ActivityPub (Mastodon or GotoSocial) server, and want your domain to be at the “Apex” of your domain, and your domain is already hosting an existing website, you’ll need to do some additional setup. GotoSocial calls this a “Split Domain deployment”.
In my case, I wanted my ActivityPub user to be @blake@blakesmith.me, but my website is already hosted at blakesmith.me, meaning my ActivityPub server (GotoSocial in this case) needs a little extra configuration.
The setup is:
Host domain: social.blakesmith.me
Account domain: blakesmith.me
Here’s my NixOS module that configures GotoSocial (a minimal ActivityPub server, great for small and single user instances) with the split domain on my server. Notice the host and account-domain configuration:
We put nginx in front of GotoSocial, and setup a LetsEncrypt TLS cert as well (required for ActivityPub). This should all work great once DNS for social.blakesmith.me is pointed at the NixOS server. The GotoSocial daemon is reachable at social.blakesmith.me, but is configured to have usernames with blakesmith.me.
My website, blakesmith.me is hosted via an S3 bucket, with Cloudfront in front of it. We need to configure the S3 bucket for blakesmith.me to redirect all requests to the /.well-known route prefix to social.blakesmith.me. Here’s the terraform necessary:
The routing_rules is the most important part: Any route to the bucket that is prefixed with .well-known will be redirected to social.blakesmith.me. ActivityPub / Mastodon uses the WebFinger protocol as a way to resolve profile and server information for a given Mastodon handle.
It’s working if you get a valid redirect to your host domain server! If we follow redirects, we’ll hit the ActivityPub server hosted at social.blakesmith.me:
If all is well, other users should be able to follow you using the account domain as your username (In my case: @blake@blakesmith.me).
One pitfall that tripped me up for awhile: if you have Cloudfront in front of your S3 bucket, like I do, you have to configure your Cloudfront origin using a custom origin and the S3 website endpoint, NOT the bucket endpoint. Otherwise, you’ll get “The specified key does not exist” errors when trying to access the webfinger resource at your account domain.
The relevant terraform setup for the Cloudfront distribution looks like this (notice how the bucket origin is configured with a custom_origin_config):