Blake Smith

create. code. learn.

»

How to Run NixOS Tests (Flake Edition)

This post builds on my last post, which covered how to develop your own custom NixOS test for custom packages and services.

If you want to contribute or work on upstream nixpkgs NixOS module tests, here’s a few pointers I picked up while experimenting with a flake based trial and error workflow.

To run an existing test:

  1. Find a test in nixpkgs/nixos/tests/all-tests.nix.
  2. cd to your local nixpkgs git checkout.
  3. Run the test with: nix run '.#nixosTests.$TEST_NAME.driver.

For example, to run the nginx NixOS tests via flakes, run:

$ nix run '.#nixosTests.nginx.driver'

You can also run the tests with an interactive shell via:

$ nix run '.#nixosTests.nginx.driverInteractive'

Nix package derivations often will link a related test suite via a passthru.tests attribute, so you can execute affected tests when you update or change a package. For example, the gotosocial package derivation links the tests like this:

buildGoModule rec {
  ...
  passthru.tests.gotosocial = nixosTests.gotosocial;
  ...
}

So another way to run the tests is via the linked attribute in the package derivation like so:

$ nix run '.#gotosocial.tests.gotosocial.driver'

Just like in my previous post, a QEMU VM will boot and execute the test suite and print its results to your terminal window.


»

Running NixOS Tests with Nix Flakes

I’ve been using Nix / NixOS for a couple years now, but have never really bothered to learn the NixOS testing framework. After playing with it a bit, I really like it! It’s like all the best properties of classic Vagrant and Chef Test Kitchen, but with the power and reproducibility of Nix.

Andreas Fuchs’ blogged about how NixOS tests actually helped them catch a regression in their code, so I felt inspired to give it a try.

Since I use nix flakes for all my personal development and machine setup, and the Nix documentation doesn’t go into detail about how to use the NixOS test harness with flakes, I wrote up my initial investigation here.

This setup will allow us to:

  1. Build a custom package with nix flakes.
  2. Run that package as a NixOS systemd service via a NixOS module.
  3. Boot a QEMU machine and execute automated tests against the package / module configuration.

Because it’s Nix, the entire process of building the package, assembling it into a systemd service, booting a QEMU VM, and executing tests all happens with a single command with a declarative build definition.

First, setup a bare-bones flake with a custom package. For this example, we’ll just make a simple netcat echo server that listens on port 3000.

flake.nix

{
  description = "NixOS tests example";

  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs/nixos-23.11";
    flake-utils.url = "github:numtide/flake-utils";
  };

  outputs = { self, nixpkgs, flake-utils }:
    flake-utils.lib.eachDefaultSystem (system:
      let
        pkgs = nixpkgs.legacyPackages.${system};
      in
        {
          packages = {
            helloNixosTests = pkgs.writeScriptBin "hello-nixos-tests" ''
            ${pkgs.netcat}/bin/nc -l 3000
            '';
          };
        }
    );
}

Once you save your flake.nix file, you should be able to build the package from within the directory:

$ nix build '.#helloNixosTests'
$ ./result/bin/hello-nixos-tests

This just runs a simple nc server that listens on port 3000.

Next, let’s create a NixOS module that will make a systemd service for our little nc server. This is just a bog-standard NixOS service module that we’re going to eventually write a test for.

hello-module.nix

{ config, lib, pkgs, ... }:

with lib;
let
  cfg = config.services.helloNixosTests;
in {
  options = {
    services.helloNixosTests = {
      enable = mkEnableOption "helloNixosTests";
    };
  };

  #### Implementation

  config = mkIf cfg.enable {
    users.users.hello = {
      createHome = true;
      description = "helloNixosTests user";
      isSystemUser = true;
      group = "hello";
      home = "/srv/helloNixosTests";
    };

    users.groups.hello.gid = 1000;

    systemd.services.helloNixosTests = {
      description = "helloNixosTests server";
      after = [ "network.target" ];
      wantedBy = [ "multi-user.target" ];
      script = ''
      exec ${pkgs.helloNixosTests}/bin/hello-nixos-tests \
      '';

      serviceConfig = {
        Type = "simple";
        User = "hello";
        Group = "hello";
        Restart = "on-failure";
        RestartSec = "30s";
      };
    };
  };
}

If you’ve written NixOS modules before, this should look completely familiar to you.

To make this module work, we have to do a few things:

  1. Create a pkgs overlay that includes our new pkgs.helloNixosTests package.
  2. Create a top-level nixosModules entry in our flake.nix file for the module.

Let’s do that now, and extend our flake file some more.

flake.nix

{
  description = "NixOS tests example";

  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs/nixos-23.11";
    flake-utils.url = "github:numtide/flake-utils";
  };

  outputs = { self, nixpkgs, flake-utils }:
    {
      nixosModules = {
        helloNixosModule = import ./hello-module.nix;
      };
    } // flake-utils.lib.eachDefaultSystem (system:
      let
        overlay = final: prev: {
          helloNixosTests = self.packages.${system}.helloNixosTests;
        };
        pkgs = nixpkgs.legacyPackages.${system}.extend overlay;
      in
        {
          packages = {
            helloNixosTests = pkgs.writeScriptBin "hello-nixos-tests" ''
            ${pkgs.netcat}/bin/nc -l 3000
            '';
          };
        }
    );
}

Notice we extended pkgs with our overlay definition, which will extend the base nixpkgs with our custom package. We also created a nixosModules.helloNixosModule attribute that imports the module we just wrote.

We’ve made a package, and a module, now let’s write the NixOS test to make sure they both work!

hello-boots.nix

{ self, pkgs }:

pkgs.nixosTest {
  name = "hello-boots";
  nodes.machine = { config, pkgs, ... }: {
    imports = [
      self.nixosModules.helloNixosModule
    ];
    services.helloNixosTests = {
      enable = true;
    };

    system.stateVersion = "23.11";
  };

  testScript = ''
    machine.wait_for_unit("helloNixosTests.service")
    machine.wait_for_open_port(3000)
  '';
}

This file is a normal nix derivation like any other one, but builds a complete QEMU based test harness to test our package and module.

A few things to notice:

  1. The top-level nodes attribute list allows us to define complete node / machine definitions. In this example, we just have one machine called machine.
  2. We configure that machine to run our module. The import block is critical to bring our service module definition into scope to run on the machine.
  3. The testScript key is where we run our tests. The NixOS test framework provides a python library to exercise the environment. You can see a list of available commands on the NixOS wiki.

Now, let’s hook in the test to our top-level flake.nix file, so we can run the test. This is the full, finished flake.nix file:

flake.nix

{
  description = "NixOS tests example";

  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs/nixos-23.11";
    flake-utils.url = "github:numtide/flake-utils";
  };

  outputs = { self, nixpkgs, flake-utils }:
    {
      nixosModules = {
        helloNixosModule = import ./hello-module.nix;
      };
    } // flake-utils.lib.eachDefaultSystem (system:
      let
        overlay = final: prev: {
          helloNixosTests = self.packages.${system}.helloNixosTests;
        };
        pkgs = nixpkgs.legacyPackages.${system}.extend overlay;
      in
        {
          checks = {
            helloNixosTest = pkgs.callPackage ./hello-boots.nix { inherit self; };
          };
          packages = {
            helloNixosTests = pkgs.writeScriptBin "hello-nixos-tests" ''
            ${pkgs.netcat}/bin/nc -l 3000
            '';
          };
        }
    );
}

We added a checks output attribute to the flake, which accept any normal nix derivation. After hooking it into our flake, we should be able to run:

$ nix flake check

This will build and run our test script inside a complete QEMU VM, and if all goes well, our scirpt will return success. If things don’t go as expected, you can boot an interactive Python shell to debug the tests like so:

[blake@blake-framework:~/code/nixos-test]$ nix run '.#checks.x86_64-linux.helloNixosTest.driverInteractive'
Machine state will be reset. To keep it, pass --keep-vm-state
start all VLans
start vlan
running vlan (pid 267780; ctl /tmp/vde1.ctl)
(finished: start all VLans, in 0.00 seconds)
additionally exposed symbols:
    machine,
    vlan1,
    start_all, test_script, machines, vlans, driver, log, os, create_machine, subtest, run_tests, join_all, retry, serial_stdout_off, serial_stdout_on, polling_condition, Machine
>>> 

You can run the tests interactively. Boot all the test machines, and run the tests:

>>> start_all()
>>> run_tests()
Test will time out and terminate in 3600 seconds
run the VM test script
additionally exposed symbols:
    machine,
    vlan1,
    start_all, test_script, machines, vlans, driver, log, os, create_machine, subtest, run_tests, join_all, retry, serial_stdout_off, serial_stdout_on, polling_condition, Machine
machine: waiting for unit helloNixosTests.service
machine: waiting for the VM to finish booting
machine: Guest shell says: b'Spawning backdoor root shell...\n'
machine: connected to guest root shell
machine: (connecting took 0.00 seconds)
(finished: waiting for the VM to finish booting, in 0.00 seconds)
(finished: waiting for unit helloNixosTests.service, in 0.04 seconds)
machine: waiting for TCP port 3000 on localhost
machine # Connection to localhost (127.0.0.1) 3000 port [tcp/hbci] succeeded!
machine # [   30.490067] systemd[1]: helloNixosTests.service: Deactivated successfully.
machine # [   30.491654] systemd[1]: helloNixosTests.service: Consumed 3ms CPU time, no IO, received 216B IP traffic, sent 112B IP traffic.
(finished: waiting for TCP port 3000 on localhost, in 0.02 seconds)
(finished: run the VM test script, in 0.06 seconds)

Shutdown the machine and quit manually via:

>>> machine.shutdown()
>>> join_all()
>>> quit()

Some other great resources to check out:


»

Apex Domains for ActivityPub and Mastodon / GotoSocial with S3 and Cloudfront

If you are setting up an ActivityPub (Mastodon or GotoSocial) server, and want your domain to be at the “Apex” of your domain, and your domain is already hosting an existing website, you’ll need to do some additional setup. GotoSocial calls this a “Split Domain deployment”.

In my case, I wanted my ActivityPub user to be @blake@blakesmith.me, but my website is already hosted at blakesmith.me, meaning my ActivityPub server (GotoSocial in this case) needs a little extra configuration.

The setup is:

  • Host domain: social.blakesmith.me
  • Account domain: blakesmith.me

Here’s my NixOS module that configures GotoSocial (a minimal ActivityPub server, great for small and single user instances) with the split domain on my server. Notice the host and account-domain configuration:

{  pkgs, ... }:

{
  services.gotosocial = {
    enable = true;
    settings = {
      host = "social.blakesmith.me";
      account-domain = "blakesmith.me";
    };
  };

  services.nginx = {
    virtualHosts."social.blakesmith.me" = {
      forceSSL = true;
      enableACME = true;
      locations."/" = {
        proxyPass = "http://127.0.0.1:8080";
        extraConfig =
          "proxy_ssl_server_name on;" +
          "proxy_pass_header Authorization;" +
          "client_max_body_size 0;" +
          "proxy_http_version 1.1;" +
          "proxy_request_buffering off;" +
          "proxy_set_header Host $host;" +
          "proxy_set_header Upgrade $http_upgrade;" +
          "proxy_set_header Connection \"upgrade\";" +
          "proxy_set_header X-Forwarded-For $remote_addr;" +
          "proxy_set_header X-Forwarded-Proto $scheme;"
        ;
      };
    };
  };

  security.acme = {
    certs."social.blakesmith.me".email = "blakesmith0@gmail.com";
  };
}

We put nginx in front of GotoSocial, and setup a LetsEncrypt TLS cert as well (required for ActivityPub). This should all work great once DNS for social.blakesmith.me is pointed at the NixOS server. The GotoSocial daemon is reachable at social.blakesmith.me, but is configured to have usernames with blakesmith.me.

My website, blakesmith.me is hosted via an S3 bucket, with Cloudfront in front of it. We need to configure the S3 bucket for blakesmith.me to redirect all requests to the /.well-known route prefix to social.blakesmith.me. Here’s the terraform necessary:

resource "aws_s3_bucket" "origin_blakesmith_me" {
  bucket = "origin.blakesmith.me"
  # ...SNIP...
  website {
    index_document = "index.html"
    routing_rules = <<EOF
[{
    "Condition": {
        "KeyPrefixEquals": ".well-known"
    },
    "Redirect": {
        "HostName": "social.blakesmith.me",
        "HttpRedirectCode": "301",
        "Protocol": "https"
    }
}]
EOF
  }
}

The routing_rules is the most important part: Any route to the bucket that is prefixed with .well-known will be redirected to social.blakesmith.me. ActivityPub / Mastodon uses the WebFinger protocol as a way to resolve profile and server information for a given Mastodon handle.

Let’s test it:

$ curl -v "https://blakesmith.me/.well-known/webfinger?resource=acct:blake@blakesmith.me"
> GET /.well-known/webfinger?resource=acct:blake@blakesmith.me HTTP/2
> Host: blakesmith.me
> User-Agent: curl/8.4.0
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/2 301
< content-length: 0
< location: https://social.blakesmith.me/.well-known/webfinger?resource=acct:blake@blakesmith.me
< date: Sat, 17 Feb 2024 18:02:11 GMT
< server: AmazonS3
< x-cache: Hit from cloudfront
< via: 1.1 4076c139caa3b19374b9d2d1784ca5e0.cloudfront.net (CloudFront)
< x-amz-cf-pop: ORD56-P4
< x-amz-cf-id: XN5GXTM2fhACc4ZWmRbs2PMKmxEMeI-3yj0GZJ_orlLAp655feXVZQ==
< age: 3
<
* Connection #0 to host blakesmith.me left intact

It’s working if you get a valid redirect to your host domain server! If we follow redirects, we’ll hit the ActivityPub server hosted at social.blakesmith.me:

$ curl -L "https://blakesmith.me/.well-known/webfinger?resource=acct:blake@blakesmith.me" | jq .
{
  "subject": "acct:blake@blakesmith.me",
  "aliases": [
    "https://social.blakesmith.me/users/blake",
    "https://social.blakesmith.me/@blake"
  ],
  "links": [
    {
      "rel": "http://webfinger.net/rel/profile-page",
      "type": "text/html",
      "href": "https://social.blakesmith.me/@blake"
    },
    {
      "rel": "self",
      "type": "application/activity+json",
      "href": "https://social.blakesmith.me/users/blake"
    }
  ]
}

If all is well, other users should be able to follow you using the account domain as your username (In my case: @blake@blakesmith.me).

One pitfall that tripped me up for awhile: if you have Cloudfront in front of your S3 bucket, like I do, you have to configure your Cloudfront origin using a custom origin and the S3 website endpoint, NOT the bucket endpoint. Otherwise, you’ll get “The specified key does not exist” errors when trying to access the webfinger resource at your account domain.

The relevant terraform setup for the Cloudfront distribution looks like this (notice how the bucket origin is configured with a custom_origin_config):

resource "aws_cloudfront_distribution" "blakesmith_distribution" {
  origin {
    domain_name = aws_s3_bucket.origin_blakesmith_me.website_endpoint
    origin_id = "blakesmith_origin"
    custom_origin_config {
      http_port = 80
      https_port = 443
      origin_protocol_policy = "http-only"
      origin_ssl_protocols = [ "TLSv1.1", "TLSv1.2", "SSLv3" ]
    }
  }

  enabled = true
  is_ipv6_enabled = true
  default_root_object = "index.html"
  aliases = ["blakesmith.me", "www.blakesmith.me"]
  price_class = "PriceClass_200"
  retain_on_delete = true
  default_cache_behavior {
    allowed_methods = [ "DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT" ]
    cached_methods = [ "GET", "HEAD" ]
    target_origin_id = "blakesmith_origin"
    forwarded_values {
      query_string = true
      cookies {
        forward = "none"
      }
    }
    viewer_protocol_policy = "redirect-to-https"
    min_ttl = 0
    default_ttl = 3600
    max_ttl = 86400
  }
  viewer_certificate {
    acm_certificate_arn = aws_acm_certificate_validation.blakesmith_me.certificate_arn
    ssl_support_method = "sni-only"
  }
  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }
}

Feel free to follow me at @blake@blakesmith.me on ActivityPub!


» all posts