»

What is 'Placement New' in Rust?

Placement new is a feature currently being discussed for the Rust programming language. It gives programmer control of memory allocation and memory placement, where current memory allocation implementations are hidden behind compiler internals via the Box::new interface. Controlling memory allocation is useful in many different applications, but a few that come to mind:

  1. Arena allocators: Pre-allocating big chunks of memory from the operating system up front, and arbitrating memory ownership directly in your application. This helps avoid context switches into the kernel for certain memory use-cases.
  2. Embedded allocators: Allocating chunks of memory from well-defined memory locations. Useful when you need memory to come from very specific hardware addresses.

This is Rust’s answer to C++ placement new, allowing one to control not only when and how memory is freed, but also where it is allocated and freed from (Thanks Holy_City for the clarification here!)

How heap allocation in Rust works now

Heap allocation is hidden behind the Box type in Rust stable. When you instantiate data within a Box type, you’re placing the data onto the heap, and the Box type stores an internal pointer to that heap allocated data.

// Count will be placed on the heap
let heap_data = Box::new(Count { num: 1 });

If you dig around in the source of the Box type, you can see some hints at why ‘placement new’ might be useful.

Let’s see how Box::new is implemented:

#[stable(feature = "rust1", since = "1.0.0")]
#[inline(always)]
pub fn new(x: T) -> Box<T> {
    box x
}

It looks like memory allocation, and lifting into the box type is hidden behind this box keyword. Searching up on the box keyword yields the Rust Documentation on unstable Box Syntax and Patterns. This unstable feature allows you to use the box keyword to instantiate allocated Boxes directly on the global heap.

What does the Drop implementation for Box look like?

#[stable(feature = "rust1", since = "1.0.0")]
unsafe impl<#[may_dangle] T: ?Sized> Drop for Box<T> {
    fn drop(&mut self) {
        // FIXME: Do nothing, drop is currently
        // performed by compiler.
    }
}

I was expecting to see some unsafe dealloc calls here, but it looks like this work is being done somewhere in compiler internals.

How Placement New Fits

If you need to customize how heap allocation works, ideally, you’d be able to hook into the box keyword or use similar syntax (and avoid C++’esque library solutions). This is exactly what Rust’s ‘placement new’ feature gives us. We need a way to use that fancy box syntax (or something similar), and implement our own memory placement strategies.

The work on ‘placement new’ is broken down into a few different efforts that need to come together to make all of this work:

  • A syntax extension that allows the programmer to specify where they’d like the memory placed. (RFC#1228),
  • A Placer trait that would allow custom allocation / placement implementations, that returns a Place type from the Placers required make_place function.
  • Desugaring logic that transforms the syntax into straight forward Rust code that calls the correct Placers.
  • Implementations of existing implicit allocation / placement strategies, including a BoxPlace for the default Box heap allocation strategy.

Placement New Examples

There’s no clear consensus on how placement new syntax will work yet, but there are many options being discussed in RFC#1228. A few different options being discussed:

Overloaded ‘box’ syntax

We could overload the previously mentioned box syntax above, and allow it to take a place expression:

let arena_ref = box arena 5;
let heap_ref = box HEAP 5;

Left Arrow Syntax

Left arrow syntax, follows the syntax PLACE_EXPR <- VALUE_EXPR:

let arena_ref = arena <- MyStruct::new();
let heap_ref = HEAP <- MyStruct::new();

Placement ‘in’ syntax

Placement ‘in’ syntax uses the ‘in’ keyword to define the placement location.

let arena_ref = in arena { MyStruct::new() };
let heap_ref = in HEAP { MyStruct::new() };

Placer examples

The Placer implementation for ExchangeHeapSingleton (the default box heap heap allocation method) implementation that was repealed from Rust looked something like this:

fn make_place<T>() -> IntermediateBox<T> {
    let layout = Layout::new::<T>();

    let p = if layout.size() == 0 {
        mem::align_of::<T>() as *mut u8
    } else {
        unsafe {
            Heap.alloc(layout.clone()).unwrap_or_else(|err| {
                Heap.oom(err)
            })
        }
    };

    IntermediateBox {
        ptr: p,
        layout,
        marker: marker::PhantomData,
    }
}

impl<T> Placer<T> for ExchangeHeapSingleton {
    type Place = IntermediateBox<T>;

    fn make_place(self) -> IntermediateBox<T> {
        make_place()
    }
}

And the corresponding Drop implementation:

impl<T: ?Sized> Drop for IntermediateBox<T> {
    fn drop(&mut self) {
        if self.layout.size() > 0 {
            unsafe {
                Heap.dealloc(self.ptr, self.layout.clone())
            }
        }
    }
}

In this case, a Heap type handles all the unsafe alloc / dealloc, and the Place returned from the Placer is of type IntermediateBox.

Current feature status

It’s not clear yet when all these things will land, especially given the uncertainty around the placement syntax. Some of the initial work that was commited to Rust unstable (including syntax extensions and the Placer protocol traits) was subsequently removed in light of further design discussions needing to be had. Either way, I think placement new is an important feature for Rust. Adding explicit (but not required) control points to the internals of Rust will make it more appealing for certain use cases, including embedded and other applications that have special memory control requirements. I’m very much looking forward to this feature landing in Rust nightly.

»

Running Dropwizard as a Guava Service

There are many things to like about the Dropwizard Framework, but if you’re like me, you might want to “own your main” function. A normal Dropwizard application gives you a run hook as part of its Application parent class, but in the end your code is still subservient to the Dropwizard framework code.

One pattern that is very good for organizing small logical services in your application is to use Guava’s Services to break your application into service level groupings (And avoid the drinking the microservice kool-aid prematurely). Guava services give you a common operational interface to coordinate all your in-process logical components. In this model, the Dropwizard web server is no more important than my periodic polling service, or my separated RPC service, and so on. I’d like my Dropwizard web stack to be a peer to the other services that I’m running inside my application. It only takes two steps to make this work.

Step 1: Create the Guava Service

Create a new class that extends AbstractIdleService. When we implement the startUp method in the service, we will need to handle the key bootstrap setup that normally is handled within the Dropwizard framework when you invoke the server command.

import com.codahale.metrics.MetricRegistry;

import com.google.common.util.concurrent.AbstractIdleService;
import com.google.inject.Guice;
import com.google.inject.Inject;

import com.google.inject.Injector;
import io.dropwizard.setup.Bootstrap;
import io.dropwizard.setup.Environment;

import org.eclipse.jetty.server.Server;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class DashboardService extends AbstractIdleService {
    private static final Logger logger = LoggerFactory.getLogger(DashboardService.class);
    private final DashboardApplication application;
    private final DashboardConfiguration config;
    private final MetricRegistry metrics;
    private Server server;

    @Inject
    public DashboardService(final DashboardApplication application,
                            final DashboardConfiguration config,
                            final MetricRegistry metrics) {
        this.application = application;
        this.config = config;
        this.metrics = metrics;
    }

    @Override
    protected void startUp() throws Exception {
        logger.info("Starting DashboardService");
        final Bootstrap<DashboardConfiguration> bootstrap = new Bootstrap<>(application);
        application.initialize(bootstrap);
        final Environment environment = new Environment(bootstrap.getApplication().getName(),
                                                        bootstrap.getObjectMapper(),
                                                        bootstrap.getValidatorFactory().getValidator(),
                                                        metrics,
                                                        bootstrap.getClassLoader());
        bootstrap.run(config, environment);
        application.run(config, environment);
        this.server = config.getServerFactory()
            .build(environment);
        server.start();
    }

    @Override
    protected void shutDown() throws Exception {
        logger.info("Stopping DashboardService");
        server.stop();
    }
}

Step 2: Reinitialize logging inside the Dropwizard Application

public class DashboardApplication extends Application<DashboardConfiguration> {
    @Override
    public void run(DashboardConfiguration config,
                    Environment environment) {
        reinitializeLogging(environment);
    }

    /**
     * Because Dropwizard clears our logback settings, reload them.
     */

    private void reinitializeLogging(Environment env) {
        LoggerContext context = (LoggerContext) LoggerFactory.getILoggerFactory();
        try {
            JoranConfigurator configurator = new JoranConfigurator();
            configurator.setContext(context);
            context.reset();
            String logBackConfigPath = System.getProperty("logback.configurationFile");
            if (logBackConfigPath != null) {
                configurator.doConfigure(logBackConfigPath);
            }
        } catch (JoranException e) {
            throw new RuntimeException("Unable to initialize logging.", e);
        }
    }
}

Start your Guava Service

Now you have a nicely contained Guava service that you can manage alongside your other Guava service that aren’t necessarily Dropwizard related. You can startAsync and stopAsync the service to start and stop the web server, even while other services are still running.

»

Terraform AWS Static Site with CloudFront

UPDATE: This pull request has been merged into Terraform

A recent patch on the Terraform GitHub repository adds support for CloudFront distributions to the Terraform AWS Provider. The patch has not been merged into Terraform mainline yet, but I wanted to share my experience setting up an S3 static site, fronted with CloudFront and DNS routed with Route53. Until the CloudFront PR gets merged, you’ll have to build the branch from source in order to use the aws_cloudfront_distribution resource. The words you’re reading right now were served up from this very Terraform configuration via CloudFront, migrated from a simple nginx setup. If you haven’t used Terraform before, please review the Introduction and Getting Started Guide before proceeding.

Step 1: Setup your S3 Static Site Bucket

The first thing you need to do is setup an S3 bucket to act as your ‘origin’. This is where all your static HTML files and assets will live. Here’s what the code looks like:

provider "aws" {
  alias = "prod"

  region = "us-east-1"
  access_key = "${var.aws_access_key}"
  secret_key = "${var.aws_secret_key}"
}

resource "aws_s3_bucket" "origin_blakesmith_me" {
  provider = "aws.prod"

  bucket = "origin.blakesmith.me"
  acl = "public-read"
  policy = <<POLICY
{
  "Version":"2012-10-17",
  "Statement":[{
    "Sid":"PublicReadForGetBucketObjects",
        "Effect":"Allow",
      "Principal": "*",
      "Action":"s3:GetObject",
      "Resource":["arn:aws:s3:::origin.blakesmith.me/*"
      ]
    }
  ]
}
POLICY
  
  website {
    index_document = "index.html"
  }
}

After running terraform apply, you will have an S3 bucket that’s setup to serve HTTP traffic from the root of the bucket. Let’s examine some of the important parameters:

  • policy: Bucket policy that makes the bucket publicly readable.
  • website: Configure the S3 bucket to serve up a static website, in this case setting the default index_document to index.html.

You can find other configurations on the aws_s3_bucket resource page.

After uploading your static site to the S3 bucket, you should already be able to view the website at http://${bucketname}.s3-website-${aws_region}.amazonaws.com. As an example, my blog can be served up at http://origin.blakesmith.me.s3-website-us-east-1.amazonaws.com/.

Step 2: Add a Route53 Record for your Origin

We need a DNS entry for this origin. In this example, we’ll create one at http://origin.blakesmith.me. This will give us a helpful DNS record that will not route through CloudFront and can be used to access the S3 bucket static site directly with no caching or other CloudFront routing rules applied.

resource "aws_route53_zone" "blakesmith_me" {
  provider = "aws.prod"
  name = "blakesmith.me"
}

resource "aws_route53_record" "origin" {
  provider = "aws.prod"
  zone_id = "${aws_route53_zone.blakesmith_me.zone_id}"
  name = "origin.blakesmith.me"
  type = "A"

  alias {
    name = "${aws_s3_bucket.origin_blakesmith_me.website_domain}"
    zone_id = "${aws_s3_bucket.origin_blakesmith_me.hosted_zone_id}"
    evaluate_target_health = false
  }
}

First we setup our top level zone, and create a record that’s associated with that zone. We setup an alias record configuration that targets our S3 bucket we created before. Here are the important parameters:

  • zone_id: A reference to our top level DNS zone id
  • alias#zone_id: A reference to the S3 bucket’s existing zone identifier

Once you terraform apply this, you should be able to access your origin at the above DNS record and the behavior should be the same as when you set up the S3 bucket static website hosting directly. Notice that http://origin.blakesmith.me has no behavior change from the simple S3 static bucket site we setup before.

Step 3: Setup your CloudFront Distribution

We have all the basic pieces in place, now comes to the meat: Let’s setup a CloudFront distribution that will use the origin we just configured to serve up the website at the edge. If you’re not too familiar with CDNs, think of them as a “big distributed cache across the globe”. After we configure this distribution, our content will be served from edge servers closest to visitor’s location.

resource "aws_cloudfront_distribution" "blakesmith_distribution" {
  provider = "aws.prod"
  origin {
    domain_name = "origin.blakesmith.me.s3.amazonaws.com"
    origin_id = "blakesmith_origin"
    s3_origin_config {}
  }
  enabled = true
  default_root_object = "index.html"
  aliases = ["blakesmith.me", "www.blakesmith.me"]
  price_class = "PriceClass_200"
  retain_on_delete = true
  default_cache_behavior {
    allowed_methods = [ "DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT" ]
    cached_methods = [ "GET", "HEAD" ]
    target_origin_id = "blakesmith_origin"
    forwarded_values {
      query_string = true
      cookies {
        forward = "none"
      }
    }
    viewer_protocol_policy = "allow-all"
    min_ttl = 0
    default_ttl = 3600
    max_ttl = 86400
  }
  viewer_certificate {
    cloudfront_default_certificate = true
  }
  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }
}

There’s a lot going on here, so let’s break it down a bit. The most important part is our origin declaration:

  origin {
    domain_name = "origin.blakesmith.me.s3.amazonaws.com"
    origin_id = "blakesmith_origin"
    s3_origin_config {
    }
  }
  • domain_name: points to the origin Route53 record we created in the last step
  • origin_id: A unique identifier for this origin configuration. Since you can setup multiple origins, used to link caching behavior to origin configurations.
  • s3_origin_config: Extra S3 origin options, we leave this blank

Next we have some other important top level declarations:

  enabled = true
  default_root_object = "index.html"
  aliases = ["blakesmith.me", "www.blakesmith.me"]
  price_class = "PriceClass_200"
  retain_on_delete = true
  • enabled: Enable our CloudFront distribution
  • default_root_object: Use index.html as our root object
  • aliases: HTTP hostnames you will be serving your site from. These must match your DNS records, or you will get 403 Forbidden Errors.
  • price_class: How CloudFront will prioritize where traffic gets served from based on price. See: CloudFront Pricing.
  • retain_on_delete: Causes CloudFront deletions to simply disable your distribution. Useful since CloudFront distributions can take upwards of 15 minutes to propagate.

Then we setup our basic caching behavior:

  default_cache_behavior {
    allowed_methods = [ "DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT" ]
    cached_methods = [ "GET", "HEAD" ]
    target_origin_id = "blakesmith_origin"
    forwarded_values {
      query_string = true
      cookies {
        forward = "none"
      }
    }
    viewer_protocol_policy = "allow-all"
    min_ttl = 0
    default_ttl = 3600
    max_ttl = 86400
  }

The most important part is that we reference our target_origin_id to link these two stanza configurations together.

  • allowed_methods: Which HTTP verbs we permit our distribution to serve
  • cached_methods: Which HTTP verbs we let this behavior apply to
  • target_origin_id: The name of the previous origin_id in our origin stanza
  • forwarded_values: Entities that will be passed from the edge to our origin.
  • viewer_protocol_policy: Which HTTP protocol policy to enforce. One of: allow-all, https-only, or redirect-to-https.
  • min_ttl: Minimum time (seconds) to live for objects in the distribution cache
  • max_ttl: Maximum time (seconds) objects can live in the distribution cache
  • default_ttl: The default time (seconds) objects will live in the distribution cache

Finally, allow CloudFront to use its default SSL cert and serve anywhere:

  viewer_certificate {
    cloudfront_default_certificate = true
  }
  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }

Once you run terraform apply with your patched version of Terraform and wait the 10-15 minutes for AWS to asynchronously setup your distribution, you will have a CloudFront provided domain name that you can validate your setup with. For example, this website is viewable via CloudFront domain name at: d1u25xzl6dnmgy.cloudfront.net

Until the aws_cloudfront_resource gets released, you’ll have to consult the documentation provided in the pull request if you need to deviate from my simple setup here. There are also other helpful examples in the integration tests if you want to see other settings in action.

Step 4: Add Root Route53 Records

The last step adds Route53 records to reference the CloudFront distribution we just setup.

resource "aws_route53_record" "root" {
  provider = "aws.prod"
  zone_id = "${aws_route53_zone.blakesmith_me.zone_id}"
  name = "blakesmith.me"
  type = "A"

  alias {
    name = "${aws_cloudfront_distribution.blakesmith_distribution.domain_name}"
    zone_id = "Z2FDTNDATAQYW2"
    evaluate_target_health = false
  }
}

resource "aws_route53_record" "www" {
  provider = "aws.prod"
  zone_id = "${aws_route53_zone.blakesmith_me.zone_id}"
  name = "www.blakesmith.me"
  type = "A"

  alias {
    name = "${aws_cloudfront_distribution.blakesmith_distribution.domain_name}"
    zone_id = "Z2FDTNDATAQYW2"
    evaluate_target_health = false
  }
}

Here we setup our apex zone and www record to point to our CloudFront distribution. The two critical new pieces you should observe are:

  • alias#name: This ALIAS name references our CloudFront distribution created in the previous step
  • zone_id: This is a fixed hardcoded constant zone_id that is used for all CloudFront distributions

One final terraform apply and voilà! You have your final product: A static site being served from an S3 bucket and fronted by the AWS CloudFront distribution with Route53 knitting everything together.

You can verify everything is working by examining the HTTP response headers and looking for the CloudFront headers:

> GET / HTTP/1.1
> User-Agent: curl/7.37.1
> Host: blakesmith.me
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/html
< Content-Length: 36938
< Connection: keep-alive
< Date: Sat, 02 Apr 2016 17:48:17 GMT
< Last-Modified: Fri, 01 Apr 2016 11:44:21 GMT
< ETag: "1ae13b1b0471e67bad10eb95347f99da"
< Accept-Ranges: bytes
* Server AmazonS3 is not blacklisted
< Server: AmazonS3
< Age: 29
< X-Cache: Hit from cloudfront
< Via: 1.1 62e12fdf0f65bd8388f763f504606830.cloudfront.net (CloudFront)
< X-Amz-Cf-Id: 4F85Bkl9_nSPQvqjDWAEMYkssuPA04gl8V5qLLIU3cPlS5E1Gtam7A==
<

Helpful headers:

  • Server: The origin server identifier, in our case AmazonS3
  • Age: The age (in seconds) of the object you just retrieved from the distribution cache
  • X-Cache: Whether the HTTP request was a cache hit or miss.

Here is the full code example if you’d like to see the full setup that powers this site.

Happy Terraforming!

» all posts

Blake Smith

create. code. learn.