Populating Chef Habitat Builder on-prem

Introduction

Setting up a Chef Habitat Builder on-prem environment (https://github.com/habitat-sh/on-prem-builder) requires the ability to do an initial population of core packages, as well as regular updates to keep the set of package synchronized with public Builder.

The current bootstrapping mechanism has some limitations:

  • It starts the initial package set up by downloading a very large tar file, on the order of ~ 15GB, requiring not only long download times, but also at least twice the amount of disk space
  • Errors during the bootstrapping procedure can require restarting the entire process - which can be time consuming
  • The set of base packages included can be ‘too little/too much’, where there are a large number of potentially unneeded packages, but not enough packages from origins other than core that might be needed
  • There is also no ability to adjust the specific packages from within a given origin that are desired
  • The bootstrap/synchronization is not airgap friendly, as it requires an external Internet connection

In order to start addressing some of these issues, Chef Habitat version 0.88.0 (https://www.habitat.sh/docs/install-habitat/) includes two new native commands (hab pkg download and hab pkg bulkupload) to ease the movement of packages between Builder environments.

These new commands simplify and replace the existing package population workflow for Builder on-prem.

Below are a range of scenarios that illustrate the use of the new commands with Builder on-prem.

Creating an initial bootstrap package set

Instead of downloading a large, predetermined bootstrap package we use the hab pkg download command to populate a directory with the required packages (and their associated origin’s public package signing keys).

This command takes as input a set of packages to seed the directory with. This package set does not need to contain any of the package dependencies, as those are fetched automatically.

As a (contrived) example, let’s say we are interested in Rust development and only need any packages related to that. We create a file called my_package_set with the following contents:

core/rust
core/cargo-nightly

This file lets us tell hab pkg download what packages to start seeding the download with, so we can issue the command as follows:

$ hab pkg download --target x86_64-linux --download-directory my_bootstrap_directory --file my_package_set

This will download the needed packages into the my_bootstrap_directory folder. The entire download will take a very short time.

If we check the contents of my_bootstrap_directory after the conclusion of the download command, we see all the needed packages and keys in one place:

~/my_bootstrap_directory $ tree
.
├── artifacts
│ ├── core-binutils-2.31.1-20190115003743-x86_64-linux.hart
│ ├── core-busybox-static-1.29.2-20190115014552-x86_64-linux.hart
│ ├── core-cacerts-2018.12.05-20190115014206-x86_64-linux.hart
│ ├── core-cargo-nightly-0.16.0-20190117180323-x86_64-linux.hart
│ ├── core-gcc-8.2.0-20190115004042-x86_64-linux.hart
│ ├── core-gcc-libs-8.2.0-20190115011926-x86_64-linux.hart
│ ├── core-glibc-2.27-20190115002733-x86_64-linux.hart
│ ├── core-gmp-6.1.2-20190115003943-x86_64-linux.hart
│ ├── core-libmpc-1.1.0-20190115004027-x86_64-linux.hart
│ ├── core-linux-headers-4.17.12-20190115002705-x86_64-linux.hart
│ ├── core-mpfr-4.0.1-20190115004008-x86_64-linux.hart
│ ├── core-rust-1.38.0-20190930155321-x86_64-linux.hart
│ └── core-zlib-1.2.11-20190115003728-x86_64-linux.hart
└── keys
└── core-20180119235000.pub

2 directories, 14 files

Looking at the directory size, we see that it is about 1/30th the size of the full set of core packages.

~/my_bootstrap_directory $ du -h .
516M ./artifacts
8.0K ./keys

516M .

Now we can move this set of packages to a location from which to upload to a local on-premises Builder any way we want - that could be by tar-ing up the directory and copying it to some other location, using a USB drive to move across an air-gapped environment, or even using a system like Artifactory to do the propagation to a desired location.

Uploading a package set to an on-premises Builder

Now that we have a cleanly built directory with the packages and keys we want, we can use the hab pkg bulkupload command to perform the upload to a local Builder environment:

$ hab pkg bulkupload -u http://localhost -c stable my_bootstrap_directory

This command (along with a valid Habitat auth token) will upload the contents of the my_bootstrap directory to the on-premises Builder that is located on localhost, and make sure that they are promoted to the stable channel so that they can start being consumed immediately.

Synchronization

Once the initial package bootstrap has been completed, there is also a need to keep the packages synchronized across environments - for example, from the public Builder SaaS to on-premises Builder.

Fortunately, the same download and bulk upload strategy can be used to keep the packages in sync. If the hab pkg download command is run a second time and pointed to the same my_bootstrap directory, it will download and update only the packages that have newer versions.

Since it is now possible to specify a smaller set of more targeted packages, the synchronization should be fairly fast (especially as compared to the current on-prem-archive script).

Summary of new capabilities

With the new download and bulk upload Habitat commands, we now have the capability to:

  • Download a specific set of packages that are suited to use in specific scenarios
  • Specify multiple origins for downloaded packages
  • Download packages for specific platform targets
  • Download only packages that have newer versions
  • Upload packages in bulk from a folder
  • Handle errors more gracefully as an integrated part of the Habitat client

We will be tuning and updating these capabilities in future releases.

As always, we appreciate your feedback in our Habitat Forum, Slack Channel, or logged as issues on one of the Habitat repositories.