This should not be needed frequently, however there are times when the VMs could be in an unrecoverable or other undesired state for some reason, and the best solution is to rebuild them.
This document targets the Live environment - the steps for Acceptance are similar.
Pre-requisites
Set up the Terraform environment following the instructions here: https://github.com/habitat-sh/cloud-environments
This doc assumes the setup instructions above are up-to-date, and you have been able to successfully do a make init
.
Steps
- Set up a maintenance window in status.io since active builds may be impacted
- Change directory to the
cloud-environments/builder-live
folder - In the
default.tf
file, change thejobsrv_worker_count
value to 0
jobsrv_worker_count = 0
- Run a
terraform plan
. This should show that the workers (and related networks) will be deleted. Double and triple check to make sure other services are not being deleted or changed (an exceptions is theaws_s3_bucket
which seems to always want to update itself). - If all looks good, run a
terraform apply
- After all the instances are deleted, go back and change the
jobsrv_worker_count
back to 50 (or whatever the original value was). - Repeat the
terraform plan
andterraform apply
steps. - Once all the worker instances are re-created, ssh into the
builder-datastore
node, and re-run theapply_config.sh
script (this ensures that all the key files are properly sent over to the workers). - Update the maintenance window to complete.
NOTE:
If you see the following error during worker creation, ignore it - it is verbiage from trying to clean up networks that don’t actually exist.
module.builder_environment.module.builder.null_resource.worker_studio_network (remote-exec): error: Invalid value for '--ns-dir <NS_DIR>': directory '/hab/svc/builder-worker/data/network/airlock-ns' cannot be found, must exist