Chef Provisioning AWS without bootstrapping

Hello All,

We are using Chef Provisioning AWS to deploy EC2 instances on AWS and that involves machine/machine_batch resource which deploy the EC2 instances and then bootstrap those instances. The bootstrapping requires WinRM port 5985 to connect to the node and our security requirements don’t allow the connection and I was exploring if I can avoid bootstrapping by machine resource and it only deploys the EC2 instances.
And if this is possible, what we could do is to install chef client in the AMI (AWS images) and put a logic to run chef-client when a new machine comes up. What additional things (like client.pem, client.rb etc) would we need to do in such a case on the node side for every new node?

I am trying to look this up in documentation as well as google it but thought would be a good idea to put the question here to get any advise I can collect.

Thanks,
Rahul

I can’t speak too much about windows as I have not run it in years. But from a chef perspective you will need a few things:

  1. Chef Validator
  • Chef server url
  • Run list

There are several ways to skin this but an approach I like is to on startup query the ec2 userdata to either get those values or get a location of where the information can be obtained. Depending on your setup you may just need to pass it an env and role and can infer how to obtain the rest based on that alone.

Once you have scripted how to obtain this information you can automatically build your client.rb and everything else after that you should be able to auto bootstrap it quite easy by just running chef-client. If you like this approach you can message I can provide further details on actual implementation in my own environments.

FWIW my actual method on vsphere is having the chef-client installed on the master image (AMI on ec2) with a basic client.rb containing the server url and the validator key.

When spinning the machine up, one of the task is to launch a chef-client with it’s runlist (this can be done in the userdata on ec2).
The runlist always includes some of the chef_client cookbook recipes which set up a complete client.rb in line with our actual policy, set up the scheduled task to run chef periodically and clean up the validator.pem file on the node.

As at first run the node will generate its own client.pem, and register to the server (using the validator key) that works well for us.

I was thinking to use the user data to install and configure Chef client on the new machines. I can either keep these items that you describe on a S3 bucket and pull it down along with the Chef client installer and do all the stuff with a simple powershell (could also try to use the same script that Chef uses for Bootstrapping). Would be great if you could share details, just be quicker for me.
I am able to use the machine resource with Chef Provisioning AWS drivers to create the EC2 instance and provide it a run list and its attributes. All I need to do now is bootstrapping chef client on the machine.

Well, add:

. { iwr -useb https://omnitruck.chef.io/install.ps1 } | iex; install in your user data after having copied the client.rb and validation.pem to c:\chef

Url install taken from documentationn here see it for options to the install script

Thanks for the info and reference. I will either use proxy to download this or simply maintain a local copy on S3 bucket and download with client.rb and validation.pem inside user data.

So I am still using machine resource to create the EC2 instances. And then doing the installation and configuration of Chef Client on the machine using user_data script which downloads validator.pem, client.rb, and chef client installer from our local file share (on S3) and installs chef-client.

[Issue 1]
At the end of user_data script it tries to run chef-client command which fails and I think it is due to the command running in the same user-data powershell script and could be fixed by running it in a new instance of powershell (may be by using Invoke-Command powershell 'chef-client' should work instead of invoke-expression 'chef-client'). I will try to use Invoke-Command and see if that works.

[Issue 2]
Apart form the above issue when I try to run chef-client manually, it runs and registers to Chef-Server and then starts running it’s run_list but fails when it has to reboot the system inside a recipe. Debug logs say that node does not have permission to update it’s own properties on the chef-server. And there is one ACL missing that should allow read/write permissions to the node on itself. How do I add this ACL when the node object is created by the machine resource?
This looks strange as it never happened when I was allowing the machine resource to do the bootstrapping. Am I missing something?

Here are some bash snippets:
Install a specific version of chef-client
curl -L https://www.chef.io/chef/install.sh | sudo bash -s -- -v 12.8.1

Here is how to get ec2 user data:

function getmeta() {
     wget -qO- http://169.254.169.254/latest$1
}

for datum in $(getmeta /user-data)
do
  case "$datum" in
    env=*) env=${datum#env=};;
    role=*) role=${datum#role=};;
    domain=*) domain=${datum#domain=};;
    org=*) org=${datum#org=};;
  esac
done
IFS="$oldifs"

hostname="$(getmeta /meta-data/local-hostname)"
chef_server_url="https://${domain}/organizations/${org}"
validator=$env
validator+="-validator"
# write first-boot.json to be used by the chef-client command.
# this sets the ROLE of the node.
echo -e "{\"run_list\": [\"recipe[myorg-env-${env}]\",\"role[$role]\"]}" > /etc/chef/first-boot.json

Then you can use those values to token replace in your client.rb and your run list (see above). and use chef-client to bootstrap like this:
chef-client -j /etc/chef/first-boot.json -E "$env" -L /var/log/chef/bootstrap.log && touch /etc/chef/chef-bootstrap.done

after this you should be good to go.

So I took an approach which is kind of mix of using machine resource and customer user-data scripts to bootstrap the machine. The machine resource sets the recipes and attributes when deploying the instance and user-data does the bootstrapping.

I added action: allocate under machine_batch and converge false under machine so that it does not attempt to bootstrap the node. But now it also does not check for 'running' status on EC2 instance and then wait for 'Windows is ready' message from EC2 instance

Chef Documentation says that I could use action :ready with machine_batch but it still tries to connect to the node to bootstrap it.

:allocate
Use to create a machine, return its machine identifier, and then (depending on the provider) boot the machine to an image.This reserves the machine with the provider and subsequent :allocate actions against this machine no longer need to create the machine, just update it.

:ready
Use to create a machine, return its machine identifier, and then boot the machine to an image with the specified parameters and transport. This machine is in a ready state and may be connected to (via SSH or other transport).

Is there a way to avoid the Bootstrapping attempt and yet have Chef check for Windows is ready and running status of EC2 when it is deployed?

code snippet below:

machine_batch do
  action :allocate
  instances.each do |instance|
    machine environment_info[instance]['instance_name'] do
      converge false
      machine_options bootstrap_options: {
        ...
    end
  end
end

Try using action :setup instead of action :allocate in machine_batch.

:setup
Use to create a machine, return its machine identifier, boot the machine to an image with the specified parameters and transport, and then install the chef-client. This machine is in a ready state, has the chef-client installed, and all of the configuration data required to apply the run-list to the machine.

Won't this also try to connect to machine? As the definitions says '...and then install chef-client' -- for that it would try to connect to the server.

Yep, sorry, I didn’t read closely enough and assumed bootstrapping only meant not running chef-client. I use setup to first create all my instances but not converge(converge false is set). Then I use a 2nd machine resource to set run list and attributes and converge. I’ve seen this pattern mentioned in other posts here. I’m not sure what you mean by ‘connect to machine’ or ‘connect to server’. The chef-client on ec2 instance won’t connect to chef-server or anything if you set converge false.
Perhaps try using this pattern and see if the first machine resource completes?