Chef Provisioning and Orchestration

Ohai Chefs,

I’m starting to look at alternatives for the current deployment and orchestration methods here and was wondering on the capabilities of Chef Provisioning.

I’ve never used Chef provisioning before so a couple of basic questions first might help me on my way and then I’ll ask a few orchestration related questions.

So:-

  1. How does chef provisioning store state? As in you have chef provisioning code which states to create machine “foo” in AWS, how does Chef map that created instance id or Chef node to that resource in code for idempotency?

  2. Can recipes reference resources created in another recipe? So say a recipe is used to create a bunch of machines and attach them to a load balancer in one recipe, can another recipe then reference those machines and remove them from the load balancer?

‘What is on about?’ you may ask. I know that Chef provisioning can support a full tear down of infrastructure and recreation. I can also see that you could probably tear down an individual layer and remake based on specific recipes, but what about something like blue-green for the web layer in a separate recipe (and rollback in another recipe).

One of the problems I see is that Chef provisioning uses named resources.
For example, you have a recipe which stands up your stack and your current web layer is created by a machine_batch statement called “web layer blue” and is attached to a load balancer. If you want to deploy a release then you need to stand up your “web layer green,” test they provisioned okay, then add green to your load balancer and take off blue.
If all goes well and there’s no rollback then green becomes blue for the next deploy so you can run the same code. How can this be handled in Chef provisioning if the chef code states the the name of the resource by reference?

How have people tackled this?
One way I suppose is to have green-to-blue and blue-to-green recipes depending on current active, as well as rollbacks for both, plus the glue code to work out which one to run, etc.
Another option I suppose is to actually update your Chef code resource names based on release names as time goes on. For example deploy.rb has machine_batch “1.0.0” and update.rb has machine_batch “1.1.0.” until it’s time for 1.2.0 and then deploy.rb gets updated with “1.1.0” and machine_batch “1.2.0,” but this adds overhead to managing the code (and relies on Chef being aware of machine resources by name, which I’m not sure it does).

Or am I totally barking up the wrong tree and Chef provisioning shouldn’t be used for orchestration but rather as more foundational layer for standing up stacks and then managing them otherwise (i.e. glorified cloudformation templates)? But even in this case there would be a need to stand up a new stack but keep the existing one, is this possible?

I spent a little bit of time thinking about this when I created the chef provision command, but given the time I had available and the complexity of the problem, I decided to ignore it. Based on your own needs and constraints, it's likely you could figure out something that will work for you.

What I was thinking about doing with chef provision was to have a way of generating a "cluster ID", which you would get exposed in the DSL and you would use it any place you needed per-cluster names, e.g., you would name machines something like "web-#{number}-#{cluster_id}" What stopped me from doing this is the amount of complexity required to make this work for a variety of use cases (e.g., you might have one or more throwaway stacks created by a single developer, a single dev stack that multiple developers are collaborating on, blue-green deployments and variations thereof, long-lived production stacks managed by a single management host, etc.). The tricky part is managing the cluster ID state so that you can destroy any of the stacks that you've created later on in a way that addresses a lot of use cases.

However, since you don't need to solve the problem in a completely general way, just in a way that fits your specific needs, it might be relatively simple for you to implement this concept on your own. If you use chef provision, you can pass arbitrary data from the command line using an argument like --opt cluster_name=name, which you access in your cookbook with:

context = ChefDK::ProvisioningData.context
cluster_name = context.opts.cluster_name

Then just use the cluster_name variable anywhere you have a name that needs to be unique to the cluster. Likewise, you could have an option that decides whether to make the given cluster active or non-active.

HTH,

Dan

Thanks Dan,

That seems kind of hacky, but I guess that shows the power of being able to use pure ruby in Chef. Either way it looks like it would work.

Do you know if orchestration something that is planned in the future for Chef provisioning or another Chef product, as it’s something that competitors (i.e. Ansible) tout themselves as the answer to.

What do you actually need for orchestration? If you're looking to do a multi-step deployment with different steps executed on different hosts, something like Rundeck would be worth looking into. But eventually that approach can start to run into problems if you're not managing the system state in a convergent way, since hosts can be down or unreachable when the orchestration tool is trying to run commands on them and you need a way to get them back into the correct state later.

Okay I found part of the answer to the first question here: https://github.com/chef/chef-provisioning/blob/master/docs/faq.md

So the information is stored for a machine directly in a Chef node. But what about all the other concepts that you can create in Chef provisioning? i.e. load balancers, autoscaling groups in AWS and so on, how is their state stored?

I think I need to know a little more about how Chef provisioning works before I can add anything more useful, but I would hope that easy code base orchestration would be a target for Chef provisioning (or another Chef tool?) would be something in the pipeline.

In terms of keeping state I think Chef is well set to do that itself as the cookbook code (both provisioning and regular cookbooks) are versioned artifacts and Chef has the ability to write that version against known objects which could be used to determine if a machine was available during an update or not.

For anyone wondering the same and coming across this thread the resource to object mapping is done via data bags (so if you’re using a Chef server then on the server and if you’re using local mode then the data_bags directory of the local repository).
The data bags for machines is ‘machines,’ but most are driver+resource specific (so each resource type for each driver will have a data bag with a certain name, at least this is the case for the AWS driver). The names of the data bags don’t really matter unless you have a clash but essentially state is maintained between runs based on resource names, so ‘yes’ is essentially the answer to the 1 and 2 questions I asked.

In terms of getting around the static resource names for objects I’ve not tried it but I think leveraging variables could work, i.e. instead of
machine 'mymachine'
you could run
machine current_stack_version
and then spool up
machine new_stack_version
as a replacement. So long as current_stack_version and new_stack_version are defined somewhere statically, such as variables passed to Chef provision as how Dan mentioned, you can refer between them, then update as necessary.

Having built in deploy strategies and keep track of current and replacement instances of a resource is not something Chef provisioning seems to do right now but would be pretty cool to add for the future if you ask me.