Sorry for the really long post.
Here is our use case:
I agree that one-off runlists are a component of overall orchestration.
Right now we use Control Tier for orchestration. It can handle the workflow
[take server out of load, wait for connections to drain, deploy code to
server, run smoke test, put server back in load]. We want to use Chef for
the “Deploy Code” step. Actually, we plan to use it to deploy configuration
and all configuration dependencies where Control Tier deploys just the code.
(We don’t have any Chef implemented, so these are currently only plans. We
do have Control Tier running and have been using it for over a year
In our case the thought process is that Control Tier would dispatch a
"chef-client -j " or some such thing to the node that is being
acted upon. We want that runlist to have only what is important to that
activity. For a code deployment the runlist would deploy application code.
For system updates the runlist would update system things. Any runlist that
runs on the node is going to need some shared set of attributes on the node.
We need a whole lifecycle of keeping the node attributes up to date so that
all the new configuration for the upcoming deployment is loaded prior to the
Answering your second question here, Before we knew all the details about
Chef, we had the concept of an “attribute runlist” and an "action runlist"
where the attribute runlist would be one runlist used to manage all node
attributes and would not have any recipes that would actually perform work
on the node. Then, we would maintain a collection of activity runlists that
perform sets of system actions relying on the existing attributes on the
Now, we plan on one more variation. I’ll prepend it with a disclaimer that
we are an “old-school” shop learning new tricks. We have a 10+ year old code
base and 10 years of process built around the caution that comes from
countless painful deployments. We don’t have the luxury of wiping the slate
clean so we have to make incremental improvements and build on each success.
That being said, we plan to “pre-deploy” most of our changes. So, the day
before the scheduled deployment we plan to lay down all the code and
configuration that is needed for the deployment in a location near the
running code. Then, the deployment becomes more of [Stop, flip links, update
database, Start]. In this case we would have a runlist that would pre-deploy
configurations and a separate one that would activate the configurations.
Let me know if I am unaware of a feature here: Expanding on the notion of an
"attribute runlist", node attributes should be persistent feature of a node.
If I set an attribute that says my administrators email address is
firstname.lastname@example.org, then I shouldn’t have to have a role in every runlist to
assure that my admin email is always set. “chef-client -j” is destructive in
that it only maintains attributes in the runlist that it ran. This doe
create the problem that if you have persistent attributes you need a method
of removing them. It is a challenging problem when specifying your
attributes to make a process to be able to remove ones you no longer need.
Chef will provide the way to delete, the user must figure out what to
Now, to answer your first questions: I do not think that maintaining one
node object per activity set would be practical in the long run.
On Fri, Jan 28, 2011 at 2:31 PM, Charles Duffy email@example.com wrote:
This speaks more to orchestration than to one-off run lists, but let me
My most interesting workflow I’ve been interesting in modeling is along the
lines of the following:
"If average load across all application servers is less than 1.0, no more
than 1/5 of all app servers are out of the pool, and this node is flagged as
having at least one recipe in the pending-downtime list:
- remove this node from the load balancer’s pool
- wait for all requests to drain
- run all recipes in the pending-downtime list, removing each from said
list after successful completion
- when pending-downtime list is empty, put this server back into the pool"
…where several different recipes have the ability to add their own
entries to the pending-downtime list (which could be anything from a
firewall reconfiguration to an application restart to a full-system reboot)
Of course, the "no more than 1/5 of all app servers are out of the pool"
requirement calls for some care to avoid race conditions.
If y’all are working on an orchestration solution, I would be very
interested to hear how it addresses this kind of use case.
On Fri, Jan 28, 2011 at 1:12 PM, Chris Walters firstname.lastname@example.org wrote:
Absolutely. One-off run lists are one of the most requested features. They
also fit into some of the preliminary discussions we’ve had about
orchestration models. We plan to get a design together for one-off run lists
in the next few weeks to share with the community for feedback.
If you’re willing to comment on your use case more, here are a few
questions that I have.
For your use case, does the multi-node solution with a shared base run
list work, or do you actually need to have only one node object for the
purpose of searching?
Should run lists be first-class objects instead of just properties on
nodes and roles? Should they be able to contain not only roles and recipes
but run list-containing entities (nodes and other dis-embodied run lists),
If anyone else has opinions on any aspect of one-off run lists, please
respond, as well.
Thank you for your input.
On Fri, Jan 28, 2011 at 7:31 AM, Dan Nemec email@example.com wrote:
So, the obligatory next questions is:
“Is this anywhere on the roadmap?”
Thanks for the suggestion about multiple nodes. We’ll play with that and
see if it may be a workable, but not ideal solution.
On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters firstname.lastname@example.org wrote:
There isn’t currently a way that I can think of to run one run list
after another except to package up the main run list into a role and prepend
that role to the one-off run list’s items.
As for one-off run lists, there isn’t currently a built-in solution.
Since a single server can be managed by many chef nodes, one way to do it is
to have different JSON files like you do, but run them as different nodes.
infrastructure maintenance runs:
“chef-client -j infra-maint.json -n node-XYZ-infra-maint”
deployment team runs:
“chef-client -j deployment.json -n node-XYZ-deployment”
Does that help?
On Wed, Jan 26, 2011 at 1:55 PM, email@example.com wrote:
We have run into an interesting problem. We want to segregate runlists
activity (e.g infrastructure maintenance, deployment, one-off, etc…).
want all the runlists to share some common role information about a
have a node that has some roles (datacenter, servergroup, tier) that
important identifiers and drive selection of certain attributes. We
different groups to be able to do maintenance on their parts at
without impacting others. So if a sysadmin wants to update /etc/hosts
shouldn’t have to worry if the application team has put in a new
for a deployment later. The sysadmin can run a runlist that only
parts of the system he is responsible for without worrying that an
deployment recipe will run. Conversely in a software deployment the
team should be able to update the applications without updating the
system (given the os changes are not part of the software deployment).
I thought “chef-client -j” would do this, but it didn’t. This is what I
did: I created a node and bootstrapped it with a runlist of its
I then made a json file with a runlist for a set of activity and ran
runlist via “chef-client -j ”. The problem is that the
for the node that existed before chef-client gets wiped out and only
runlist in the json file gets run thus wiping out its “identity” and
breaking the one-off runlist because certain attributes no longer
I’d like to be able to append a runlist on the fly to an existing
the node where the new runlist exists on the node only for the duration
chef-client run. The node has a “base” runlist that should always be
but I want to run some other recipes and roles one at a time while
“base” runlist. I do not want to have to copy the base runlist into the
json file of the one-shot runlist that I am running as I’m trying to
“activity” runlists environment independent.
Is there a way to run a one-off runlist on a node that is effectively
to the runlist that is already on the node and is removed after the