Sorry for the really long post.
Here is our use case:
I agree that one-off runlists are a component of overall orchestration.
Right now we use Control Tier for orchestration. It can handle the workflow
[take server out of load, wait for connections to drain, deploy code to
server, run smoke test, put server back in load]. We want to use Chef for
the "Deploy Code" step. Actually, we plan to use it to deploy configuration
and all configuration dependencies where Control Tier deploys just the code.
(We don't have any Chef implemented, so these are currently only plans. We
do have Control Tier running and have been using it for over a year
orchestrating deployments).
In our case the thought process is that Control Tier would dispatch a
"chef-client -j " or some such thing to the node that is being
acted upon. We want that runlist to have only what is important to that
activity. For a code deployment the runlist would deploy application code.
For system updates the runlist would update system things. Any runlist that
runs on the node is going to need some shared set of attributes on the node.
We need a whole lifecycle of keeping the node attributes up to date so that
all the new configuration for the upcoming deployment is loaded prior to the
deployment.
Answering your second question here, Before we knew all the details about
Chef, we had the concept of an "attribute runlist" and an "action runlist"
where the attribute runlist would be one runlist used to manage all node
attributes and would not have any recipes that would actually perform work
on the node. Then, we would maintain a collection of activity runlists that
perform sets of system actions relying on the existing attributes on the
node.
Now, we plan on one more variation. I'll prepend it with a disclaimer that
we are an "old-school" shop learning new tricks. We have a 10+ year old code
base and 10 years of process built around the caution that comes from
countless painful deployments. We don't have the luxury of wiping the slate
clean so we have to make incremental improvements and build on each success.
That being said, we plan to "pre-deploy" most of our changes. So, the day
before the scheduled deployment we plan to lay down all the code and
configuration that is needed for the deployment in a location near the
running code. Then, the deployment becomes more of [Stop, flip links, update
database, Start]. In this case we would have a runlist that would pre-deploy
configurations and a separate one that would activate the configurations.
Let me know if I am unaware of a feature here: Expanding on the notion of an
"attribute runlist", node attributes should be persistent feature of a node.
If I set an attribute that says my administrators email address is
admin@example.com, then I shouldn't have to have a role in every runlist to
assure that my admin email is always set. "chef-client -j" is destructive in
that it only maintains attributes in the runlist that it ran. This doe
create the problem that if you have persistent attributes you need a method
of removing them. It is a challenging problem when specifying your
attributes to make a process to be able to remove ones you no longer need.
Chef will provide the way to delete, the user must figure out what to
delete.
Now, to answer your first questions: I do not think that maintaining one
node object per activity set would be practical in the long run.
Dan
On Fri, Jan 28, 2011 at 2:31 PM, Charles Duffy charles@dyfis.net wrote:
This speaks more to orchestration than to one-off run lists, but let me
comment --
My most interesting workflow I've been interesting in modeling is along the
lines of the following:
"If average load across all application servers is less than 1.0, no more
than 1/5 of all app servers are out of the pool, and this node is flagged as
having at least one recipe in the pending-downtime list:
- remove this node from the load balancer's pool
- wait for all requests to drain
- run all recipes in the pending-downtime list, removing each from said
list after successful completion
- when pending-downtime list is empty, put this server back into the pool"
...where several different recipes have the ability to add their own
entries to the pending-downtime list (which could be anything from a
firewall reconfiguration to an application restart to a full-system reboot)
Of course, the "no more than 1/5 of all app servers are out of the pool"
requirement calls for some care to avoid race conditions.
If y'all are working on an orchestration solution, I would be very
interested to hear how it addresses this kind of use case.
On Fri, Jan 28, 2011 at 1:12 PM, Chris Walters cw@opscode.com wrote:
Dan,
Absolutely. One-off run lists are one of the most requested features. They
also fit into some of the preliminary discussions we've had about
orchestration models. We plan to get a design together for one-off run lists
in the next few weeks to share with the community for feedback.
If you're willing to comment on your use case more, here are a few
questions that I have.
For your use case, does the multi-node solution with a shared base run
list work, or do you actually need to have only one node object for the
purpose of searching?
Should run lists be first-class objects instead of just properties on
nodes and roles? Should they be able to contain not only roles and recipes
but run list-containing entities (nodes and other dis-embodied run lists),
as well?
If anyone else has opinions on any aspect of one-off run lists, please
respond, as well.
Thank you for your input.
-chris
On Fri, Jan 28, 2011 at 7:31 AM, Dan Nemec dan@nemecfamily.com wrote:
So, the obligatory next questions is:
"Is this anywhere on the roadmap?"
Thanks for the suggestion about multiple nodes. We'll play with that and
see if it may be a workable, but not ideal solution.
Dan
On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters cw@opscode.com wrote:
Hi Dan,
There isn't currently a way that I can think of to run one run list
after another except to package up the main run list into a role and prepend
that role to the one-off run list's items.
As for one-off run lists, there isn't currently a built-in solution.
Since a single server can be managed by many chef nodes, one way to do it is
to have different JSON files like you do, but run them as different nodes.
Something like:
infrastructure maintenance runs:
"chef-client -j infra-maint.json -n node-XYZ-infra-maint"
deployment team runs:
"chef-client -j deployment.json -n node-XYZ-deployment"
etc.
Does that help?
-chris
On Wed, Jan 26, 2011 at 1:55 PM, dan@nemecfamily.com wrote:
We have run into an interesting problem. We want to segregate runlists
by
activity (e.g infrastructure maintenance, deployment, one-off, etc…).
But we
want all the runlists to share some common role information about a
node. We
have a node that has some roles (datacenter, servergroup, tier) that
are
important identifiers and drive selection of certain attributes. We
want
different groups to be able to do maintenance on their parts at
different times
without impacting others. So if a sysadmin wants to update /etc/hosts
he
shouldn’t have to worry if the application team has put in a new
attribute
for a deployment later. The sysadmin can run a runlist that only
affects the
parts of the system he is responsible for without worrying that an
application
deployment recipe will run. Conversely in a software deployment the
deployment
team should be able to update the applications without updating the
operating
system (given the os changes are not part of the software deployment).
I thought “chef-client -j” would do this, but it didn’t. This is what I
did: I created a node and bootstrapped it with a runlist of its
identity roles.
I then made a json file with a runlist for a set of activity and ran
the
runlist via “chef-client -j ”. The problem is that the
runlist
for the node that existed before chef-client gets wiped out and only
the
runlist in the json file gets run thus wiping out its “identity” and
breaking the one-off runlist because certain attributes no longer
exist.
I’d like to be able to append a runlist on the fly to an existing
runlist on
the node where the new runlist exists on the node only for the duration
of the
chef-client run. The node has a “base” runlist that should always be
run,
but I want to run some other recipes and roles one at a time while
keeping the
“base” runlist. I do not want to have to copy the base runlist into the
json file of the one-shot runlist that I am running as I’m trying to
keep the
“activity” runlists environment independent.
Is there a way to run a one-off runlist on a node that is effectively
appended
to the runlist that is already on the node and is removed after the
run?
Thanks,
Dan