One-Shot runlists with inheritance


#1

We have run into an interesting problem. We want to segregate runlists by
activity (e.g infrastructure maintenance, deployment, one-off, etc…). But we
want all the runlists to share some common role information about a node. We
have a node that has some roles (datacenter, servergroup, tier) that are
important identifiers and drive selection of certain attributes. We want
different groups to be able to do maintenance on their parts at different times
without impacting others. So if a sysadmin wants to update /etc/hosts he
shouldn’t have to worry if the application team has put in a new attribute
for a deployment later. The sysadmin can run a runlist that only affects the
parts of the system he is responsible for without worrying that an application
deployment recipe will run. Conversely in a software deployment the deployment
team should be able to update the applications without updating the operating
system (given the os changes are not part of the software deployment).

I thought “chef-client -j” would do this, but it didn’t. This is what I
did: I created a node and bootstrapped it with a runlist of its identity roles.
I then made a json file with a runlist for a set of activity and ran the
runlist via “chef-client -j ”. The problem is that the runlist
for the node that existed before chef-client gets wiped out and only the
runlist in the json file gets run thus wiping out its “identity” and
breaking the one-off runlist because certain attributes no longer exist.

I’d like to be able to append a runlist on the fly to an existing runlist on
the node where the new runlist exists on the node only for the duration of the
chef-client run. The node has a “base” runlist that should always be run,
but I want to run some other recipes and roles one at a time while keeping the
“base” runlist. I do not want to have to copy the base runlist into the
json file of the one-shot runlist that I am running as I’m trying to keep the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is effectively appended
to the runlist that is already on the node and is removed after the run?

Thanks,

Dan


#2

Hi Dan,

There isn’t currently a way that I can think of to run one run list after
another except to package up the main run list into a role and prepend that
role to the one-off run list’s items.

As for one-off run lists, there isn’t currently a built-in solution. Since a
single server can be managed by many chef nodes, one way to do it is to have
different JSON files like you do, but run them as different nodes. Something
like:

infrastructure maintenance runs:
“chef-client -j infra-maint.json -n node-XYZ-infra-maint”

deployment team runs:
“chef-client -j deployment.json -n node-XYZ-deployment”

etc.

Does that help?

-chris

On Wed, Jan 26, 2011 at 1:55 PM, dan@nemecfamily.com wrote:

We have run into an interesting problem. We want to segregate runlists by
activity (e.g infrastructure maintenance, deployment, one-off, etc…). But
we
want all the runlists to share some common role information about a node.
We
have a node that has some roles (datacenter, servergroup, tier) that are
important identifiers and drive selection of certain attributes. We want
different groups to be able to do maintenance on their parts at different
times
without impacting others. So if a sysadmin wants to update /etc/hosts he
shouldn’t have to worry if the application team has put in a new attribute
for a deployment later. The sysadmin can run a runlist that only affects
the
parts of the system he is responsible for without worrying that an
application
deployment recipe will run. Conversely in a software deployment the
deployment
team should be able to update the applications without updating the
operating
system (given the os changes are not part of the software deployment).

I thought “chef-client -j” would do this, but it didn’t. This is what I
did: I created a node and bootstrapped it with a runlist of its identity
roles.
I then made a json file with a runlist for a set of activity and ran the
runlist via “chef-client -j ”. The problem is that the runlist
for the node that existed before chef-client gets wiped out and only the
runlist in the json file gets run thus wiping out its “identity” and
breaking the one-off runlist because certain attributes no longer exist.

I’d like to be able to append a runlist on the fly to an existing runlist
on
the node where the new runlist exists on the node only for the duration of
the
chef-client run. The node has a “base” runlist that should always be run,
but I want to run some other recipes and roles one at a time while keeping
the
“base” runlist. I do not want to have to copy the base runlist into the
json file of the one-shot runlist that I am running as I’m trying to keep
the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is effectively
appended
to the runlist that is already on the node and is removed after the run?

Thanks,

Dan


#3

So, the obligatory next questions is:

“Is this anywhere on the roadmap?”

Thanks for the suggestion about multiple nodes. We’ll play with that and see
if it may be a workable, but not ideal solution.

Dan

On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters cw@opscode.com wrote:

Hi Dan,

There isn’t currently a way that I can think of to run one run list after
another except to package up the main run list into a role and prepend that
role to the one-off run list’s items.

As for one-off run lists, there isn’t currently a built-in solution. Since
a single server can be managed by many chef nodes, one way to do it is to
have different JSON files like you do, but run them as different nodes.
Something like:

infrastructure maintenance runs:
“chef-client -j infra-maint.json -n node-XYZ-infra-maint”

deployment team runs:
“chef-client -j deployment.json -n node-XYZ-deployment”

etc.

Does that help?

-chris

On Wed, Jan 26, 2011 at 1:55 PM, dan@nemecfamily.com wrote:

We have run into an interesting problem. We want to segregate runlists by
activity (e.g infrastructure maintenance, deployment, one-off, etc…). But
we
want all the runlists to share some common role information about a node.
We
have a node that has some roles (datacenter, servergroup, tier) that are
important identifiers and drive selection of certain attributes. We want
different groups to be able to do maintenance on their parts at different
times
without impacting others. So if a sysadmin wants to update /etc/hosts he
shouldn’t have to worry if the application team has put in a new attribute
for a deployment later. The sysadmin can run a runlist that only affects
the
parts of the system he is responsible for without worrying that an
application
deployment recipe will run. Conversely in a software deployment the
deployment
team should be able to update the applications without updating the
operating
system (given the os changes are not part of the software deployment).

I thought “chef-client -j” would do this, but it didn’t. This is what I
did: I created a node and bootstrapped it with a runlist of its identity
roles.
I then made a json file with a runlist for a set of activity and ran the
runlist via “chef-client -j ”. The problem is that the runlist
for the node that existed before chef-client gets wiped out and only the
runlist in the json file gets run thus wiping out its “identity” and
breaking the one-off runlist because certain attributes no longer exist.

I’d like to be able to append a runlist on the fly to an existing runlist
on
the node where the new runlist exists on the node only for the duration of
the
chef-client run. The node has a “base” runlist that should always be run,
but I want to run some other recipes and roles one at a time while keeping
the
“base” runlist. I do not want to have to copy the base runlist into the
json file of the one-shot runlist that I am running as I’m trying to keep
the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is effectively
appended
to the runlist that is already on the node and is removed after the run?

Thanks,

Dan


#4

Dan,

Absolutely. One-off run lists are one of the most requested features. They
also fit into some of the preliminary discussions we’ve had about
orchestration models. We plan to get a design together for one-off run lists
in the next few weeks to share with the community for feedback.

If you’re willing to comment on your use case more, here are a few questions
that I have.

For your use case, does the multi-node solution with a shared base run list
work, or do you actually need to have only one node object for the purpose
of searching?

Should run lists be first-class objects instead of just properties on nodes
and roles? Should they be able to contain not only roles and recipes but run
list-containing entities (nodes and other dis-embodied run lists), as well?

If anyone else has opinions on any aspect of one-off run lists, please
respond, as well.

Thank you for your input.

-chris

On Fri, Jan 28, 2011 at 7:31 AM, Dan Nemec dan@nemecfamily.com wrote:

So, the obligatory next questions is:

“Is this anywhere on the roadmap?”

Thanks for the suggestion about multiple nodes. We’ll play with that and
see if it may be a workable, but not ideal solution.

Dan

On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters cw@opscode.com wrote:

Hi Dan,

There isn’t currently a way that I can think of to run one run list after
another except to package up the main run list into a role and prepend that
role to the one-off run list’s items.

As for one-off run lists, there isn’t currently a built-in solution. Since
a single server can be managed by many chef nodes, one way to do it is to
have different JSON files like you do, but run them as different nodes.
Something like:

infrastructure maintenance runs:
“chef-client -j infra-maint.json -n node-XYZ-infra-maint”

deployment team runs:
“chef-client -j deployment.json -n node-XYZ-deployment”

etc.

Does that help?

-chris

On Wed, Jan 26, 2011 at 1:55 PM, dan@nemecfamily.com wrote:

We have run into an interesting problem. We want to segregate runlists by
activity (e.g infrastructure maintenance, deployment, one-off, etc…). But
we
want all the runlists to share some common role information about a node.
We
have a node that has some roles (datacenter, servergroup, tier) that are
important identifiers and drive selection of certain attributes. We want
different groups to be able to do maintenance on their parts at different
times
without impacting others. So if a sysadmin wants to update /etc/hosts he
shouldn’t have to worry if the application team has put in a new
attribute
for a deployment later. The sysadmin can run a runlist that only affects
the
parts of the system he is responsible for without worrying that an
application
deployment recipe will run. Conversely in a software deployment the
deployment
team should be able to update the applications without updating the
operating
system (given the os changes are not part of the software deployment).

I thought “chef-client -j” would do this, but it didn’t. This is what I
did: I created a node and bootstrapped it with a runlist of its identity
roles.
I then made a json file with a runlist for a set of activity and ran the
runlist via “chef-client -j ”. The problem is that the runlist
for the node that existed before chef-client gets wiped out and only the
runlist in the json file gets run thus wiping out its “identity” and
breaking the one-off runlist because certain attributes no longer exist.

I’d like to be able to append a runlist on the fly to an existing runlist
on
the node where the new runlist exists on the node only for the duration
of the
chef-client run. The node has a “base” runlist that should always be run,
but I want to run some other recipes and roles one at a time while
keeping the
“base” runlist. I do not want to have to copy the base runlist into the
json file of the one-shot runlist that I am running as I’m trying to keep
the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is effectively
appended
to the runlist that is already on the node and is removed after the run?

Thanks,

Dan


#5

If anyone else has opinions on any aspect of one-off run lists, please
respond, as well.

I don’t know if this is classified as a “one-off run list” but it’s
certainly in the camp of orchestration models. We have a need to deploy to
a cluster of nodes behind a load balancer. We want to pull one out of
rotation, deploy code and configs, smoke test it, and if it passes, put it
back into rotation then serially move on to the next one, etc. I can
provide more details offline if this use case would be helpful to you,
Chris.

Regards.

  • Rob

On Fri, Jan 28, 2011 at 2:12 PM, Chris Walters cw@opscode.com wrote:

Dan,

Absolutely. One-off run lists are one of the most requested features. They
also fit into some of the preliminary discussions we’ve had about
orchestration models. We plan to get a design together for one-off run lists
in the next few weeks to share with the community for feedback.

If you’re willing to comment on your use case more, here are a few
questions that I have.

For your use case, does the multi-node solution with a shared base run list
work, or do you actually need to have only one node object for the purpose
of searching?

Should run lists be first-class objects instead of just properties on nodes
and roles? Should they be able to contain not only roles and recipes but run
list-containing entities (nodes and other dis-embodied run lists), as well?

If anyone else has opinions on any aspect of one-off run lists, please
respond, as well.

Thank you for your input.

-chris

On Fri, Jan 28, 2011 at 7:31 AM, Dan Nemec dan@nemecfamily.com wrote:

So, the obligatory next questions is:

“Is this anywhere on the roadmap?”

Thanks for the suggestion about multiple nodes. We’ll play with that and
see if it may be a workable, but not ideal solution.

Dan

On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters cw@opscode.com wrote:

Hi Dan,

There isn’t currently a way that I can think of to run one run list after
another except to package up the main run list into a role and prepend that
role to the one-off run list’s items.

As for one-off run lists, there isn’t currently a built-in solution.
Since a single server can be managed by many chef nodes, one way to do it is
to have different JSON files like you do, but run them as different nodes.
Something like:

infrastructure maintenance runs:
“chef-client -j infra-maint.json -n node-XYZ-infra-maint”

deployment team runs:
“chef-client -j deployment.json -n node-XYZ-deployment”

etc.

Does that help?

-chris

On Wed, Jan 26, 2011 at 1:55 PM, dan@nemecfamily.com wrote:

We have run into an interesting problem. We want to segregate runlists
by
activity (e.g infrastructure maintenance, deployment, one-off, etc…).
But we
want all the runlists to share some common role information about a
node. We
have a node that has some roles (datacenter, servergroup, tier) that are
important identifiers and drive selection of certain attributes. We want
different groups to be able to do maintenance on their parts at
different times
without impacting others. So if a sysadmin wants to update /etc/hosts he
shouldn’t have to worry if the application team has put in a new
attribute
for a deployment later. The sysadmin can run a runlist that only affects
the
parts of the system he is responsible for without worrying that an
application
deployment recipe will run. Conversely in a software deployment the
deployment
team should be able to update the applications without updating the
operating
system (given the os changes are not part of the software deployment).

I thought “chef-client -j” would do this, but it didn’t. This is what I
did: I created a node and bootstrapped it with a runlist of its identity
roles.
I then made a json file with a runlist for a set of activity and ran
the
runlist via “chef-client -j ”. The problem is that the runlist
for the node that existed before chef-client gets wiped out and only the
runlist in the json file gets run thus wiping out its “identity” and
breaking the one-off runlist because certain attributes no longer exist.

I’d like to be able to append a runlist on the fly to an existing
runlist on
the node where the new runlist exists on the node only for the duration
of the
chef-client run. The node has a “base” runlist that should always be
run,
but I want to run some other recipes and roles one at a time while
keeping the
“base” runlist. I do not want to have to copy the base runlist into the
json file of the one-shot runlist that I am running as I’m trying to
keep the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is effectively
appended
to the runlist that is already on the node and is removed after the run?

Thanks,

Dan


#6

This speaks more to orchestration than to one-off run lists, but let me
comment –

My most interesting workflow I’ve been interesting in modeling is along the
lines of the following:

"If average load across all application servers is less than 1.0, no more
than 1/5 of all app servers are out of the pool, and this node is flagged as
having at least one recipe in the pending-downtime list:

  • remove this node from the load balancer’s pool
  • wait for all requests to drain
  • run all recipes in the pending-downtime list, removing each from said
    list after successful completion
  • when pending-downtime list is empty, put this server back into the pool"

…where several different recipes have the ability to add their own entries
to the pending-downtime list (which could be anything from a firewall
reconfiguration to an application restart to a full-system reboot)

Of course, the "no more than 1/5 of all app servers are out of the pool"
requirement calls for some care to avoid race conditions.

If y’all are working on an orchestration solution, I would be very
interested to hear how it addresses this kind of use case.

On Fri, Jan 28, 2011 at 1:12 PM, Chris Walters cw@opscode.com wrote:

Dan,

Absolutely. One-off run lists are one of the most requested features. They
also fit into some of the preliminary discussions we’ve had about
orchestration models. We plan to get a design together for one-off run lists
in the next few weeks to share with the community for feedback.

If you’re willing to comment on your use case more, here are a few
questions that I have.

For your use case, does the multi-node solution with a shared base run list
work, or do you actually need to have only one node object for the purpose
of searching?

Should run lists be first-class objects instead of just properties on nodes
and roles? Should they be able to contain not only roles and recipes but run
list-containing entities (nodes and other dis-embodied run lists), as well?

If anyone else has opinions on any aspect of one-off run lists, please
respond, as well.

Thank you for your input.

-chris

On Fri, Jan 28, 2011 at 7:31 AM, Dan Nemec dan@nemecfamily.com wrote:

So, the obligatory next questions is:

“Is this anywhere on the roadmap?”

Thanks for the suggestion about multiple nodes. We’ll play with that and
see if it may be a workable, but not ideal solution.

Dan

On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters cw@opscode.com wrote:

Hi Dan,

There isn’t currently a way that I can think of to run one run list after
another except to package up the main run list into a role and prepend that
role to the one-off run list’s items.

As for one-off run lists, there isn’t currently a built-in solution.
Since a single server can be managed by many chef nodes, one way to do it is
to have different JSON files like you do, but run them as different nodes.
Something like:

infrastructure maintenance runs:
“chef-client -j infra-maint.json -n node-XYZ-infra-maint”

deployment team runs:
“chef-client -j deployment.json -n node-XYZ-deployment”

etc.

Does that help?

-chris

On Wed, Jan 26, 2011 at 1:55 PM, dan@nemecfamily.com wrote:

We have run into an interesting problem. We want to segregate runlists
by
activity (e.g infrastructure maintenance, deployment, one-off, etc…).
But we
want all the runlists to share some common role information about a
node. We
have a node that has some roles (datacenter, servergroup, tier) that are
important identifiers and drive selection of certain attributes. We want
different groups to be able to do maintenance on their parts at
different times
without impacting others. So if a sysadmin wants to update /etc/hosts he
shouldn’t have to worry if the application team has put in a new
attribute
for a deployment later. The sysadmin can run a runlist that only affects
the
parts of the system he is responsible for without worrying that an
application
deployment recipe will run. Conversely in a software deployment the
deployment
team should be able to update the applications without updating the
operating
system (given the os changes are not part of the software deployment).

I thought “chef-client -j” would do this, but it didn’t. This is what I
did: I created a node and bootstrapped it with a runlist of its identity
roles.
I then made a json file with a runlist for a set of activity and ran
the
runlist via “chef-client -j ”. The problem is that the runlist
for the node that existed before chef-client gets wiped out and only the
runlist in the json file gets run thus wiping out its “identity” and
breaking the one-off runlist because certain attributes no longer exist.

I’d like to be able to append a runlist on the fly to an existing
runlist on
the node where the new runlist exists on the node only for the duration
of the
chef-client run. The node has a “base” runlist that should always be
run,
but I want to run some other recipes and roles one at a time while
keeping the
“base” runlist. I do not want to have to copy the base runlist into the
json file of the one-shot runlist that I am running as I’m trying to
keep the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is effectively
appended
to the runlist that is already on the node and is removed after the run?

Thanks,

Dan


#7

Sorry for the really long post.

Here is our use case:

I agree that one-off runlists are a component of overall orchestration.
Right now we use Control Tier for orchestration. It can handle the workflow
[take server out of load, wait for connections to drain, deploy code to
server, run smoke test, put server back in load]. We want to use Chef for
the “Deploy Code” step. Actually, we plan to use it to deploy configuration
and all configuration dependencies where Control Tier deploys just the code.
(We don’t have any Chef implemented, so these are currently only plans. We
do have Control Tier running and have been using it for over a year
orchestrating deployments).

In our case the thought process is that Control Tier would dispatch a
"chef-client -j " or some such thing to the node that is being
acted upon. We want that runlist to have only what is important to that
activity. For a code deployment the runlist would deploy application code.
For system updates the runlist would update system things. Any runlist that
runs on the node is going to need some shared set of attributes on the node.
We need a whole lifecycle of keeping the node attributes up to date so that
all the new configuration for the upcoming deployment is loaded prior to the
deployment.

Answering your second question here, Before we knew all the details about
Chef, we had the concept of an “attribute runlist” and an "action runlist"
where the attribute runlist would be one runlist used to manage all node
attributes and would not have any recipes that would actually perform work
on the node. Then, we would maintain a collection of activity runlists that
perform sets of system actions relying on the existing attributes on the
node.

Now, we plan on one more variation. I’ll prepend it with a disclaimer that
we are an “old-school” shop learning new tricks. We have a 10+ year old code
base and 10 years of process built around the caution that comes from
countless painful deployments. We don’t have the luxury of wiping the slate
clean so we have to make incremental improvements and build on each success.
That being said, we plan to “pre-deploy” most of our changes. So, the day
before the scheduled deployment we plan to lay down all the code and
configuration that is needed for the deployment in a location near the
running code. Then, the deployment becomes more of [Stop, flip links, update
database, Start]. In this case we would have a runlist that would pre-deploy
configurations and a separate one that would activate the configurations.

Let me know if I am unaware of a feature here: Expanding on the notion of an
"attribute runlist", node attributes should be persistent feature of a node.
If I set an attribute that says my administrators email address is
admin@example.com, then I shouldn’t have to have a role in every runlist to
assure that my admin email is always set. “chef-client -j” is destructive in
that it only maintains attributes in the runlist that it ran. This doe
create the problem that if you have persistent attributes you need a method
of removing them. It is a challenging problem when specifying your
attributes to make a process to be able to remove ones you no longer need.
Chef will provide the way to delete, the user must figure out what to
delete.

Now, to answer your first questions: I do not think that maintaining one
node object per activity set would be practical in the long run.

Dan

On Fri, Jan 28, 2011 at 2:31 PM, Charles Duffy charles@dyfis.net wrote:

This speaks more to orchestration than to one-off run lists, but let me
comment –

My most interesting workflow I’ve been interesting in modeling is along the
lines of the following:

"If average load across all application servers is less than 1.0, no more
than 1/5 of all app servers are out of the pool, and this node is flagged as
having at least one recipe in the pending-downtime list:

  • remove this node from the load balancer’s pool
  • wait for all requests to drain
  • run all recipes in the pending-downtime list, removing each from said
    list after successful completion
  • when pending-downtime list is empty, put this server back into the pool"

…where several different recipes have the ability to add their own
entries to the pending-downtime list (which could be anything from a
firewall reconfiguration to an application restart to a full-system reboot)

Of course, the "no more than 1/5 of all app servers are out of the pool"
requirement calls for some care to avoid race conditions.

If y’all are working on an orchestration solution, I would be very
interested to hear how it addresses this kind of use case.

On Fri, Jan 28, 2011 at 1:12 PM, Chris Walters cw@opscode.com wrote:

Dan,

Absolutely. One-off run lists are one of the most requested features. They
also fit into some of the preliminary discussions we’ve had about
orchestration models. We plan to get a design together for one-off run lists
in the next few weeks to share with the community for feedback.

If you’re willing to comment on your use case more, here are a few
questions that I have.

For your use case, does the multi-node solution with a shared base run
list work, or do you actually need to have only one node object for the
purpose of searching?

Should run lists be first-class objects instead of just properties on
nodes and roles? Should they be able to contain not only roles and recipes
but run list-containing entities (nodes and other dis-embodied run lists),
as well?

If anyone else has opinions on any aspect of one-off run lists, please
respond, as well.

Thank you for your input.

-chris

On Fri, Jan 28, 2011 at 7:31 AM, Dan Nemec dan@nemecfamily.com wrote:

So, the obligatory next questions is:

“Is this anywhere on the roadmap?”

Thanks for the suggestion about multiple nodes. We’ll play with that and
see if it may be a workable, but not ideal solution.

Dan

On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters cw@opscode.com wrote:

Hi Dan,

There isn’t currently a way that I can think of to run one run list
after another except to package up the main run list into a role and prepend
that role to the one-off run list’s items.

As for one-off run lists, there isn’t currently a built-in solution.
Since a single server can be managed by many chef nodes, one way to do it is
to have different JSON files like you do, but run them as different nodes.
Something like:

infrastructure maintenance runs:
“chef-client -j infra-maint.json -n node-XYZ-infra-maint”

deployment team runs:
“chef-client -j deployment.json -n node-XYZ-deployment”

etc.

Does that help?

-chris

On Wed, Jan 26, 2011 at 1:55 PM, dan@nemecfamily.com wrote:

We have run into an interesting problem. We want to segregate runlists
by
activity (e.g infrastructure maintenance, deployment, one-off, etc…).
But we
want all the runlists to share some common role information about a
node. We
have a node that has some roles (datacenter, servergroup, tier) that
are
important identifiers and drive selection of certain attributes. We
want
different groups to be able to do maintenance on their parts at
different times
without impacting others. So if a sysadmin wants to update /etc/hosts
he
shouldn’t have to worry if the application team has put in a new
attribute
for a deployment later. The sysadmin can run a runlist that only
affects the
parts of the system he is responsible for without worrying that an
application
deployment recipe will run. Conversely in a software deployment the
deployment
team should be able to update the applications without updating the
operating
system (given the os changes are not part of the software deployment).

I thought “chef-client -j” would do this, but it didn’t. This is what I
did: I created a node and bootstrapped it with a runlist of its
identity roles.
I then made a json file with a runlist for a set of activity and ran
the
runlist via “chef-client -j ”. The problem is that the
runlist
for the node that existed before chef-client gets wiped out and only
the
runlist in the json file gets run thus wiping out its “identity” and
breaking the one-off runlist because certain attributes no longer
exist.

I’d like to be able to append a runlist on the fly to an existing
runlist on
the node where the new runlist exists on the node only for the duration
of the
chef-client run. The node has a “base” runlist that should always be
run,
but I want to run some other recipes and roles one at a time while
keeping the
“base” runlist. I do not want to have to copy the base runlist into the
json file of the one-shot runlist that I am running as I’m trying to
keep the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is effectively
appended
to the runlist that is already on the node and is removed after the
run?

Thanks,

Dan


#8

Thinking about this problem, I’ve written a “one-shot” cookbook that
may be used to solve simple cases of this problem.

https://github.com/mattray/cookbooks/tree/master/one-shot

This cookbook provides a framework for making single-use, one-shot
recipes. By including the “one-shot” recipe in the node’s run_list, on
the next chef-client run the contents of the "one-shot::one-shot"
recipe will be called. This is parametrized as an attribute, so you
can change these out by setting the [“one_shot”][“recipe”] to include
different recipes (and uploading dependencies if necessary). The file
roles/one-shot.rb is included so you can simply change the role
instead of changing the source directly.

Thanks,
Matt Ray
Technical Evangelist | Opscode, Inc
E: matt@opscode.com T: (512) 731-2218
Twitter, Github: mattray

On Fri, Jan 28, 2011 at 2:18 PM, Dan Nemec dan@nemecfamily.com wrote:

Sorry for the really long post.
Here is our use case:
I agree that one-off runlists are a component of overall orchestration.
Right now we use Control Tier for orchestration. It can handle the workflow
[take server out of load, wait for connections to drain, deploy code to
server, run smoke test, put server back in load]. We want to use Chef for
the “Deploy Code” step. Actually, we plan to use it to deploy configuration
and all configuration dependencies where Control Tier deploys just the code.
(We don’t have any Chef implemented, so these are currently only plans. We
do have Control Tier running and have been using it for over a year
orchestrating deployments).
In our case the thought process is that Control Tier would dispatch a
"chef-client -j " or some such thing to the node that is being
acted upon. We want that runlist to have only what is important to that
activity. For a code deployment the runlist would deploy application code.
For system updates the runlist would update system things. Any runlist that
runs on the node is going to need some shared set of attributes on the node.
We need a whole lifecycle of keeping the node attributes up to date so that
all the new configuration for the upcoming deployment is loaded prior to the
deployment.
Answering your second question here, Before we knew all the details about
Chef, we had the concept of an “attribute runlist” and an “action runlist"
where the attribute runlist would be one runlist used to manage all node
attributes and would not have any recipes that would actually perform work
on the node. Then, we would maintain a collection of activity runlists that
perform sets of system actions relying on the existing attributes on the
node.
Now, we plan on one more variation. I’ll prepend it with a disclaimer that
we are an “old-school” shop learning new tricks. We have a 10+ year old code
base and 10 years of process built around the caution that comes from
countless painful deployments. We don’t have the luxury of wiping the slate
clean so we have to make incremental improvements and build on each success.
That being said, we plan to “pre-deploy” most of our changes. So, the day
before the scheduled deployment we plan to lay down all the code and
configuration that is needed for the deployment in a location near the
running code. Then, the deployment becomes more of [Stop, flip links, update
database, Start]. In this case we would have a runlist that would pre-deploy
configurations and a separate one that would activate the configurations.
Let me know if I am unaware of a feature here: Expanding on the notion of an
"attribute runlist”, node attributes should be persistent feature of a node.
If I set an attribute that says my administrators email address is
admin@example.com, then I shouldn’t have to have a role in every runlist to
assure that my admin email is always set. “chef-client -j” is destructive in
that it only maintains attributes in the runlist that it ran. This doe
create the problem that if you have persistent attributes you need a method
of removing them. It is a challenging problem when specifying your
attributes to make a process to be able to remove ones you no longer need.
Chef will provide the way to delete, the user must figure out what to
delete.
Now, to answer your first questions: I do not think that maintaining one
node object per activity set would be practical in the long run.
Dan

On Fri, Jan 28, 2011 at 2:31 PM, Charles Duffy charles@dyfis.net wrote:

This speaks more to orchestration than to one-off run lists, but let me
comment –
My most interesting workflow I’ve been interesting in modeling is along
the lines of the following:
"If average load across all application servers is less than 1.0, no more
than 1/5 of all app servers are out of the pool, and this node is flagged as
having at least one recipe in the pending-downtime list:

  • remove this node from the load balancer’s pool
  • wait for all requests to drain
  • run all recipes in the pending-downtime list, removing each from said
    list after successful completion
  • when pending-downtime list is empty, put this server back into the
    pool"
    …where several different recipes have the ability to add their own
    entries to the pending-downtime list (which could be anything from a
    firewall reconfiguration to an application restart to a full-system reboot)
    Of course, the "no more than 1/5 of all app servers are out of the pool"
    requirement calls for some care to avoid race conditions.
    If y’all are working on an orchestration solution, I would be very
    interested to hear how it addresses this kind of use case.
    On Fri, Jan 28, 2011 at 1:12 PM, Chris Walters cw@opscode.com wrote:

Dan,
Absolutely. One-off run lists are one of the most requested features.
They also fit into some of the preliminary discussions we’ve had about
orchestration models. We plan to get a design together for one-off run lists
in the next few weeks to share with the community for feedback.
If you’re willing to comment on your use case more, here are a few
questions that I have.
For your use case, does the multi-node solution with a shared base run
list work, or do you actually need to have only one node object for the
purpose of searching?
Should run lists be first-class objects instead of just properties on
nodes and roles? Should they be able to contain not only roles and recipes
but run list-containing entities (nodes and other dis-embodied run lists),
as well?
If anyone else has opinions on any aspect of one-off run lists, please
respond, as well.
Thank you for your input.
-chris

On Fri, Jan 28, 2011 at 7:31 AM, Dan Nemec dan@nemecfamily.com wrote:

So, the obligatory next questions is:
"Is this anywhere on the roadmap?"
Thanks for the suggestion about multiple nodes. We’ll play with that and
see if it may be a workable, but not ideal solution.
Dan

On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters cw@opscode.com wrote:

Hi Dan,
There isn’t currently a way that I can think of to run one run list
after another except to package up the main run list into a role and prepend
that role to the one-off run list’s items.
As for one-off run lists, there isn’t currently a built-in solution.
Since a single server can be managed by many chef nodes, one way to do it is
to have different JSON files like you do, but run them as different nodes.
Something like:

infrastructure maintenance runs:
"chef-client -j infra-maint.json -n node-XYZ-infra-maint"
deployment team runs:
“chef-client -j deployment.json -n node-XYZ-deployment”

etc.

Does that help?
-chris
On Wed, Jan 26, 2011 at 1:55 PM, dan@nemecfamily.com wrote:

We have run into an interesting problem. We want to segregate runlists
by
activity (e.g infrastructure maintenance, deployment, one-off, etc…).
But we
want all the runlists to share some common role information about a
node. We
have a node that has some roles (datacenter, servergroup, tier) that
are
important identifiers and drive selection of certain attributes. We
want
different groups to be able to do maintenance on their parts at
different times
without impacting others. So if a sysadmin wants to update /etc/hosts
he
shouldn’t have to worry if the application team has put in a new
attribute
for a deployment later. The sysadmin can run a runlist that only
affects the
parts of the system he is responsible for without worrying that an
application
deployment recipe will run. Conversely in a software deployment the
deployment
team should be able to update the applications without updating the
operating
system (given the os changes are not part of the software deployment).

I thought “chef-client -j” would do this, but it didn’t. This is what
I
did: I created a node and bootstrapped it with a runlist of its
identity roles.
I then made a json file with a runlist for a set of activity and ran
the
runlist via “chef-client -j ”. The problem is that the
runlist
for the node that existed before chef-client gets wiped out and only
the
runlist in the json file gets run thus wiping out its “identity” and
breaking the one-off runlist because certain attributes no longer
exist.

I’d like to be able to append a runlist on the fly to an existing
runlist on
the node where the new runlist exists on the node only for the
duration of the
chef-client run. The node has a “base” runlist that should always be
run,
but I want to run some other recipes and roles one at a time while
keeping the
“base” runlist. I do not want to have to copy the base runlist into
the
json file of the one-shot runlist that I am running as I’m trying to
keep the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is effectively
appended
to the runlist that is already on the node and is removed after the
run?

Thanks,

Dan


#9

Matt,

That’s a great idea. I looked it over and I think it does solve the problem
I’m thinking of with a one-shot runlist.

I’m relatively new to Chef so I don’t know everything that is possible and
have a few questions.

  1. Your attribute that contains a list of recipes, can it contain roles with
    runlists as well?

I’m still left with the problem that I require a “base” runlist as well. The
way I see your one-shot runlist is that if I want to use it with
"chef-client -j" then I still need to include all of the roles and recipes I
consider “base” in the list of the -j option. Your solution just provides me
the mechanism to attach some recipes (maybe roles) to an existing runlist
where it will be removed after the run. That is exactly what I need for the
second half of my problem.

[I’ll interject here that one of my design goals is that I have
environment-specific configuration in as few places as possible. Commands,
especially, cannot be environment dependent. I just want to run “doit” not
"doit.prod".

  1. Given I have a databag with some configuration that is a runlist that
    can contain roles or recipes (here in this databag is my node-specific
    information, and nowhere else). Is it possible to have a recipe that will
    read from the databag, then construct a runlist on the fly and run that
    runlist?

That way I can say something like:

chef-client -j infrastructure.json

where infrastructure.json looks like

{ “run_list”: [ “recipe[node-manager::base-runlist]”,
“recipe[one-shot::infrastructure]” ] }

Then when that runs, it concatenates the runlist from the node databag
attribute and the runlist from the infrastructure attributes.

Let me know if you think I’m getting to far out in my quest for base
runlists and one-shot runlists. I can’t run a monolithic runlist that does
everything.

Dan

On Tue, Feb 1, 2011 at 2:43 PM, Matt Ray matt@opscode.com wrote:

Thinking about this problem, I’ve written a “one-shot” cookbook that
may be used to solve simple cases of this problem.

https://github.com/mattray/cookbooks/tree/master/one-shot

This cookbook provides a framework for making single-use, one-shot
recipes. By including the “one-shot” recipe in the node’s run_list, on
the next chef-client run the contents of the "one-shot::one-shot"
recipe will be called. This is parametrized as an attribute, so you
can change these out by setting the [“one_shot”][“recipe”] to include
different recipes (and uploading dependencies if necessary). The file
roles/one-shot.rb is included so you can simply change the role
instead of changing the source directly.

Thanks,
Matt Ray
Technical Evangelist | Opscode, Inc
E: matt@opscode.com T: (512) 731-2218
Twitter, Github: mattray

On Fri, Jan 28, 2011 at 2:18 PM, Dan Nemec dan@nemecfamily.com wrote:

Sorry for the really long post.
Here is our use case:
I agree that one-off runlists are a component of overall orchestration.
Right now we use Control Tier for orchestration. It can handle the
workflow
[take server out of load, wait for connections to drain, deploy code to
server, run smoke test, put server back in load]. We want to use Chef for
the “Deploy Code” step. Actually, we plan to use it to deploy
configuration
and all configuration dependencies where Control Tier deploys just the
code.
(We don’t have any Chef implemented, so these are currently only plans.
We
do have Control Tier running and have been using it for over a year
orchestrating deployments).
In our case the thought process is that Control Tier would dispatch a
"chef-client -j " or some such thing to the node that is being
acted upon. We want that runlist to have only what is important to that
activity. For a code deployment the runlist would deploy application
code.
For system updates the runlist would update system things. Any runlist
that
runs on the node is going to need some shared set of attributes on the
node.
We need a whole lifecycle of keeping the node attributes up to date so
that
all the new configuration for the upcoming deployment is loaded prior to
the
deployment.
Answering your second question here, Before we knew all the details about
Chef, we had the concept of an “attribute runlist” and an “action
runlist"
where the attribute runlist would be one runlist used to manage all node
attributes and would not have any recipes that would actually perform
work
on the node. Then, we would maintain a collection of activity runlists
that
perform sets of system actions relying on the existing attributes on the
node.
Now, we plan on one more variation. I’ll prepend it with
a disclaimer that
we are an “old-school” shop learning new tricks. We have a 10+ year old
code
base and 10 years of process built around the caution that comes from
countless painful deployments. We don’t have the luxury of wiping the
slate
clean so we have to make incremental improvements and build on each
success.
That being said, we plan to “pre-deploy” most of our changes. So, the day
before the scheduled deployment we plan to lay down all the code and
configuration that is needed for the deployment in a location near the
running code. Then, the deployment becomes more of [Stop, flip links,
update
database, Start]. In this case we would have a runlist that would
pre-deploy
configurations and a separate one that would activate the configurations.
Let me know if I am unaware of a feature here: Expanding on the notion of
an
"attribute runlist”, node attributes should be persistent feature of a
node.
If I set an attribute that says my administrators email address is
admin@example.com, then I shouldn’t have to have a role in every runlist
to
assure that my admin email is always set. “chef-client -j” is destructive
in
that it only maintains attributes in the runlist that it ran. This doe
create the problem that if you have persistent attributes you need a
method
of removing them. It is a challenging problem when specifying your
attributes to make a process to be able to remove ones you no longer
need.
Chef will provide the way to delete, the user must figure out what to
delete.
Now, to answer your first questions: I do not think that maintaining one
node object per activity set would be practical in the long run.
Dan

On Fri, Jan 28, 2011 at 2:31 PM, Charles Duffy charles@dyfis.net
wrote:

This speaks more to orchestration than to one-off run lists, but let me
comment –
My most interesting workflow I’ve been interesting in modeling is along
the lines of the following:
"If average load across all application servers is less than 1.0, no
more

than 1/5 of all app servers are out of the pool, and this node is
flagged as

having at least one recipe in the pending-downtime list:

  • remove this node from the load balancer’s pool
  • wait for all requests to drain
  • run all recipes in the pending-downtime list, removing each from said
    list after successful completion
  • when pending-downtime list is empty, put this server back into the
    pool"
    …where several different recipes have the ability to add their own
    entries to the pending-downtime list (which could be anything from a
    firewall reconfiguration to an application restart to a full-system
    reboot)

Of course, the "no more than 1/5 of all app servers are out of the pool"
requirement calls for some care to avoid race conditions.
If y’all are working on an orchestration solution, I would be very
interested to hear how it addresses this kind of use case.
On Fri, Jan 28, 2011 at 1:12 PM, Chris Walters cw@opscode.com wrote:

Dan,
Absolutely. One-off run lists are one of the most requested features.
They also fit into some of the preliminary discussions we’ve had about
orchestration models. We plan to get a design together for one-off run
lists

in the next few weeks to share with the community for feedback.
If you’re willing to comment on your use case more, here are a few
questions that I have.
For your use case, does the multi-node solution with a shared base run
list work, or do you actually need to have only one node object for the
purpose of searching?
Should run lists be first-class objects instead of just properties on
nodes and roles? Should they be able to contain not only roles and
recipes

but run list-containing entities (nodes and other dis-embodied run
lists),

as well?
If anyone else has opinions on any aspect of one-off run lists, please
respond, as well.
Thank you for your input.
-chris

On Fri, Jan 28, 2011 at 7:31 AM, Dan Nemec dan@nemecfamily.com
wrote:

So, the obligatory next questions is:
"Is this anywhere on the roadmap?"
Thanks for the suggestion about multiple nodes. We’ll play with that
and

see if it may be a workable, but not ideal solution.
Dan

On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters cw@opscode.com
wrote:

Hi Dan,
There isn’t currently a way that I can think of to run one run list
after another except to package up the main run list into a role and
prepend

that role to the one-off run list’s items.
As for one-off run lists, there isn’t currently a built-in solution.
Since a single server can be managed by many chef nodes, one way to
do it is

to have different JSON files like you do, but run them as different
nodes.

Something like:

infrastructure maintenance runs:
"chef-client -j infra-maint.json -n node-XYZ-infra-maint"
deployment team runs:
“chef-client -j deployment.json -n node-XYZ-deployment”

etc.

Does that help?
-chris
On Wed, Jan 26, 2011 at 1:55 PM, dan@nemecfamily.com wrote:

We have run into an interesting problem. We want to segregate
runlists

by
activity (e.g infrastructure maintenance, deployment, one-off,
etc…).

But we
want all the runlists to share some common role information about a
node. We
have a node that has some roles (datacenter, servergroup, tier) that
are
important identifiers and drive selection of certain attributes. We
want
different groups to be able to do maintenance on their parts at
different times
without impacting others. So if a sysadmin wants to update
/etc/hosts

he
shouldn’t have to worry if the application team has put in a new
attribute
for a deployment later. The sysadmin can run a runlist that only
affects the
parts of the system he is responsible for without worrying that an
application
deployment recipe will run. Conversely in a software deployment the
deployment
team should be able to update the applications without updating the
operating
system (given the os changes are not part of the software
deployment).

I thought “chef-client -j” would do this, but it didn’t. This is
what

I
did: I created a node and bootstrapped it with a runlist of its
identity roles.
I then made a json file with a runlist for a set of activity and
ran

the
runlist via “chef-client -j ”. The problem is that the
runlist
for the node that existed before chef-client gets wiped out and only
the
runlist in the json file gets run thus wiping out its “identity” and
breaking the one-off runlist because certain attributes no longer
exist.

I’d like to be able to append a runlist on the fly to an existing
runlist on
the node where the new runlist exists on the node only for the
duration of the
chef-client run. The node has a “base” runlist that should always be
run,
but I want to run some other recipes and roles one at a time while
keeping the
“base” runlist. I do not want to have to copy the base runlist into
the
json file of the one-shot runlist that I am running as I’m trying to
keep the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is
effectively

appended
to the runlist that is already on the node and is removed after the
run?

Thanks,

Dan


#10

Chef (and puppet and cfengine and bcfg2 and…) are really meant to
converge an entire system against a (composed) state description, not
doing “one off” operations.

Keeping that in mind, if you’re absolutely sure that the “one-off” run
list won’t step on the toes of the “normal” runlist, you could try:

Adding an additional Client (SSL user) to your platform account, and
calling chef-client -c /path/to/an/alternate/configfile.rb, that
specifies a special separate node name.

For example: foonode-oneoff.yourdomain.here.

Add your runlist to that node object, then call chef-client against
the special node.

YMMV

-s

On Tue, Feb 1, 2011 at 5:03 PM, Dan Nemec dan@nemecfamily.com wrote:

Matt,
That’s a great idea. I looked it over and I think it does solve the problem
I’m thinking of with a one-shot runlist.
I’m relatively new to Chef so I don’t know everything that is possible and
have a few questions.

  1. Your attribute that contains a list of recipes, can it contain roles with
    runlists as well?
    I’m still left with the problem that I require a “base” runlist as well. The
    way I see your one-shot runlist is that if I want to use it with
    "chef-client -j" then I still need to include all of the roles and recipes I
    consider “base” in the list of the -j option. Your solution just provides me
    the mechanism to attach some recipes (maybe roles) to an existing runlist
    where it will be removed after the run. That is exactly what I need for the
    second half of my problem.
    [I’ll interject here that one of my design goals is that I have
    environment-specific configuration in as few places as possible. Commands,
    especially, cannot be environment dependent. I just want to run “doit” not
    "doit.prod".
  2. Given I have a databag with some configuration that is a runlist that
    can contain roles or recipes (here in this databag is my node-specific
    information, and nowhere else). Is it possible to have a recipe that will
    read from the databag, then construct a runlist on the fly and run that
    runlist?
    That way I can say something like:
    chef-client -j infrastructure.json
    where infrastructure.json looks like
    { “run_list”: [ “recipe[node-manager::base-runlist]”,
    “recipe[one-shot::infrastructure]” ] }
    Then when that runs, it concatenates the runlist from the node databag
    attribute and the runlist from the infrastructure attributes.
    Let me know if you think I’m getting to far out in my quest for base
    runlists and one-shot runlists. I can’t run a monolithic runlist that does
    everything.
    Dan
    On Tue, Feb 1, 2011 at 2:43 PM, Matt Ray matt@opscode.com wrote:

Thinking about this problem, I’ve written a “one-shot” cookbook that
may be used to solve simple cases of this problem.

https://github.com/mattray/cookbooks/tree/master/one-shot

This cookbook provides a framework for making single-use, one-shot
recipes. By including the “one-shot” recipe in the node’s run_list, on
the next chef-client run the contents of the "one-shot::one-shot"
recipe will be called. This is parametrized as an attribute, so you
can change these out by setting the [“one_shot”][“recipe”] to include
different recipes (and uploading dependencies if necessary). The file
roles/one-shot.rb is included so you can simply change the role
instead of changing the source directly.

Thanks,
Matt Ray
Technical Evangelist | Opscode, Inc
E: matt@opscode.com T: (512) 731-2218
Twitter, Github: mattray

On Fri, Jan 28, 2011 at 2:18 PM, Dan Nemec dan@nemecfamily.com wrote:

Sorry for the really long post.
Here is our use case:
I agree that one-off runlists are a component of overall orchestration.
Right now we use Control Tier for orchestration. It can handle the
workflow
[take server out of load, wait for connections to drain, deploy code to
server, run smoke test, put server back in load]. We want to use Chef
for
the “Deploy Code” step. Actually, we plan to use it to deploy
configuration
and all configuration dependencies where Control Tier deploys just the
code.
(We don’t have any Chef implemented, so these are currently only plans.
We
do have Control Tier running and have been using it for over a year
orchestrating deployments).
In our case the thought process is that Control Tier would dispatch a
"chef-client -j " or some such thing to the node that is being
acted upon. We want that runlist to have only what is important to that
activity. For a code deployment the runlist would deploy application
code.
For system updates the runlist would update system things. Any runlist
that
runs on the node is going to need some shared set of attributes on the
node.
We need a whole lifecycle of keeping the node attributes up to date so
that
all the new configuration for the upcoming deployment is loaded prior to
the
deployment.
Answering your second question here, Before we knew all the details
about
Chef, we had the concept of an “attribute runlist” and an “action
runlist"
where the attribute runlist would be one runlist used to manage all node
attributes and would not have any recipes that would actually perform
work
on the node. Then, we would maintain a collection of activity runlists
that
perform sets of system actions relying on the existing attributes on the
node.
Now, we plan on one more variation. I’ll prepend it with
a disclaimer that
we are an “old-school” shop learning new tricks. We have a 10+ year old
code
base and 10 years of process built around the caution that comes from
countless painful deployments. We don’t have the luxury of wiping the
slate
clean so we have to make incremental improvements and build on each
success.
That being said, we plan to “pre-deploy” most of our changes. So, the
day
before the scheduled deployment we plan to lay down all the code and
configuration that is needed for the deployment in a location near the
running code. Then, the deployment becomes more of [Stop, flip links,
update
database, Start]. In this case we would have a runlist that would
pre-deploy
configurations and a separate one that would activate the
configurations.
Let me know if I am unaware of a feature here: Expanding on the notion
of an
"attribute runlist”, node attributes should be persistent feature of a
node.
If I set an attribute that says my administrators email address is
admin@example.com, then I shouldn’t have to have a role in every runlist
to
assure that my admin email is always set. “chef-client -j” is
destructive in
that it only maintains attributes in the runlist that it ran. This doe
create the problem that if you have persistent attributes you need a
method
of removing them. It is a challenging problem when specifying your
attributes to make a process to be able to remove ones you no longer
need.
Chef will provide the way to delete, the user must figure out what to
delete.
Now, to answer your first questions: I do not think that maintaining one
node object per activity set would be practical in the long run.
Dan

On Fri, Jan 28, 2011 at 2:31 PM, Charles Duffy charles@dyfis.net
wrote:

This speaks more to orchestration than to one-off run lists, but let me
comment –
My most interesting workflow I’ve been interesting in modeling is along
the lines of the following:
"If average load across all application servers is less than 1.0, no
more
than 1/5 of all app servers are out of the pool, and this node is
flagged as
having at least one recipe in the pending-downtime list:

  • remove this node from the load balancer’s pool
  • wait for all requests to drain
  • run all recipes in the pending-downtime list, removing each from
    said
    list after successful completion
  • when pending-downtime list is empty, put this server back into the
    pool"
    …where several different recipes have the ability to add their own
    entries to the pending-downtime list (which could be anything from a
    firewall reconfiguration to an application restart to a full-system
    reboot)
    Of course, the "no more than 1/5 of all app servers are out of the
    pool"
    requirement calls for some care to avoid race conditions.
    If y’all are working on an orchestration solution, I would be very
    interested to hear how it addresses this kind of use case.
    On Fri, Jan 28, 2011 at 1:12 PM, Chris Walters cw@opscode.com wrote:

Dan,
Absolutely. One-off run lists are one of the most requested features.
They also fit into some of the preliminary discussions we’ve had about
orchestration models. We plan to get a design together for one-off run
lists
in the next few weeks to share with the community for feedback.
If you’re willing to comment on your use case more, here are a few
questions that I have.
For your use case, does the multi-node solution with a shared base run
list work, or do you actually need to have only one node object for
the
purpose of searching?
Should run lists be first-class objects instead of just properties on
nodes and roles? Should they be able to contain not only roles and
recipes
but run list-containing entities (nodes and other dis-embodied run
lists),
as well?
If anyone else has opinions on any aspect of one-off run lists, please
respond, as well.
Thank you for your input.
-chris

On Fri, Jan 28, 2011 at 7:31 AM, Dan Nemec dan@nemecfamily.com
wrote:

So, the obligatory next questions is:
"Is this anywhere on the roadmap?"
Thanks for the suggestion about multiple nodes. We’ll play with that
and
see if it may be a workable, but not ideal solution.
Dan

On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters cw@opscode.com
wrote:

Hi Dan,
There isn’t currently a way that I can think of to run one run list
after another except to package up the main run list into a role and
prepend
that role to the one-off run list’s items.
As for one-off run lists, there isn’t currently a built-in solution.
Since a single server can be managed by many chef nodes, one way to
do it is
to have different JSON files like you do, but run them as different
nodes.
Something like:

infrastructure maintenance runs:
"chef-client -j infra-maint.json -n node-XYZ-infra-maint"
deployment team runs:
“chef-client -j deployment.json -n node-XYZ-deployment”

etc.

Does that help?
-chris
On Wed, Jan 26, 2011 at 1:55 PM, dan@nemecfamily.com wrote:

We have run into an interesting problem. We want to segregate
runlists
by
activity (e.g infrastructure maintenance, deployment, one-off,
etc…).
But we
want all the runlists to share some common role information about a
node. We
have a node that has some roles (datacenter, servergroup, tier)
that
are
important identifiers and drive selection of certain attributes. We
want
different groups to be able to do maintenance on their parts at
different times
without impacting others. So if a sysadmin wants to update
/etc/hosts
he
shouldn’t have to worry if the application team has put in a new
attribute
for a deployment later. The sysadmin can run a runlist that only
affects the
parts of the system he is responsible for without worrying that an
application
deployment recipe will run. Conversely in a software deployment the
deployment
team should be able to update the applications without updating the
operating
system (given the os changes are not part of the software
deployment).

I thought “chef-client -j” would do this, but it didn’t. This is
what
I
did: I created a node and bootstrapped it with a runlist of its
identity roles.
I then made a json file with a runlist for a set of activity and
ran
the
runlist via “chef-client -j ”. The problem is that the
runlist
for the node that existed before chef-client gets wiped out and
only
the
runlist in the json file gets run thus wiping out its “identity”
and
breaking the one-off runlist because certain attributes no longer
exist.

I’d like to be able to append a runlist on the fly to an existing
runlist on
the node where the new runlist exists on the node only for the
duration of the
chef-client run. The node has a “base” runlist that should always
be
run,
but I want to run some other recipes and roles one at a time while
keeping the
“base” runlist. I do not want to have to copy the base runlist into
the
json file of the one-shot runlist that I am running as I’m trying
to
keep the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is
effectively
appended
to the runlist that is already on the node and is removed after the
run?

Thanks,

Dan


#11

To clarify, chef-client in this scenario would be run in the context
of a normal posix user.

The deployment team would log into the machine with their individual
accounts, then invoke chef-client. chef-client’s runlist would do its
thing, writing to directories that the user had group write access to.

If a recipe attempted to modify /etc/hosts or /etc/shadow, it would
fail, throwing a permissions error.

-s

On Tue, Feb 1, 2011 at 5:14 PM, Sean OMeara someara@gmail.com wrote:

Chef (and puppet and cfengine and bcfg2 and…) are really meant to
converge an entire system against a (composed) state description, not
doing “one off” operations.

Keeping that in mind, if you’re absolutely sure that the “one-off” run
list won’t step on the toes of the “normal” runlist, you could try:

Adding an additional Client (SSL user) to your platform account, and
calling chef-client -c /path/to/an/alternate/configfile.rb, that
specifies a special separate node name.

For example: foonode-oneoff.yourdomain.here.

Add your runlist to that node object, then call chef-client against
the special node.

YMMV

-s

On Tue, Feb 1, 2011 at 5:03 PM, Dan Nemec dan@nemecfamily.com wrote:

Matt,
That’s a great idea. I looked it over and I think it does solve the problem
I’m thinking of with a one-shot runlist.
I’m relatively new to Chef so I don’t know everything that is possible and
have a few questions.

  1. Your attribute that contains a list of recipes, can it contain roles with
    runlists as well?
    I’m still left with the problem that I require a “base” runlist as well. The
    way I see your one-shot runlist is that if I want to use it with
    "chef-client -j" then I still need to include all of the roles and recipes I
    consider “base” in the list of the -j option. Your solution just provides me
    the mechanism to attach some recipes (maybe roles) to an existing runlist
    where it will be removed after the run. That is exactly what I need for the
    second half of my problem.
    [I’ll interject here that one of my design goals is that I have
    environment-specific configuration in as few places as possible. Commands,
    especially, cannot be environment dependent. I just want to run “doit” not
    "doit.prod".
  2. Given I have a databag with some configuration that is a runlist that
    can contain roles or recipes (here in this databag is my node-specific
    information, and nowhere else). Is it possible to have a recipe that will
    read from the databag, then construct a runlist on the fly and run that
    runlist?
    That way I can say something like:
    chef-client -j infrastructure.json
    where infrastructure.json looks like
    { “run_list”: [ “recipe[node-manager::base-runlist]”,
    “recipe[one-shot::infrastructure]” ] }
    Then when that runs, it concatenates the runlist from the node databag
    attribute and the runlist from the infrastructure attributes.
    Let me know if you think I’m getting to far out in my quest for base
    runlists and one-shot runlists. I can’t run a monolithic runlist that does
    everything.
    Dan
    On Tue, Feb 1, 2011 at 2:43 PM, Matt Ray matt@opscode.com wrote:

Thinking about this problem, I’ve written a “one-shot” cookbook that
may be used to solve simple cases of this problem.

https://github.com/mattray/cookbooks/tree/master/one-shot

This cookbook provides a framework for making single-use, one-shot
recipes. By including the “one-shot” recipe in the node’s run_list, on
the next chef-client run the contents of the "one-shot::one-shot"
recipe will be called. This is parametrized as an attribute, so you
can change these out by setting the [“one_shot”][“recipe”] to include
different recipes (and uploading dependencies if necessary). The file
roles/one-shot.rb is included so you can simply change the role
instead of changing the source directly.

Thanks,
Matt Ray
Technical Evangelist | Opscode, Inc
E: matt@opscode.com T: (512) 731-2218
Twitter, Github: mattray

On Fri, Jan 28, 2011 at 2:18 PM, Dan Nemec dan@nemecfamily.com wrote:

Sorry for the really long post.
Here is our use case:
I agree that one-off runlists are a component of overall orchestration.
Right now we use Control Tier for orchestration. It can handle the
workflow
[take server out of load, wait for connections to drain, deploy code to
server, run smoke test, put server back in load]. We want to use Chef
for
the “Deploy Code” step. Actually, we plan to use it to deploy
configuration
and all configuration dependencies where Control Tier deploys just the
code.
(We don’t have any Chef implemented, so these are currently only plans.
We
do have Control Tier running and have been using it for over a year
orchestrating deployments).
In our case the thought process is that Control Tier would dispatch a
"chef-client -j " or some such thing to the node that is being
acted upon. We want that runlist to have only what is important to that
activity. For a code deployment the runlist would deploy application
code.
For system updates the runlist would update system things. Any runlist
that
runs on the node is going to need some shared set of attributes on the
node.
We need a whole lifecycle of keeping the node attributes up to date so
that
all the new configuration for the upcoming deployment is loaded prior to
the
deployment.
Answering your second question here, Before we knew all the details
about
Chef, we had the concept of an “attribute runlist” and an “action
runlist"
where the attribute runlist would be one runlist used to manage all node
attributes and would not have any recipes that would actually perform
work
on the node. Then, we would maintain a collection of activity runlists
that
perform sets of system actions relying on the existing attributes on the
node.
Now, we plan on one more variation. I’ll prepend it with
a disclaimer that
we are an “old-school” shop learning new tricks. We have a 10+ year old
code
base and 10 years of process built around the caution that comes from
countless painful deployments. We don’t have the luxury of wiping the
slate
clean so we have to make incremental improvements and build on each
success.
That being said, we plan to “pre-deploy” most of our changes. So, the
day
before the scheduled deployment we plan to lay down all the code and
configuration that is needed for the deployment in a location near the
running code. Then, the deployment becomes more of [Stop, flip links,
update
database, Start]. In this case we would have a runlist that would
pre-deploy
configurations and a separate one that would activate the
configurations.
Let me know if I am unaware of a feature here: Expanding on the notion
of an
"attribute runlist”, node attributes should be persistent feature of a
node.
If I set an attribute that says my administrators email address is
admin@example.com, then I shouldn’t have to have a role in every runlist
to
assure that my admin email is always set. “chef-client -j” is
destructive in
that it only maintains attributes in the runlist that it ran. This doe
create the problem that if you have persistent attributes you need a
method
of removing them. It is a challenging problem when specifying your
attributes to make a process to be able to remove ones you no longer
need.
Chef will provide the way to delete, the user must figure out what to
delete.
Now, to answer your first questions: I do not think that maintaining one
node object per activity set would be practical in the long run.
Dan

On Fri, Jan 28, 2011 at 2:31 PM, Charles Duffy charles@dyfis.net
wrote:

This speaks more to orchestration than to one-off run lists, but let me
comment –
My most interesting workflow I’ve been interesting in modeling is along
the lines of the following:
"If average load across all application servers is less than 1.0, no
more
than 1/5 of all app servers are out of the pool, and this node is
flagged as
having at least one recipe in the pending-downtime list:

  • remove this node from the load balancer’s pool
  • wait for all requests to drain
  • run all recipes in the pending-downtime list, removing each from
    said
    list after successful completion
  • when pending-downtime list is empty, put this server back into the
    pool"
    …where several different recipes have the ability to add their own
    entries to the pending-downtime list (which could be anything from a
    firewall reconfiguration to an application restart to a full-system
    reboot)
    Of course, the "no more than 1/5 of all app servers are out of the
    pool"
    requirement calls for some care to avoid race conditions.
    If y’all are working on an orchestration solution, I would be very
    interested to hear how it addresses this kind of use case.
    On Fri, Jan 28, 2011 at 1:12 PM, Chris Walters cw@opscode.com wrote:

Dan,
Absolutely. One-off run lists are one of the most requested features.
They also fit into some of the preliminary discussions we’ve had about
orchestration models. We plan to get a design together for one-off run
lists
in the next few weeks to share with the community for feedback.
If you’re willing to comment on your use case more, here are a few
questions that I have.
For your use case, does the multi-node solution with a shared base run
list work, or do you actually need to have only one node object for
the
purpose of searching?
Should run lists be first-class objects instead of just properties on
nodes and roles? Should they be able to contain not only roles and
recipes
but run list-containing entities (nodes and other dis-embodied run
lists),
as well?
If anyone else has opinions on any aspect of one-off run lists, please
respond, as well.
Thank you for your input.
-chris

On Fri, Jan 28, 2011 at 7:31 AM, Dan Nemec dan@nemecfamily.com
wrote:

So, the obligatory next questions is:
"Is this anywhere on the roadmap?"
Thanks for the suggestion about multiple nodes. We’ll play with that
and
see if it may be a workable, but not ideal solution.
Dan

On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters cw@opscode.com
wrote:

Hi Dan,
There isn’t currently a way that I can think of to run one run list
after another except to package up the main run list into a role and
prepend
that role to the one-off run list’s items.
As for one-off run lists, there isn’t currently a built-in solution.
Since a single server can be managed by many chef nodes, one way to
do it is
to have different JSON files like you do, but run them as different
nodes.
Something like:

infrastructure maintenance runs:
"chef-client -j infra-maint.json -n node-XYZ-infra-maint"
deployment team runs:
“chef-client -j deployment.json -n node-XYZ-deployment”

etc.

Does that help?
-chris
On Wed, Jan 26, 2011 at 1:55 PM, dan@nemecfamily.com wrote:

We have run into an interesting problem. We want to segregate
runlists
by
activity (e.g infrastructure maintenance, deployment, one-off,
etc…).
But we
want all the runlists to share some common role information about a
node. We
have a node that has some roles (datacenter, servergroup, tier)
that
are
important identifiers and drive selection of certain attributes. We
want
different groups to be able to do maintenance on their parts at
different times
without impacting others. So if a sysadmin wants to update
/etc/hosts
he
shouldn’t have to worry if the application team has put in a new
attribute
for a deployment later. The sysadmin can run a runlist that only
affects the
parts of the system he is responsible for without worrying that an
application
deployment recipe will run. Conversely in a software deployment the
deployment
team should be able to update the applications without updating the
operating
system (given the os changes are not part of the software
deployment).

I thought “chef-client -j” would do this, but it didn’t. This is
what
I
did: I created a node and bootstrapped it with a runlist of its
identity roles.
I then made a json file with a runlist for a set of activity and
ran
the
runlist via “chef-client -j ”. The problem is that the
runlist
for the node that existed before chef-client gets wiped out and
only
the
runlist in the json file gets run thus wiping out its “identity”
and
breaking the one-off runlist because certain attributes no longer
exist.

I’d like to be able to append a runlist on the fly to an existing
runlist on
the node where the new runlist exists on the node only for the
duration of the
chef-client run. The node has a “base” runlist that should always
be
run,
but I want to run some other recipes and roles one at a time while
keeping the
“base” runlist. I do not want to have to copy the base runlist into
the
json file of the one-shot runlist that I am running as I’m trying
to
keep the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is
effectively
appended
to the runlist that is already on the node and is removed after the
run?

Thanks,

Dan


#12

The attribute only adds a recipe to include, you could modify it to
handle an array if you really wanted.
https://github.com/mattray/cookbooks/blob/master/one-shot/recipes/default.rb#L23

If you really want a substantial one-shot run_list, you could modify
the cookbook to remove a role instead, and include the modified
one-shot cookbook in your role to be removed.
https://github.com/mattray/cookbooks/blob/master/one-shot/recipes/default.rb#L28

Thanks,
Matt Ray
Technical Evangelist | Opscode, Inc
E: matt@opscode.com T: (512) 731-2218
Twitter, Github: mattray

On Tue, Feb 1, 2011 at 4:03 PM, Dan Nemec dan@nemecfamily.com wrote:

Matt,
That’s a great idea. I looked it over and I think it does solve the problem
I’m thinking of with a one-shot runlist.
I’m relatively new to Chef so I don’t know everything that is possible and
have a few questions.

  1. Your attribute that contains a list of recipes, can it contain roles with
    runlists as well?
    I’m still left with the problem that I require a “base” runlist as well. The
    way I see your one-shot runlist is that if I want to use it with
    "chef-client -j" then I still need to include all of the roles and recipes I
    consider “base” in the list of the -j option. Your solution just provides me
    the mechanism to attach some recipes (maybe roles) to an existing runlist
    where it will be removed after the run. That is exactly what I need for the
    second half of my problem.
    [I’ll interject here that one of my design goals is that I have
    environment-specific configuration in as few places as possible. Commands,
    especially, cannot be environment dependent. I just want to run “doit” not
    "doit.prod".

#13

Sean,

You bring up a good point. As a new Chef user in a 10 year old environment
it’s hard to change the way you think and go from a "discreet unit of work"
mentality to a “state” mentality. Our whole operations staff (myself
included) has always been in the world of “I file a change request for a
discreet piece of work then perform that one piece of work”. Where a piece
of work is something like updating the configurations of one application.
We’ve always thought of it that way and have a hard time with the
possibility of “bringing the system to a new state” might
have unforeseen consequences. The risk in the latter case is that if you
want to make one change, you must make sure that no one else has changed any
other part of the configuration or you might accidentally make a change you
didn’t intend. In the first case I know only one thing can change, in the
latter I have to do some extra work to make sure only one thing is going to
change.

Instead of calling it a one-shot runlist, it is more a discreet-work
runlist. I know Chef wasn’t designed for it. I’m wondering if it should
support it better or if I should change the way I think about configuration
management.

Am I being too paranoid? Who is using Chef with a team of 10+ admins with
long runlists run for every change? Do you get to the point where your code
and your processes are good enough that you can rely on knowing that only
intended changes are applied only at the time you intend them? How would an
ITIL-type shop use Chef with their change control. Are there any case
studies published?

Thanks,

Dan

On Tue, Feb 1, 2011 at 5:25 PM, Sean OMeara someara@gmail.com wrote:

To clarify, chef-client in this scenario would be run in the context
of a normal posix user.

The deployment team would log into the machine with their individual
accounts, then invoke chef-client. chef-client’s runlist would do its
thing, writing to directories that the user had group write access to.

If a recipe attempted to modify /etc/hosts or /etc/shadow, it would
fail, throwing a permissions error.

-s

On Tue, Feb 1, 2011 at 5:14 PM, Sean OMeara someara@gmail.com wrote:

Chef (and puppet and cfengine and bcfg2 and…) are really meant to
converge an entire system against a (composed) state description, not
doing “one off” operations.

Keeping that in mind, if you’re absolutely sure that the “one-off” run
list won’t step on the toes of the “normal” runlist, you could try:

Adding an additional Client (SSL user) to your platform account, and
calling chef-client -c /path/to/an/alternate/configfile.rb, that
specifies a special separate node name.

For example: foonode-oneoff.yourdomain.here.

Add your runlist to that node object, then call chef-client against
the special node.

YMMV

-s

On Tue, Feb 1, 2011 at 5:03 PM, Dan Nemec dan@nemecfamily.com wrote:

Matt,
That’s a great idea. I looked it over and I think it does solve the
problem

I’m thinking of with a one-shot runlist.
I’m relatively new to Chef so I don’t know everything that is possible
and

have a few questions.

  1. Your attribute that contains a list of recipes, can it contain roles
    with

runlists as well?
I’m still left with the problem that I require a “base” runlist as well.
The

way I see your one-shot runlist is that if I want to use it with
"chef-client -j" then I still need to include all of the roles and
recipes I

consider “base” in the list of the -j option. Your solution just
provides me

the mechanism to attach some recipes (maybe roles) to an existing
runlist

where it will be removed after the run. That is exactly what I need for
the

second half of my problem.
[I’ll interject here that one of my design goals is that I have
environment-specific configuration in as few places as possible.
Commands,

especially, cannot be environment dependent. I just want to run "doit"
not

“doit.prod”.
2) Given I have a databag with some configuration that is a runlist
that

can contain roles or recipes (here in this databag is my node-specific
information, and nowhere else). Is it possible to have a recipe that
will

read from the databag, then construct a runlist on the fly and run that
runlist?
That way I can say something like:
chef-client -j infrastructure.json
where infrastructure.json looks like
{ “run_list”: [ “recipe[node-manager::base-runlist]”,
“recipe[one-shot::infrastructure]” ] }
Then when that runs, it concatenates the runlist from the node databag
attribute and the runlist from the infrastructure attributes.
Let me know if you think I’m getting to far out in my quest for base
runlists and one-shot runlists. I can’t run a monolithic runlist that
does

everything.
Dan
On Tue, Feb 1, 2011 at 2:43 PM, Matt Ray matt@opscode.com wrote:

Thinking about this problem, I’ve written a “one-shot” cookbook that
may be used to solve simple cases of this problem.

https://github.com/mattray/cookbooks/tree/master/one-shot

This cookbook provides a framework for making single-use, one-shot
recipes. By including the “one-shot” recipe in the node’s run_list, on
the next chef-client run the contents of the "one-shot::one-shot"
recipe will be called. This is parametrized as an attribute, so you
can change these out by setting the [“one_shot”][“recipe”] to include
different recipes (and uploading dependencies if necessary). The file
roles/one-shot.rb is included so you can simply change the role
instead of changing the source directly.

Thanks,
Matt Ray
Technical Evangelist | Opscode, Inc
E: matt@opscode.com T: (512) 731-2218
Twitter, Github: mattray

On Fri, Jan 28, 2011 at 2:18 PM, Dan Nemec dan@nemecfamily.com
wrote:

Sorry for the really long post.
Here is our use case:
I agree that one-off runlists are a component of overall
orchestration.

Right now we use Control Tier for orchestration. It can handle the
workflow
[take server out of load, wait for connections to drain, deploy code
to

server, run smoke test, put server back in load]. We want to use Chef
for
the “Deploy Code” step. Actually, we plan to use it to deploy
configuration
and all configuration dependencies where Control Tier deploys just
the

code.
(We don’t have any Chef implemented, so these are currently only
plans.

We
do have Control Tier running and have been using it for over a year
orchestrating deployments).
In our case the thought process is that Control Tier would dispatch a
"chef-client -j " or some such thing to the node that is
being

acted upon. We want that runlist to have only what is important to
that

activity. For a code deployment the runlist would deploy application
code.
For system updates the runlist would update system things. Any
runlist

that
runs on the node is going to need some shared set of attributes on
the

node.
We need a whole lifecycle of keeping the node attributes up to date
so

that
all the new configuration for the upcoming deployment is loaded prior
to

the
deployment.
Answering your second question here, Before we knew all the details
about
Chef, we had the concept of an “attribute runlist” and an "action
runlist"
where the attribute runlist would be one runlist used to manage all
node

attributes and would not have any recipes that would actually perform
work
on the node. Then, we would maintain a collection of activity
runlists

that
perform sets of system actions relying on the existing attributes on
the

node.
Now, we plan on one more variation. I’ll prepend it with
a disclaimer that
we are an “old-school” shop learning new tricks. We have a 10+ year
old

code
base and 10 years of process built around the caution that comes from
countless painful deployments. We don’t have the luxury of wiping the
slate
clean so we have to make incremental improvements and build on each
success.
That being said, we plan to “pre-deploy” most of our changes. So, the
day
before the scheduled deployment we plan to lay down all the code and
configuration that is needed for the deployment in a location near
the

running code. Then, the deployment becomes more of [Stop, flip links,
update
database, Start]. In this case we would have a runlist that would
pre-deploy
configurations and a separate one that would activate the
configurations.
Let me know if I am unaware of a feature here: Expanding on the
notion

of an
"attribute runlist", node attributes should be persistent feature of
a

node.
If I set an attribute that says my administrators email address is
admin@example.com, then I shouldn’t have to have a role in every
runlist

to
assure that my admin email is always set. “chef-client -j” is
destructive in
that it only maintains attributes in the runlist that it ran. This
doe

create the problem that if you have persistent attributes you need a
method
of removing them. It is a challenging problem when specifying your
attributes to make a process to be able to remove ones you no longer
need.
Chef will provide the way to delete, the user must figure out what to
delete.
Now, to answer your first questions: I do not think that maintaining
one

node object per activity set would be practical in the long run.
Dan

On Fri, Jan 28, 2011 at 2:31 PM, Charles Duffy charles@dyfis.net
wrote:

This speaks more to orchestration than to one-off run lists, but let
me

comment –
My most interesting workflow I’ve been interesting in modeling is
along

the lines of the following:
"If average load across all application servers is less than 1.0, no
more
than 1/5 of all app servers are out of the pool, and this node is
flagged as
having at least one recipe in the pending-downtime list:

  • remove this node from the load balancer’s pool
  • wait for all requests to drain
  • run all recipes in the pending-downtime list, removing each from
    said
    list after successful completion
  • when pending-downtime list is empty, put this server back into
    the

pool"
…where several different recipes have the ability to add their own
entries to the pending-downtime list (which could be anything from a
firewall reconfiguration to an application restart to a full-system
reboot)
Of course, the "no more than 1/5 of all app servers are out of the
pool"
requirement calls for some care to avoid race conditions.
If y’all are working on an orchestration solution, I would be very
interested to hear how it addresses this kind of use case.
On Fri, Jan 28, 2011 at 1:12 PM, Chris Walters cw@opscode.com
wrote:

Dan,
Absolutely. One-off run lists are one of the most requested
features.

They also fit into some of the preliminary discussions we’ve had
about

orchestration models. We plan to get a design together for one-off
run

lists
in the next few weeks to share with the community for feedback.
If you’re willing to comment on your use case more, here are a few
questions that I have.
For your use case, does the multi-node solution with a shared base
run

list work, or do you actually need to have only one node object for
the
purpose of searching?
Should run lists be first-class objects instead of just properties
on

nodes and roles? Should they be able to contain not only roles and
recipes
but run list-containing entities (nodes and other dis-embodied run
lists),
as well?
If anyone else has opinions on any aspect of one-off run lists,
please

respond, as well.
Thank you for your input.
-chris

On Fri, Jan 28, 2011 at 7:31 AM, Dan Nemec dan@nemecfamily.com
wrote:

So, the obligatory next questions is:
"Is this anywhere on the roadmap?"
Thanks for the suggestion about multiple nodes. We’ll play with
that

and
see if it may be a workable, but not ideal solution.
Dan

On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters cw@opscode.com
wrote:

Hi Dan,
There isn’t currently a way that I can think of to run one run
list

after another except to package up the main run list into a role
and

prepend
that role to the one-off run list’s items.
As for one-off run lists, there isn’t currently a built-in
solution.

Since a single server can be managed by many chef nodes, one way
to

do it is
to have different JSON files like you do, but run them as
different

nodes.
Something like:

infrastructure maintenance runs:
"chef-client -j infra-maint.json -n node-XYZ-infra-maint"
deployment team runs:
“chef-client -j deployment.json -n node-XYZ-deployment”

etc.

Does that help?
-chris
On Wed, Jan 26, 2011 at 1:55 PM, dan@nemecfamily.com wrote:

We have run into an interesting problem. We want to segregate
runlists
by
activity (e.g infrastructure maintenance, deployment, one-off,
etc…).
But we
want all the runlists to share some common role information
about a

node. We
have a node that has some roles (datacenter, servergroup, tier)
that
are
important identifiers and drive selection of certain attributes.
We

want
different groups to be able to do maintenance on their parts at
different times
without impacting others. So if a sysadmin wants to update
/etc/hosts
he
shouldn’t have to worry if the application team has put in a new
attribute
for a deployment later. The sysadmin can run a runlist that only
affects the
parts of the system he is responsible for without worrying that
an

application
deployment recipe will run. Conversely in a software deployment
the

deployment
team should be able to update the applications without updating
the

operating
system (given the os changes are not part of the software
deployment).

I thought “chef-client -j” would do this, but it didn’t. This is
what
I
did: I created a node and bootstrapped it with a runlist of its
identity roles.
I then made a json file with a runlist for a set of activity
and

ran
the
runlist via “chef-client -j ”. The problem is that the
runlist
for the node that existed before chef-client gets wiped out and
only
the
runlist in the json file gets run thus wiping out its “identity”
and
breaking the one-off runlist because certain attributes no
longer

exist.

I’d like to be able to append a runlist on the fly to an
existing

runlist on
the node where the new runlist exists on the node only for the
duration of the
chef-client run. The node has a “base” runlist that should
always

be
run,
but I want to run some other recipes and roles one at a time
while

keeping the
“base” runlist. I do not want to have to copy the base runlist
into

the
json file of the one-shot runlist that I am running as I’m
trying

to
keep the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is
effectively
appended
to the runlist that is already on the node and is removed after
the

run?

Thanks,

Dan


#14

Dan,
Have you looked into RunDeck [0]? The description from the project site
says it all (it even uses the word ad-hoc!):

“RunDeck is cross-platform open source software that helps you automate
ad-hoc and routine procedures in data center or cloud environments. RunDeck
allows you to run tasks on any number of nodes from a web-based or
command-line interface. RunDeck also includes other features that make it
easy to scale up your scripting efforts including: access control, workflow
building, scheduling, logging, and integration with external sources for
node and option data.”

RunDeck is built on the ControlTier core and is better suited for
complicated orchestration use cases such as yours. Adam has already written
a lightweight service [1] that exposes Chef server node data in a format the
RunDeck server can consume. Basically this means you can write RunDeck jobs
that are Chef aware and executed across a Chef run infrastructure. It seems
with your company’s experience with ControlTier the transition to Rundeck
should be easy.

One thing to keep in mind is Chef aims to provide you with a set of
primitives to model your infrastructure…it doesn’t try to model the world.
We are working on orchestration right now (see the email to the community
list from Chris Walters), but still believe integrating with products that
are built to do orchestration might be a good solution for many people.

Seth


Opscode, Inc.
Seth Chisamore, Technical Evangelist
T: (404) 348-0505 E: schisamo@opscode.com
Twitter, IRC, Github: schisamo

[0] http://rundeck.org/
[1] https://github.com/opscode/chef-rundeck

On Wed, Feb 2, 2011 at 9:08 AM, Dan Nemec dan@nemecfamily.com wrote:

Sean,

You bring up a good point. As a new Chef user in a 10 year old environment
it’s hard to change the way you think and go from a "discreet unit of work"
mentality to a “state” mentality. Our whole operations staff (myself
included) has always been in the world of “I file a change request for a
discreet piece of work then perform that one piece of work”. Where a piece
of work is something like updating the configurations of one application.
We’ve always thought of it that way and have a hard time with the
possibility of “bringing the system to a new state” might
have unforeseen consequences. The risk in the latter case is that if you
want to make one change, you must make sure that no one else has changed any
other part of the configuration or you might accidentally make a change you
didn’t intend. In the first case I know only one thing can change, in the
latter I have to do some extra work to make sure only one thing is going to
change.

Instead of calling it a one-shot runlist, it is more a discreet-work
runlist. I know Chef wasn’t designed for it. I’m wondering if it should
support it better or if I should change the way I think about configuration
management.

Am I being too paranoid? Who is using Chef with a team of 10+ admins with
long runlists run for every change? Do you get to the point where your code
and your processes are good enough that you can rely on knowing that only
intended changes are applied only at the time you intend them? How would an
ITIL-type shop use Chef with their change control. Are there any case
studies published?

Thanks,

Dan

On Tue, Feb 1, 2011 at 5:25 PM, Sean OMeara someara@gmail.com wrote:

To clarify, chef-client in this scenario would be run in the context
of a normal posix user.

The deployment team would log into the machine with their individual
accounts, then invoke chef-client. chef-client’s runlist would do its
thing, writing to directories that the user had group write access to.

If a recipe attempted to modify /etc/hosts or /etc/shadow, it would
fail, throwing a permissions error.

-s

On Tue, Feb 1, 2011 at 5:14 PM, Sean OMeara someara@gmail.com wrote:

Chef (and puppet and cfengine and bcfg2 and…) are really meant to
converge an entire system against a (composed) state description, not
doing “one off” operations.

Keeping that in mind, if you’re absolutely sure that the “one-off” run
list won’t step on the toes of the “normal” runlist, you could try:

Adding an additional Client (SSL user) to your platform account, and
calling chef-client -c /path/to/an/alternate/configfile.rb, that
specifies a special separate node name.

For example: foonode-oneoff.yourdomain.here.

Add your runlist to that node object, then call chef-client against
the special node.

YMMV

-s

On Tue, Feb 1, 2011 at 5:03 PM, Dan Nemec dan@nemecfamily.com wrote:

Matt,
That’s a great idea. I looked it over and I think it does solve the
problem

I’m thinking of with a one-shot runlist.
I’m relatively new to Chef so I don’t know everything that is possible
and

have a few questions.

  1. Your attribute that contains a list of recipes, can it contain roles
    with

runlists as well?
I’m still left with the problem that I require a “base” runlist as
well. The

way I see your one-shot runlist is that if I want to use it with
"chef-client -j" then I still need to include all of the roles and
recipes I

consider “base” in the list of the -j option. Your solution just
provides me

the mechanism to attach some recipes (maybe roles) to an existing
runlist

where it will be removed after the run. That is exactly what I need for
the

second half of my problem.
[I’ll interject here that one of my design goals is that I have
environment-specific configuration in as few places as possible.
Commands,

especially, cannot be environment dependent. I just want to run "doit"
not

“doit.prod”.
2) Given I have a databag with some configuration that is a runlist
that

can contain roles or recipes (here in this databag is my node-specific
information, and nowhere else). Is it possible to have a recipe that
will

read from the databag, then construct a runlist on the fly and run that
runlist?
That way I can say something like:
chef-client -j infrastructure.json
where infrastructure.json looks like
{ “run_list”: [ “recipe[node-manager::base-runlist]”,
“recipe[one-shot::infrastructure]” ] }
Then when that runs, it concatenates the runlist from the node databag
attribute and the runlist from the infrastructure attributes.
Let me know if you think I’m getting to far out in my quest for base
runlists and one-shot runlists. I can’t run a monolithic runlist that
does

everything.
Dan
On Tue, Feb 1, 2011 at 2:43 PM, Matt Ray matt@opscode.com wrote:

Thinking about this problem, I’ve written a “one-shot” cookbook that
may be used to solve simple cases of this problem.

https://github.com/mattray/cookbooks/tree/master/one-shot

This cookbook provides a framework for making single-use, one-shot
recipes. By including the “one-shot” recipe in the node’s run_list, on
the next chef-client run the contents of the "one-shot::one-shot"
recipe will be called. This is parametrized as an attribute, so you
can change these out by setting the [“one_shot”][“recipe”] to include
different recipes (and uploading dependencies if necessary). The file
roles/one-shot.rb is included so you can simply change the role
instead of changing the source directly.

Thanks,
Matt Ray
Technical Evangelist | Opscode, Inc
E: matt@opscode.com T: (512) 731-2218
Twitter, Github: mattray

On Fri, Jan 28, 2011 at 2:18 PM, Dan Nemec dan@nemecfamily.com
wrote:

Sorry for the really long post.
Here is our use case:
I agree that one-off runlists are a component of overall
orchestration.

Right now we use Control Tier for orchestration. It can handle the
workflow
[take server out of load, wait for connections to drain, deploy code
to

server, run smoke test, put server back in load]. We want to use
Chef

for
the “Deploy Code” step. Actually, we plan to use it to deploy
configuration
and all configuration dependencies where Control Tier deploys just
the

code.
(We don’t have any Chef implemented, so these are currently only
plans.

We
do have Control Tier running and have been using it for over a year
orchestrating deployments).
In our case the thought process is that Control Tier would dispatch
a

"chef-client -j " or some such thing to the node that is
being

acted upon. We want that runlist to have only what is important to
that

activity. For a code deployment the runlist would deploy application
code.
For system updates the runlist would update system things. Any
runlist

that
runs on the node is going to need some shared set of attributes on
the

node.
We need a whole lifecycle of keeping the node attributes up to date
so

that
all the new configuration for the upcoming deployment is loaded
prior to

the
deployment.
Answering your second question here, Before we knew all the details
about
Chef, we had the concept of an “attribute runlist” and an "action
runlist"
where the attribute runlist would be one runlist used to manage all
node

attributes and would not have any recipes that would actually
perform

work
on the node. Then, we would maintain a collection of activity
runlists

that
perform sets of system actions relying on the existing attributes on
the

node.
Now, we plan on one more variation. I’ll prepend it with
a disclaimer that
we are an “old-school” shop learning new tricks. We have a 10+ year
old

code
base and 10 years of process built around the caution that comes
from

countless painful deployments. We don’t have the luxury of wiping
the

slate
clean so we have to make incremental improvements and build on each
success.
That being said, we plan to “pre-deploy” most of our changes. So,
the

day
before the scheduled deployment we plan to lay down all the code and
configuration that is needed for the deployment in a location near
the

running code. Then, the deployment becomes more of [Stop, flip
links,

update
database, Start]. In this case we would have a runlist that would
pre-deploy
configurations and a separate one that would activate the
configurations.
Let me know if I am unaware of a feature here: Expanding on the
notion

of an
"attribute runlist", node attributes should be persistent feature of
a

node.
If I set an attribute that says my administrators email address is
admin@example.com, then I shouldn’t have to have a role in every
runlist

to
assure that my admin email is always set. “chef-client -j” is
destructive in
that it only maintains attributes in the runlist that it ran. This
doe

create the problem that if you have persistent attributes you need a
method
of removing them. It is a challenging problem when specifying your
attributes to make a process to be able to remove ones you no longer
need.
Chef will provide the way to delete, the user must figure out what
to

delete.
Now, to answer your first questions: I do not think that maintaining
one

node object per activity set would be practical in the long run.
Dan

On Fri, Jan 28, 2011 at 2:31 PM, Charles Duffy charles@dyfis.net
wrote:

This speaks more to orchestration than to one-off run lists, but
let me

comment –
My most interesting workflow I’ve been interesting in modeling is
along

the lines of the following:
"If average load across all application servers is less than 1.0,
no

more
than 1/5 of all app servers are out of the pool, and this node is
flagged as
having at least one recipe in the pending-downtime list:

  • remove this node from the load balancer’s pool
  • wait for all requests to drain
  • run all recipes in the pending-downtime list, removing each from
    said
    list after successful completion
  • when pending-downtime list is empty, put this server back into
    the

pool"
…where several different recipes have the ability to add their
own

entries to the pending-downtime list (which could be anything from
a

firewall reconfiguration to an application restart to a full-system
reboot)
Of course, the "no more than 1/5 of all app servers are out of the
pool"
requirement calls for some care to avoid race conditions.
If y’all are working on an orchestration solution, I would be very
interested to hear how it addresses this kind of use case.
On Fri, Jan 28, 2011 at 1:12 PM, Chris Walters cw@opscode.com
wrote:

Dan,
Absolutely. One-off run lists are one of the most requested
features.

They also fit into some of the preliminary discussions we’ve had
about

orchestration models. We plan to get a design together for one-off
run

lists
in the next few weeks to share with the community for feedback.
If you’re willing to comment on your use case more, here are a few
questions that I have.
For your use case, does the multi-node solution with a shared base
run

list work, or do you actually need to have only one node object
for

the
purpose of searching?
Should run lists be first-class objects instead of just properties
on

nodes and roles? Should they be able to contain not only roles and
recipes
but run list-containing entities (nodes and other dis-embodied run
lists),
as well?
If anyone else has opinions on any aspect of one-off run lists,
please

respond, as well.
Thank you for your input.
-chris

On Fri, Jan 28, 2011 at 7:31 AM, Dan Nemec dan@nemecfamily.com
wrote:

So, the obligatory next questions is:
"Is this anywhere on the roadmap?"
Thanks for the suggestion about multiple nodes. We’ll play with
that

and
see if it may be a workable, but not ideal solution.
Dan

On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters cw@opscode.com
wrote:

Hi Dan,
There isn’t currently a way that I can think of to run one run
list

after another except to package up the main run list into a role
and

prepend
that role to the one-off run list’s items.
As for one-off run lists, there isn’t currently a built-in
solution.

Since a single server can be managed by many chef nodes, one way
to

do it is
to have different JSON files like you do, but run them as
different

nodes.
Something like:

infrastructure maintenance runs:
"chef-client -j infra-maint.json -n node-XYZ-infra-maint"
deployment team runs:
“chef-client -j deployment.json -n node-XYZ-deployment”

etc.

Does that help?
-chris
On Wed, Jan 26, 2011 at 1:55 PM, dan@nemecfamily.com wrote:

We have run into an interesting problem. We want to segregate
runlists
by
activity (e.g infrastructure maintenance, deployment, one-off,
etc…).
But we
want all the runlists to share some common role information
about a

node. We
have a node that has some roles (datacenter, servergroup, tier)
that
are
important identifiers and drive selection of certain
attributes. We

want
different groups to be able to do maintenance on their parts at
different times
without impacting others. So if a sysadmin wants to update
/etc/hosts
he
shouldn’t have to worry if the application team has put in a
new

attribute
for a deployment later. The sysadmin can run a runlist that
only

affects the
parts of the system he is responsible for without worrying that
an

application
deployment recipe will run. Conversely in a software deployment
the

deployment
team should be able to update the applications without updating
the

operating
system (given the os changes are not part of the software
deployment).

I thought “chef-client -j” would do this, but it didn’t. This
is

what
I
did: I created a node and bootstrapped it with a runlist of its
identity roles.
I then made a json file with a runlist for a set of activity
and

ran
the
runlist via “chef-client -j ”. The problem is that
the

runlist
for the node that existed before chef-client gets wiped out and
only
the
runlist in the json file gets run thus wiping out its
“identity”

and
breaking the one-off runlist because certain attributes no
longer

exist.

I’d like to be able to append a runlist on the fly to an
existing

runlist on
the node where the new runlist exists on the node only for the
duration of the
chef-client run. The node has a “base” runlist that should
always

be
run,
but I want to run some other recipes and roles one at a time
while

keeping the
“base” runlist. I do not want to have to copy the base runlist
into

the
json file of the one-shot runlist that I am running as I’m
trying

to
keep the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is
effectively
appended
to the runlist that is already on the node and is removed after
the

run?

Thanks,

Dan


#15

Howdy, Dan –

Before my present venture I was part of Dell’s Software-as-a-Service group.
We were using Puppet rather than Chef (an experience which led to my status
as an avowed Chef user today), but we did (eventually) get to where, in
response to a ticket, we would update the puppet manifest, commit that
update to version control tagged by the change request, and apply it
globally. And yes, our operations group was well over 10 people (though
devops was much smaller), and the corpus of code we were running was huge
(we were the impetus for a number of refinements in topological sort used to
order operations, and one of the first users I’m aware of to restructure
their Puppet deployment to prevent compilation from occurring on the
puppetmaster while still reporting centrally).

That said, we had a lengthy transition period in which puppet was, by
default, configured to run in no-op mode (only reporting the changes it
would make but not actually impacting servers’ configuration), allowing
the generated report to be inspected for unforseen consequences; also, we
had systems running Puppet in no-op mode in the background, and alerting
when it was detected that a nonzero number of changes would be made if the
desired configuration was applied. Frankly, once we stopped making one-off
changes through means other than puppet, doing runs at-will stopped being
problematic – except for situations such as downed nodes not having been
present for a change (but genuinely needing it), which is exactly the kind
of case where Puppet or Chef shines.

I haven’t looked into whether Chef has such a noop mode, but it certainly
did a fair bit to ease the minds of folks uncomfortable with breaking from
tradition.

By the way – I don’t know if we have anything published, but the lead of
our devops team, Deepak Giridharagopal, presented at PuppetCamp on several
occasions; if you can find those sessions recorded, they might be of
interest.

On Wed, Feb 2, 2011 at 8:08 AM, Dan Nemec dan@nemecfamily.com wrote:

Sean,

You bring up a good point. As a new Chef user in a 10 year old environment
it’s hard to change the way you think and go from a "discreet unit of work"
mentality to a “state” mentality. Our whole operations staff (myself
included) has always been in the world of “I file a change request for a
discreet piece of work then perform that one piece of work”. Where a piece
of work is something like updating the configurations of one application.
We’ve always thought of it that way and have a hard time with the
possibility of “bringing the system to a new state” might
have unforeseen consequences. The risk in the latter case is that if you
want to make one change, you must make sure that no one else has changed any
other part of the configuration or you might accidentally make a change you
didn’t intend. In the first case I know only one thing can change, in the
latter I have to do some extra work to make sure only one thing is going to
change.

Instead of calling it a one-shot runlist, it is more a discreet-work
runlist. I know Chef wasn’t designed for it. I’m wondering if it should
support it better or if I should change the way I think about configuration
management.

Am I being too paranoid? Who is using Chef with a team of 10+ admins with
long runlists run for every change? Do you get to the point where your code
and your processes are good enough that you can rely on knowing that only
intended changes are applied only at the time you intend them? How would an
ITIL-type shop use Chef with their change control. Are there any case
studies published?

Thanks,

Dan

On Tue, Feb 1, 2011 at 5:25 PM, Sean OMeara someara@gmail.com wrote:

To clarify, chef-client in this scenario would be run in the context
of a normal posix user.

The deployment team would log into the machine with their individual
accounts, then invoke chef-client. chef-client’s runlist would do its
thing, writing to directories that the user had group write access to.

If a recipe attempted to modify /etc/hosts or /etc/shadow, it would
fail, throwing a permissions error.

-s

On Tue, Feb 1, 2011 at 5:14 PM, Sean OMeara someara@gmail.com wrote:

Chef (and puppet and cfengine and bcfg2 and…) are really meant to
converge an entire system against a (composed) state description, not
doing “one off” operations.

Keeping that in mind, if you’re absolutely sure that the “one-off” run
list won’t step on the toes of the “normal” runlist, you could try:

Adding an additional Client (SSL user) to your platform account, and
calling chef-client -c /path/to/an/alternate/configfile.rb, that
specifies a special separate node name.

For example: foonode-oneoff.yourdomain.here.

Add your runlist to that node object, then call chef-client against
the special node.

YMMV

-s

On Tue, Feb 1, 2011 at 5:03 PM, Dan Nemec dan@nemecfamily.com wrote:

Matt,
That’s a great idea. I looked it over and I think it does solve the
problem

I’m thinking of with a one-shot runlist.
I’m relatively new to Chef so I don’t know everything that is possible
and

have a few questions.

  1. Your attribute that contains a list of recipes, can it contain roles
    with

runlists as well?
I’m still left with the problem that I require a “base” runlist as
well. The

way I see your one-shot runlist is that if I want to use it with
"chef-client -j" then I still need to include all of the roles and
recipes I

consider “base” in the list of the -j option. Your solution just
provides me

the mechanism to attach some recipes (maybe roles) to an existing
runlist

where it will be removed after the run. That is exactly what I need for
the

second half of my problem.
[I’ll interject here that one of my design goals is that I have
environment-specific configuration in as few places as possible.
Commands,

especially, cannot be environment dependent. I just want to run "doit"
not

“doit.prod”.
2) Given I have a databag with some configuration that is a runlist
that

can contain roles or recipes (here in this databag is my node-specific
information, and nowhere else). Is it possible to have a recipe that
will

read from the databag, then construct a runlist on the fly and run that
runlist?
That way I can say something like:
chef-client -j infrastructure.json
where infrastructure.json looks like
{ “run_list”: [ “recipe[node-manager::base-runlist]”,
“recipe[one-shot::infrastructure]” ] }
Then when that runs, it concatenates the runlist from the node databag
attribute and the runlist from the infrastructure attributes.
Let me know if you think I’m getting to far out in my quest for base
runlists and one-shot runlists. I can’t run a monolithic runlist that
does

everything.
Dan
On Tue, Feb 1, 2011 at 2:43 PM, Matt Ray matt@opscode.com wrote:

Thinking about this problem, I’ve written a “one-shot” cookbook that
may be used to solve simple cases of this problem.

https://github.com/mattray/cookbooks/tree/master/one-shot

This cookbook provides a framework for making single-use, one-shot
recipes. By including the “one-shot” recipe in the node’s run_list, on
the next chef-client run the contents of the "one-shot::one-shot"
recipe will be called. This is parametrized as an attribute, so you
can change these out by setting the [“one_shot”][“recipe”] to include
different recipes (and uploading dependencies if necessary). The file
roles/one-shot.rb is included so you can simply change the role
instead of changing the source directly.

Thanks,
Matt Ray
Technical Evangelist | Opscode, Inc
E: matt@opscode.com T: (512) 731-2218
Twitter, Github: mattray

On Fri, Jan 28, 2011 at 2:18 PM, Dan Nemec dan@nemecfamily.com
wrote:

Sorry for the really long post.
Here is our use case:
I agree that one-off runlists are a component of overall
orchestration.

Right now we use Control Tier for orchestration. It can handle the
workflow
[take server out of load, wait for connections to drain, deploy code
to

server, run smoke test, put server back in load]. We want to use
Chef

for
the “Deploy Code” step. Actually, we plan to use it to deploy
configuration
and all configuration dependencies where Control Tier deploys just
the

code.
(We don’t have any Chef implemented, so these are currently only
plans.

We
do have Control Tier running and have been using it for over a year
orchestrating deployments).
In our case the thought process is that Control Tier would dispatch
a

"chef-client -j " or some such thing to the node that is
being

acted upon. We want that runlist to have only what is important to
that

activity. For a code deployment the runlist would deploy application
code.
For system updates the runlist would update system things. Any
runlist

that
runs on the node is going to need some shared set of attributes on
the

node.
We need a whole lifecycle of keeping the node attributes up to date
so

that
all the new configuration for the upcoming deployment is loaded
prior to

the
deployment.
Answering your second question here, Before we knew all the details
about
Chef, we had the concept of an “attribute runlist” and an "action
runlist"
where the attribute runlist would be one runlist used to manage all
node

attributes and would not have any recipes that would actually
perform

work
on the node. Then, we would maintain a collection of activity
runlists

that
perform sets of system actions relying on the existing attributes on
the

node.
Now, we plan on one more variation. I’ll prepend it with
a disclaimer that
we are an “old-school” shop learning new tricks. We have a 10+ year
old

code
base and 10 years of process built around the caution that comes
from

countless painful deployments. We don’t have the luxury of wiping
the

slate
clean so we have to make incremental improvements and build on each
success.
That being said, we plan to “pre-deploy” most of our changes. So,
the

day
before the scheduled deployment we plan to lay down all the code and
configuration that is needed for the deployment in a location near
the

running code. Then, the deployment becomes more of [Stop, flip
links,

update
database, Start]. In this case we would have a runlist that would
pre-deploy
configurations and a separate one that would activate the
configurations.
Let me know if I am unaware of a feature here: Expanding on the
notion

of an
"attribute runlist", node attributes should be persistent feature of
a

node.
If I set an attribute that says my administrators email address is
admin@example.com, then I shouldn’t have to have a role in every
runlist

to
assure that my admin email is always set. “chef-client -j” is
destructive in
that it only maintains attributes in the runlist that it ran. This
doe

create the problem that if you have persistent attributes you need a
method
of removing them. It is a challenging problem when specifying your
attributes to make a process to be able to remove ones you no longer
need.
Chef will provide the way to delete, the user must figure out what
to

delete.
Now, to answer your first questions: I do not think that maintaining
one

node object per activity set would be practical in the long run.
Dan

On Fri, Jan 28, 2011 at 2:31 PM, Charles Duffy charles@dyfis.net
wrote:

This speaks more to orchestration than to one-off run lists, but
let me

comment –
My most interesting workflow I’ve been interesting in modeling is
along

the lines of the following:
"If average load across all application servers is less than 1.0,
no

more
than 1/5 of all app servers are out of the pool, and this node is
flagged as
having at least one recipe in the pending-downtime list:

  • remove this node from the load balancer’s pool
  • wait for all requests to drain
  • run all recipes in the pending-downtime list, removing each from
    said
    list after successful completion
  • when pending-downtime list is empty, put this server back into
    the

pool"
…where several different recipes have the ability to add their
own

entries to the pending-downtime list (which could be anything from
a

firewall reconfiguration to an application restart to a full-system
reboot)
Of course, the "no more than 1/5 of all app servers are out of the
pool"
requirement calls for some care to avoid race conditions.
If y’all are working on an orchestration solution, I would be very
interested to hear how it addresses this kind of use case.
On Fri, Jan 28, 2011 at 1:12 PM, Chris Walters cw@opscode.com
wrote:

Dan,
Absolutely. One-off run lists are one of the most requested
features.

They also fit into some of the preliminary discussions we’ve had
about

orchestration models. We plan to get a design together for one-off
run

lists
in the next few weeks to share with the community for feedback.
If you’re willing to comment on your use case more, here are a few
questions that I have.
For your use case, does the multi-node solution with a shared base
run

list work, or do you actually need to have only one node object
for

the
purpose of searching?
Should run lists be first-class objects instead of just properties
on

nodes and roles? Should they be able to contain not only roles and
recipes
but run list-containing entities (nodes and other dis-embodied run
lists),
as well?
If anyone else has opinions on any aspect of one-off run lists,
please

respond, as well.
Thank you for your input.
-chris

On Fri, Jan 28, 2011 at 7:31 AM, Dan Nemec dan@nemecfamily.com
wrote:

So, the obligatory next questions is:
"Is this anywhere on the roadmap?"
Thanks for the suggestion about multiple nodes. We’ll play with
that

and
see if it may be a workable, but not ideal solution.
Dan

On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters cw@opscode.com
wrote:

Hi Dan,
There isn’t currently a way that I can think of to run one run
list

after another except to package up the main run list into a role
and

prepend
that role to the one-off run list’s items.
As for one-off run lists, there isn’t currently a built-in
solution.

Since a single server can be managed by many chef nodes, one way
to

do it is
to have different JSON files like you do, but run them as
different

nodes.
Something like:

infrastructure maintenance runs:
"chef-client -j infra-maint.json -n node-XYZ-infra-maint"
deployment team runs:
“chef-client -j deployment.json -n node-XYZ-deployment”

etc.

Does that help?
-chris
On Wed, Jan 26, 2011 at 1:55 PM, dan@nemecfamily.com wrote:

We have run into an interesting problem. We want to segregate
runlists
by
activity (e.g infrastructure maintenance, deployment, one-off,
etc…).
But we
want all the runlists to share some common role information
about a

node. We
have a node that has some roles (datacenter, servergroup, tier)
that
are
important identifiers and drive selection of certain
attributes. We

want
different groups to be able to do maintenance on their parts at
different times
without impacting others. So if a sysadmin wants to update
/etc/hosts
he
shouldn’t have to worry if the application team has put in a
new

attribute
for a deployment later. The sysadmin can run a runlist that
only

affects the
parts of the system he is responsible for without worrying that
an

application
deployment recipe will run. Conversely in a software deployment
the

deployment
team should be able to update the applications without updating
the

operating
system (given the os changes are not part of the software
deployment).

I thought “chef-client -j” would do this, but it didn’t. This
is

what
I
did: I created a node and bootstrapped it with a runlist of its
identity roles.
I then made a json file with a runlist for a set of activity
and

ran
the
runlist via “chef-client -j ”. The problem is that
the

runlist
for the node that existed before chef-client gets wiped out and
only
the
runlist in the json file gets run thus wiping out its
“identity”

and
breaking the one-off runlist because certain attributes no
longer

exist.

I’d like to be able to append a runlist on the fly to an
existing

runlist on
the node where the new runlist exists on the node only for the
duration of the
chef-client run. The node has a “base” runlist that should
always

be
run,
but I want to run some other recipes and roles one at a time
while

keeping the
“base” runlist. I do not want to have to copy the base runlist
into

the
json file of the one-shot runlist that I am running as I’m
trying

to
keep the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is
effectively
appended
to the runlist that is already on the node and is removed after
the

run?

Thanks,

Dan


#16

Dan,

I think many of your concerns will go away when environment support
comes in but we had the same problem with Puppet over at TSYS when I
was there.
I think the key focus shift is understanding that the CM tool didn’t
cause the breakage. If someone wants to set a connection pool size to
2000 and the box can’t support it, the problem would have happened if
it had been done discreetly or not. Subverting your CM tool in one-off
scenarios only increases the risk.

I really think the mindset change needs to happen. Why are changes
sitting in the repo uncommited or unapplied? If the change requires
some sort of downtime, it’s understandable that you wouldn’t want to
run it RIGHT then but if the change doesn’t require a restart, why not
apply it? You don’t have to run it across all servers of a certain
role. You can tag a given server as a guinea pig and conditionally run
it on that one as your test scenario. But really you had it right when
you said “state”.

With the modern CM tools, you can revert an entire state very easily.
So you made a change that didn’t work out well? Update the
databag/node and rerun. Chef will do the right thing.

On Wed, Feb 2, 2011 at 9:08 AM, Dan Nemec dan@nemecfamily.com wrote:

Sean,
You bring up a good point. As a new Chef user in a 10 year old environment
it’s hard to change the way you think and go from a "discreet unit of work"
mentality to a “state” mentality. Our whole operations staff (myself
included) has always been in the world of “I file a change request for a
discreet piece of work then perform that one piece of work”. Where a piece
of work is something like updating the configurations of one application.
We’ve always thought of it that way and have a hard time with the
possibility of “bringing the system to a new state” might
have unforeseen consequences. The risk in the latter case is that if you
want to make one change, you must make sure that no one else has changed any
other part of the configuration or you might accidentally make a change you
didn’t intend. In the first case I know only one thing can change, in the
latter I have to do some extra work to make sure only one thing is going to
change.
Instead of calling it a one-shot runlist, it is more a discreet-work
runlist. I know Chef wasn’t designed for it. I’m wondering if it should
support it better or if I should change the way I think about configuration
management.
Am I being too paranoid? Who is using Chef with a team of 10+ admins with
long runlists run for every change? Do you get to the point where your code
and your processes are good enough that you can rely on knowing that only
intended changes are applied only at the time you intend them? How would an
ITIL-type shop use Chef with their change control. Are there any case
studies published?
Thanks,
Dan

On Tue, Feb 1, 2011 at 5:25 PM, Sean OMeara someara@gmail.com wrote:

To clarify, chef-client in this scenario would be run in the context
of a normal posix user.

The deployment team would log into the machine with their individual
accounts, then invoke chef-client. chef-client’s runlist would do its
thing, writing to directories that the user had group write access to.

If a recipe attempted to modify /etc/hosts or /etc/shadow, it would
fail, throwing a permissions error.

-s

On Tue, Feb 1, 2011 at 5:14 PM, Sean OMeara someara@gmail.com wrote:

Chef (and puppet and cfengine and bcfg2 and…) are really meant to
converge an entire system against a (composed) state description, not
doing “one off” operations.

Keeping that in mind, if you’re absolutely sure that the “one-off” run
list won’t step on the toes of the “normal” runlist, you could try:

Adding an additional Client (SSL user) to your platform account, and
calling chef-client -c /path/to/an/alternate/configfile.rb, that
specifies a special separate node name.

For example: foonode-oneoff.yourdomain.here.

Add your runlist to that node object, then call chef-client against
the special node.

YMMV

-s

On Tue, Feb 1, 2011 at 5:03 PM, Dan Nemec dan@nemecfamily.com wrote:

Matt,
That’s a great idea. I looked it over and I think it does solve the
problem
I’m thinking of with a one-shot runlist.
I’m relatively new to Chef so I don’t know everything that is possible
and
have a few questions.

  1. Your attribute that contains a list of recipes, can it contain roles
    with
    runlists as well?
    I’m still left with the problem that I require a “base” runlist as
    well. The
    way I see your one-shot runlist is that if I want to use it with
    "chef-client -j" then I still need to include all of the roles and
    recipes I
    consider “base” in the list of the -j option. Your solution just
    provides me
    the mechanism to attach some recipes (maybe roles) to an existing
    runlist
    where it will be removed after the run. That is exactly what I need for
    the
    second half of my problem.
    [I’ll interject here that one of my design goals is that I have
    environment-specific configuration in as few places as possible.
    Commands,
    especially, cannot be environment dependent. I just want to run “doit"
    not
    "doit.prod”.
  2. Given I have a databag with some configuration that is a runlist
    that
    can contain roles or recipes (here in this databag is my node-specific
    information, and nowhere else). Is it possible to have a recipe that
    will
    read from the databag, then construct a runlist on the fly and run that
    runlist?
    That way I can say something like:
    chef-client -j infrastructure.json
    where infrastructure.json looks like
    { “run_list”: [ “recipe[node-manager::base-runlist]”,
    “recipe[one-shot::infrastructure]” ] }
    Then when that runs, it concatenates the runlist from the node databag
    attribute and the runlist from the infrastructure attributes.
    Let me know if you think I’m getting to far out in my quest for base
    runlists and one-shot runlists. I can’t run a monolithic runlist that
    does
    everything.
    Dan
    On Tue, Feb 1, 2011 at 2:43 PM, Matt Ray matt@opscode.com wrote:

Thinking about this problem, I’ve written a “one-shot” cookbook that
may be used to solve simple cases of this problem.

https://github.com/mattray/cookbooks/tree/master/one-shot

This cookbook provides a framework for making single-use, one-shot
recipes. By including the “one-shot” recipe in the node’s run_list, on
the next chef-client run the contents of the "one-shot::one-shot"
recipe will be called. This is parametrized as an attribute, so you
can change these out by setting the [“one_shot”][“recipe”] to include
different recipes (and uploading dependencies if necessary). The file
roles/one-shot.rb is included so you can simply change the role
instead of changing the source directly.

Thanks,
Matt Ray
Technical Evangelist | Opscode, Inc
E: matt@opscode.com T: (512) 731-2218
Twitter, Github: mattray

On Fri, Jan 28, 2011 at 2:18 PM, Dan Nemec dan@nemecfamily.com
wrote:

Sorry for the really long post.
Here is our use case:
I agree that one-off runlists are a component of overall
orchestration.
Right now we use Control Tier for orchestration. It can handle the
workflow
[take server out of load, wait for connections to drain, deploy code
to
server, run smoke test, put server back in load]. We want to use
Chef
for
the “Deploy Code” step. Actually, we plan to use it to deploy
configuration
and all configuration dependencies where Control Tier deploys just
the
code.
(We don’t have any Chef implemented, so these are currently only
plans.
We
do have Control Tier running and have been using it for over a year
orchestrating deployments).
In our case the thought process is that Control Tier would dispatch
a
"chef-client -j " or some such thing to the node that is
being
acted upon. We want that runlist to have only what is important to
that
activity. For a code deployment the runlist would deploy application
code.
For system updates the runlist would update system things. Any
runlist
that
runs on the node is going to need some shared set of attributes on
the
node.
We need a whole lifecycle of keeping the node attributes up to date
so
that
all the new configuration for the upcoming deployment is loaded
prior to
the
deployment.
Answering your second question here, Before we knew all the details
about
Chef, we had the concept of an “attribute runlist” and an “action
runlist"
where the attribute runlist would be one runlist used to manage all
node
attributes and would not have any recipes that would actually
perform
work
on the node. Then, we would maintain a collection of activity
runlists
that
perform sets of system actions relying on the existing attributes on
the
node.
Now, we plan on one more variation. I’ll prepend it with
a disclaimer that
we are an “old-school” shop learning new tricks. We have a 10+ year
old
code
base and 10 years of process built around the caution that comes
from
countless painful deployments. We don’t have the luxury of wiping
the
slate
clean so we have to make incremental improvements and build on each
success.
That being said, we plan to “pre-deploy” most of our changes. So,
the
day
before the scheduled deployment we plan to lay down all the code and
configuration that is needed for the deployment in a location near
the
running code. Then, the deployment becomes more of [Stop, flip
links,
update
database, Start]. In this case we would have a runlist that would
pre-deploy
configurations and a separate one that would activate the
configurations.
Let me know if I am unaware of a feature here: Expanding on the
notion
of an
"attribute runlist”, node attributes should be persistent feature of
a
node.
If I set an attribute that says my administrators email address is
admin@example.com, then I shouldn’t have to have a role in every
runlist
to
assure that my admin email is always set. “chef-client -j” is
destructive in
that it only maintains attributes in the runlist that it ran. This
doe
create the problem that if you have persistent attributes you need a
method
of removing them. It is a challenging problem when specifying your
attributes to make a process to be able to remove ones you no longer
need.
Chef will provide the way to delete, the user must figure out what
to
delete.
Now, to answer your first questions: I do not think that maintaining
one
node object per activity set would be practical in the long run.
Dan

On Fri, Jan 28, 2011 at 2:31 PM, Charles Duffy charles@dyfis.net
wrote:

This speaks more to orchestration than to one-off run lists, but
let me
comment –
My most interesting workflow I’ve been interesting in modeling is
along
the lines of the following:
"If average load across all application servers is less than 1.0,
no
more
than 1/5 of all app servers are out of the pool, and this node is
flagged as
having at least one recipe in the pending-downtime list:

  • remove this node from the load balancer’s pool
  • wait for all requests to drain
  • run all recipes in the pending-downtime list, removing each from
    said
    list after successful completion
  • when pending-downtime list is empty, put this server back into
    the
    pool"
    …where several different recipes have the ability to add their
    own
    entries to the pending-downtime list (which could be anything from
    a
    firewall reconfiguration to an application restart to a full-system
    reboot)
    Of course, the "no more than 1/5 of all app servers are out of the
    pool"
    requirement calls for some care to avoid race conditions.
    If y’all are working on an orchestration solution, I would be very
    interested to hear how it addresses this kind of use case.
    On Fri, Jan 28, 2011 at 1:12 PM, Chris Walters cw@opscode.com
    wrote:

Dan,
Absolutely. One-off run lists are one of the most requested
features.
They also fit into some of the preliminary discussions we’ve had
about
orchestration models. We plan to get a design together for one-off
run
lists
in the next few weeks to share with the community for feedback.
If you’re willing to comment on your use case more, here are a few
questions that I have.
For your use case, does the multi-node solution with a shared base
run
list work, or do you actually need to have only one node object
for
the
purpose of searching?
Should run lists be first-class objects instead of just properties
on
nodes and roles? Should they be able to contain not only roles and
recipes
but run list-containing entities (nodes and other dis-embodied run
lists),
as well?
If anyone else has opinions on any aspect of one-off run lists,
please
respond, as well.
Thank you for your input.
-chris

On Fri, Jan 28, 2011 at 7:31 AM, Dan Nemec dan@nemecfamily.com
wrote:

So, the obligatory next questions is:
"Is this anywhere on the roadmap?"
Thanks for the suggestion about multiple nodes. We’ll play with
that
and
see if it may be a workable, but not ideal solution.
Dan

On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters cw@opscode.com
wrote:

Hi Dan,
There isn’t currently a way that I can think of to run one run
list
after another except to package up the main run list into a role
and
prepend
that role to the one-off run list’s items.
As for one-off run lists, there isn’t currently a built-in
solution.
Since a single server can be managed by many chef nodes, one way
to
do it is
to have different JSON files like you do, but run them as
different
nodes.
Something like:

infrastructure maintenance runs:
"chef-client -j infra-maint.json -n node-XYZ-infra-maint"
deployment team runs:
“chef-client -j deployment.json -n node-XYZ-deployment”

etc.

Does that help?
-chris
On Wed, Jan 26, 2011 at 1:55 PM, dan@nemecfamily.com wrote:

We have run into an interesting problem. We want to segregate
runlists
by
activity (e.g infrastructure maintenance, deployment, one-off,
etc…).
But we
want all the runlists to share some common role information
about a
node. We
have a node that has some roles (datacenter, servergroup, tier)
that
are
important identifiers and drive selection of certain
attributes. We
want
different groups to be able to do maintenance on their parts at
different times
without impacting others. So if a sysadmin wants to update
/etc/hosts
he
shouldn’t have to worry if the application team has put in a
new
attribute
for a deployment later. The sysadmin can run a runlist that
only
affects the
parts of the system he is responsible for without worrying that
an
application
deployment recipe will run. Conversely in a software deployment
the
deployment
team should be able to update the applications without updating
the
operating
system (given the os changes are not part of the software
deployment).

I thought “chef-client -j” would do this, but it didn’t. This
is
what
I
did: I created a node and bootstrapped it with a runlist of its
identity roles.
I then made a json file with a runlist for a set of activity
and
ran
the
runlist via “chef-client -j ”. The problem is that
the
runlist
for the node that existed before chef-client gets wiped out and
only
the
runlist in the json file gets run thus wiping out its
“identity”
and
breaking the one-off runlist because certain attributes no
longer
exist.

I’d like to be able to append a runlist on the fly to an
existing
runlist on
the node where the new runlist exists on the node only for the
duration of the
chef-client run. The node has a “base” runlist that should
always
be
run,
but I want to run some other recipes and roles one at a time
while
keeping the
“base” runlist. I do not want to have to copy the base runlist
into
the
json file of the one-shot runlist that I am running as I’m
trying
to
keep the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is
effectively
appended
to the runlist that is already on the node and is removed after
the
run?

Thanks,

Dan