Abusing Startup Handlers

Os4839 · March 28, 2012, 9:38am

Hi,

I’m currently thinking about abusing the Chef startup handlers for more
stability in a compute cluster.

We are running a compute cluster for scientific analysis with a resource
management system where you have a master node responsible for scheduling
analysis jobs to execution nodes. For each of the execution nodes I would
like to do the following:

When they start a chef run, they send a “Don’t send any new workloads to
me” message to the master node
When the chef run was successful they send a “Ok. I’m up for some more
jobs now.” message to the master node.

My goal is to have nodes on which a chef run fails to be disabled in the
resource management system because a failed chef run most of the time means
that the node is in a bad/undefined state so chances are that new jobs
scheduled there will fail.

I could obviously have the node disabled with an execute resource in the
very first recipe, but what if one of the ohai plugins gets stuck and chef
never makes it to that first recipe?

Therefore I would like to disable the execution node at the very beginning
of the chef run. Is this possible with the startup handlers?

As I understood it, the start up handlers are run after ohai. Wouldn’t it
be reasonable to run them before ohai starts, thus making the startup
handlers the very first thing to run on a node?

Cheers,

Oscar

Ranjib_Dey · March 28, 2012, 10:05am

You have highlighted multiple things here, if you plan to orchestrate cross
node services , where status of chef run should impact whether another chef
run will be triggered or not, you can try mcollective.
also, for custom solutions,
node.ohai_time stores the time of last successful chef run occurred , if
your chef runs are scheduled periodically (or with a fixed splay time) you
can exploit this attribute in chef startup handler to decide whether a node
has converged successfully or not (and also you can use the rabbitmq that
you are using with chef server to modle pub/sub like workflows).

On Wed, Mar 28, 2012 at 3:08 PM, oscar schneider os4839@googlemail.comwrote:

Hi,

I'm currently thinking about abusing the Chef startup handlers for more
stability in a compute cluster.

We are running a compute cluster for scientific analysis with a resource
management system where you have a master node responsible for scheduling
analysis jobs to execution nodes. For each of the execution nodes I would
like to do the following:

When they start a chef run, they send a "Don't send any new workloads
to me" message to the master node

When the chef run was successful they send a "Ok. I'm up for some more
jobs now." message to the master node.

My goal is to have nodes on which a chef run fails to be disabled in the
resource management system because a failed chef run most of the time means
that the node is in a bad/undefined state so chances are that new jobs
scheduled there will fail.

I could obviously have the node disabled with an execute resource in the
very first recipe, but what if one of the ohai plugins gets stuck and chef
never makes it to that first recipe?

Therefore I would like to disable the execution node at the very beginning
of the chef run. Is this possible with the startup handlers?

As I understood it, the start up handlers are run after ohai. Wouldn't it
be reasonable to run them before ohai starts, thus making the startup
handlers the very first thing to run on a node?

Cheers,

Oscar

Topic		Replies	Views
Empty run list after failed run (Windows) Chef Infra (archive)	3	1322	February 23, 2015
Using Chef report handler and Reboot Chef Infra (archive)	0	374	June 13, 2019
Recipes run order versus resource usage/cloning Chef Infra (archive)	2	714	April 7, 2014
Removing permission to persist an overridden run-list for a node Chef Infra (archive)	1	409	October 1, 2015
Determining if a chef-client run is an initial chef-client run through handlers Chef Infra (archive)	4	472	March 4, 2015

Abusing Startup Handlers

Related topics