Managing by Cluster Instead of Machine

Hi there!

I have been using chef for almost a year now to manage our infrastructure and I have been loving it. However I was wondering is there a method for performing an action once per cluster, as opposed to once per machine. My particular example is managing elasticsearch, we have a number of master, slave and client nodes. Occasionally when we update a template file, an api call needs to be made to the cluster in order to use it, however that command only needs to be run once per CLUSTER, not per machine, i.e. building the resource into the elasticsearch cookbook would be redundant as it would be run once per NODE, whenever we run chef-client. Is there a more logical home for actions that are cluster-based, instead of node-based??

Thanks!!

This really depends on your requirements.

The easiest things are:

  • make one node “special,” (assign a normal attribute or something) so that it’s the designated node for running these kinds of tasks. The downside is that you have to ensure there’s always exactly one of these.
  • code the task so that it only runs once, but every node will try. You can mostly get there with a not_if or only_if but there is a window of time between the guard command runs and the resource actually executes. Also, if the actual change takes some amount of time to complete, other nodes would need to be able to detect that it’s in progress and skip. How safe this is for your use case depends on how the elasticsearch APIs work.

The fully general solution for this is to add some kind of distributed leader election system (like etcd, etc.) into the mix and use the ruby client for that system to integrate with Chef. But then you have to manage whatever it is you end up using.