Monitoring chef runs


#21

On Friday, September 7, 2012 at 3:22 PM, KC Braunschweig wrote:

On Fri, Sep 7, 2012 at 3:11 PM, Daniel DeLeo <dan@kallistec.com (mailto:dan@kallistec.com)> wrote:

There’s a patch where delayed notifications are always run in master. This
is a pretty significant behavior change so we’re waiting until Chef 11 to
ship it.

Interesting. I suspect that would be good much of the time, and more
obvious, but sorta goes against the normal chef behavior that if
something bad happens we bail out immediately. Thanks,

KC
It was an easy patch, but not an easy decision.

The basic argument for is that delayed notifications are generally used to make configuration changes for resources where Chef cannot verify the state. For example, Chef cannot usually tell if a service resource is running the correct version of the application and config, so it has no way to enforce a policy (service “foo” should be running with the app version and config on disk) in an idempotent way. For simple, single-process services, workarounds are possible (e.g., using something like ps -o etime and a disk-based notification queue), but there is no general solution that works for all cases.

The argument against is that some resource may be partially or incorrectly configured, and running the delayed notification could therefore cause an outage.

After much discussion, we decided that the idea that chef made a promise (of sorts) to run an action on a resource and the benefit of ending up with a correctly configured system according to your policy by re-running chef outweighed the concerns about (possibly, temporarily) leaving a resource in an incorrect state. Furthermore, running an incorrect version of, or incorrectly configured service can also lead to severe problems, so most of the “win” ends up on the side of always running the notifications.


Daniel DeLeo


#22

On Fri, Sep 7, 2012 at 6:28 PM, Daniel DeLeo dan@kallistec.com wrote:

After much discussion, we decided that the idea that chef made a promise (of
sorts) to run an action on a resource and the benefit of ending up with a
correctly configured system according to your policy by re-running chef
outweighed the concerns about (possibly, temporarily) leaving a resource in
an incorrect state. Furthermore, running an incorrect version of, or
incorrectly configured service can also lead to severe problems, so most of
the “win” ends up on the side of always running the notifications.

FWIW I agree. Thought seems like all could be accommodated if the new
behavior became default and you could pass another setting to the
notification (rather than :delayed or :immediately) to trigger the
current behavior instead.

KC