I monitor the chef-client log via nagios nrpe (active check) for two
purposes: (1) check the log’s last modified time to know whether the service
is running correctly, and (2) report multi-line log errors. The problem
with just monitoring the chef-client (or any process) via a check_proc
command (or comparable) is that the process could be running but wedged.
Ensuring that the process is continually writing to its log, while not
perfect, goes a bit further to make sure the process is not wedged.
Getting nrpe to report multiple lines takes a bit of work as that’s not
standard - but it is possible (whereas it’s not possible at all AFAIK via
nsca passive checks). I can dig up my notes on how I did this if people are
On Thu, Jun 30, 2011 at 1:44 AM, Ranjib Dey email@example.com:
We use nagios as our alerting/monitoring solution and i have written an
nsca based chef report handler that submits the chef client run status via
send_nsca command to a nsca server (which in turn submits a passive check on
the nagios.cmd pipe). It assumes nsca-client is installed on the chef client
nodes and nsca daemon is running (and configured ) on the nagios box.
You can get the script here: https://github.com/ranjibd/nsca_handler
You can also use nrpe based active checks, but i guess that will introduce
some delay, but also helps you detecting in case the client crashes (which
im experiencing with ruby 1.9.2 )
On Thu, Jun 30, 2011 at 4:54 AM, Robert Keng firstname.lastname@example.org:
Just wondering if anyone knows if there is a way to monitor chef-client
to make sure it isn’t stuck in retries because of some error in some recipe?
Does the client actually reply back to the chef-server if its able to apply
all recipes and things are happy?
I apologize if this is a really simple question, fairly new to
chef…Just I’d rather not have to parse /var/log/chef/client.log for this,