Best practice for measuring and monitoring chef-client runs?

Augie_Schwer · September 17, 2014, 12:33am

What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What are
some popular ways to accomplish these goals? Thanks!

–
Augie Schwer - Augie@Schwer.us - http://schwer.us

jeffbyrnes · September 17, 2014, 12:38am

We use an email handler to report runs; primarily filtered for failed runs. Crude, but it works.

On Tue, Sep 16, 2014 at 8:33 PM, Augie Schwer augie.schwer@gmail.com
wrote:

What are people using to monitor and measure their chef-client runs?
I would like to monitor for when chef-client runs fail on a node.
It would be nice to measure chef-client run times.
Is it safe to assume people are using handlers for both of these? What are
some popular ways to accomplish these goals? Thanks!

Augie Schwer - Augie@Schwer.us - http://schwer.us

Mike · September 17, 2014, 2:10am

On Sep 16, 2014 8:39 PM, "Jeff Byrnes" jeff@evertrue.com wrote:

We use an email handler to report runs; primarily filtered for failed
runs. Crude, but it works.

On Tue, Sep 16, 2014 at 8:33 PM, Augie Schwer augie.schwer@gmail.com
wrote:

What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What
are some popular ways to accomplish these goals? Thanks!

--
Augie Schwer - Augie@Schwer.us - http://schwer.us

adamhjk · September 17, 2014, 2:12am

In addition, you can use the management console and analytics to get this
data. Free under 25 nodes.
On Sep 16, 2014 5:33 PM, "Augie Schwer" augie.schwer@gmail.com wrote:

What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What are
some popular ways to accomplish these goals? Thanks!

--
Augie Schwer - Augie@Schwer.us - http://schwer.us

Morgan_Blackthorne · September 17, 2014, 2:48am

We use Airbrake Handler to send the errors to hoptoad (which aggregates and
emails). Haven't had time to dig into the run time analysis, not sure it
matters to us at this point... complicated for us runs usually finish in
about 1-2m, so that's plenty fine by me.

--
~~ StormeRider ~~

"Every world needs its heroes [...] They inspire us to be better than we
are. And they protect from the darkness that's just around the corner."

(from Smallville Season 6x1: "Zod")

On why I hate the phrase "that's so lame"... http://bit.ly/Ps3uSS

On Tue, Sep 16, 2014 at 5:33 PM, Augie Schwer augie.schwer@gmail.com
wrote:

What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What are
some popular ways to accomplish these goals? Thanks!

--
Augie Schwer - Augie@Schwer.us - http://schwer.us

ppalanisamy · September 17, 2014, 2:50am

We use both email handler & Datadog handler. We were hit by a situation where there was a memory leak (with chef-client in daemon mode) which caused the handler also to fail without enough memory. We ended up fixing the memory leak and changed chef-client execution to task instead of service.

Thanks,
Prakash

From: Mike [mailto:miketheman@gmail.com]
Sent: Wednesday, September 17, 2014 4:10 AM
To: chef@lists.opscode.com
Subject: [chef] Re: Re: Best practice for measuring and monitoring chef-client runs?

https://supermarket.getchef.com/tools/chef-handler-datadog
On Sep 16, 2014 8:39 PM, “Jeff Byrnes” <jeff@evertrue.com mailto:jeff@evertrue.com> wrote:
We use an email handler to report runs; primarily filtered for failed runs. Crude, but it works.

On Tue, Sep 16, 2014 at 8:33 PM, Augie Schwer <augie.schwer@gmail.com mailto:augie.schwer@gmail.com> wrote:
What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What are some popular ways to accomplish these goals? Thanks!

–
Augie Schwer - Augie@Schwer.us mailto:Augie@Schwer.us - http://schwer.us

Click herehttps://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ== to report this email as spam.

www.sdl.com

If you are not the intended recipient of this mail SDL requests and requires that you delete it without acting upon or copying any of its contents,
and we further request that you advise us.

SDL PLC is a public limited company registered in England and Wales.
Registered number: 02675207.

Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire SL6 7DY, UK.

This message has been scanned for malware by Websense. www.websense.com

Steffen_Gebert1 · September 17, 2014, 6:03am

Jumping into the "we do.." postings: We send chef-client statistics to
zabbix using a report handler:

success
elapsed_time
start_time
end_time
all_resources_num
updated_resource num

I think that should be pretty easy to adapt to whatever monitoring
system you use.

Yours
Steffen

Links:

github.com

TYPO3-cookbooks/zabbix-custom-checks/blob/master/recipes/chef-client.rb

#
# Cookbook Name:: zabbix-custom-checks
# Recipe:: chef-client
#
# Copyright 2012, Steffen Gebert / TYPO3 Association
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

include_recipe "zabbix-custom-checks::default"

This file has been truncated. show original

https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/templates/default/chef-client/chef-client-handler.rb

On 17/09/14 02:33, Augie Schwer wrote:

What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What are
some popular ways to accomplish these goals? Thanks!

Mark_Mzyk_OLD · September 17, 2014, 4:37pm

The report handler that supplies the data from the client run to the
Chef server reporting add on is open source, so it could be used and/or
built off of, if you didn't want to use the pre-built Chef add ons.

It's here in the client:

github.com

chef/chef/blob/main/lib/chef/resource_reporter.rb

#
# Author:: Daniel DeLeo (<dan@chef.io>)
# Author:: Prajakta Purohit (prajakta@chef.io>)
# Author:: Tyler Cloke (<tyler@opscode.com>)
#
# Copyright:: Copyright (c) Chef Software Inc.
# License:: Apache License, Version 2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

This file has been truncated. show original

Mark Mzyk

Steffen Gebert mailto:st+gmane@st-g.de
September 17, 2014 at 2:03 AM
Jumping into the "we do.." postings: We send chef-client statistics to
zabbix using a report handler:

success

elapsed_time

start_time

end_time

all_resources_num

updated_resource num

I think that should be pretty easy to adapt to whatever monitoring
system you use.

Yours
Steffen

Links:
https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/recipes/chef-client.rb
https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/templates/default/chef-client/chef-client-handler.rb

Augie Schwer mailto:augie.schwer@gmail.com
September 16, 2014 at 8:33 PM
What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What
are some popular ways to accomplish these goals? Thanks!

--
Augie Schwer - Augie@Schwer.us - http://schwer.us

DV1 · September 17, 2014, 6:35pm

We have a custom Rails app that acts as handler for chef-client. Here's
what dashboard looks like: http://i.imgur.com/sR4UCWC.png

We also have an automated task that runs "knife status" and reports on any
hosts that haven't checked in for a while.

On Wed, Sep 17, 2014 at 9:37 AM, Mark Mzyk mmzyk@getchef.com wrote:

The report handler that supplies the data from the client run to the Chef
server reporting add on is open source, so it could be used and/or built
off of, if you didn't want to use the pre-built Chef add ons.

It's here in the client:
https://github.com/opscode/chef/blob/master/lib/chef/resource_reporter.rb

Mark Mzyk

Steffen Gebert st+gmane@st-g.de
September 17, 2014 at 2:03 AM
Jumping into the "we do.." postings: We send chef-client statistics to
zabbix using a report handler:

success

elapsed_time

start_time

end_time

all_resources_num

updated_resource num

I think that should be pretty easy to adapt to whatever monitoring
system you use.

Yours
Steffen

Links:

https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/recipes/chef-client.rb

https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/templates/default/chef-client/chef-client-handler.rb

Augie Schwer augie.schwer@gmail.com
September 16, 2014 at 8:33 PM
What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What are
some popular ways to accomplish these goals? Thanks!

--
Augie Schwer - Augie@Schwer.us - http://schwer.us

--
Best regards, Dmitriy V.

Tiago_Cruz · September 17, 2014, 7:04pm

I have a simple check on nagios like this

...
HOURS=2
SECONDS=$(expr $HOURS * 60 * 60)
OHAI_TIME="$(expr $(date +%s) - $SECONDS)"

SEARCH="$(knife search node "ohai_time:[* TO $OHAI_TIME] AND
chef_environment:production")"
...

On Wed, Sep 17, 2014 at 3:35 PM, DV vindimy@gmail.com wrote:

We have a custom Rails app that acts as handler for chef-client. Here's
what dashboard looks like: http://i.imgur.com/sR4UCWC.png

We also have an automated task that runs "knife status" and reports on any
hosts that haven't checked in for a while.

On Wed, Sep 17, 2014 at 9:37 AM, Mark Mzyk mmzyk@getchef.com wrote:

The report handler that supplies the data from the client run to the Chef
server reporting add on is open source, so it could be used and/or built
off of, if you didn't want to use the pre-built Chef add ons.

It's here in the client:
https://github.com/opscode/chef/blob/master/lib/chef/resource_reporter.rb

Mark Mzyk

Steffen Gebert st+gmane@st-g.de
September 17, 2014 at 2:03 AM
Jumping into the "we do.." postings: We send chef-client statistics to
zabbix using a report handler:

success

elapsed_time

start_time

end_time

all_resources_num

updated_resource num

I think that should be pretty easy to adapt to whatever monitoring
system you use.

Yours
Steffen

Links:

https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/recipes/chef-client.rb

https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/templates/default/chef-client/chef-client-handler.rb

Augie Schwer augie.schwer@gmail.com
September 16, 2014 at 8:33 PM
What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What
are some popular ways to accomplish these goals? Thanks!

--
Augie Schwer - Augie@Schwer.us - http://schwer.us

--
Best regards, Dmitriy V.

--
-- Tiago Cruz

Augie_Schwer · September 17, 2014, 11:52pm

Thanks everyone, that is all very helpful.

On Wed, Sep 17, 2014 at 12:04 PM, Tiago Cruz tiago.tuxkiller@gmail.com
wrote:

I have a simple check on nagios like this

...
HOURS=2
SECONDS=$(expr $HOURS * 60 * 60)
OHAI_TIME="$(expr $(date +%s) - $SECONDS)"

SEARCH="$(knife search node "ohai_time:[* TO $OHAI_TIME] AND
chef_environment:production")"
...

On Wed, Sep 17, 2014 at 3:35 PM, DV vindimy@gmail.com wrote:

We have a custom Rails app that acts as handler for chef-client. Here's
what dashboard looks like: http://i.imgur.com/sR4UCWC.png

We also have an automated task that runs "knife status" and reports on
any hosts that haven't checked in for a while.

On Wed, Sep 17, 2014 at 9:37 AM, Mark Mzyk mmzyk@getchef.com wrote:

The report handler that supplies the data from the client run to the
Chef server reporting add on is open source, so it could be used and/or
built off of, if you didn't want to use the pre-built Chef add ons.

It's here in the client:
https://github.com/opscode/chef/blob/master/lib/chef/resource_reporter.rb

Mark Mzyk

Steffen Gebert st+gmane@st-g.de
September 17, 2014 at 2:03 AM
Jumping into the "we do.." postings: We send chef-client statistics to
zabbix using a report handler:

success

elapsed_time

start_time

end_time

all_resources_num

updated_resource num

I think that should be pretty easy to adapt to whatever monitoring
system you use.

Yours
Steffen

Links:

https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/recipes/chef-client.rb

https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/templates/default/chef-client/chef-client-handler.rb

Augie Schwer augie.schwer@gmail.com
September 16, 2014 at 8:33 PM
What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What
are some popular ways to accomplish these goals? Thanks!

--
Augie Schwer - Augie@Schwer.us - http://schwer.us

--
Best regards, Dmitriy V.

--
-- Tiago Cruz

--
Augie Schwer - Augie@Schwer.us - http://schwer.us

Topic		Replies	Views
How to monitor if chef-client is actually working Chef Infra (archive)	6	3449	July 10, 2011
Monitoring chef runs Chef Infra (archive)	21	524	September 10, 2012
How to alert on chef recipe not running or failed Chef Infra (archive)	1	504	July 22, 2019
How to tell when recipes fail? Chef Infra (archive)	10	959	November 16, 2010
Monitoring chef-client failures Chef Infra (archive)	3	791	October 10, 2011

Best practice for measuring and monitoring chef-client runs?

Related topics