Best practice for measuring and monitoring chef-client runs?

What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What are
some popular ways to accomplish these goals? Thanks!


Augie Schwer - Augie@Schwer.us - http://schwer.us

We use an email handler to report runs; primarily filtered for failed runs. Crude, but it works.

On Tue, Sep 16, 2014 at 8:33 PM, Augie Schwer augie.schwer@gmail.com
wrote:

What are people using to monitor and measure their chef-client runs?
I would like to monitor for when chef-client runs fail on a node.
It would be nice to measure chef-client run times.
Is it safe to assume people are using handlers for both of these? What are
some popular ways to accomplish these goals? Thanks!

Augie Schwer - Augie@Schwer.us - http://schwer.us

On Sep 16, 2014 8:39 PM, "Jeff Byrnes" jeff@evertrue.com wrote:

We use an email handler to report runs; primarily filtered for failed
runs. Crude, but it works.

On Tue, Sep 16, 2014 at 8:33 PM, Augie Schwer augie.schwer@gmail.com
wrote:

What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What
are some popular ways to accomplish these goals? Thanks!

--
Augie Schwer - Augie@Schwer.us - http://schwer.us

In addition, you can use the management console and analytics to get this
data. Free under 25 nodes.
On Sep 16, 2014 5:33 PM, "Augie Schwer" augie.schwer@gmail.com wrote:

What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What are
some popular ways to accomplish these goals? Thanks!

--
Augie Schwer - Augie@Schwer.us - http://schwer.us

We use Airbrake Handler to send the errors to hoptoad (which aggregates and
emails). Haven't had time to dig into the run time analysis, not sure it
matters to us at this point... complicated for us runs usually finish in
about 1-2m, so that's plenty fine by me.

--
~~ StormeRider ~~

"Every world needs its heroes [...] They inspire us to be better than we
are. And they protect from the darkness that's just around the corner."

(from Smallville Season 6x1: "Zod")

On why I hate the phrase "that's so lame"... http://bit.ly/Ps3uSS

On Tue, Sep 16, 2014 at 5:33 PM, Augie Schwer augie.schwer@gmail.com
wrote:

What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What are
some popular ways to accomplish these goals? Thanks!

--
Augie Schwer - Augie@Schwer.us - http://schwer.us

We use both email handler & Datadog handler. We were hit by a situation where there was a memory leak (with chef-client in daemon mode) which caused the handler also to fail without enough memory. We ended up fixing the memory leak and changed chef-client execution to task instead of service.

Thanks,
Prakash

From: Mike [mailto:miketheman@gmail.com]
Sent: Wednesday, September 17, 2014 4:10 AM
To: chef@lists.opscode.com
Subject: [chef] Re: Re: Best practice for measuring and monitoring chef-client runs?

https://supermarket.getchef.com/tools/chef-handler-datadog
On Sep 16, 2014 8:39 PM, “Jeff Byrnes” <jeff@evertrue.commailto:jeff@evertrue.com> wrote:
We use an email handler to report runs; primarily filtered for failed runs. Crude, but it works.

On Tue, Sep 16, 2014 at 8:33 PM, Augie Schwer <augie.schwer@gmail.commailto:augie.schwer@gmail.com> wrote:
What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What are some popular ways to accomplish these goals? Thanks!


Augie Schwer - Augie@Schwer.usmailto:Augie@Schwer.us - http://schwer.us

Click herehttps://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ== to report this email as spam.


www.sdl.com


SDL PLC confidential, all rights reserved.

If you are not the intended recipient of this mail SDL requests and requires that you delete it without acting upon or copying any of its contents,
and we further request that you advise us.

SDL PLC is a public limited company registered in England and Wales.
Registered number: 02675207.

Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire SL6 7DY, UK.

This message has been scanned for malware by Websense. www.websense.com

Jumping into the "we do.." postings: We send chef-client statistics to
zabbix using a report handler:

  • success
  • elapsed_time
  • start_time
  • end_time
  • all_resources_num
  • updated_resource num

I think that should be pretty easy to adapt to whatever monitoring
system you use.

Yours
Steffen

Links:

https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/templates/default/chef-client/chef-client-handler.rb

On 17/09/14 02:33, Augie Schwer wrote:

What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What are
some popular ways to accomplish these goals? Thanks!

The report handler that supplies the data from the client run to the
Chef server reporting add on is open source, so it could be used and/or
built off of, if you didn't want to use the pre-built Chef add ons.

It's here in the client:

  • Mark Mzyk

Steffen Gebert mailto:st+gmane@st-g.de
September 17, 2014 at 2:03 AM
Jumping into the "we do.." postings: We send chef-client statistics to
zabbix using a report handler:

  • success
  • elapsed_time
  • start_time
  • end_time
  • all_resources_num
  • updated_resource num

I think that should be pretty easy to adapt to whatever monitoring
system you use.

Yours
Steffen

Links:
https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/recipes/chef-client.rb
https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/templates/default/chef-client/chef-client-handler.rb

Augie Schwer mailto:augie.schwer@gmail.com
September 16, 2014 at 8:33 PM
What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What
are some popular ways to accomplish these goals? Thanks!

--
Augie Schwer - Augie@Schwer.us - http://schwer.us

We have a custom Rails app that acts as handler for chef-client. Here's
what dashboard looks like: http://i.imgur.com/sR4UCWC.png

We also have an automated task that runs "knife status" and reports on any
hosts that haven't checked in for a while.

On Wed, Sep 17, 2014 at 9:37 AM, Mark Mzyk mmzyk@getchef.com wrote:

The report handler that supplies the data from the client run to the Chef
server reporting add on is open source, so it could be used and/or built
off of, if you didn't want to use the pre-built Chef add ons.

It's here in the client:
https://github.com/opscode/chef/blob/master/lib/chef/resource_reporter.rb

  • Mark Mzyk

    Steffen Gebert st+gmane@st-g.de
    September 17, 2014 at 2:03 AM
    Jumping into the "we do.." postings: We send chef-client statistics to
    zabbix using a report handler:

  • success

  • elapsed_time

  • start_time

  • end_time

  • all_resources_num

  • updated_resource num

I think that should be pretty easy to adapt to whatever monitoring
system you use.

Yours
Steffen

Links:

https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/recipes/chef-client.rb

https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/templates/default/chef-client/chef-client-handler.rb

Augie Schwer augie.schwer@gmail.com
September 16, 2014 at 8:33 PM
What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What are
some popular ways to accomplish these goals? Thanks!

--
Augie Schwer - Augie@Schwer.us - http://schwer.us

--
Best regards, Dmitriy V.

I have a simple check on nagios like this

...
HOURS=2
SECONDS=$(expr $HOURS * 60 * 60)
OHAI_TIME="$(expr $(date +%s) - $SECONDS)"

SEARCH="$(knife search node "ohai_time:[* TO $OHAI_TIME] AND
chef_environment:production")"
...

On Wed, Sep 17, 2014 at 3:35 PM, DV vindimy@gmail.com wrote:

We have a custom Rails app that acts as handler for chef-client. Here's
what dashboard looks like: http://i.imgur.com/sR4UCWC.png

We also have an automated task that runs "knife status" and reports on any
hosts that haven't checked in for a while.

On Wed, Sep 17, 2014 at 9:37 AM, Mark Mzyk mmzyk@getchef.com wrote:

The report handler that supplies the data from the client run to the Chef
server reporting add on is open source, so it could be used and/or built
off of, if you didn't want to use the pre-built Chef add ons.

It's here in the client:
https://github.com/opscode/chef/blob/master/lib/chef/resource_reporter.rb

  • Mark Mzyk

    Steffen Gebert st+gmane@st-g.de
    September 17, 2014 at 2:03 AM
    Jumping into the "we do.." postings: We send chef-client statistics to
    zabbix using a report handler:

  • success

  • elapsed_time

  • start_time

  • end_time

  • all_resources_num

  • updated_resource num

I think that should be pretty easy to adapt to whatever monitoring
system you use.

Yours
Steffen

Links:

https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/recipes/chef-client.rb

https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/templates/default/chef-client/chef-client-handler.rb

Augie Schwer augie.schwer@gmail.com
September 16, 2014 at 8:33 PM
What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What
are some popular ways to accomplish these goals? Thanks!

--
Augie Schwer - Augie@Schwer.us - http://schwer.us

--
Best regards, Dmitriy V.

--
-- Tiago Cruz

Thanks everyone, that is all very helpful.

On Wed, Sep 17, 2014 at 12:04 PM, Tiago Cruz tiago.tuxkiller@gmail.com
wrote:

I have a simple check on nagios like this

...
HOURS=2
SECONDS=$(expr $HOURS * 60 * 60)
OHAI_TIME="$(expr $(date +%s) - $SECONDS)"

SEARCH="$(knife search node "ohai_time:[* TO $OHAI_TIME] AND
chef_environment:production")"
...

On Wed, Sep 17, 2014 at 3:35 PM, DV vindimy@gmail.com wrote:

We have a custom Rails app that acts as handler for chef-client. Here's
what dashboard looks like: http://i.imgur.com/sR4UCWC.png

We also have an automated task that runs "knife status" and reports on
any hosts that haven't checked in for a while.

On Wed, Sep 17, 2014 at 9:37 AM, Mark Mzyk mmzyk@getchef.com wrote:

The report handler that supplies the data from the client run to the
Chef server reporting add on is open source, so it could be used and/or
built off of, if you didn't want to use the pre-built Chef add ons.

It's here in the client:
https://github.com/opscode/chef/blob/master/lib/chef/resource_reporter.rb

  • Mark Mzyk

    Steffen Gebert st+gmane@st-g.de
    September 17, 2014 at 2:03 AM
    Jumping into the "we do.." postings: We send chef-client statistics to
    zabbix using a report handler:

  • success

  • elapsed_time

  • start_time

  • end_time

  • all_resources_num

  • updated_resource num

I think that should be pretty easy to adapt to whatever monitoring
system you use.

Yours
Steffen

Links:

https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/recipes/chef-client.rb

https://github.com/TYPO3-cookbooks/zabbix-custom-checks/blob/master/templates/default/chef-client/chef-client-handler.rb

Augie Schwer augie.schwer@gmail.com
September 16, 2014 at 8:33 PM
What are people using to monitor and measure their chef-client runs?

I would like to monitor for when chef-client runs fail on a node.

It would be nice to measure chef-client run times.

Is it safe to assume people are using handlers for both of these? What
are some popular ways to accomplish these goals? Thanks!

--
Augie Schwer - Augie@Schwer.us - http://schwer.us

--
Best regards, Dmitriy V.

--
-- Tiago Cruz

--
Augie Schwer - Augie@Schwer.us - http://schwer.us