Community input needed on the Nagios cookbook


#1

I’d love to get feedback from the community on the current state of the Nagios cookbook. Automated monitoring of systems is in my opinion probably the coolest feature of a configuration management systems, and a large number of people are using Nagios. The Nagios cookbook needed / needs some love though. I’ve been doing a lot of work on the cookbook lately since it was lacking functionality I needed and was outright broken in a lot of ways. I know there’s a lot left to do and hopefully you guys can provide some input on what’s necessary.

If you used the cookbook and it didn’t work for you what problems did you run into?
If you had to modify the cookbook with hard coded values (my biggest issue) for your environment what was lacking that forced you to do this?
In general what would you like to see that isn’t there yet?

What I’ve added that hopefully should get merged in the next release cycle:

  • Defining Event handlers from data bags
  • Optional support for monitoring hosts in multiple environments
  • Support for environments with Windows hosts without applying Linux checks that will always fail
  • Support for defining nagios host groups from chef search queries stored in data bags
  • Install Nagios 3.4.1 not 3.2.3 when installing from source
  • Support for installing the server from source, while installing the client from package (if you just want a newer server than Ubuntu packages)
  • Fixed source installs of the server to actually work

What I still need to get pushed upstream

  • Support for installing NRPE via EPEL packages on RHEL distros instead of compiling from source
  • Ability to define blacklisted chef environments (preprod/dev) where host/service alerting is disabled
  • The Nagios user needs passwordless sudo out of the box to perform most NRPE checks

So if you have some time let me know your experiences

Tim Smith

Operations Engineer, SaaS Operations

M: +1 707.738.8132

TW: @tas50

webtrendshttp://www.webtrends.com/

Real-Time Relevance. Remarkable ROI.™

London | Portland | San Francisco | Melbourne | Tokyo


#2

On 13 August 2012 00:48, Tim Smith Tim.Smith@webtrends.com wrote:

I’d love to get feedback from the community on the current state of the
Nagios cookbook. Automated monitoring of systems is in my opinion probably
the coolest feature of a configuration management systems, and a large
number of people are using Nagios. The Nagios cookbook needed / needs some
love though. I’ve been doing a lot of work on the cookbook lately since it
was lacking functionality I needed and was outright broken in a lot of
ways. I know there’s a lot left to do and hopefully you guys can provide
some input on what’s necessary.

If you used the cookbook and it didn’t work for you what problems did
you run into?
If you had to modify the cookbook with hard coded values (my biggest
issue) for your environment what was lacking that forced you to do this?
In general what would you like to see that isn’t there yet?

The main problem I have with the nagios cookbook was monitoring systems not
covered by chef without hard coding them into recipes or templates for
things like firewalls, routers, switches, printers, etc. I ended up hacking
the code to support a data_bag “clientless” which I could list items in
there and the nagios checks that apply to them.

What I’ve added that hopefully should get merged in the next release
cycle:

  • Defining Event handlers from data bags
  • Optional support for monitoring hosts in multiple environments
  • Support for environments with Windows hosts without applying Linux
    checks that will always fail
  • Support for defining nagios host groups from chef search queries
    stored in data bags
  • Install Nagios 3.4.1 not 3.2.3 when installing from source
  • Support for installing the server from source, while installing the
    client from package (if you just want a newer server than Ubuntu packages)
  • Fixed source installs of the server to actually work

Look forward to this alot of these are why I am stilled force to run my
pre-chef nagios solution as well to cover them.

What I still need to get pushed upstream

  • Support for installing NRPE via EPEL packages on RHEL distros
    instead of compiling from source
  • Ability to define blacklisted chef environments (preprod/dev) where
    host/service alerting is disabled
  • The Nagios user needs passwordless sudo out of the box to perform
    most NRPE checks

So if you have some time let me know your experiences

*Tim Smith*

Operations Engineer, SaaS Operations

M: +1 707.738.8132

TW: @tas50

webtrends http://www.webtrends.com/

Real-Time Relevance. Remarkable ROI.™

London | Portland | San Francisco | Melbourne | Tokyo


#3

Any chance you could share that clientless code? I was about to whip something up to handle that exact scenario. I have a lot of old systems that aren’t managed with Chef and probably never will be that I still need to monitor in some fashion.

On 13 August 2012 00:48, Tim Smith <Tim.Smith@webtrends.commailto:Tim.Smith@webtrends.com> wrote:
I’d love to get feedback from the community on the current state of the Nagios cookbook. Automated monitoring of systems is in my opinion probably the coolest feature of a configuration management systems, and a large number of people are using Nagios. The Nagios cookbook needed / needs some love though. I’ve been doing a lot of work on the cookbook lately since it was lacking functionality I needed and was outright broken in a lot of ways. I know there’s a lot left to do and hopefully you guys can provide some input on what’s necessary.

If you used the cookbook and it didn’t work for you what problems did you run into?
If you had to modify the cookbook with hard coded values (my biggest issue) for your environment what was lacking that forced you to do this?
In general what would you like to see that isn’t there yet?

The main problem I have with the nagios cookbook was monitoring systems not covered by chef without hard coding them into recipes or templates for things like firewalls, routers, switches, printers, etc. I ended up hacking the code to support a data_bag “clientless” which I could list items in there and the nagios checks that apply to them.

What I’ve added that hopefully should get merged in the next release cycle:

  • Defining Event handlers from data bags
  • Optional support for monitoring hosts in multiple environments
  • Support for environments with Windows hosts without applying Linux checks that will always fail
  • Support for defining nagios host groups from chef search queries stored in data bags
  • Install Nagios 3.4.1 not 3.2.3 when installing from source
  • Support for installing the server from source, while installing the client from package (if you just want a newer server than Ubuntu packages)
  • Fixed source installs of the server to actually work

Look forward to this alot of these are why I am stilled force to run my pre-chef nagios solution as well to cover them.

What I still need to get pushed upstream

  • Support for installing NRPE via EPEL packages on RHEL distros instead of compiling from source
  • Ability to define blacklisted chef environments (preprod/dev) where host/service alerting is disabled
  • The Nagios user needs passwordless sudo out of the box to perform most NRPE checks

So if you have some time let me know your experiences

Tim Smith

Operations Engineer, SaaS Operations

M: +1 707.738.8132tel:707.738.8132

TW: @tas50

webtrendshttp://www.webtrends.com/

Real-Time Relevance. Remarkable ROI.™

London | Portland | San Francisco | Melbourne | Tokyo


#4

On 13 August 2012 15:57, Tim Smith Tim.Smith@webtrends.com wrote:

Any chance you could share that clientless code? I was about to whip
something up to handle that exact scenario. I have a lot of old systems
that aren’t managed with Chef and probably never will be that I still need
to monitor in some fashion.

I think this is all the additions I made to cover it.

templates/default/hosts.cfg.erb
#############
#clientless nodes from databag
<% @clientless.each do |c| -%>
define host {
use server
address <%= c[‘ipaddress’] %>
host_name <%= c[‘id’] %>
hostgroups <%= c[‘hostgroup_name’] %>
}
<% end -%>
#############

templates/default/hostgroups.cfg.erb
#############
#from databag clientless in chef
<% hostgroup_unique_array = Array.new
@hostgroup.each do | h|
hostgroup_unique_array << h[‘hostgroup_name’]
hostgroup_unique_array = hostgroup_unique_array.uniq
end
hostgroup_unique_array.each do |q|
-%>
define hostgroup {
hostgroup_name <%= q %>
alias <%= q %>
}

<% end -%>
#############

recipes/server.rb
#############
hostgroup = search(:clientless, ‘hostgroup_name:*’)

if hostgroup.empty?
Chef::Log.info(“No hostgroups in databag for clientless nodes returned
from search, using this node so hostgroups.cfg has data”)
hostgroup = Array.new
hostgroup << hostgroup
end

if clientless.empty?
Chef::Log.info(“No clientless nodes returned from search, using this node
so hosts.cfg has data”)
clientless = Array.new
clientless << clientless
end
#############

Databag named clientless contain items with ipaddress, id, hostgroup_name
items limits can only be in one hostgroup
services are defined against that homegroup default to a basic host-alive
ping check might be good standard.

On 13 August 2012 00:48, Tim Smith Tim.Smith@webtrends.com wrote:

I’d love to get feedback from the community on the current state of the
Nagios cookbook. Automated monitoring of systems is in my opinion probably
the coolest feature of a configuration management systems, and a large
number of people are using Nagios. The Nagios cookbook needed / needs some
love though. I’ve been doing a lot of work on the cookbook lately since it
was lacking functionality I needed and was outright broken in a lot of
ways. I know there’s a lot left to do and hopefully you guys can provide
some input on what’s necessary.

If you used the cookbook and it didn’t work for you what problems did
you run into?
If you had to modify the cookbook with hard coded values (my biggest
issue) for your environment what was lacking that forced you to do this?
In general what would you like to see that isn’t there yet?

The main problem I have with the nagios cookbook was monitoring systems
not covered by chef without hard coding them into recipes or templates for
things like firewalls, routers, switches, printers, etc. I ended up hacking
the code to support a data_bag “clientless” which I could list items in
there and the nagios checks that apply to them.

What I’ve added that hopefully should get merged in the next release
cycle:

  • Defining Event handlers from data bags
  • Optional support for monitoring hosts in multiple environments
  • Support for environments with Windows hosts without applying Linux
    checks that will always fail
  • Support for defining nagios host groups from chef search queries
    stored in data bags
  • Install Nagios 3.4.1 not 3.2.3 when installing from source
  • Support for installing the server from source, while installing the
    client from package (if you just want a newer server than Ubuntu packages)
  • Fixed source installs of the server to actually work

Look forward to this alot of these are why I am stilled force to run my
pre-chef nagios solution as well to cover them.

What I still need to get pushed upstream

  • Support for installing NRPE via EPEL packages on RHEL distros
    instead of compiling from source
  • Ability to define blacklisted chef environments (preprod/dev) where
    host/service alerting is disabled
  • The Nagios user needs passwordless sudo out of the box to perform
    most NRPE checks

So if you have some time let me know your experiences

*Tim Smith*

Operations Engineer, SaaS Operations

M: +1 707.738.8132

TW: @tas50

webtrends http://www.webtrends.com/

Real-Time Relevance. Remarkable ROI.™

London | Portland | San Francisco | Melbourne | Tokyo


#5

Sure. I will work up a pull request, but if you want something sooner:

in server.rb:

Additional nodes to be monitored which aren’t managed by chef can be

added through nagios.additional_hosts attributes
add_nodes = Array.new
add_nodes = node[‘nagios’][‘additional_hosts’] if
node[‘nagios’].attribute?(‘additional_hosts’)

Iterate over the hosts defined in the environment and add them to the

hostgroups and nodes
if (add_nodes && !add_nodes.empty?)
add_nodes.each do |add_node|
service_hosts[add_node[‘hostgroups’]] = add_node[‘host_name’]
role_list << add_node[‘hostgroups’] unless role_list.include?
add_node[‘hostgroups’]
end
end

In the environments that I have additional servers/devices that I want

monitored I add an array of hashes that contain “address”, “host_name”, and
"hostgroups" values. It looks like this:

“additional_hosts”: [
{
“address”: “192.168.30.72”,
“host_name”: “acient”,
“hostgroups”: “pillar_app”
},
{
“address”: “192.168.30.50”,
“host_name”: “elderly”,
“hostgroups”: “foundation”
}
]

In the hosts.cfg.erb template I’ve added:

###################################################

Additional Servers

<% @add_nodes.each do |n| -%>
define host {
use server
<% n.each do |k,v| %>
<%= k + " " + v %>
<% end -%>
}
<% end -%>

I’m sure there is some other change that I’m forgetting, but hope that
helps.

Thanks,
Jake.

On Mon, Aug 13, 2012 at 10:57 AM, Tim Smith Tim.Smith@webtrends.com wrote:

Any chance you could share that clientless code? I was about to whip
something up to handle that exact scenario. I have a lot of old systems
that aren’t managed with Chef and probably never will be that I still need
to monitor in some fashion.

On 13 August 2012 00:48, Tim Smith Tim.Smith@webtrends.com wrote:

I’d love to get feedback from the community on the current state of the
Nagios cookbook. Automated monitoring of systems is in my opinion probably
the coolest feature of a configuration management systems, and a large
number of people are using Nagios. The Nagios cookbook needed / needs some
love though. I’ve been doing a lot of work on the cookbook lately since it
was lacking functionality I needed and was outright broken in a lot of
ways. I know there’s a lot left to do and hopefully you guys can provide
some input on what’s necessary.

If you used the cookbook and it didn’t work for you what problems did
you run into?
If you had to modify the cookbook with hard coded values (my biggest
issue) for your environment what was lacking that forced you to do this?
In general what would you like to see that isn’t there yet?

The main problem I have with the nagios cookbook was monitoring systems
not covered by chef without hard coding them into recipes or templates for
things like firewalls, routers, switches, printers, etc. I ended up hacking
the code to support a data_bag “clientless” which I could list items in
there and the nagios checks that apply to them.

What I’ve added that hopefully should get merged in the next release
cycle:

  • Defining Event handlers from data bags
  • Optional support for monitoring hosts in multiple environments
  • Support for environments with Windows hosts without applying Linux
    checks that will always fail
  • Support for defining nagios host groups from chef search queries
    stored in data bags
  • Install Nagios 3.4.1 not 3.2.3 when installing from source
  • Support for installing the server from source, while installing the
    client from package (if you just want a newer server than Ubuntu packages)
  • Fixed source installs of the server to actually work

Look forward to this alot of these are why I am stilled force to run my
pre-chef nagios solution as well to cover them.

What I still need to get pushed upstream

  • Support for installing NRPE via EPEL packages on RHEL distros
    instead of compiling from source
  • Ability to define blacklisted chef environments (preprod/dev) where
    host/service alerting is disabled
  • The Nagios user needs passwordless sudo out of the box to perform
    most NRPE checks

So if you have some time let me know your experiences

*Tim Smith*

Operations Engineer, SaaS Operations

M: +1 707.738.8132

TW: @tas50

webtrends http://www.webtrends.com/

Real-Time Relevance. Remarkable ROI.™

London | Portland | San Francisco | Melbourne | Tokyo


#6

On Sun, Aug 12, 2012 at 7:48 PM, Tim Smith Tim.Smith@webtrends.com wrote:

I’d love to get feedback from the community on the current state of the
Nagios cookbook.

http://tickets.opscode.com/browse/COOK-1209

Does anyone have an nrpe install where the daemon is named
’nagios-nrpe-server’?

Bryan


#7

The EPEL packages will install as NRPE. Perhaps the source install?

From: Bryan McLellan <btm@loftninjas.orgmailto:btm@loftninjas.org>
Reply-To: "chef@lists.opscode.commailto:chef@lists.opscode.com" <chef@lists.opscode.commailto:chef@lists.opscode.com>
Date: Monday, August 13, 2012 11:46 AM
To: "chef@lists.opscode.commailto:chef@lists.opscode.com" <chef@lists.opscode.commailto:chef@lists.opscode.com>
Subject: [chef] Re: Community input needed on the Nagios cookbook

On Sun, Aug 12, 2012 at 7:48 PM, Tim Smith <Tim.Smith@webtrends.commailto:Tim.Smith@webtrends.com> wrote:
I’d love to get feedback from the community on the current state of the Nagios cookbook.

http://tickets.opscode.com/browse/COOK-1209

Does anyone have an nrpe install where the daemon is named ‘nagios-nrpe-server’?

Bryan


#8

That’s the service name on our Ubuntu servers. It may or may not be the
Debian name as well, I haven’t checked.

On Mon, Aug 13, 2012 at 1:46 PM, Bryan McLellan btm@loftninjas.org wrote:

On Sun, Aug 12, 2012 at 7:48 PM, Tim Smith Tim.Smith@webtrends.comwrote:

I’d love to get feedback from the community on the current state of the
Nagios cookbook.

http://tickets.opscode.com/browse/COOK-1209

Does anyone have an nrpe install where the daemon is named
’nagios-nrpe-server’?

Bryan


#9

Yup, is the same in Debian (nagios-nrpe-server)

Jorge Espada

On Mon, Aug 13, 2012 at 6:39 PM, Michael Cumings
mcumings@narrativescience.com wrote:

That’s the service name on our Ubuntu servers. It may or may not be the
Debian name as well, I haven’t checked.

On Mon, Aug 13, 2012 at 1:46 PM, Bryan McLellan btm@loftninjas.org wrote:

On Sun, Aug 12, 2012 at 7:48 PM, Tim Smith Tim.Smith@webtrends.com
wrote:

I’d love to get feedback from the community on the current state of the
Nagios cookbook.

http://tickets.opscode.com/browse/COOK-1209

Does anyone have an nrpe install where the daemon is named
’nagios-nrpe-server’?

Bryan