Troubleshooting report/error handlers not appearing to send reports?


#1

I’ve enabled the gelf report handler per the instructions in the
chef_handler cookbook and the chef-gelf cookbook.

I’ve got an odd situation. I’m putting in our syslog server
explicitly, so a host is set. The run starts like this:

ubuntu@domU-12-31-39-00-F1-86:~$ sudo chef-client --once
[Thu, 26 Jan 2012 23:02:46 +0000] INFO: *** Chef 0.10.6 ***[Thu, 26
Jan 2012 23:02:49 +0000] INFO: Run List is [recipe[tuning],
role[knewton_base_u1110], recipe[grover], role[Grover_cluster],
role[Grover_service]
][Thu, 26 Jan 2012 23:02:49 +0000] INFO: Run List expands to [tuning,
chef_handler::gelf, metachef, ubuntu, apt, ntp, knewton_aaa,
rsyslog::client, knewton_package, knewton_package::python,
knewton_package::emacs, knewton_package::isaak_environment,
build-essential, ganglia::knewton_ganglia_monitor, grover
]
[Thu, 26 Jan 2012 23:02:49 +0000] INFO: Starting Chef Run for Grover-service-0
[Thu, 26 Jan 2012 23:02:49 +0000] INFO: Running start handlers
[Thu, 26 Jan 2012 23:02:49 +0000] INFO: Start handlers complete.[Thu,
26 Jan 2012 23:02:50 +0000] INFO: Loading cookbooks [apt,
build-essential, chef_handler, client_authenticators, cron, ganglia,
grover, java, knewto
n_aaa, knewton_defines, knewton_lwrp, knewton_package, metachef, ntp,
rsyslog, runit, tuning, ubuntu]
[Thu, 26 Jan 2012 23:02:51 +0000] INFO: Storing updated
cookbooks/chef_handler/resources/default.rb in the cache.
[Thu, 26 Jan 2012 23:02:51 +0000] INFO: Storing updated
cookbooks/chef_handler/providers/default.rb in the cache.
[Thu, 26 Jan 2012 23:02:51 +0000] INFO: Storing updated
cookbooks/chef_handler/recipes/default.rb in the cache.
[Thu, 26 Jan 2012 23:02:51 +0000] INFO: Storing updated
cookbooks/chef_handler/recipes/gelf.rb in the cache.
[Thu, 26 Jan 2012 23:02:51 +0000] INFO: Storing updated
cookbooks/chef_handler/recipes/json_file.rb in the cache.
[Thu, 26 Jan 2012 23:02:51 +0000] INFO: Storing updated
cookbooks/chef_handler/attributes/default.rb in the cache.
[Thu, 26 Jan 2012 23:02:51 +0000] INFO: Storing updated
cookbooks/chef_handler/metadata.rb in the cache.
[Thu, 26 Jan 2012 23:02:51 +0000] INFO: Storing updated
cookbooks/chef_handler/README.md in the cache.
“log_server is”
ec2-50-17-46-130.compute-1.amazonaws.com

It runs, succeeds, and ends like this:

[Thu, 26 Jan 2012 23:03:01 +0000] INFO: Processing service[grover]
action nothing (grover::default line 142)
[Thu, 26 Jan 2012 23:03:02 +0000] INFO: Chef Run complete in
12.951394309 seconds
[Thu, 26 Jan 2012 23:03:02 +0000] INFO: Running report handlers
[Thu, 26 Jan 2012 23:03:02 +0000] INFO: Report handlers complete

and if I run this on the graylog2 host, the gelf message makes it through.

However, if I run it on any other host with the recipe hard-coded to
point to the same host in the same region, I get nada. I have run
tcpdump and the curious part is that no udp traffic seems to be sent
on port 12201 when I try to report to the remote host. I have tested
that I can send messages using alternate gelf libraries (e.g. the
PyLog module can send messages to my graylog2 server via the gelf
port) and the messages appear in the web ui so it doesn’t appear to be
AWS/EC2 permissions at play.

Can anyone shed some light on this?

Thanks,

-Peter


#2

On Thu, Jan 26, 2012 at 10:50 PM, Peter Norton pn+chef-list@knewton.com wrote:

I’ve enabled the gelf report handler per the instructions in the
chef_handler cookbook and the chef-gelf cookbook.

I’ve got an odd situation. I’m putting in our syslog server
explicitly, so a host is set. The run starts like this:

If you're absolutely sure that there are no firewall rules on AWS blocking traffic (remember GELF is UDP), you can try an alternate debug method on another host (or on each instance if it matters)

There’s a gem I wrote (originally for logstash) called gelfd
(https://github.com/lusis/gelfd)

It comes with a bin script that starts a basic UDP server to catch and
decode all gelf messages. It’s easier to move around than your
logstash server and easier than tcpdump machinations (imho).

Again, I’d quadruple check your EC2 rules. Make sure that split
horizon DNS is working properly on ec2 but you can use gelfd to help
debug.

Thanks,

-Peter


#3

Thank, I’ll try that. I’ve confirmed that I can use nc to send udp
between source and destination (AWS only implements inbound filters).
As I said, the weirdest part was that I didn’t see any port 12201
(udp) traffic leaving the system after the chef runs. I guess for
that part I’ll start adding prints to the installed chef-gelf gem.
I’ll use your server, too.

Thanks again,

-Peter

On Fri, Jan 27, 2012 at 12:36 AM, John E. Vincent (lusis)
lusis.org+chef-list@gmail.com wrote:

On Thu, Jan 26, 2012 at 10:50 PM, Peter Norton pn+chef-list@knewton.com wrote:

I’ve enabled the gelf report handler per the instructions in the
chef_handler cookbook and the chef-gelf cookbook.

I’ve got an odd situation. I’m putting in our syslog server
explicitly, so a host is set. The run starts like this:

If you're absolutely sure that there are no firewall rules on AWS blocking traffic (remember GELF is UDP), you can try an alternate debug method on another host (or on each instance if it matters)

There’s a gem I wrote (originally for logstash) called gelfd
(https://github.com/lusis/gelfd)

It comes with a bin script that starts a basic UDP server to catch and
decode all gelf messages. It’s easier to move around than your
logstash server and easier than tcpdump machinations (imho).

Again, I’d quadruple check your EC2 rules. Make sure that split
horizon DNS is working properly on ec2 but you can use gelfd to help
debug.

Thanks,

-Peter


#4

I have confirmed that if I use graypy I can send messages from the
client to the graylog2 server, so I’ve eliminated the network as being
the problem.

I’m now down to the question of why the same role/config that will
send a gelf message locally will not send it out to a server (still
confirmed with tcpdump not seeing an outbound datagram).

Any help is apreciated. I’ll be looking at this a bit more today.

-Peter

On Fri, Jan 27, 2012 at 12:44 AM, Peter Norton pn+chef-list@knewton.com wrote:

Thank, I’ll try that. I’ve confirmed that I can use nc to send udp
between source and destination (AWS only implements inbound filters).
As I said, the weirdest part was that I didn’t see any port 12201
(udp) traffic leaving the system after the chef runs. I guess for
that part I’ll start adding prints to the installed chef-gelf gem.
I’ll use your server, too.

Thanks again,

-Peter

On Fri, Jan 27, 2012 at 12:36 AM, John E. Vincent (lusis)
lusis.org+chef-list@gmail.com wrote:

On Thu, Jan 26, 2012 at 10:50 PM, Peter Norton pn+chef-list@knewton.com wrote:

I’ve enabled the gelf report handler per the instructions in the
chef_handler cookbook and the chef-gelf cookbook.

I’ve got an odd situation. I’m putting in our syslog server
explicitly, so a host is set. The run starts like this:

If you're absolutely sure that there are no firewall rules on AWS blocking traffic (remember GELF is UDP), you can try an alternate debug method on another host (or on each instance if it matters)

There’s a gem I wrote (originally for logstash) called gelfd
(https://github.com/lusis/gelfd)

It comes with a bin script that starts a basic UDP server to catch and
decode all gelf messages. It’s easier to move around than your
logstash server and easier than tcpdump machinations (imho).

Again, I’d quadruple check your EC2 rules. Make sure that split
horizon DNS is working properly on ec2 but you can use gelfd to help
debug.

Thanks,

-Peter