LWRPs always reporting as updated


#1

Hey all,

In the process of migrating to continuous run of the chef-client on a
few servers, we started shipping resource change counts over to
graphite. The idea was that we would get an alert if something changed
(temporary solution) as well as tracking drift over time.

What I discovered is that a few LWRPs are causing runs to always show
updated resources even though they “technically” didn’t change
anything. Before I work on pull requests (because this most def is
unexpected behavior to me), I’d like to get some ideas on possible
fixes.

  1. nagios_nrpecheck resource
    My best guess here is that the move from a template to a file resource
    with content attribute is the cause here:

I haven’t dug too deeply but I’m guessing that the file resource’s
handling of contents is the issue here. My thought is that, for cases
where content is being written to the file directly by the resource,
would it not make sense to build the file in the cache path first,
checksum against destination (if it exists) and then move the file in
place at the end (as opposed to writing contents in place? It appears
that in-place is what’s happening now via my first glance at the code.

  1. chef_handler resourse
    This one is new territory to me having not used them before now. I
    read over the comments here:

https://github.com/opscode-cookbooks/chef_handler/blob/master/providers/default.rb#L35-43

Is a guard of some kind enough here to short-circuit
updated_by_last_action? Again, I’m willing to dig in but I never run
in daemon mode. I’m not sure of the state of the exception_ and
report_ handlers at that point. The explict delete_if + push leads me
to believe that it HAS to be run everytime right now.

What’s the source of the issue here and maybe I can tackle it? Is
there any way of knowing if chef-client is running daemonized or not?

Let me know and thoughts you all have.

Thanks!
John E. Vincent (@lusis)


#2

On Saturday, June 9, 2012 at 1:08 PM, John E. Vincent (lusis) wrote:

Hey all,

In the process of migrating to continuous run of the chef-client on a
few servers, we started shipping resource change counts over to
graphite. The idea was that we would get an alert if something changed
(temporary solution) as well as tracking drift over time.

What I discovered is that a few LWRPs are causing runs to always show
updated resources even though they “technically” didn’t change
anything. Before I work on pull requests (because this most def is
unexpected behavior to me), I’d like to get some ideas on possible
fixes.

  1. nagios_nrpecheck resource
    My best guess here is that the move from a template to a file resource
    with content attribute is the cause here:

https://github.com/opscode-cookbooks/nagios/commit/e9b117f179c30472e181032d4f8ac7f93cf3c404

I haven’t dug too deeply but I’m guessing that the file resource’s
handling of contents is the issue here. My thought is that, for cases
where content is being written to the file directly by the resource,
would it not make sense to build the file in the cache path first,
checksum against destination (if it exists) and then move the file in
place at the end (as opposed to writing contents in place? It appears
that in-place is what’s happening now via my first glance at the code.

The file resource checksums the in-memory content field and compares to the on-disk version, so it’s idempotent. This behavior is different than cookbook_file and template, which has some interesting implications for SE Linux and Windows, but shouldn’t affect notifications.

There was a bug with all file-based resources sometimes not triggering notifications in Chef 0.10.10, but that’s fixed in the 10.12 RC. And if you find that using the file resource as designed is causing it to trigger notifications incorrectly, that would of course be a bug, too.

  1. chef_handler resourse
    This one is new territory to me having not used them before now. I
    read over the comments here:

https://github.com/opscode-cookbooks/chef_handler/blob/master/providers/default.rb#L35-43

Is a guard of some kind enough here to short-circuit
updated_by_last_action? Again, I’m willing to dig in but I never run
in daemon mode. I’m not sure of the state of the exception_ and
report_ handlers at that point. The explict delete_if + push leads me
to believe that it HAS to be run everytime right now.

What that code is doing is implementing code reloading similar to what Rails’ dev mode would do. I haven’t used the cookbook personally, but you could imagine that this action could get triggered by a notification from a cookbook_file, in which case you’d only reload when the code changes. I’m not sure what that implies as far as using the handler cookbook.

What’s the source of the issue here and maybe I can tackle it? Is
there any way of knowing if chef-client is running daemonized or not?

Let me know and thoughts you all have.

Thanks!
John E. Vincent (@lusis)


Dan DeLeo