Check for changes in a data bag


#1

I have a recipe which looks in a data bag item to get a list of yum
groups and individual rpms to install at a nodes first boot, like so:

recipes/default.rb

Install packages assigned to this node/role.

manifest = data_bag_item(“manifests”, node[:yum][:manifest])

Build a yum compatible list of groups.

grplist = ""
manifest[“groups”].each do |g|
grplist = grplist + " “#{g}”"
end

Install Groups

execute “groupinstall” do
command "yum -y groupinstall #{grplist} > /tmp/chef-yum.log 2>&1 &&
touch /tmp/chef-yum-groupinstall.done"
creates "/tmp/chef-yum-groupinstall.done"
action :run
end

Build a yum compatible list of packages.

pkglist = ""
manifest[“packages”].each do |p|
pkglist = pkglist + " “#{p}”"
end

Install Packages.

execute “packageinstall” do
command "yum -y -t install #{pkglist} >> /tmp/chef-yum.log 2>&1 &&
touch /tmp/chef-yum-packageinstall.done"
creates "/tmp/chef-yum-packageinstall.done"
action :run
end

Clean up everything and make sure everything is up to date.

execute “yum-clean-all” do
command "yum clean all && yum -y update >> /tmp/chef-yum.log 2>&1"
action :run
end

This works great, I get a yum run at a nodes first boot with all
packages the node needs from the specified data bag. What I want now
is to have each subsequent chef run look at the data bag and see if
there have been any changes since the last run and if so, rerun yum to
pick up those changes. Is there an easy way to do this using the data
bag’s revision number perhaps or do I need to somehow checksum the
data bag contents and store the checksum? Should I just let yum run
each time? Rerunning this tends to add a couple of minutes or more to
the length of the chef-client run but is otherwise harmless.

jbh


#2

Hi John,

On 7/7/10 7:58 AM, John Hanks wrote:

I have a recipe which looks in a data bag item to get a list of yum
groups and individual rpms to install at a nodes first boot, like so:

[snip]

This works great, I get a yum run at a nodes first boot with all
packages the node needs from the specified data bag. What I want now
is to have each subsequent chef run look at the data bag and see if
there have been any changes since the last run and if so, rerun yum to
pick up those changes. Is there an easy way to do this using the data
bag’s revision number perhaps or do I need to somehow checksum the
data bag contents and store the checksum? Should I just let yum run
each time? Rerunning this tends to add a couple of minutes or more to
the length of the chef-client run but is otherwise harmless.

A couple of thoughts…

  1. There is a package resource that will use an appropriate package
    provider based on the OS (yum is the default for Redhat and CentOS). If
    you haven’t tried it, you might try using that instead of executing yum
    directly. One advantage is to make your recipes easier to port to other
    Linux flavors.

  2. A common pattern is to simply rerun and let the resources figure out
    if work needs to be done (in this case package installs). And the
    package resource should do a reasonable job of only doing the
    installation work if the requested package is not already installed.

Finally, you could modify your recipe to use some not_if checks to skip
the execution if the packages are already installed. Along these lines,
you could include a version number or timestamp in your databag and
write it out on your nodes as part of the recipe. Then you can check
whether the timestamp matches during a chef run. I think you already
had that idea. I’m not aware of easier ways…

  • seth

#3

On Thu, Jul 8, 2010 at 10:31 AM, Seth Falcon seth@userprimary.net wrote:

  1. There is a package resource that will use an appropriate package
    provider based on the OS (yum is the default for Redhat and CentOS). If
    you haven’t tried it, you might try using that instead of executing yum
    directly. One advantage is to make your recipes easier to port to other
    Linux flavors.

This would be ideal, except that the package resource only supports
rpms, not ‘yum groupinstall’. I get away with a much shorter list of
individual rpms to maintain by using groups. Ideally I’d add
groupinstall support to the yum provider, but I’m not that chef-smart
yet.

  1. A common pattern is to simply rerun and let the resources figure out
    if work needs to be done (in this case package installs). And the
    package resource should do a reasonable job of only doing the
    installation work if the requested package is not already installed.

I think this is common most likely because it makes sense. For now I’m
going to just let yum rerun for each chef-client run. If I decide to
try to make it fancy then I’ll tackle adding groupinstall support to
the yum package resource.

jbh