Apt-get update strategy

A recent failure in the apache cassandra apt repo got me thinking
about the way I’ve assembled my cookbooks. I have the apt cookbook
first in my run and do an update. The repo being down gave me the
classic “apt-get update returned 100” error within chef.

The failure meant that none of my recipes ran against the node (and
updated our application war file), despite the fact that the node had
previously been converged and all the required apt packages were
already installed.

My chef run needs to be resilient to those kinds of failures as once a
node is initially converged and all apt packages installed, apt
doesn’t need to do an update (I don’t do package :upgrade at the
moment).

So what I’d ideally like is to be able to trigger an apt-get update on
the first package which requires installing. If no packages require
installing, no apt-get update is performed. The fact an update has
been performed needs to be recorded as we don’t want to do it for
every package that’s installed as it will kill performance. Once is
enough per chef run unless we add/remove a sources.list.d entry (which
I already handle using :notifies).

Opinions welcome, am I looking and an LWRP?

Maybe a feature could be added to the package resource/providers to
handle updating the package caches once per run?

On Fri, Mar 25, 2011 at 5:45 AM, Luke Biddell luke.biddell@gmail.com wrote:

A recent failure in the apache cassandra apt repo got me thinking
about the way I've assembled my cookbooks. I have the apt cookbook
first in my run and do an update. The repo being down gave me the
classic "apt-get update returned 100" error within chef.

The failure meant that none of my recipes ran against the node (and
updated our application war file), despite the fact that the node had
previously been converged and all the required apt packages were
already installed.

My chef run needs to be resilient to those kinds of failures as once a
node is initially converged and all apt packages installed, apt
doesn't need to do an update (I don't do package :upgrade at the
moment).

So what I'd ideally like is to be able to trigger an apt-get update on
the first package which requires installing. If no packages require
installing, no apt-get update is performed. The fact an update has
been performed needs to be recorded as we don't want to do it for
every package that's installed as it will kill performance. Once is
enough per chef run unless we add/remove a sources.list.d entry (which
I already handle using :notifies).

Opinions welcome, am I looking and an LWRP?

Michael, my current workaround is to effectively run "apt-get update" as a
cron job once per day and also upon a host (re)boot. Our longer-term
workaround idea is to mirror the apt repos locally which would solve this
and other apt problems (e.g., updates or dropped support of package versions
we depend upon).

These workarounds are not ideal and may not be appropriate for everyone.

  • Rob

On Fri, Mar 25, 2011 at 9:18 AM, Michael Hale mikehale@gmail.com wrote:

Maybe a feature could be added to the package resource/providers to
handle updating the package caches once per run?

On Fri, Mar 25, 2011 at 5:45 AM, Luke Biddell luke.biddell@gmail.com
wrote:

A recent failure in the apache cassandra apt repo got me thinking
about the way I've assembled my cookbooks. I have the apt cookbook
first in my run and do an update. The repo being down gave me the
classic "apt-get update returned 100" error within chef.

The failure meant that none of my recipes ran against the node (and
updated our application war file), despite the fact that the node had
previously been converged and all the required apt packages were
already installed.

My chef run needs to be resilient to those kinds of failures as once a
node is initially converged and all apt packages installed, apt
doesn't need to do an update (I don't do package :upgrade at the
moment).

So what I'd ideally like is to be able to trigger an apt-get update on
the first package which requires installing. If no packages require
installing, no apt-get update is performed. The fact an update has
been performed needs to be recorded as we don't want to do it for
every package that's installed as it will kill performance. Once is
enough per chef run unless we add/remove a sources.list.d entry (which
I already handle using :notifies).

Opinions welcome, am I looking and an LWRP?

While this doesn't solve the original problem exactly, you may want to
take a look at the apt cookbook and the recent apt-cacher updates.
It's super simple to set up, just have 1 server apply the apt::cacher
and everyone (including the server) use apt::cacher-client, no
tweaking needed. This will start proxying and caching your apt
downloads and minimize your exposure to remote repos being
unavailable, especially if you modify the expiration times on the
server.

Thanks,
Matt Ray
Technical Evangelist | Opscode, Inc
E: matt@opscode.com T: (512) 731-2218
Twitter, Github: mattray

On Fri, Mar 25, 2011 at 8:35 AM, Rob Guttman robguttman@gmail.com wrote:

Michael, my current workaround is to effectively run "apt-get update" as a
cron job once per day and also upon a host (re)boot. Our longer-term
workaround idea is to mirror the apt repos locally which would solve this
and other apt problems (e.g., updates or dropped support of package versions
we depend upon).

These workarounds are not ideal and may not be appropriate for everyone.

  • Rob

On Fri, Mar 25, 2011 at 9:18 AM, Michael Hale mikehale@gmail.com wrote:

Maybe a feature could be added to the package resource/providers to
handle updating the package caches once per run?

On Fri, Mar 25, 2011 at 5:45 AM, Luke Biddell luke.biddell@gmail.com
wrote:

A recent failure in the apache cassandra apt repo got me thinking
about the way I've assembled my cookbooks. I have the apt cookbook
first in my run and do an update. The repo being down gave me the
classic "apt-get update returned 100" error within chef.

The failure meant that none of my recipes ran against the node (and
updated our application war file), despite the fact that the node had
previously been converged and all the required apt packages were
already installed.

My chef run needs to be resilient to those kinds of failures as once a
node is initially converged and all apt packages installed, apt
doesn't need to do an update (I don't do package :upgrade at the
moment).

So what I'd ideally like is to be able to trigger an apt-get update on
the first package which requires installing. If no packages require
installing, no apt-get update is performed. The fact an update has
been performed needs to be recorded as we don't want to do it for
every package that's installed as it will kill performance. Once is
enough per chef run unless we add/remove a sources.list.d entry (which
I already handle using :notifies).

Opinions welcome, am I looking and an LWRP?

On Fri, Mar 25, 2011 at 2:45 AM, Luke Biddell luke.biddell@gmail.com wrote:

So what I'd ideally like is to be able to trigger an apt-get update on
the first package which requires installing. If no packages require
installing, no apt-get update is performed. The fact an update has
been performed needs to be recorded as we don't want to do it for
every package that's installed as it will kill performance. Once is
enough per chef run unless we add/remove a sources.list.d entry (which
I already handle using :notifies).

There are a number of strategies. Here's another I used to do.

Only trigger an apt-get update when a repo or key is added, otherwise
rely on Ubuntu to run a daily apt-get update but run it ourselves if
we need to. Note that I was silently rescuing failures as well.

Run apt-get update to create the stamp file

execute "apt-get-update" do
ignore_failure true
epic_fail true
command "apt-get update"
not_if do File.exists?('/var/lib/apt/periodic/update-success-stamp') end
end

provides /var/lib/apt/periodic/update-success-stamp on apt-get update

package "update-notifier-common" do
ignore_failure true
notifies :run, resources(:execute => "apt-get-update"), :immediately
end

execute "apt-get-update-periodic" do
ignore_failure true
epic_fail true
command "apt-get update"
only_if do
File.exists?('/var/lib/apt/periodic/update-success-stamp') &&
File.mtime('/var/lib/apt/periodic/update-success-stamp') < Time.now - 86400
end
end

Thanks for all the suggestions chaps. I'll certainly check them all out.

I had a scribble on Friday (that's what Friday's are for?) and hacked
up my own provider.

I've added it to our existing apt cookbook. It's intended as a drop-in
replacement for package. If you do an install it only does an update
if the package isn't installed. Ie, the first package to be installed
will trigger the only update for that run. If you do a package upgrade
it makes sure an upgrade is done once only. If all packages are
converged, no update is done.

The flag to indicate if an update has been done is stored on the node
and has to be reset at the start of each run. Is there a better way of
setting transient attributes for the run?

I've posted the code here in case it's any use to anyone else. I'm
going to commit it in dev here and see what comes out in the wash. I'm
sure I've broken/abused something.


  • apt/resources/pkg.rb

actions :install, :upgrade, :remove, :purge
attribute :name, :kind_of => String


  • apt/providers/pkg.rb

action :install do
if(!system("dpkg-query -W -f='${Status}' #{@new_resource.name} > /dev/null"))
process_package :install
end
end

action :upgrade do
process_package :upgrade
end

action :remove do
package @new_resource.name do
action :remove
end
end

action :purge do
package @new_resource.name do
action :purge
end
end

private

def process_package (mode)
Chef::Log.debug("#{mode} of package #{@new_resource.name} requested")
if(!node[:apt_update_performed_this_chef_run])
Chef::Log.debug("apt-get update required for this run, performing")
execute "apt-get update"
node[:apt_update_performed_this_chef_run] = true
else
Chef::Log.debug("apt-get update already performed for this run")
end
package @new_resource.name do
action mode
end
@new_resource.updated_by_last_action(true)
end

On 27 March 2011 14:46, Bryan McLellan btm@loftninjas.org wrote:

On Fri, Mar 25, 2011 at 2:45 AM, Luke Biddell luke.biddell@gmail.com wrote:

So what I'd ideally like is to be able to trigger an apt-get update on
the first package which requires installing. If no packages require
installing, no apt-get update is performed. The fact an update has
been performed needs to be recorded as we don't want to do it for
every package that's installed as it will kill performance. Once is
enough per chef run unless we add/remove a sources.list.d entry (which
I already handle using :notifies).

There are a number of strategies. Here's another I used to do.

Only trigger an apt-get update when a repo or key is added, otherwise
rely on Ubuntu to run a daily apt-get update but run it ourselves if
we need to. Note that I was silently rescuing failures as well.

Run apt-get update to create the stamp file

execute "apt-get-update" do
ignore_failure true
epic_fail true
command "apt-get update"
not_if do File.exists?('/var/lib/apt/periodic/update-success-stamp') end
end

provides /var/lib/apt/periodic/update-success-stamp on apt-get update

package "update-notifier-common" do
ignore_failure true
notifies :run, resources(:execute => "apt-get-update"), :immediately
end

execute "apt-get-update-periodic" do
ignore_failure true
epic_fail true
command "apt-get update"
only_if do
File.exists?('/var/lib/apt/periodic/update-success-stamp') &&
File.mtime('/var/lib/apt/periodic/update-success-stamp') < Time.now - 86400
end
end

On Mar 28, 2011, at 5:54 AM, Luke Biddell luke.biddell@gmail.com wrote:

The flag to indicate if an update has been done is stored on the node
and has to be reset at the start of each run. Is there a better way of
setting transient attributes for the run?

Take a look at node.run_state for transient storage. I stow my searches there for reuse across a few recipes.

-- Mason Turner (mobile)

Perfect, just what I was looking for. Thanks.

On 28 March 2011 12:17, Mason Turner opsmason@gmail.com wrote:

On Mar 28, 2011, at 5:54 AM, Luke Biddell luke.biddell@gmail.com wrote:

The flag to indicate if an update has been done is stored on the node
and has to be reset at the start of each run. Is there a better way of
setting transient attributes for the run?

Take a look at node.run_state for transient storage. I stow my searches
there for reuse across a few recipes.

-- Mason Turner (mobile)

Luke,

I'm thinking you would want someway to apt-get update again if you add
or remove a repository regardless of whether or not you have already
updated in a given chef run.

On Mon, Mar 28, 2011 at 8:43 AM, Luke Biddell luke.biddell@gmail.com wrote:

Perfect, just what I was looking for. Thanks.

On 28 March 2011 12:17, Mason Turner opsmason@gmail.com wrote:

On Mar 28, 2011, at 5:54 AM, Luke Biddell luke.biddell@gmail.com wrote:

The flag to indicate if an update has been done is stored on the node
and has to be reset at the start of each run. Is there a better way of
setting transient attributes for the run?

Take a look at node.run_state for transient storage. I stow my searches
there for reuse across a few recipes.

-- Mason Turner (mobile)

Maybe you could add something like this to your provider:

update_success_stamp_mtime =
File.mtime("/var/lib/apt/periodic/update-success-stamp")
Dir["/etc/apt/**/*.list"].any?{|list_file| File.mtime(list_file) >
update_success_stamp_mtime }

On Mon, Mar 28, 2011 at 9:28 AM, mikehale mikehale@gmail.com wrote:

Luke,

I'm thinking you would want someway to apt-get update again if you add or remove a repository regardless of whether or not you have already updated in a given chef run.

On Mon, Mar 28, 2011 at 8:43 AM, Luke Biddell luke.biddell@gmail.com wrote:

Perfect, just what I was looking for. Thanks.

On 28 March 2011 12:17, Mason Turner opsmason@gmail.com wrote:

On Mar 28, 2011, at 5:54 AM, Luke Biddell luke.biddell@gmail.com wrote:

The flag to indicate if an update has been done is stored on the node
and has to be reset at the start of each run. Is there a better way of
setting transient attributes for the run?

Take a look at node.run_state for transient storage. I stow my searches
there for reuse across a few recipes.

-- Mason Turner (mobile)

Absolutely, the apt-get update execute resource is not guarded in any
way. Whenever we modify the content of /etc/apt/sources.list.d/ we use
a notifies to trigger an apt-get update.

We only guard on package :install (within my custom LWRP) so on a
fully converged node we don't apt-get update at all.

On 28 March 2011 14:29, Michael Hale mikehale@gmail.com wrote:

Luke,

I'm thinking you would want someway to apt-get update again if you add
or remove a repository regardless of whether or not you have already
updated in a given chef run.

On Mon, Mar 28, 2011 at 8:43 AM, Luke Biddell luke.biddell@gmail.com wrote:

Perfect, just what I was looking for. Thanks.

On 28 March 2011 12:17, Mason Turner opsmason@gmail.com wrote:

On Mar 28, 2011, at 5:54 AM, Luke Biddell luke.biddell@gmail.com wrote:

The flag to indicate if an update has been done is stored on the node
and has to be reset at the start of each run. Is there a better way of
setting transient attributes for the run?

Take a look at node.run_state for transient storage. I stow my searches
there for reuse across a few recipes.

-- Mason Turner (mobile)