Strategies to prevent chef-client bomb out due to yum error?

hi. this keeps happening to me lately. in various places in my cookbooks,
i’ll yum install an rpm. sometimes the process bombs out because it gets
a 503 response from http://mirrors.fedoraproject.org/. contacting that
server only happens for the epel repo.

is this happening to anyone else? how do you solve it?

one solution might be to mirror that repo locally, but i really don’t
want to.

maybe another solution would be to disable this line in epel.repo:

mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=epel-debug-5&arch=$basearch

and enable this line?

baseurl=http://download.fedoraproject.org/pub/epel/5/$basearch/debug

is it a good idea for the package or yum_package provider to help with
this? then again, maybe not because the point of cfg mgmt here is for
a node to be in a known and intended state. and if it can’t get there,
then it’s a “bad” node, and the chef-client run should fail…

and here’s how it happens:

package “perl-YAML” do
action :install
end

[Wed, 21 Mar 2012 21:19:57 +0000] WARN: Problem parsing line ‘Could not retrieve mirrorlist http://mirrors.fedoraproject.org/mirrorlist?repo=epel-5&arch=x86_64 error was’ from yum-dump.py! Please check your yum configuration.
[Wed, 21 Mar 2012 21:19:57 +0000] WARN: Problem parsing line ‘[Errno 14] HTTP Error 503: SERVICE UNAVAILABLE’ from yum-dump.py! Please check your yum configuration.
[Wed, 21 Mar 2012 21:19:57 +0000] DEBUG: Re-raising exception: Chef::Exceptions::Package - package[perl-YAML] (toolbin::default line 12) had an error: Yum
failed - #<Process::Status: pid=1992,exited(1)> - returns: yum-dump Repository Error: Cannot find a valid baseurl for repo: epel

thanks!
kallen

On Mar 21, 2012, at 9:35 PM, kallen@groknaut.net wrote:

hi. this keeps happening to me lately. in various places in my cookbooks,
i'll yum install an rpm. sometimes the process bombs out because it gets
a 503 response from http://mirrors.fedoraproject.org/. contacting that
server only happens for the epel repo.

is this happening to anyone else? how do you solve it?

Yes, this happened to us all the time. I ultimately decided to ship the entire epel-release-5-4.noarch.rpm file via cookbook_file so that I could push it directly to my nodes and they could themselves keep track of which mirrors are working well/fast, and wouldn't constantly trip themselves up by trying to hit the whacked-up mirror rotator.

I believe I heard jtimberman also making some noises about this problem on Chef Infra (archive).

one solution might be to mirror that repo locally, but i really don't
want to.

We tried to set up a mirror for that repo locally, but because of the way the EPEL mirroring system works, that's actually kind of a pain. Either you hard-code everything to your local URL that is the EPEL mirror and you live with the problems of what happens when your local mirror is down, or you officially register yourself as a mirror of EPEL so that you can get into the mirrormanager system and therefore catch all the "local" IP addresses that are looking for a mirror and get them redirected to you. But if you officially register yourself as a mirror of EPEL, then you've got a huge mass of additional traffic that you might have to deal with.

Blech and double blech.

maybe another solution would be to disable this line in epel.repo:

mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=epel-debug-5&arch=$basearch

and enable this line?

baseurl=http://download.fedoraproject.org/pub/epel/5/$basearch/debug

Here's our solution:

cookbook_file "#{Chef::Config[:file_cache_path]}/epel-release-#{epel}.noarch.rpm" do
Chef::Log.info("Providing epel-release-#{epel}.noarch.rpm via cookbook_file")
source "epel-release-#{epel}.noarch.rpm"
not_if "rpm -qa | grep -qx 'epel-release-#{epel}'"
end

rpm_package "epel-release" do
source "#{Chef::Config[:file_cache_path]}/epel-release-#{epel}.noarch.rpm"
end

is it a good idea for the package or yum_package provider to help with
this? then again, maybe not because the point of cfg mgmt here is for
a node to be in a known and intended state. and if it can't get there,
then it's a "bad" node, and the chef-client run should fail.

Last I saw, there was still a bug in the yum provider that could cause it to fail an entire chef run because yum was returning a non-zero error code, even in cases where the output from yum indicates that this might be just a temporary error and you should retry at a later time. I know some fixes have been applied to the yum provider on this issue, but I don't think the bug has been fully squashed.

Meanwhile, code your CentOS systems knowing that yum does crazy whacky stuff when it hits mirror sites that are run in crazy whacky ways, and that this combination frequently occurs when you mix yum and EPEL.

--
Brad Knowles bknowles@ihiji.com
SAGE Level IV, Chef Level 0.0.1

AWESOME! thanks so much for the response.

kallen

On Thu, 22 Mar 2012, Brad Knowles wrote:

On Mar 21, 2012, at 9:35 PM, kallen@groknaut.net wrote:

hi. this keeps happening to me lately. in various places in my cookbooks,
i'll yum install an rpm. sometimes the process bombs out because it gets
a 503 response from http://mirrors.fedoraproject.org/. contacting that
server only happens for the epel repo.

is this happening to anyone else? how do you solve it?

Yes, this happened to us all the time. I ultimately decided to ship the entire epel-release-5-4.noarch.rpm file via cookbook_file so that I could push it directly to my nodes and they could themselves keep track of which mirrors are working well/fast, and wouldn't constantly trip themselves up by trying to hit the whacked-up mirror rotator.

I believe I heard jtimberman also making some noises about this problem on Chef Infra (archive).

one solution might be to mirror that repo locally, but i really don't
want to.

We tried to set up a mirror for that repo locally, but because of the way the EPEL mirroring system works, that's actually kind of a pain. Either you hard-code everything to your local URL that is the EPEL mirror and you live with the problems of what happens when your local mirror is down, or you officially register yourself as a mirror of EPEL so that you can get into the mirrormanager system and therefore catch all the "local" IP addresses that are looking for a mirror and get them redirected to you. But if you officially register yourself as a mirror of EPEL, then you've got a huge mass of additional traffic that you might have to deal with.

Blech and double blech.

maybe another solution would be to disable this line in epel.repo:

mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=epel-debug-5&arch=$basearch

and enable this line?

baseurl=http://download.fedoraproject.org/pub/epel/5/$basearch/debug

Here's our solution:

cookbook_file "#{Chef::Config[:file_cache_path]}/epel-release-#{epel}.noarch.rpm" do
Chef::Log.info("Providing epel-release-#{epel}.noarch.rpm via cookbook_file")
source "epel-release-#{epel}.noarch.rpm"
not_if "rpm -qa | grep -qx 'epel-release-#{epel}'"
end

rpm_package "epel-release" do
source "#{Chef::Config[:file_cache_path]}/epel-release-#{epel}.noarch.rpm"
end

is it a good idea for the package or yum_package provider to help with
this? then again, maybe not because the point of cfg mgmt here is for
a node to be in a known and intended state. and if it can't get there,
then it's a "bad" node, and the chef-client run should fail.

Last I saw, there was still a bug in the yum provider that could cause it to fail an entire chef run because yum was returning a non-zero error code, even in cases where the output from yum indicates that this might be just a temporary error and you should retry at a later time. I know some fixes have been applied to the yum provider on this issue, but I don't think the bug has been fully squashed.

Meanwhile, code your CentOS systems knowing that yum does crazy whacky stuff when it hits mirror sites that are run in crazy whacky ways, and that this combination frequently occurs when you mix yum and EPEL.

--
Brad Knowles bknowles@ihiji.com
SAGE Level IV, Chef Level 0.0.1