Chef-client (still) randomly failing

I’m still failing to get chef-client to run on it’s own without failing due
to timeouts. I have 4 CentOS 6.4 nodes in QA, overnight usually 2 will fail
and show as ‘red’ the next day - it runs hourly, so they’ll be down for 10+
hours. I login to the node, stop chef-client, run it manually, and it
works.

  1. Why is it timing out on NTP, and even failing as it retries 4 times in a
    row?
  2. Why doesn’t it fix itself on the next run? It seems to fail, and then
    never recover.

Full logs here:

On Tuesday, November 19, 2013 at 10:13 AM, Phil Cryer wrote:

I'm still failing to get chef-client to run on it's own without failing due to timeouts. I have 4 CentOS 6.4 nodes in QA, overnight usually 2 will fail and show as 'red' the next day - it runs hourly, so they'll be down for 10+ hours. I login to the node, stop chef-client, run it manually, and it works.

  1. Why is it timing out on NTP, and even failing as it retries 4 times in a row?
  2. Why doesn't it fix itself on the next run? It seems to fail, and then never recover.

Full logs here:chef-client keeps randomly failing. I login to the node, stop chef-client, run it manually, and it works. 1) Why is it timing out on NTP, 4 times in a row? 2) Why doesn't it fix itself on the next run? It seems to fail, and then never recover. · GitHub

Can you check lsof and ps to see if there is a child process of chef that does not get killed and keeps the yum lock?

You may be hitting this issue: https://tickets.opscode.com/browse/MIXLIB-16

That’s been fixed in master and should get released with the next Chef 11.x release. You could try building the gem manually and installing it on one of your systems to see if it fixes the problem which would be greatly appreciated.

--
Daniel DeLeo

Can do Daniel, can you tell me how I build the gem manually? I've checked
out from GitHub - chef-boneyard/chef-client: Development repository for Chef Client cookbook and
GitHub - chef-boneyard/chef: DEPRECATED: Use chef-client or chef-server - have run Bundle install
successfully, but don't see any info about how to build the gem. And, can I
just build the chef-client 11 gem, and put it on an existing node that's
still talking to chef-server 10.x? Thanks

On Tue, Nov 19, 2013 at 12:20 PM, Daniel DeLeo dan@kallistec.com wrote:

On Tuesday, November 19, 2013 at 10:13 AM, Phil Cryer wrote:

I'm still failing to get chef-client to run on it's own without failing
due to timeouts. I have 4 CentOS 6.4 nodes in QA, overnight usually 2 will
fail and show as 'red' the next day - it runs hourly, so they'll be down
for 10+ hours. I login to the node, stop chef-client, run it manually,
and it works.

  1. Why is it timing out on NTP, and even failing as it retries 4 times in
    a row?
  2. Why doesn't it fix itself on the next run? It seems to fail, and then
    never recover.

Full logs here:
chef-client keeps randomly failing. I login to the node, stop chef-client, run it manually, and it works. 1) Why is it timing out on NTP, 4 times in a row? 2) Why doesn't it fix itself on the next run? It seems to fail, and then never recover. · GitHub

Can you check lsof and ps to see if there is a child process of chef that
does not get killed and keeps the yum lock?

You may be hitting this issue:
https://tickets.opscode.com/browse/MIXLIB-16

That’s been fixed in master and should get released with the next Chef
11.x release. You could try building the gem manually and installing it on
one of your systems to see if it fixes the problem which would be greatly
appreciated.

--
Daniel DeLeo

--
http://philcryer.com/

On Tuesday, November 19, 2013 at 11:54 AM, Phil Cryer wrote:

Can do Daniel, can you tell me how I build the gem manually? I've checked out from GitHub - chef-boneyard/chef-client: Development repository for Chef Client cookbook and GitHub - chef-boneyard/chef: DEPRECATED: Use chef-client or chef-server - have run Bundle install successfully, but don't see any info about how to build the gem. And, can I just build the chef-client 11 gem, and put it on an existing node that's still talking to chef-server 10.x? Thanks
You need to build the mixlib-shellout gem from master to get the fix: GitHub - chef/mixlib-shellout: mixin library for subprocess management, output collection

The process should be:

git clone
bundle install
rake gem

and then you have a gem package under pkg.

I may push a prerelease version a little later today (chatting with the team about this now). If that happens you would just do /opt/chef/embedded/bin/gem install mixlib-shellout —pre (assuming omnibus). I’ll let you know what we decide.

--
Daniel DeLeo

Ok, I've got that in place on my qa01 - I did a chef-client stop, then a ps
to make sure all chef processes were gone, so we'll see how this one goes.
If you do a prerelease I can install it on another node, just let me know -
and thanks.

On Tue, Nov 19, 2013 at 2:06 PM, Daniel DeLeo dan@kallistec.com wrote:

On Tuesday, November 19, 2013 at 11:54 AM, Phil Cryer wrote:

Can do Daniel, can you tell me how I build the gem manually? I've checked
out from GitHub - chef-boneyard/chef-client: Development repository for Chef Client cookbook and
GitHub - chef-boneyard/chef: DEPRECATED: Use chef-client or chef-server - have run Bundle install
successfully, but don't see any info about how to build the gem. And, can I
just build the chef-client 11 gem, and put it on an existing node that's
still talking to chef-server 10.x? Thanks

You need to build the mixlib-shellout gem from master to get the fix:
GitHub - chef/mixlib-shellout: mixin library for subprocess management, output collection

The process should be:

git clone
bundle install
rake gem

and then you have a gem package under pkg.

I may push a prerelease version a little later today (chatting with the
team about this now). If that happens you would just do
/opt/chef/embedded/bin/gem install mixlib-shellout —pre (assuming omnibus).
I’ll let you know what we decide.

--
Daniel DeLeo

--
http://philcryer.com/

On Tuesday, November 19, 2013 at 12:30 PM, Phil Cryer wrote:

Ok, I've got that in place on my qa01 - I did a chef-client stop, then a ps to make sure all chef processes were gone, so we'll see how this one goes. If you do a prerelease I can install it on another node, just let me know - and thanks.

We decided to do the prerelease, so you can install that with:

Omnibus: /opt/chef/embedded/bin/gem install mixlib-shellout —pre
Otherwise: gem install mixlib-shellout —pre

Thanks for testing this.

--
Daniel DeLeo

Installed, thank you, will report back tomorrow.

[root@qa02 ~]# /opt/chef/embedded/bin/gem install mixlib-shellout --pre
Fetching: mixlib-shellout-1.3.0.rc.0.gem (100%)
Successfully installed mixlib-shellout-1.3.0.rc.0
1 gem installed
Installing ri documentation for mixlib-shellout-1.3.0.rc.0...
Installing RDoc documentation for mixlib-shellout-1.3.0.rc.0...

On Tue, Nov 19, 2013 at 2:32 PM, Daniel DeLeo dan@kallistec.com wrote:

On Tuesday, November 19, 2013 at 12:30 PM, Phil Cryer wrote:

Ok, I've got that in place on my qa01 - I did a chef-client stop, then a
ps to make sure all chef processes were gone, so we'll see how this one
goes. If you do a prerelease I can install it on another node, just let me
know - and thanks.

We decided to do the prerelease, so you can install that with:

Omnibus: /opt/chef/embedded/bin/gem install mixlib-shellout —pre
Otherwise: gem install mixlib-shellout —pre

Thanks for testing this.

--
Daniel DeLeo

--
http://philcryer.com/

Daniel, sorry to say this still fails with the new mixlib-shellout. It
fails, then fails every other time chef-client runs and never works. I
think it may still be an issue with the yum.pid getting in the way. To fix,
I ssh in and do:

-> stop chef-client

/etc/init.d/chef-client stop

Stopping chef-client: [ OK ]

-> make sure all chef process are done

ps -fe|grep chef

root 14303 14279 0 13:29 pts/0 00:00:00 grep chef

-> remove the stale yum pid

rm /var/run/yum.pid

-> run chef-client

chef-client

And it just works. So, here are the logs from when it was failing, notice
it continued to try every hour...

=====
[2013-11-21T11:48:37-06:00] WARN: Current
template[/opt/splunkforwarder/etc/system/local/outputs.conf]:
/var/chef/cache/cookbooks/logging/recipes/default.rb:51:in from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for file[/etc/httpd/conf.d/ssl.conf] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous file[/etc/httpd/conf.d/ssl.conf]: /var/chef/cache/cookbooks/apache2/recipes/default.rb:84:in block in
from_file'
[2013-11-21T11:48:37-06:00] WARN: Current
file[/etc/httpd/conf.d/ssl.conf]:
/var/chef/cache/cookbooks/apache2/recipes/mod_ssl.rb:28:in from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for template[/etc/httpd/ports.conf] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous template[/etc/httpd/ports.conf]: /var/chef/cache/cookbooks/apache2/recipes/default.rb:185:in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current template[/etc/httpd/ports.conf]:
/var/chef/cache/cookbooks/apache2/recipes/mod_ssl.rb:34:in from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for template[/etc/varnish/default.vcl] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous template[/etc/varnish/default.vcl]: /var/chef/cache/cookbooks/chef-varnish/recipes/default.rb:60:in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current
template[/etc/varnish/default.vcl]:
/var/chef/cache/cookbooks/app-savviscom/recipes/default.rb:133:in
from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for service[moxi-server] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous service[moxi-server]: /var/chef/cache/cookbooks/couchbase/recipes/moxi.rb:42:in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current service[moxi-server]:
/var/chef/cache/cookbooks/couchbase/recipes/moxi.rb:56:in from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for package[postfix] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous package[postfix]: /var/chef/cache/cookbooks/base/recipes/default.rb:82:in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current package[postfix]:
/var/chef/cache/cookbooks/app-savviscom/recipes/default.rb:186:in
from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for package[rsyslog] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous package[rsyslog]: /var/chef/cache/cookbooks/base/recipes/default.rb:90:in block in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current package[rsyslog]:
/var/chef/cache/cookbooks/app-savviscom/recipes/default.rb:189:in
from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for template[/etc/php.ini] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous template[/etc/php.ini]: /var/chef/cache/cookbooks/app-savviscom/recipes/default.rb:96:in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current template[/etc/php.ini]:
/var/chef/cache/cookbooks/php/recipes/package.rb:27:in `from_file'
[2013-11-21T11:48:37-06:00] INFO: Processing template[/root/.profile]
action create (base::default line 14)
[2013-11-21T11:48:37-06:00] INFO: Processing template[/root/.curlrc] action
create (base::default line 20)
[2013-11-21T11:48:37-06:00] INFO: Processing template[/root/.bash_profile]
action create (base::default line 26)
[2013-11-21T11:48:37-06:00] INFO: Processing service[iptables] action
disable (base::default line 32)
[2013-11-21T11:48:37-06:00] INFO: Processing service[iptables] action stop
(base::default line 32)
[2013-11-21T11:48:37-06:00] INFO: Processing
template[/etc/profile.d/proxy.sh] action create (base::default line 35)
[2013-11-21T11:48:37-06:00] INFO: Processing
cookbook_file[/etc/selinux/config] action create (base::default line 41)
[2013-11-21T11:48:37-06:00] INFO: Processing
cookbook_file[/etc/yum/pluginconf.d/fastestmirror.conf] action create
(base::default line 57)
[2013-11-21T11:48:37-06:00] INFO: Processing
cookbook_file[/etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6] action create
(base::default line 63)
[2013-11-21T11:48:37-06:00] INFO: Processing yum_key[RPM-GPG-KEY-EPEL-6]
action add (yum::epel line 22)
[2013-11-21T11:48:37-06:00] INFO: Processing yum_repository[epel] action
add (yum::epel line 27)
[2013-11-21T11:48:37-06:00] INFO: Processing package[ntp] action install
(base::default line 77)
[2013-11-21T12:03:52-06:00] ERROR:
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py
exceeded timeout 900
[2013-11-21T12:03:52-06:00] INFO: Retrying execution of package[ntp], 3
attempt(s) left
[2013-11-21T12:19:21-06:00] ERROR:
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py
exceeded timeout 900
[2013-11-21T12:19:21-06:00] INFO: Retrying execution of package[ntp], 2
attempt(s) left
[2013-11-21T12:34:50-06:00] ERROR:
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py
exceeded timeout 900
[2013-11-21T12:34:50-06:00] INFO: Retrying execution of package[ntp], 1
attempt(s) left
[2013-11-21T12:50:19-06:00] ERROR:
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py
exceeded timeout 900
[2013-11-21T12:50:19-06:00] INFO: Retrying execution of package[ntp], 0
attempt(s) left
[2013-11-21T13:05:48-06:00] ERROR:
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py
exceeded timeout 900
[2013-11-21T13:05:48-06:00] INFO: Running queued delayed notifications
before re-raising exception
[2013-11-21T13:05:48-06:00] ERROR: Running exception handlers
[2013-11-21T13:05:48-06:00] ERROR: Exception handlers complete
[2013-11-21T13:05:48-06:00] FATAL: Stacktrace dumped to
/var/chef/cache/chef-stacktrace.out
[2013-11-21T13:05:48-06:00] ERROR: Chef::Exceptions::ChildConvergeError:
Chef run process exited unsuccessfully (exit code 1)
[2013-11-21T13:05:48-06:00] ERROR: Sleeping for 1800 seconds before trying
again

Thanks

On Tue, Nov 19, 2013 at 2:57 PM, Phil Cryer phil@philcryer.com wrote:

Installed, thank you, will report back tomorrow.

[root@qa02 ~]# /opt/chef/embedded/bin/gem install mixlib-shellout --pre
Fetching: mixlib-shellout-1.3.0.rc.0.gem (100%)
Successfully installed mixlib-shellout-1.3.0.rc.0
1 gem installed
Installing ri documentation for mixlib-shellout-1.3.0.rc.0...
Installing RDoc documentation for mixlib-shellout-1.3.0.rc.0...

On Tue, Nov 19, 2013 at 2:32 PM, Daniel DeLeo dan@kallistec.com wrote:

On Tuesday, November 19, 2013 at 12:30 PM, Phil Cryer wrote:

Ok, I've got that in place on my qa01 - I did a chef-client stop, then a
ps to make sure all chef processes were gone, so we'll see how this one
goes. If you do a prerelease I can install it on another node, just let me
know - and thanks.

We decided to do the prerelease, so you can install that with:

Omnibus: /opt/chef/embedded/bin/gem install mixlib-shellout —pre
Otherwise: gem install mixlib-shellout —pre

Thanks for testing this.

--
Daniel DeLeo

--
http://philcryer.com/

--
http://philcryer.com/

On Thursday, November 21, 2013 at 11:35 AM, Phil Cryer wrote:

Daniel, sorry to say this still fails with the new mixlib-shellout. It fails, then fails every other time chef-client runs and never works. I think it may still be an issue with the yum.pid getting in the way. To fix, I ssh in and do:

-> stop chef-client

/etc/init.d/chef-client stop

Stopping chef-client: [ OK ]

-> make sure all chef process are done

ps -fe|grep chef

root 14303 14279 0 13:29 pts/0 00:00:00 grep chef

-> remove the stale yum pid

rm /var/run/yum.pid

-> run chef-client

chef-client

And it just works. So, here are the logs from when it was failing, notice it continued to try every hour...

=====
[2013-11-21T11:48:37-06:00] WARN: Current template[/opt/splunkforwarder/etc/system/local/outputs.conf]: /var/chef/cache/cookbooks/logging/recipes/default.rb:51:in from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for file[/etc/httpd/conf.d/ssl.conf] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous file[/etc/httpd/conf.d/ssl.conf]: /var/chef/cache/cookbooks/apache2/recipes/default.rb:84:in block in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current file[/etc/httpd/conf.d/ssl.conf]: /var/chef/cache/cookbooks/apache2/recipes/mod_ssl.rb:28:in from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for template[/etc/httpd/ports.conf] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous template[/etc/httpd/ports.conf]: /var/chef/cache/cookbooks/apache2/recipes/default.rb:185:in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current template[/etc/httpd/ports.conf]: /var/chef/cache/cookbooks/apache2/recipes/mod_ssl.rb:34:in from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for template[/etc/varnish/default.vcl] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous template[/etc/varnish/default.vcl]: /var/chef/cache/cookbooks/chef-varnish/recipes/default.rb:60:in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current template[/etc/varnish/default.vcl]: /var/chef/cache/cookbooks/app-savviscom/recipes/default.rb:133:in from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for service[moxi-server] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous service[moxi-server]: /var/chef/cache/cookbooks/couchbase/recipes/moxi.rb:42:in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current service[moxi-server]: /var/chef/cache/cookbooks/couchbase/recipes/moxi.rb:56:in from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for package[postfix] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous package[postfix]: /var/chef/cache/cookbooks/base/recipes/default.rb:82:in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current package[postfix]: /var/chef/cache/cookbooks/app-savviscom/recipes/default.rb:186:in from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for package[rsyslog] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous package[rsyslog]: /var/chef/cache/cookbooks/base/recipes/default.rb:90:in block in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current package[rsyslog]: /var/chef/cache/cookbooks/app-savviscom/recipes/default.rb:189:in from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for template[/etc/php.ini] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous template[/etc/php.ini]: /var/chef/cache/cookbooks/app-savviscom/recipes/default.rb:96:in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current template[/etc/php.ini]: /var/chef/cache/cookbooks/php/recipes/package.rb:27:in `from_file'
[2013-11-21T11:48:37-06:00] INFO: Processing template[/root/.profile] action create (base::default line 14)
[2013-11-21T11:48:37-06:00] INFO: Processing template[/root/.curlrc] action create (base::default line 20)
[2013-11-21T11:48:37-06:00] INFO: Processing template[/root/.bash_profile] action create (base::default line 26)
[2013-11-21T11:48:37-06:00] INFO: Processing service[iptables] action disable (base::default line 32)
[2013-11-21T11:48:37-06:00] INFO: Processing service[iptables] action stop (base::default line 32)
[2013-11-21T11:48:37-06:00] INFO: Processing template[/etc/profile.d/proxy.sh] action create (base::default line 35)
[2013-11-21T11:48:37-06:00] INFO: Processing cookbook_file[/etc/selinux/config] action create (base::default line 41)
[2013-11-21T11:48:37-06:00] INFO: Processing cookbook_file[/etc/yum/pluginconf.d/fastestmirror.conf] action create (base::default line 57)
[2013-11-21T11:48:37-06:00] INFO: Processing cookbook_file[/etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6] action create (base::default line 63)
[2013-11-21T11:48:37-06:00] INFO: Processing yum_key[RPM-GPG-KEY-EPEL-6] action add (yum::epel line 22)
[2013-11-21T11:48:37-06:00] INFO: Processing yum_repository[epel] action add (yum::epel line 27)
[2013-11-21T11:48:37-06:00] INFO: Processing package[ntp] action install (base::default line 77)
[2013-11-21T12:03:52-06:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py exceeded timeout 900
[2013-11-21T12:03:52-06:00] INFO: Retrying execution of package[ntp], 3 attempt(s) left
[2013-11-21T12:19:21-06:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py exceeded timeout 900
[2013-11-21T12:19:21-06:00] INFO: Retrying execution of package[ntp], 2 attempt(s) left
[2013-11-21T12:34:50-06:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py exceeded timeout 900
[2013-11-21T12:34:50-06:00] INFO: Retrying execution of package[ntp], 1 attempt(s) left
[2013-11-21T12:50:19-06:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py exceeded timeout 900
[2013-11-21T12:50:19-06:00] INFO: Retrying execution of package[ntp], 0 attempt(s) left
[2013-11-21T13:05:48-06:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py exceeded timeout 900
[2013-11-21T13:05:48-06:00] INFO: Running queued delayed notifications before re-raising exception
[2013-11-21T13:05:48-06:00] ERROR: Running exception handlers
[2013-11-21T13:05:48-06:00] ERROR: Exception handlers complete
[2013-11-21T13:05:48-06:00] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out
[2013-11-21T13:05:48-06:00] ERROR: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
[2013-11-21T13:05:48-06:00] ERROR: Sleeping for 1800 seconds before trying again

=====

Thanks
I would expect these error messages to look different with the new version of mixlib-shellout. Can you run this recipe:

require 'mixlib/shellout/version'
log "the shellout version is #{Mixlib::ShellOut::VERSION}" do
level :warn
end

With chef-apply: chef-apply shellout-version.rb

the output should look like:

Recipe: (chef-apply cookbook)::(chef-apply recipe)

  • log[the shellout version is 1.3.0.rc.0] action write[2013-11-21T11:49:45-08:00] WARN: the shellout version is 1.3.0.rc.0

--
Daniel DeLeo

-> I threw this in the end of one of my recipes:
require 'mixlib/shellout/version'
log "the shellout version is #{Mixlib::ShellOut::VERSION}" do
level :warn
end

-> ran it, and the log says:

  • log[the shellout version is 1.3.0.rc.0] action
    write[2013-11-21T19:42:11-06:00] WARN: the shellout version is 1.3.0.rc.0

happy to test it again, or another gem, whatever - just let me know. Thanks

On Thu, Nov 21, 2013 at 1:51 PM, Daniel DeLeo dan@kallistec.com wrote:

On Thursday, November 21, 2013 at 11:35 AM, Phil Cryer wrote:

Daniel, sorry to say this still fails with the new mixlib-shellout. It
fails, then fails every other time chef-client runs and never works. I
think it may still be an issue with the yum.pid getting in the way. To fix,
I ssh in and do:

-> stop chef-client

/etc/init.d/chef-client stop

Stopping chef-client: [ OK ]

-> make sure all chef process are done

ps -fe|grep chef

root 14303 14279 0 13:29 pts/0 00:00:00 grep chef

-> remove the stale yum pid

rm /var/run/yum.pid

-> run chef-client

chef-client

And it just works. So, here are the logs from when it was failing, notice
it continued to try every hour...

=====
[2013-11-21T11:48:37-06:00] WARN: Current
template[/opt/splunkforwarder/etc/system/local/outputs.conf]:
/var/chef/cache/cookbooks/logging/recipes/default.rb:51:in from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for file[/etc/httpd/conf.d/ssl.conf] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous file[/etc/httpd/conf.d/ssl.conf]: /var/chef/cache/cookbooks/apache2/recipes/default.rb:84:in block in
from_file'
[2013-11-21T11:48:37-06:00] WARN: Current
file[/etc/httpd/conf.d/ssl.conf]:
/var/chef/cache/cookbooks/apache2/recipes/mod_ssl.rb:28:in from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for template[/etc/httpd/ports.conf] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous template[/etc/httpd/ports.conf]: /var/chef/cache/cookbooks/apache2/recipes/default.rb:185:in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current
template[/etc/httpd/ports.conf]:
/var/chef/cache/cookbooks/apache2/recipes/mod_ssl.rb:34:in from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for template[/etc/varnish/default.vcl] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous template[/etc/varnish/default.vcl]: /var/chef/cache/cookbooks/chef-varnish/recipes/default.rb:60:in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current
template[/etc/varnish/default.vcl]:
/var/chef/cache/cookbooks/app-savviscom/recipes/default.rb:133:in
from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for service[moxi-server] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous service[moxi-server]: /var/chef/cache/cookbooks/couchbase/recipes/moxi.rb:42:in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current service[moxi-server]:
/var/chef/cache/cookbooks/couchbase/recipes/moxi.rb:56:in from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for package[postfix] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous package[postfix]: /var/chef/cache/cookbooks/base/recipes/default.rb:82:in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current package[postfix]:
/var/chef/cache/cookbooks/app-savviscom/recipes/default.rb:186:in
from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for package[rsyslog] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous package[rsyslog]: /var/chef/cache/cookbooks/base/recipes/default.rb:90:in block in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current package[rsyslog]:
/var/chef/cache/cookbooks/app-savviscom/recipes/default.rb:189:in
from_file' [2013-11-21T11:48:37-06:00] WARN: Cloning resource attributes for template[/etc/php.ini] from prior resource (CHEF-3694) [2013-11-21T11:48:37-06:00] WARN: Previous template[/etc/php.ini]: /var/chef/cache/cookbooks/app-savviscom/recipes/default.rb:96:in from_file'
[2013-11-21T11:48:37-06:00] WARN: Current template[/etc/php.ini]:
/var/chef/cache/cookbooks/php/recipes/package.rb:27:in `from_file'
[2013-11-21T11:48:37-06:00] INFO: Processing template[/root/.profile]
action create (base::default line 14)
[2013-11-21T11:48:37-06:00] INFO: Processing template[/root/.curlrc]
action create (base::default line 20)
[2013-11-21T11:48:37-06:00] INFO: Processing template[/root/.bash_profile]
action create (base::default line 26)
[2013-11-21T11:48:37-06:00] INFO: Processing service[iptables] action
disable (base::default line 32)
[2013-11-21T11:48:37-06:00] INFO: Processing service[iptables] action stop
(base::default line 32)
[2013-11-21T11:48:37-06:00] INFO: Processing
template[/etc/profile.d/proxy.sh] action create (base::default line 35)
[2013-11-21T11:48:37-06:00] INFO: Processing
cookbook_file[/etc/selinux/config] action create (base::default line 41)
[2013-11-21T11:48:37-06:00] INFO: Processing
cookbook_file[/etc/yum/pluginconf.d/fastestmirror.conf] action create
(base::default line 57)
[2013-11-21T11:48:37-06:00] INFO: Processing
cookbook_file[/etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6] action create
(base::default line 63)
[2013-11-21T11:48:37-06:00] INFO: Processing yum_key[RPM-GPG-KEY-EPEL-6]
action add (yum::epel line 22)
[2013-11-21T11:48:37-06:00] INFO: Processing yum_repository[epel] action
add (yum::epel line 27)
[2013-11-21T11:48:37-06:00] INFO: Processing package[ntp] action install
(base::default line 77)
[2013-11-21T12:03:52-06:00] ERROR:
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py
exceeded timeout 900
[2013-11-21T12:03:52-06:00] INFO: Retrying execution of package[ntp], 3
attempt(s) left
[2013-11-21T12:19:21-06:00] ERROR:
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py
exceeded timeout 900
[2013-11-21T12:19:21-06:00] INFO: Retrying execution of package[ntp], 2
attempt(s) left
[2013-11-21T12:34:50-06:00] ERROR:
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py
exceeded timeout 900
[2013-11-21T12:34:50-06:00] INFO: Retrying execution of package[ntp], 1
attempt(s) left
[2013-11-21T12:50:19-06:00] ERROR:
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py
exceeded timeout 900
[2013-11-21T12:50:19-06:00] INFO: Retrying execution of package[ntp], 0
attempt(s) left
[2013-11-21T13:05:48-06:00] ERROR:
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py
exceeded timeout 900
[2013-11-21T13:05:48-06:00] INFO: Running queued delayed notifications
before re-raising exception
[2013-11-21T13:05:48-06:00] ERROR: Running exception handlers
[2013-11-21T13:05:48-06:00] ERROR: Exception handlers complete
[2013-11-21T13:05:48-06:00] FATAL: Stacktrace dumped to
/var/chef/cache/chef-stacktrace.out
[2013-11-21T13:05:48-06:00] ERROR: Chef::Exceptions::ChildConvergeError:
Chef run process exited unsuccessfully (exit code 1)
[2013-11-21T13:05:48-06:00] ERROR: Sleeping for 1800 seconds before trying
again

Thanks

I would expect these error messages to look different with the new version
of mixlib-shellout. Can you run this recipe:

require 'mixlib/shellout/version'
log "the shellout version is #{Mixlib::ShellOut::VERSION}" do
level :warn
end

With chef-apply: chef-apply shellout-version.rb

the output should look like:

Recipe: (chef-apply cookbook)::(chef-apply recipe)

  • log[the shellout version is 1.3.0.rc.0] action
    write[2013-11-21T11:49:45-08:00] WARN: the shellout version is 1.3.0.rc.0

--
Daniel DeLeo

--
http://philcryer.com/

On Thursday, November 21, 2013 at 5:47 PM, Phil Cryer wrote:

-> I threw this in the end of one of my recipes:
require 'mixlib/shellout/version'
log "the shellout version is #{Mixlib::ShellOut::VERSION}" do
level :warn
end

-> ran it, and the log says:

  • log[the shellout version is 1.3.0.rc.0] action write[2013-11-21T19:42:11-06:00] WARN: the shellout version is 1.3.0.rc.0

happy to test it again, or another gem, whatever - just let me know. Thanks

On Thu, Nov 21, 2013 at 1:51 PM, Daniel DeLeo <dan@kallistec.com (mailto:dan@kallistec.com)> wrote:

On Thursday, November 21, 2013 at 11:35 AM, Phil Cryer wrote:

Daniel, sorry to say this still fails with the new mixlib-shellout. It fails, then fails every other time chef-client runs and never works. I think it may still be an issue with the yum.pid getting in the way. To fix, I ssh in and do:

-> stop chef-client

/etc/init.d/chef-client stop

Stopping chef-client: [ OK ]

-> make sure all chef process are done

ps -fe|grep chef

root 14303 14279 0 13:29 pts/0 00:00:00 grep chef

-> remove the stale yum pid

rm /var/run/yum.pid

-> run chef-client

chef-client

And it just works. So, here are the logs from when it was failing, notice it continued to try every hour...

So is there a live process that has the yum lock? Or it’s truly stale and the pid in the file is a dead process? Also is this machine tight on RAM, and is there anything in the OOM killer log? At first glance I can’t think of what chef could do automatically for that case. Perhaps it could read the pid file and check if the process is alive, but that seems fraught with peril and prone to race conditions.

In any case if there are not any stale yum-dump.py processes hanging around then this behavior isn’t caused by MIXLIB-16, so I might have sent you on a wild goose chase.

so nothing I can see is locking yum, assuming it's something chef is
kicking off. What I do (when things aren't working) is login, stop
chef-client, kill any chef processes hanging around and make sure there's
no /var/run/yum.pid. Then I manually run chef-client, it works fine. I do
it again to be sure it can still work, that's fine, then I loggout. hours
later, it's stuck, in this condition again. This is across 4 VMware VMs,
each are just running apache and php, each has 2Gig RAM. They're CentOS
6.4. I want to get it fixed because I want chef-client to check in
regularly so I have all green when I run knife status. Anything else you
can think of to debug what's happening to make it fail (seemingly)
randomly? Thanks

On Thu, Nov 21, 2013 at 8:02 PM, Daniel DeLeo dan@kallistec.com wrote:

On Thursday, November 21, 2013 at 5:47 PM, Phil Cryer wrote:

-> I threw this in the end of one of my recipes:
require 'mixlib/shellout/version'
log "the shellout version is #{Mixlib::ShellOut::VERSION}" do
level :warn
end

-> ran it, and the log says:

  • log[the shellout version is 1.3.0.rc.0] action
    write[2013-11-21T19:42:11-06:00] WARN: the shellout version is 1.3.0.rc.0

happy to test it again, or another gem, whatever - just let me know. Thanks

On Thu, Nov 21, 2013 at 1:51 PM, Daniel DeLeo dan@kallistec.com wrote:

On Thursday, November 21, 2013 at 11:35 AM, Phil Cryer wrote:

Daniel, sorry to say this still fails with the new mixlib-shellout. It
fails, then fails every other time chef-client runs and never works. I
think it may still be an issue with the yum.pid getting in the way. To fix,
I ssh in and do:

-> stop chef-client

/etc/init.d/chef-client stop

Stopping chef-client: [ OK ]

-> make sure all chef process are done

ps -fe|grep chef

root 14303 14279 0 13:29 pts/0 00:00:00 grep chef

-> remove the stale yum pid

rm /var/run/yum.pid

-> run chef-client

chef-client

And it just works. So, here are the logs from when it was failing, notice
it continued to try every hour...

So is there a live process that has the yum lock? Or it’s truly stale and
the pid in the file is a dead process? Also is this machine tight on RAM,
and is there anything in the OOM killer log? At first glance I can’t think
of what chef could do automatically for that case. Perhaps it could read
the pid file and check if the process is alive, but that seems fraught with
peril and prone to race conditions.

In any case if there are not any stale yum-dump.py processes hanging
around then this behavior isn’t caused by MIXLIB-16, so I might have sent
you on a wild goose chase.

--
http://philcryer.com/

On Friday, November 22, 2013 at 10:09 AM, Phil Cryer wrote:

so nothing I can see is locking yum, assuming it's something chef is kicking off. What I do (when things aren't working) is login, stop chef-client, kill any chef processes hanging around and make sure there's no /var/run/yum.pid. Then I manually run chef-client, it works fine. I do it again to be sure it can still work, that's fine, then I loggout. hours later, it's stuck, in this condition again. This is across 4 VMware VMs, each are just running apache and php, each has 2Gig RAM. They're CentOS 6.4. I want to get it fixed because I want chef-client to check in regularly so I have all green when I run knife status. Anything else you can think of to debug what's happening to make it fail (seemingly) randomly? Thanks

But the constant here is that there’s always a /var/run/yum.pid and there is no process matching that pid (or it’s been recycled)? If that’s the case, the next step is to figure out what process that is, and why it’s dying without cleaning up after itself.

--
Daniel DeLeo

On Thu, Nov 21, 2013 at 8:02 PM, Daniel DeLeo <dan@kallistec.com (mailto:dan@kallistec.com)> wrote:

On Thursday, November 21, 2013 at 5:47 PM, Phil Cryer wrote:

-> I threw this in the end of one of my recipes:
require 'mixlib/shellout/version'
log "the shellout version is #{Mixlib::ShellOut::VERSION}" do
level :warn
end

-> ran it, and the log says:

  • log[the shellout version is 1.3.0.rc.0] action write[2013-11-21T19:42:11-06:00] WARN: the shellout version is 1.3.0.rc.0

happy to test it again, or another gem, whatever - just let me know. Thanks

On Thu, Nov 21, 2013 at 1:51 PM, Daniel DeLeo <dan@kallistec.com (mailto:dan@kallistec.com)> wrote:

On Thursday, November 21, 2013 at 11:35 AM, Phil Cryer wrote:

Daniel, sorry to say this still fails with the new mixlib-shellout. It fails, then fails every other time chef-client runs and never works. I think it may still be an issue with the yum.pid getting in the way. To fix, I ssh in and do:

-> stop chef-client

/etc/init.d/chef-client stop

Stopping chef-client: [ OK ]

-> make sure all chef process are done

ps -fe|grep chef

root 14303 14279 0 13:29 pts/0 00:00:00 grep chef

-> remove the stale yum pid

rm /var/run/yum.pid

-> run chef-client

chef-client

And it just works. So, here are the logs from when it was failing, notice it continued to try every hour...

So is there a live process that has the yum lock? Or it’s truly stale and the pid in the file is a dead process? Also is this machine tight on RAM, and is there anything in the OOM killer log? At first glance I can’t think of what chef could do automatically for that case. Perhaps it could read the pid file and check if the process is alive, but that seems fraught with peril and prone to race conditions.

In any case if there are not any stale yum-dump.py processes hanging around then this behavior isn’t caused by MIXLIB-16, so I might have sent you on a wild goose chase.

--
http://philcryer.com/