76 running chef-client processes exhausting system memory


#1

ISSUE

76 running chef-client processes exhausting system memory, causing
other apps to crash (in this case zabbix), chef-client is set to run
once an hour, currently showing 76 running processes.

[root@dc2mgmtsavpd01 ~]# ps -fe | grep chef-client | wc -l
76

Also, when I have failed chef-client runs like this I always find this
in the ps; a chef-client worker process that is old (hung?), and a
yum-dump.py that always needs to be killed before a chef-client run
will be successful.

ps -fe|grep chef

root 7082 13613 23 09:12 ? 00:00:00 chef-client worker:
ppid=13613;start=09:12:03;
root 7221 7082 26 09:12 ? 00:00:00 /usr/bin/python
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py
–options --installed-provides

After killing the 76 processes, and a kill -9 on the chef-client work
and yum-dump.py, a chef-client run will be successful, as will others
that are run automatically after that - it’s just at some point down
the line one will get hung up and cause this cascade of processes.

SPECS

chef-client -v

Chef: 11.6.0

cat /etc/issue.net

CentOS release 6.4 (Final)
Kernel \r on an \m

Vmware instance with one core and 3G RAM

LOGS

[…]
[2013-10-03T09:06:37-05:00] INFO: Forking chef instance to converge…
[2013-10-03T09:06:37-05:00] INFO: *** Chef 11.6.0 ***
[2013-10-03T09:06:37-05:00] WARN: unable to detect ipaddress
[2013-10-03T09:06:37-05:00] WARN: unable to detect macaddress
[2013-10-03T09:06:37-05:00] WARN: unable to detect ip6address
[2013-10-03T09:06:37-05:00] INFO: Run List is [recipe[base],
recipe[pd-web], recipe[diptables]]
[2013-10-03T09:06:37-05:00] INFO: Run List expands to [base, pd-web, diptables]
[2013-10-03T09:06:37-05:00] INFO: Starting Chef Run for
dc2mgmtsavpd01.mgmt.sdirect
[2013-10-03T09:06:37-05:00] INFO: Running start handlers
[2013-10-03T09:06:37-05:00] INFO: Start handlers complete.
[2013-10-03T09:06:38-05:00] INFO: Loading cookbooks [apache2, apt,
aws, base, build-essential, chef-client, chef-varnish, chef_handler,
couchbase, cron, database, diptables, dmg, drush, firewall, git, line,
mysql, nfs, ntp, openssl, php, postfix, postgresql, rsyslog, runit,
savviscom-pd-web, splunk_handler, timezone-ii, ufw, windows, xfs, xml,
yum, zabbix]
[2013-10-03T09:06:39-05:00] WARN: Cloning resource attributes for
template[/etc/php.ini] from prior resource (CHEF-3694)
[2013-10-03T09:06:39-05:00] WARN: Previous template[/etc/php.ini]:
/var/chef/cache/cookbooks/pd-web/recipes/default.rb:4:in from_file' [2013-10-03T09:06:39-05:00] WARN: Current template[/etc/php.ini]: /var/chef/cache/cookbooks/php/recipes/package.rb:27:infrom_file’
[2013-10-03T09:06:39-05:00] WARN: Cloning resource attributes for
cookbook_file[/etc/httpd/conf/httpd.conf] from prior resource
(CHEF-3694)
[2013-10-03T09:06:39-05:00] WARN: Previous
cookbook_file[/etc/httpd/conf/httpd.conf]:
/var/chef/cache/cookbooks/savviscom-pd-web/recipes/default.rb:55:in
from_file' [2013-10-03T09:06:39-05:00] WARN: Current cookbook_file[/etc/httpd/conf/httpd.conf]: /var/chef/cache/cookbooks/apache2/recipes/default.rb:221:infrom_file’
[2013-10-03T09:06:39-05:00] WARN: Cloning resource attributes for
service[apache2] from prior resource (CHEF-3694)
[2013-10-03T09:06:39-05:00] WARN: Previous service[apache2]:
/var/chef/cache/cookbooks/apache2/recipes/default.rb:24:in from_file' [2013-10-03T09:06:39-05:00] WARN: Current service[apache2]: /var/chef/cache/cookbooks/apache2/recipes/default.rb:228:infrom_file’
[2013-10-03T09:06:39-05:00] WARN: Cloning resource attributes for
service[moxi-server] from prior resource (CHEF-3694)
[2013-10-03T09:06:39-05:00] WARN: Previous service[moxi-server]:
/var/chef/cache/cookbooks/couchbase/recipes/moxi.rb:42:in from_file' [2013-10-03T09:06:39-05:00] WARN: Current service[moxi-server]: /var/chef/cache/cookbooks/couchbase/recipes/moxi.rb:56:infrom_file’
[2013-10-03T09:06:39-05:00] WARN: Cloning resource attributes for
template[/etc/varnish/default.vcl] from prior resource (CHEF-3694)
[2013-10-03T09:06:39-05:00] WARN: Previous
template[/etc/varnish/default.vcl]:
/var/chef/cache/cookbooks/chef-varnish/recipes/default.rb:56:in
from_file' [2013-10-03T09:06:39-05:00] WARN: Current template[/etc/varnish/default.vcl]: /var/chef/cache/cookbooks/savviscom-pd-web/recipes/default.rb:113:infrom_file’
[2013-10-03T09:06:39-05:00] WARN: Cloning resource attributes for
service[varnishlog] from prior resource (CHEF-3694)
[2013-10-03T09:06:39-05:00] WARN: Previous service[varnishlog]:
/var/chef/cache/cookbooks/chef-varnish/recipes/default.rb:81:in
from_file' [2013-10-03T09:06:39-05:00] WARN: Current service[varnishlog]: /var/chef/cache/cookbooks/savviscom-pd-web/recipes/default.rb:120:infrom_file’
[2013-10-03T09:06:39-05:00] WARN: Don’t know how to set up automatic
iptables on your distribution, sorry. Please submit a bug ticket at


[2013-10-03T09:06:39-05:00] INFO: Processing template[/root/.profile]
action create (base::default line 1)
[2013-10-03T09:06:39-05:00] INFO: Processing
template[/root/.bash_profile] action create (base::default line 8)
[2013-10-03T09:06:39-05:00] INFO: Processing template[/root/.curlrc]
action create (base::default line 15)
[2013-10-03T09:06:39-05:00] INFO: Processing
yum_key[RPM-GPG-KEY-EPEL-6] action add (yum::epel line 22)
[2013-10-03T09:06:39-05:00] INFO: Processing yum_repository[epel]
action add (yum::epel line 27)
[2013-10-03T09:06:39-05:00] INFO: Processing package[tzdata] action
install (timezone-ii::default line 16)
[2013-10-03T09:06:39-05:00] INFO: Running queued delayed notifications
before re-raising exception
[2013-10-03T09:06:39-05:00] ERROR: Running exception handlers
[2013-10-03T09:06:39-05:00] ERROR: Exception handlers complete
[2013-10-03T09:06:39-05:00] FATAL: Stacktrace dumped to
/var/chef/cache/chef-stacktrace.out
[2013-10-03T09:06:39-05:00] ERROR:
Chef::Exceptions::ChildConvergeError: Chef run process exited
unsuccessfully (exit code 1)
[2013-10-03T09:06:39-05:00] ERROR: Sleeping for 3600 seconds before trying again

STACKTRACE

cat /var/chef/cache/chef-stacktrace.out

Generated at 2013-10-03 09:06:39 -0500
Errno::ENOMEM: package[tzdata] (timezone-ii::default line 16) had an
error: Errno::ENOMEM: Cannot allocate memory - fork(2)
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/mixlib-shellout-1.2.0/lib/mixlib/shellout/unix.rb:256:in
fork' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/mixlib-shellout-1.2.0/lib/mixlib/shellout/unix.rb:256:infork_subprocess’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/mixlib-shellout-1.2.0/lib/mixlib/shellout/unix.rb:40:in
run_command' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/mixlib-shellout-1.2.0/lib/mixlib/shellout.rb:225:inrun_command’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/mixin/shell_out.rb:30:in
shell_out' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/mixin/shell_out.rb:35:inshell_out!’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum.rb:714:in
refresh' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum.rb:806:inpackage_available?’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum.rb:1055:in
load_current_resource' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider.rb:97:inrun_action’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource.rb:625:in
run_action' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/runner.rb:49:inrun_action’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/runner.rb:81:in
block (2 levels) in converge' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/runner.rb:81:ineach’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/runner.rb:81:in
block in converge' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource_collection.rb:98:inblock in execute_each_resource’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource_collection/stepable_iterator.rb:116:in
call' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource_collection/stepable_iterator.rb:116:incall_iterator_block’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource_collection/stepable_iterator.rb:85:in
step' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource_collection/stepable_iterator.rb:104:initerate’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource_collection/stepable_iterator.rb:55:in
each_with_index' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource_collection.rb:96:inexecute_each_resource’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/runner.rb:80:in
converge' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/client.rb:429:inconverge’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/client.rb:494:in
do_run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/client.rb:199:inblock in run’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/client.rb:193:in
fork' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/client.rb:193:inrun’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/application.rb:183:in
run_chef_client' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/application/client.rb:302:inblock in run_application’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/application/client.rb:294:in
loop' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/application/client.rb:294:inrun_application’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/application.rb:66:in
run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/bin/chef-client:26:in<top (required)>’
/usr/bin/chef-client:23:in load' /usr/bin/chef-client:23:in

https://gist.github.com/philcryer/6810475

http://philcryer.com


#2

Additionally, I’m currently running 6 nodes, and the 4 that this
always happens to are the web servers (which have 15+ recipes/packages
assigned to them), the 2 db servers (which only have 2 recipes
assigned) never have this issue, which makes me think something is
happening with yum to gum things up.

On Thu, Oct 3, 2013 at 9:19 AM, Phil Cryer phil@philcryer.com wrote:

ISSUE

76 running chef-client processes exhausting system memory, causing
other apps to crash (in this case zabbix), chef-client is set to run
once an hour, currently showing 76 running processes.

[root@dc2mgmtsavpd01 ~]# ps -fe | grep chef-client | wc -l
76

Also, when I have failed chef-client runs like this I always find this
in the ps; a chef-client worker process that is old (hung?), and a
yum-dump.py that always needs to be killed before a chef-client run
will be successful.

ps -fe|grep chef

root 7082 13613 23 09:12 ? 00:00:00 chef-client worker:
ppid=13613;start=09:12:03;
root 7221 7082 26 09:12 ? 00:00:00 /usr/bin/python
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum-dump.py
–options --installed-provides

After killing the 76 processes, and a kill -9 on the chef-client work
and yum-dump.py, a chef-client run will be successful, as will others
that are run automatically after that - it’s just at some point down
the line one will get hung up and cause this cascade of processes.

SPECS

chef-client -v

Chef: 11.6.0

cat /etc/issue.net

CentOS release 6.4 (Final)
Kernel \r on an \m

Vmware instance with one core and 3G RAM

LOGS

[…]
[2013-10-03T09:06:37-05:00] INFO: Forking chef instance to converge…
[2013-10-03T09:06:37-05:00] INFO: *** Chef 11.6.0 ***
[2013-10-03T09:06:37-05:00] WARN: unable to detect ipaddress
[2013-10-03T09:06:37-05:00] WARN: unable to detect macaddress
[2013-10-03T09:06:37-05:00] WARN: unable to detect ip6address
[2013-10-03T09:06:37-05:00] INFO: Run List is [recipe[base],
recipe[pd-web], recipe[diptables]]
[2013-10-03T09:06:37-05:00] INFO: Run List expands to [base, pd-web, diptables]
[2013-10-03T09:06:37-05:00] INFO: Starting Chef Run for
dc2mgmtsavpd01.mgmt.sdirect
[2013-10-03T09:06:37-05:00] INFO: Running start handlers
[2013-10-03T09:06:37-05:00] INFO: Start handlers complete.
[2013-10-03T09:06:38-05:00] INFO: Loading cookbooks [apache2, apt,
aws, base, build-essential, chef-client, chef-varnish, chef_handler,
couchbase, cron, database, diptables, dmg, drush, firewall, git, line,
mysql, nfs, ntp, openssl, php, postfix, postgresql, rsyslog, runit,
savviscom-pd-web, splunk_handler, timezone-ii, ufw, windows, xfs, xml,
yum, zabbix]
[2013-10-03T09:06:39-05:00] WARN: Cloning resource attributes for
template[/etc/php.ini] from prior resource (CHEF-3694)
[2013-10-03T09:06:39-05:00] WARN: Previous template[/etc/php.ini]:
/var/chef/cache/cookbooks/pd-web/recipes/default.rb:4:in from_file' [2013-10-03T09:06:39-05:00] WARN: Current template[/etc/php.ini]: /var/chef/cache/cookbooks/php/recipes/package.rb:27:infrom_file’
[2013-10-03T09:06:39-05:00] WARN: Cloning resource attributes for
cookbook_file[/etc/httpd/conf/httpd.conf] from prior resource
(CHEF-3694)
[2013-10-03T09:06:39-05:00] WARN: Previous
cookbook_file[/etc/httpd/conf/httpd.conf]:
/var/chef/cache/cookbooks/savviscom-pd-web/recipes/default.rb:55:in
from_file' [2013-10-03T09:06:39-05:00] WARN: Current cookbook_file[/etc/httpd/conf/httpd.conf]: /var/chef/cache/cookbooks/apache2/recipes/default.rb:221:infrom_file’
[2013-10-03T09:06:39-05:00] WARN: Cloning resource attributes for
service[apache2] from prior resource (CHEF-3694)
[2013-10-03T09:06:39-05:00] WARN: Previous service[apache2]:
/var/chef/cache/cookbooks/apache2/recipes/default.rb:24:in from_file' [2013-10-03T09:06:39-05:00] WARN: Current service[apache2]: /var/chef/cache/cookbooks/apache2/recipes/default.rb:228:infrom_file’
[2013-10-03T09:06:39-05:00] WARN: Cloning resource attributes for
service[moxi-server] from prior resource (CHEF-3694)
[2013-10-03T09:06:39-05:00] WARN: Previous service[moxi-server]:
/var/chef/cache/cookbooks/couchbase/recipes/moxi.rb:42:in from_file' [2013-10-03T09:06:39-05:00] WARN: Current service[moxi-server]: /var/chef/cache/cookbooks/couchbase/recipes/moxi.rb:56:infrom_file’
[2013-10-03T09:06:39-05:00] WARN: Cloning resource attributes for
template[/etc/varnish/default.vcl] from prior resource (CHEF-3694)
[2013-10-03T09:06:39-05:00] WARN: Previous
template[/etc/varnish/default.vcl]:
/var/chef/cache/cookbooks/chef-varnish/recipes/default.rb:56:in
from_file' [2013-10-03T09:06:39-05:00] WARN: Current template[/etc/varnish/default.vcl]: /var/chef/cache/cookbooks/savviscom-pd-web/recipes/default.rb:113:infrom_file’
[2013-10-03T09:06:39-05:00] WARN: Cloning resource attributes for
service[varnishlog] from prior resource (CHEF-3694)
[2013-10-03T09:06:39-05:00] WARN: Previous service[varnishlog]:
/var/chef/cache/cookbooks/chef-varnish/recipes/default.rb:81:in
from_file' [2013-10-03T09:06:39-05:00] WARN: Current service[varnishlog]: /var/chef/cache/cookbooks/savviscom-pd-web/recipes/default.rb:120:infrom_file’
[2013-10-03T09:06:39-05:00] WARN: Don’t know how to set up automatic
iptables on your distribution, sorry. Please submit a bug ticket at
https://github.com/wk8/cookbook-iptables/issues
[2013-10-03T09:06:39-05:00] INFO: Processing template[/root/.profile]
action create (base::default line 1)
[2013-10-03T09:06:39-05:00] INFO: Processing
template[/root/.bash_profile] action create (base::default line 8)
[2013-10-03T09:06:39-05:00] INFO: Processing template[/root/.curlrc]
action create (base::default line 15)
[2013-10-03T09:06:39-05:00] INFO: Processing
yum_key[RPM-GPG-KEY-EPEL-6] action add (yum::epel line 22)
[2013-10-03T09:06:39-05:00] INFO: Processing yum_repository[epel]
action add (yum::epel line 27)
[2013-10-03T09:06:39-05:00] INFO: Processing package[tzdata] action
install (timezone-ii::default line 16)
[2013-10-03T09:06:39-05:00] INFO: Running queued delayed notifications
before re-raising exception
[2013-10-03T09:06:39-05:00] ERROR: Running exception handlers
[2013-10-03T09:06:39-05:00] ERROR: Exception handlers complete
[2013-10-03T09:06:39-05:00] FATAL: Stacktrace dumped to
/var/chef/cache/chef-stacktrace.out
[2013-10-03T09:06:39-05:00] ERROR:
Chef::Exceptions::ChildConvergeError: Chef run process exited
unsuccessfully (exit code 1)
[2013-10-03T09:06:39-05:00] ERROR: Sleeping for 3600 seconds before trying again

STACKTRACE

cat /var/chef/cache/chef-stacktrace.out

Generated at 2013-10-03 09:06:39 -0500
Errno::ENOMEM: package[tzdata] (timezone-ii::default line 16) had an
error: Errno::ENOMEM: Cannot allocate memory - fork(2)
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/mixlib-shellout-1.2.0/lib/mixlib/shellout/unix.rb:256:in
fork' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/mixlib-shellout-1.2.0/lib/mixlib/shellout/unix.rb:256:infork_subprocess’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/mixlib-shellout-1.2.0/lib/mixlib/shellout/unix.rb:40:in
run_command' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/mixlib-shellout-1.2.0/lib/mixlib/shellout.rb:225:inrun_command’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/mixin/shell_out.rb:30:in
shell_out' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/mixin/shell_out.rb:35:inshell_out!’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum.rb:714:in
refresh' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum.rb:806:inpackage_available?’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider/package/yum.rb:1055:in
load_current_resource' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/provider.rb:97:inrun_action’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource.rb:625:in
run_action' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/runner.rb:49:inrun_action’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/runner.rb:81:in
block (2 levels) in converge' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/runner.rb:81:ineach’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/runner.rb:81:in
block in converge' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource_collection.rb:98:inblock in execute_each_resource’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource_collection/stepable_iterator.rb:116:in
call' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource_collection/stepable_iterator.rb:116:incall_iterator_block’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource_collection/stepable_iterator.rb:85:in
step' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource_collection/stepable_iterator.rb:104:initerate’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource_collection/stepable_iterator.rb:55:in
each_with_index' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/resource_collection.rb:96:inexecute_each_resource’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/runner.rb:80:in
converge' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/client.rb:429:inconverge’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/client.rb:494:in
do_run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/client.rb:199:inblock in run’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/client.rb:193:in
fork' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/client.rb:193:inrun’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/application.rb:183:in
run_chef_client' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/application/client.rb:302:inblock in run_application’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/application/client.rb:294:in
loop' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/application/client.rb:294:inrun_application’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/lib/chef/application.rb:66:in
run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.6.0/bin/chef-client:26:in<top (required)>’
/usr/bin/chef-client:23:in load' /usr/bin/chef-client:23:in

https://gist.github.com/philcryer/6810475

http://philcryer.com


http://philcryer.com


#3

On Thursday, October 3, 2013 at 7:30 AM, Phil Cryer wrote:

Additionally, I’m currently running 6 nodes, and the 4 that this
always happens to are the web servers (which have 15+ recipes/packages
assigned to them), the 2 db servers (which only have 2 recipes
assigned) never have this issue, which makes me think something is
happening with yum to gum things up.

On Thu, Oct 3, 2013 at 9:19 AM, Phil Cryer <phil@philcryer.com (mailto:phil@philcryer.com)> wrote:

ISSUE

76 running chef-client processes exhausting system memory, causing
other apps to crash (in this case zabbix), chef-client is set to run
once an hour, currently showing 76 running processes.

Are you running chef out of cron?


Daniel DeLeo


#4

On Thu, Oct 3, 2013 at 10:41 AM, Daniel DeLeo dan@kallistec.com wrote:

On Thursday, October 3, 2013 at 7:30 AM, Phil Cryer wrote:

Additionally, I’m currently running 6 nodes, and the 4 that this
always happens to are the web servers (which have 15+ recipes/packages
assigned to them), the 2 db servers (which only have 2 recipes
assigned) never have this issue, which makes me think something is
happening with yum to gum things up.

On Thu, Oct 3, 2013 at 9:19 AM, Phil Cryer phil@philcryer.com wrote:

ISSUE

76 running chef-client processes exhausting system memory, causing
other apps to crash (in this case zabbix), chef-client is set to run
once an hour, currently showing 76 running processes.

Are you running chef out of cron?

No, it’s installed as a service, so it’s ‘running’ all the time, here
you can see it’s set to only run once an hour.

root 1663 1 0 00:15 ? 00:00:00
/opt/chef/embedded/bin/ruby /usr/bin/chef-client -d -c
/etc/chef/client.rb -L /var/log/chef/client.log -P
/var/run/chef/client.pid -i 3600 -s 300

Also, while scrolling the logs I did see this error, that I hadn’t
caught before - seem to be related to the yum-dump.py script, looks
like I have something wrong with my yum config - “Problem parsing line
’Freeing read locks for locker 0xca9:” will look for that error online

  • package[tzdata] action install[2013-10-03T10:00:39-05:00] WARN:
    Problem parsing line ‘Freeing read locks for locker 0xca7:
    32758/140342062917376’ from yum-dump.py! Please check your yum
    configuration.
    [2013-10-03T10:00:39-05:00] WARN: Problem parsing line ‘Freeing read
    locks for locker 0xca9: 32758/140342062917376’ from yum-dump.py!
    Please check your yum configuration.
    (up to date)

FWIW, I use Debian almost exclusively (sometimes Ubuntu) so this
Centos stuff is new to me :slight_smile:


#5

On Thursday, October 3, 2013 at 9:56 AM, Phil Cryer wrote:

On Thu, Oct 3, 2013 at 10:41 AM, Daniel DeLeo <dan@kallistec.com (mailto:dan@kallistec.com)> wrote:

On Thursday, October 3, 2013 at 7:30 AM, Phil Cryer wrote:

Additionally, I’m currently running 6 nodes, and the 4 that this
always happens to are the web servers (which have 15+ recipes/packages
assigned to them), the 2 db servers (which only have 2 recipes
assigned) never have this issue, which makes me think something is
happening with yum to gum things up.

On Thu, Oct 3, 2013 at 9:19 AM, Phil Cryer <phil@philcryer.com (mailto:phil@philcryer.com)> wrote:

ISSUE

76 running chef-client processes exhausting system memory, causing
other apps to crash (in this case zabbix), chef-client is set to run
once an hour, currently showing 76 running processes.

Are you running chef out of cron?

No, it’s installed as a service, so it’s ‘running’ all the time, here
you can see it’s set to only run once an hour.

root 1663 1 0 00:15 ? 00:00:00
/opt/chef/embedded/bin/ruby /usr/bin/chef-client -d -c
/etc/chef/client.rb -L /var/log/chef/client.log -P
/var/run/chef/client.pid -i 3600 -s 300

Also, while scrolling the logs I did see this error, that I hadn’t
caught before - seem to be related to the yum-dump.py script, looks
like I have something wrong with my yum config - “Problem parsing line
’Freeing read locks for locker 0xca9:” will look for that error online

  • package[tzdata] action install[2013-10-03T10:00:39-05:00] WARN:
    Problem parsing line ‘Freeing read locks for locker 0xca7:
    32758/140342062917376’ from yum-dump.py! Please check your yum
    configuration.
    [2013-10-03T10:00:39-05:00] WARN: Problem parsing line ‘Freeing read
    locks for locker 0xca9: 32758/140342062917376’ from yum-dump.py!
    Please check your yum configuration.
    (up to date)

FWIW, I use Debian almost exclusively (sometimes Ubuntu) so this
Centos stuff is new to me :slight_smile:

Seems like you have 2 problems here. One looks a lot like https://tickets.opscode.com/browse/CHEF-4556

This issue applies when:

  • You’re running chef daemonized
  • You’re using the chef-client cookbook to manage the daemon
  • You’re using using a System V R4 type init system (i.e., plain-ol-init scripts and not daemontools/runit/upstart/systemd/etc.)
  • You get into a state where the pid file used by chef’s daemonization code is corrupt or stale.

You can generally rectify this by killing all chef processes, removing the pid file, and then starting chef via the init script. We narrowed the range of conditions in which this problem can occur in Chef 11.6, and are going to rewrite that portion of the daemonization code to fix it entirely in 11.8

The yum-dump.py bug is over my head, not being too familiar with yum/CentOS myself. Maybe take a look on the bug tracker to see if there’s anything relevant, and if not, create a new issue?


Daniel DeLeo


#6

Also, while scrolling the logs I did see this error, that I hadn’t
caught before - seem to be related to the yum-dump.py script, looks
like I have something wrong with my yum config - “Problem parsing line
’Freeing read locks for locker 0xca9:” will look for that error online
if you haven’t found it already:

https://bugzilla.redhat.com/show_bug.cgi?id=918184

as an ex-redhat/centos sysadmin for probably 15-ish years, i’d like to
welcome you to the wonderful world of rpm database lock issues =)

you’ll probably want to bookmark this page as well:

http://www.oldrpm.org/hintskinks/repairdb/