Investigating slow chef-solo startup times

Hi –

We’re noticing that chef solo is taking very long (9-11 seconds) to run for
what seem like a small set of recipes that create a few files (we do not do
any network IO as part of that chef run)

Looking at the logs, it seems most of the time is spent during startup. Are
there any tips and tricks to improve startup time?

If it helps, we’re running the latest (v11.12.8) installed via omnibus on a
centos 6.5 OpenVZ guest.

  • Ketan

You can use the even handler system to pin pointvthe exact step thats
consuning time (just inherit the base dispatcher and print current time ).
From there you can use standard rbtrace or other profiler to check gc and
memory/cpu related stats as well.
It may be some resource related ..like file checksum etc or ruby gc related
(unlikely in your case).. but the first logical step will be to figure out
the exact stage. There arr resource and recipe wise time profiler also,
distributed as a chef handler

Also, whats the output of --force-logger. That should indicate the time
consuming stage as well.
On Jun 28, 2014 4:36 AM, "Ketan Padegaonkar" ketanpadegaonkar@gmail.com
wrote:

Hi –

We're noticing that chef solo is taking very long (9-11 seconds) to run
for what seem like a small set of recipes that create a few files (we do
not do any network IO as part of that chef run)

Looking at the logs, it seems most of the time is spent during startup.
Are there any tips and tricks to improve startup time?

If it helps, we're running the latest (v11.12.8) installed via omnibus on
a centos 6.5 OpenVZ guest.

  • Ketan

Are you connected using LDAP? 9/10 times, it's LDAP loading users/groups.

Thanks,
Seth

On Jun 28, 2014, at 7:36 AM, Ketan Padegaonkar ketanpadegaonkar@gmail.com wrote:

Hi –

We're noticing that chef solo is taking very long (9-11 seconds) to run for what seem like a small set of recipes that create a few files (we do not do any network IO as part of that chef run)

Looking at the logs, it seems most of the time is spent during startup. Are there any tips and tricks to improve startup time?

If it helps, we're running the latest (v11.12.8) installed via omnibus on a centos 6.5 OpenVZ guest.

  • Ketan

We are not doing ldap. Further investigation revealed that the ohai cloud
/filesystem/network plugins were taking most of this time. Disabling these
shaved about 4 seconds off the startup time.

I could provide the the timings for these plugins if anyone is interested
about it a bit more.
On Jun 30, 2014 4:23 PM, "Seth Vargo" sethvargo@getchef.com wrote:

Are you connected using LDAP? 9/10 times, it's LDAP loading users/groups.

Thanks,
Seth

On Jun 28, 2014, at 7:36 AM, Ketan Padegaonkar <
ketanpadegaonkar@gmail.com> wrote:

Hi –

We're noticing that chef solo is taking very long (9-11 seconds) to run
for what seem like a small set of recipes that create a few files (we do
not do any network IO as part of that chef run)

Looking at the logs, it seems most of the time is spent during startup.
Are there any tips and tricks to improve startup time?

If it helps, we're running the latest (v11.12.8) installed via omnibus
on a centos 6.5 OpenVZ guest.

  • Ketan

Here's an anecdote: one time we have chef-client being stuck due to
sysadmin taking a filer offline without un-mounting volume from client
hosts first (df command would get stuck as well). So I tried to work around
the issue temporarily by disabling ohai's "linux/filesystem" plugin, but it
didn't have any effect whatsoever.

Ultimately we had to reboot every host with the issue to get it back to
normal.

Here's the output from chef-client upon issuing Ctrl+C from it being stuck
trying to list a hung network mount. We did try ohai 7.0.2 and 6.20.0 with
the same result.

SystemExit: exit
/usr/lib64/ruby/gems/1.9.1/gems/ohai-6.20.0/lib/ohai/mixin/command.rb:282:in
select' /usr/lib64/ruby/gems/1.9.1/gems/ohai-6.20.0/lib/ohai/mixin/command.rb:282:in popen4'
/usr/lib64/ruby/gems/1.9.1/gems/ohai-6.20.0/lib/ohai/plugins/linux/
filesystem.rb:24:in from_file' /usr/lib64/ruby/gems/1.9.1/gems/ohai-6.20.0/lib/ohai/mixin/from_file.rb:29:in instance_eval'
/usr/lib64/ruby/gems/1.9.1/gems/ohai-6.20.0/lib/ohai/mixin/from_file.rb:29:in
from_file' /usr/lib64/ruby/gems/1.9.1/gems/ohai-6.20.0/lib/ohai/system.rb:216:in block in require_plugin'
/usr/lib64/ruby/gems/1.9.1/gems/ohai-6.20.0/lib/ohai/system.rb:211:in each' /usr/lib64/ruby/gems/1.9.1/gems/ohai-6.20.0/lib/ohai/system.rb:211:in require_plugin'
/usr/lib64/ruby/gems/1.9.1/gems/ohai-6.20.0/lib/ohai/system.rb:140:in
block (2 levels) in all_plugins' /usr/lib64/ruby/gems/1.9.1/gems/ohai-6.20.0/lib/ohai/system.rb:133:in each'
/usr/lib64/ruby/gems/1.9.1/gems/ohai-6.20.0/lib/ohai/system.rb:133:in
block in all_plugins' /usr/lib64/ruby/gems/1.9.1/gems/ohai-6.20.0/lib/ohai/system.rb:131:in each'
/usr/lib64/ruby/gems/1.9.1/gems/ohai-6.20.0/lib/ohai/system.rb:131:in
all_plugins' /usr/lib64/ruby/gems/1.9.1/gems/chef-11.8.2/lib/chef/client.rb:267:in run_ohai'
/usr/lib64/ruby/gems/1.9.1/gems/chef-11.8.2/lib/chef/client.rb:484:in
do_run' /usr/lib64/ruby/gems/1.9.1/gems/chef-11.8.2/lib/chef/client.rb:199:in block in run'
/usr/lib64/ruby/gems/1.9.1/gems/chef-11.8.2/lib/chef/client.rb:193:in fork' /usr/lib64/ruby/gems/1.9.1/gems/chef-11.8.2/lib/chef/client.rb:193:in run'
/usr/lib64/ruby/gems/1.9.1/gems/chef-11.8.2/lib/chef/application.rb:208:in
run_chef_client' /usr/lib64/ruby/gems/1.9.1/gems/chef-11.8.2/lib/chef/application/client.rb:312:in block in run_application'
/usr/lib64/ruby/gems/1.9.1/gems/chef-11.8.2/lib/chef/application/client.rb:304:in
loop' /usr/lib64/ruby/gems/1.9.1/gems/chef-11.8.2/lib/chef/application/client.rb:304:in run_application'
/usr/lib64/ruby/gems/1.9.1/gems/chef-11.8.2/lib/chef/application.rb:66:in
run' /usr/lib64/ruby/gems/1.9.1/gems/chef-11.8.2/bin/chef-client:26:in <top
(required)>'
/usr/bin/chef-client:23:in load' /usr/bin/chef-client:23:in '

On Thu, Jul 10, 2014 at 3:57 AM, Ketan Padegaonkar <
ketanpadegaonkar@gmail.com> wrote:

We are not doing ldap. Further investigation revealed that the ohai cloud
/filesystem/network plugins were taking most of this time. Disabling these
shaved about 4 seconds off the startup time.

I could provide the the timings for these plugins if anyone is interested
about it a bit more.
On Jun 30, 2014 4:23 PM, "Seth Vargo" sethvargo@getchef.com wrote:

Are you connected using LDAP? 9/10 times, it's LDAP loading users/groups.

Thanks,
Seth

On Jun 28, 2014, at 7:36 AM, Ketan Padegaonkar <
ketanpadegaonkar@gmail.com> wrote:

Hi –

We're noticing that chef solo is taking very long (9-11 seconds) to run
for what seem like a small set of recipes that create a few files (we do
not do any network IO as part of that chef run)

Looking at the logs, it seems most of the time is spent during startup.
Are there any tips and tricks to improve startup time?

If it helps, we're running the latest (v11.12.8) installed via omnibus
on a centos 6.5 OpenVZ guest.

  • Ketan

--
Best regards, Dmitriy V.