Initial chef-client run won't exit


#1

hi y’all,

my client is Centos 5.7, chef 0.10.4-1. the way i intialize chef on the
client is by running:

chef-client -j /etc/chef/first-boot.json -L /var/log/chef/client.log -l debug --once

but that first chef-client run never exits, it hangs onto the tty (if
there is one). or when it’s run at first boot via an init script, i see
the same hang.

some of the recipes in the run_list are from the chef-client cookbook.
the upshot being, the chef-client launches from there as a daemon.

what’s going on? how to fix?

lots of details below. thanks!
kallen

[root@sous-chef3 ~]# cat /etc/chef/first-boot.json
{
“run_list”: [ “role[base]”, “role[mail-client]”, “role[syslog-ng-client]”, “recipe[ec2-ephraid]”, “recipe[ec2-ebs]”, “recipe[cleanup]” ]
}

the base role includes the chef-client recipe. so that’s where the first chef-client run fires off the chef-client daemon.

  "recipe[chef-client::delete_validation]",
  "recipe[chef-client::config]",
  "recipe[chef-client::service]",
  "role[dns-client]",
  "recipe[users::ops]",
  "recipe[users::eng]",
  "recipe[snmp]",
  "recipe[toolbin]",
  "recipe[yum]"

[root@sous-chef3 ~]# cat /etc/chef/client.rb
log_level :info
log_location STDOUT
interval 21600
chef_server_url "https://10.xx.yy.hi"
validation_client_name “chef-validator”

Using default node name

file_backup_path "/var/lib/chef"
file_cache_path "/var/cache/chef"
pid_file “/var/run/chef/client.pid”

the moment in processlist where the first chef-client run “spawns” chef-client in daemon mode:

root 1859 6.6 2.5 48724 45040 pts/1 S+ 18:31 0:11 | _ /usr/bin/ruby /usr/bin/chef-client -j /etc/chef/first-boot.json -L /var/log/chef/client.log -l debug --once
root 4310 0.0 0.0 2516 1148 pts/1 S+ 18:34 0:00 | _ /bin/sh /sbin/service chef-client restart
root 4315 0.0 0.0 5740 1228 pts/1 R+ 18:34 0:00 | _ /bin/bash /etc/init.d/chef-client restart
root 4316 0.0 0.0 0 0 pts/1 Z+ 18:34 0:00 | _ [chef-client]

and after the first chef-client run finishes, it doesn’t exit. it sits around:

root 1859 4.0 2.5 48716 45040 pts/1 S+ 18:31 0:11 | _ /usr/bin/ruby /usr/bin/chef-client -j /etc/chef/first-boot.json -L /var/log/chef/client.log -l debug --once

some debug output from that first chef-client run:

[Fri, 02 Dec 2011 18:34:28 +0000] DEBUG: Sending HTTP Request via PUT to 10.64.94.113:443/nodes/sous-chef3.dev.solsticeiscoming.com
[Fri, 02 Dec 2011 18:34:28 +0000] INFO: Chef Run complete in 172.367392 seconds
[Fri, 02 Dec 2011 18:34:28 +0000] DEBUG: Cleaning the checksum cache
[Fri, 02 Dec 2011 18:34:28 +0000] INFO: Running report handlers
[Fri, 02 Dec 2011 18:34:28 +0000] INFO: Report handlers complete
[Fri, 02 Dec 2011 18:34:28 +0000] DEBUG: Sleeping for 21600 seconds
… and it hangs onto my tty

and yet the chef-client daemon is also running:

root 4326 1.9 2.2 44392 39912 ? S 18:34 0:02 /usr/bin/ruby /usr/bin/chef-client -d -c /etc/chef/client.rb -L /var/log/chef/client.log -P /var/run/chef/client.pid -i 21600 -s 20

[root@sous-chef3 ~]# strace -p 1859
Process 1859 attached - interrupt to quit
select(5, [4], NULL, NULL, {21267, 776000} <unfinished …>
Process 1859 detached
[root@sous-chef3 ~]# lsof -p 1859 | tail
chef-clie 1859 root mem REG 8,1 1081647 /lib/i686/nosegneg/libm-2.5.so (path inode=1081642)
chef-clie 1859 root mem REG 8,1 41984 577025 /usr/lib/ruby/1.8/i386-linux/socket.so
chef-clie 1859 root mem REG 8,1 1081648 /lib/i686/nosegneg/libpthread-2.5.so (path inode=1084237)
chef-clie 1859 root mem REG 8,1 245256 577041 /usr/lib/ruby/1.8/i386-linux/nkf.so
chef-clie 1859 root 0u CHR 136,1 3 /dev/pts/1
chef-clie 1859 root 1u CHR 136,1 3 /dev/pts/1
chef-clie 1859 root 2u CHR 136,1 3 /dev/pts/1
chef-clie 1859 root 3w REG 8,1 815020 886152 /var/log/chef/client.log
chef-clie 1859 root 4r FIFO 0,6 37225 pipe
chef-clie 1859 root 5w FIFO 0,6 37225 pipe

and so i have to ctrl-c that first hanging chef run:
^C
[Fri, 02 Dec 2011 18:46:04 +0000] FATAL: SIGINT received, stopping
[root@sous-chef3 ~]# echo $?
2

and finally, fwiw, the bit in debug log where the first chef-client run runs the chef-client cookbook:

[Fri, 02 Dec 2011 18:34:22 +0000] INFO: template[/etc/chef/client.rb] sending create action to ruby_block[reload_client_config] (delayed)
[Fri, 02 Dec 2011 18:34:22 +0000] INFO: Processing ruby_block[reload_client_config] action create (chef-client::config line 48)
[Fri, 02 Dec 2011 18:34:22 +0000] INFO: ruby_block[reload_client_config] called
[Fri, 02 Dec 2011 18:34:22 +0000] INFO: template[/etc/init.d/chef-client] sending restart action to service[chef-client] (delayed)
[Fri, 02 Dec 2011 18:34:22 +0000] INFO: Processing service[chef-client] action restart (chef-client::service line 73)
[Fri, 02 Dec 2011 18:34:22 +0000] DEBUG: service[chef-client] supports status, running
[Fri, 02 Dec 2011 18:34:22 +0000] DEBUG: Executing /sbin/service chef-client status
[Fri, 02 Dec 2011 18:34:22 +0000] DEBUG: ---- Begin output of /sbin/service chef-client status ----
[Fri, 02 Dec 2011 18:34:22 +0000] DEBUG: STDOUT: chef-client is stopped
[Fri, 02 Dec 2011 18:34:22 +0000] DEBUG: STDERR:
[Fri, 02 Dec 2011 18:34:22 +0000] DEBUG: ---- End output of /sbin/service chef-client status ----
[Fri, 02 Dec 2011 18:34:22 +0000] DEBUG: Ran /sbin/service chef-client status returned 3
[Fri, 02 Dec 2011 18:34:22 +0000] DEBUG: sh(/sbin/chkconfig --list chef-client)
[Fri, 02 Dec 2011 18:34:22 +0000] DEBUG: Executing /sbin/service chef-client restart
[Fri, 02 Dec 2011 18:34:26 +0000] INFO: Daemonizing…
[Fri, 02 Dec 2011 18:34:26 +0000] INFO: Forked, in 4326. Priveleges: 0 0
[Fri, 02 Dec 2011 18:34:26 +0000] DEBUG: ---- Begin output of /sbin/service chef-client restart ----
[Fri, 02 Dec 2011 18:34:26 +0000] DEBUG: STDOUT: Stopping chef-client: [FAILED]
Starting chef-client: [ OK ]
[Fri, 02 Dec 2011 18:34:26 +0000] DEBUG: STDERR:
[Fri, 02 Dec 2011 18:34:26 +0000] DEBUG: ---- End output of /sbin/service chef-client restart ----
[Fri, 02 Dec 2011 18:34:26 +0000] DEBUG: Ran /sbin/service chef-client restart returned 0
[Fri, 02 Dec 2011 18:34:26 +0000] INFO: service[chef-client] restarted


#2

On Dec 2, 2011, at 1:08 PM, kallen@groknaut.net wrote:

my client is Centos 5.7, chef 0.10.4-1. the way i intialize chef on the
client is by running:

chef-client -j /etc/chef/first-boot.json -L /var/log/chef/client.log -l debug --once

but that first chef-client run never exits, it hangs onto the tty (if
there is one). or when it’s run at first boot via an init script, i see
the same hang.

The last time I saw anything like that, it was because someone had changed their knife.rb to default chef to staying resident and not exiting, which apparently does not get over-ridden by command-line options like “–once”.

[root@sous-chef3 ~]# cat /etc/chef/client.rb
log_level :info
log_location STDOUT
interval 21600

Oh, sorry – it must have been this option in the client.rb, not in knife.rb. But notice that the chef process below is running with a “-i 21600” argument, which is why all the chef-client runs will sleep at the end of the run instead of terminating.

Try removing this option.


Brad Knowles bknowles@ihiji.com
SAGE Level IV, Chef Level 0.0.1


#3

On Fri, 02 Dec 2011, Brad Knowles wrote:

On Dec 2, 2011, at 1:08 PM, kallen@groknaut.net wrote:

my client is Centos 5.7, chef 0.10.4-1. the way i intialize chef on the
client is by running:

chef-client -j /etc/chef/first-boot.json -L /var/log/chef/client.log -l debug --once

but that first chef-client run never exits, it hangs onto the tty (if
there is one). or when it’s run at first boot via an init script, i see
the same hang.

The last time I saw anything like that, it was because someone had changed their knife.rb to default chef to staying resident and not exiting, which apparently does not get over-ridden by command-line options like “–once”.

(yeah that was me back on Nov 2nd)

[root@sous-chef3 ~]# cat /etc/chef/client.rb
log_level :info
log_location STDOUT
interval 21600

Oh, sorry – it must have been this option in the client.rb, not in knife.rb. But notice that the chef process below is running with a “-i 21600” argument, which is why all the chef-client runs will sleep at the end of the run instead of terminating.

Try removing this option.

but that option is actually not present for the very first chef-client
run. (sorry that wasn’t clear from my initial post). the first run to
bootstrap the system runs with a client.rb in place like this:

[root@sous-chef3 ~]# cat /etc/chef/client.rb-first-boot-here-y0h
log_level :info
log_location STDOUT
ssl_verify_mode :verify_none
chef_server_url "https://10.doop.burf.113"
file_cache_path "/var/cache/chef"
file_backup_path "/var/lib/chef/backup"
pid_file "/var/run/chef/client.pid"
cache_options({ :path => “/var/cache/chef/checksums”, :skip_expires => true})
signing_ca_user "chef"
Mixlib::Log::Formatter.show_time = true
environment “dev”

there is no interval stated there. the client.rb that shows the interval
is a result of the chef-client recipe running during that first
chef-client run.

and even if the first chef-client run used a client.rb with an interval set, shouldn’t using --once override the looping? that’s what i understood from Dan DeLeo’s response here: http://lists.opscode.com/sympa/arc/chef/2011-11/msg00025.html

i’m running this again on a fresh system from the top:

[root@sous-chef3 ~]# chef-client -c /etc/chef/client.rb-first-boot-here-y0h -j /etc/chef/first-boot.json -L /var/log/chef/client.log -l debug --once

and the result is the same behavior as before.

kallen


#4

hi… any response to below? (bottom-reply…)

to summarize

  • the initial run didn’t use a client config that set the interval
  • doesn’t --once override any interval stated in the client config?
    (my examples shows it didn’t)

kallen

On Fri, 02 Dec 2011, kallen@groknaut.net wrote:

On Fri, 02 Dec 2011, Brad Knowles wrote:

On Dec 2, 2011, at 1:08 PM, kallen@groknaut.net wrote:

my client is Centos 5.7, chef 0.10.4-1. the way i intialize chef on the
client is by running:

chef-client -j /etc/chef/first-boot.json -L /var/log/chef/client.log -l debug --once

but that first chef-client run never exits, it hangs onto the tty (if
there is one). or when it’s run at first boot via an init script, i see
the same hang.

The last time I saw anything like that, it was because someone had changed their knife.rb to default chef to staying resident and not exiting, which apparently does not get over-ridden by command-line options like “–once”.

(yeah that was me back on Nov 2nd)

[root@sous-chef3 ~]# cat /etc/chef/client.rb
log_level :info
log_location STDOUT
interval 21600

Oh, sorry – it must have been this option in the client.rb, not in knife.rb. But notice that the chef process below is running with a “-i 21600” argument, which is why all the chef-client runs will sleep at the end of the run instead of terminating.

Try removing this option.

but that option is actually not present for the very first chef-client
run. (sorry that wasn’t clear from my initial post). the first run to
bootstrap the system runs with a client.rb in place like this:

[root@sous-chef3 ~]# cat /etc/chef/client.rb-first-boot-here-y0h
log_level :info
log_location STDOUT
ssl_verify_mode :verify_none
chef_server_url "https://10.doop.burf.113"
file_cache_path "/var/cache/chef"
file_backup_path "/var/lib/chef/backup"
pid_file "/var/run/chef/client.pid"
cache_options({ :path => “/var/cache/chef/checksums”, :skip_expires => true})
signing_ca_user "chef"
Mixlib::Log::Formatter.show_time = true
environment “dev”

there is no interval stated there. the client.rb that shows the interval
is a result of the chef-client recipe running during that first
chef-client run.

and even if the first chef-client run used a client.rb with an interval set, shouldn’t using --once override the looping? that’s what i understood from Dan DeLeo’s response here: http://lists.opscode.com/sympa/arc/chef/2011-11/msg00025.html

i’m running this again on a fresh system from the top:

[root@sous-chef3 ~]# chef-client -c /etc/chef/client.rb-first-boot-here-y0h -j /etc/chef/first-boot.json -L /var/log/chef/client.log -l debug --once

and the result is the same behavior as before.

kallen