We are in the process of moving several nodes from older hosted Chef to Enterprise Chef servers. I’ve moved several alpha (test) nodes but they no longer converge. Here are steps I followed.
Noted the IP address of the node
Deleted the node from the old chef server
Via ssh visited the node and clean out the /etc/chef directory
Ran knife bootstrap
Visited the node and confirmed via ps -ef | grep chef-client that chef-client was running with the proper interval and splay.
After waiting the 15 minutes for the interval the node fails to converge with these messages in /var/log/chef/client.log
[2014-07-16T13:38:50-05:00] INFO: *** Chef 11.12.2 ***
[2014-07-16T13:38:50-05:00] INFO: Chef-client pid: 7550
[2014-07-16T13:38:50-05:00] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out
[2014-07-16T13:38:50-05:00] ERROR: Unable to determine node name: configure node_name or configure the system’s hostname and fqdn
[2014-07-16T13:38:50-05:00] ERROR: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
[2014-07-16T13:38:50-05:00] ERROR: Sleeping for 900 seconds before trying again
And these messages in /var/chef/cache/chef-stacktrace.out
Generated at 2014-07-16 13:38:50 -0500
Chef::Exceptions::CannotDetermineNodeName: Unable to determine node name: configure node_name or configure the system’s hostname and fqdn
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:299:in node_name' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:313:inregister’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:416:in do_run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:213:inblock in run’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:207:in fork' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:207:inrun’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application.rb:217:in run_chef_client' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:328:inblock in run_application’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:317:in loop' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:317:inrun_application’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application.rb:67:in run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/bin/chef-client:26:in<top (required)>’
/usr/bin/chef-client:23:in load' /usr/bin/chef-client:23:in'
/var/chef/cache/chef-stacktrace.out (END)
I’ve checked hostname -f and it produces the FQDN. Running chef-client manually on the node succeeds, it’s only the background process that’s failing. For what it’s worth, here is the output from ps -ef | grep chef-client
Can you increase the log level to debug in client.rb? That should show the failing ohai output. You can also try running ohai manually as well.
On Wednesday, July 16, 2014 at 11:53 AM, Mark Nichols wrote:
Ohai Chefs,
We are in the process of moving several nodes from older hosted Chef to Enterprise Chef servers. I’ve moved several alpha (test) nodes but they no longer converge. Here are steps I followed.
Noted the IP address of the node
Deleted the node from the old chef server
Via ssh visited the node and clean out the /etc/chef directory
Ran knife bootstrap
Visited the node and confirmed via ps -ef | grep chef-client that chef-client was running with the proper interval and splay.
After waiting the 15 minutes for the interval the node fails to converge with these messages in /var/log/chef/client.log
[2014-07-16T13:38:50-05:00] INFO: *** Chef 11.12.2 ***
[2014-07-16T13:38:50-05:00] INFO: Chef-client pid: 7550
[2014-07-16T13:38:50-05:00] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out
[2014-07-16T13:38:50-05:00] ERROR: Unable to determine node name: configure node_name or configure the system's hostname and fqdn
[2014-07-16T13:38:50-05:00] ERROR: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
[2014-07-16T13:38:50-05:00] ERROR: Sleeping for 900 seconds before trying again
And these messages in /var/chef/cache/chef-stacktrace.out
Generated at 2014-07-16 13:38:50 -0500
Chef::Exceptions::CannotDetermineNodeName: Unable to determine node name: configure node_name or configure the system's hostname and fqdn
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:299:in node_name' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:313:in register'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:416:in do_run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:213:in block in run'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:207:in fork' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:207:in run'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application.rb:217:in run_chef_client' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:328:in block in run_application'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:317:in loop' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:317:in run_application'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application.rb:67:in run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/bin/chef-client:26:in <top (required)>'
/usr/bin/chef-client:23:in load' /usr/bin/chef-client:23:in '
/var/chef/cache/chef-stacktrace.out (END)
I’ve checked hostname -f and it produces the FQDN. Running chef-client manually on the node succeeds, it’s only the background process that’s failing. For what it’s worth, here is the output from ps -ef | grep chef-client
Very strange that manual chef-client run succeeds but chef-client daemon
does not.
This is silly, but did you kill the old chef-client process before
bootstrapping? Also, have you tried rebooting the host?
On Wed, Jul 16, 2014 at 11:53 AM, Mark Nichols chef@zanshin.net wrote:
Ohai Chefs,
We are in the process of moving several nodes from older hosted Chef to
Enterprise Chef servers. I’ve moved several alpha (test) nodes but they no
longer converge. Here are steps I followed.
Noted the IP address of the node
Deleted the node from the old chef server
Via ssh visited the node and clean out the /etc/chef directory
Ran knife bootstrap
Visited the node and confirmed via ps -ef | grep chef-client that
chef-client was running with the proper interval and splay.
After waiting the 15 minutes for the interval the node fails to converge
with these messages in /var/log/chef/client.log
[2014-07-16T13:38:50-05:00] INFO: *** Chef 11.12.2 ***
[2014-07-16T13:38:50-05:00] INFO: Chef-client pid: 7550
[2014-07-16T13:38:50-05:00] FATAL: Stacktrace dumped to
/var/chef/cache/chef-stacktrace.out
[2014-07-16T13:38:50-05:00] ERROR: Unable to determine node name:
configure node_name or configure the system's hostname and fqdn
[2014-07-16T13:38:50-05:00] ERROR: Chef::Exceptions::ChildConvergeError:
Chef run process exited unsuccessfully (exit code 1)
[2014-07-16T13:38:50-05:00] ERROR: Sleeping for 900 seconds before trying
again
And these messages in /var/chef/cache/chef-stacktrace.out
Generated at 2014-07-16 13:38:50 -0500
Chef::Exceptions::CannotDetermineNodeName: Unable to determine node name:
configure node_name or configure the system's hostname and fqdn
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:299:in node_name' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:313:in register'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:416:in do_run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:213:in block in run'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:207:in fork' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:207:in run'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application.rb:217:in run_chef_client' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:328:in block in run_application'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:317:in loop' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:317:in run_application'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application.rb:67:in run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/bin/chef-client:26:in <top (required)>'
/usr/bin/chef-client:23:in load' /usr/bin/chef-client:23:in '
/var/chef/cache/chef-stacktrace.out (END)
I’ve checked hostname -f and it produces the FQDN. Running chef-client
manually on the node succeeds, it’s only the background process that’s
failing. For what it’s worth, here is the output from ps -ef | grep chef-client
Very strange that manual chef-client run succeeds but chef-client daemon does not.
This is silly, but did you kill the old chef-client process before bootstrapping? Also, have you tried rebooting the host?
I did kill the old chef-client process prior to the new bootstrap command. I haven’t tried rebooting the host. That would be, ah, complicated. These are legacy (i.e., pre-chef) servers that aren’t easily rebooted.
I’ll explore more with tcpdump and debug turned on for the chef client.