Moved node to new chef server, no longer converges

Mark_Nichols · July 16, 2014, 6:53pm

Ohai Chefs,

We are in the process of moving several nodes from older hosted Chef to Enterprise Chef servers. I’ve moved several alpha (test) nodes but they no longer converge. Here are steps I followed.

Noted the IP address of the node
Deleted the node from the old chef server
Via ssh visited the node and clean out the /etc/chef directory
Ran knife bootstrap
Visited the node and confirmed via ps -ef | grep chef-client that chef-client was running with the proper interval and splay.

After waiting the 15 minutes for the interval the node fails to converge with these messages in /var/log/chef/client.log

[2014-07-16T13:38:50-05:00] INFO: *** Chef 11.12.2 ***
[2014-07-16T13:38:50-05:00] INFO: Chef-client pid: 7550
[2014-07-16T13:38:50-05:00] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out
[2014-07-16T13:38:50-05:00] ERROR: Unable to determine node name: configure node_name or configure the system’s hostname and fqdn
[2014-07-16T13:38:50-05:00] ERROR: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
[2014-07-16T13:38:50-05:00] ERROR: Sleeping for 900 seconds before trying again

And these messages in /var/chef/cache/chef-stacktrace.out

Generated at 2014-07-16 13:38:50 -0500
Chef::Exceptions::CannotDetermineNodeName: Unable to determine node name: configure node_name or configure the system’s hostname and fqdn
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:299:in node_name' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:313:inregister’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:416:in do_run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:213:inblock in run’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:207:in fork' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:207:inrun’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application.rb:217:in run_chef_client' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:328:inblock in run_application’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:317:in loop' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:317:inrun_application’
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application.rb:67:in run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/bin/chef-client:26:in<top (required)>’
/usr/bin/chef-client:23:in load' /usr/bin/chef-client:23:in'
/var/chef/cache/chef-stacktrace.out (END)

I’ve checked hostname -f and it produces the FQDN. Running chef-client manually on the node succeeds, it’s only the background process that’s failing. For what it’s worth, here is the output from ps -ef | grep chef-client

root 2662 1 0 Jul11 ? 00:00:13 /opt/chef/embedded/bin/ruby /usr/bin/chef-client -d -c /etc/chef/client.rb -L /var/log/chef/client.log -P /var/run/chef/client.pid -i 900 -s 180

I’m stumped.

What am I not seeing?

Thanks,
Mark

dcondomitti · July 16, 2014, 7:12pm

Can you increase the log level to debug in client.rb? That should show the failing ohai output. You can also try running ohai manually as well.

On Wednesday, July 16, 2014 at 11:53 AM, Mark Nichols wrote:

Ohai Chefs,

We are in the process of moving several nodes from older hosted Chef to Enterprise Chef servers. I’ve moved several alpha (test) nodes but they no longer converge. Here are steps I followed.

Noted the IP address of the node

Deleted the node from the old chef server

Via ssh visited the node and clean out the /etc/chef directory

Ran knife bootstrap

Visited the node and confirmed via ps -ef | grep chef-client that chef-client was running with the proper interval and splay.

After waiting the 15 minutes for the interval the node fails to converge with these messages in /var/log/chef/client.log

[2014-07-16T13:38:50-05:00] INFO: *** Chef 11.12.2 ***
[2014-07-16T13:38:50-05:00] INFO: Chef-client pid: 7550
[2014-07-16T13:38:50-05:00] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out
[2014-07-16T13:38:50-05:00] ERROR: Unable to determine node name: configure node_name or configure the system's hostname and fqdn
[2014-07-16T13:38:50-05:00] ERROR: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
[2014-07-16T13:38:50-05:00] ERROR: Sleeping for 900 seconds before trying again

And these messages in /var/chef/cache/chef-stacktrace.out

Generated at 2014-07-16 13:38:50 -0500
Chef::Exceptions::CannotDetermineNodeName: Unable to determine node name: configure node_name or configure the system's hostname and fqdn
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:299:in node_name' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:313:in register'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:416:in do_run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:213:in block in run'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:207:in fork' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:207:in run'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application.rb:217:in run_chef_client' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:328:in block in run_application'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:317:in loop' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:317:in run_application'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application.rb:67:in run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/bin/chef-client:26:in <top (required)>'
/usr/bin/chef-client:23:in load' /usr/bin/chef-client:23:in '
/var/chef/cache/chef-stacktrace.out (END)

I’ve checked hostname -f and it produces the FQDN. Running chef-client manually on the node succeeds, it’s only the background process that’s failing. For what it’s worth, here is the output from ps -ef | grep chef-client

root 2662 1 0 Jul11 ? 00:00:13 /opt/chef/embedded/bin/ruby /usr/bin/chef-client -d -c /etc/chef/client.rb -L /var/log/chef/client.log -P /var/run/chef/client.pid -i 900 -s 180

I’m stumped.

What am I not seeing?

Thanks,
Mark

DV1 · July 17, 2014, 12:21am

Very strange that manual chef-client run succeeds but chef-client daemon
does not.

This is silly, but did you kill the old chef-client process before
bootstrapping? Also, have you tried rebooting the host?

On Wed, Jul 16, 2014 at 11:53 AM, Mark Nichols chef@zanshin.net wrote:

Ohai Chefs,

We are in the process of moving several nodes from older hosted Chef to
Enterprise Chef servers. I’ve moved several alpha (test) nodes but they no
longer converge. Here are steps I followed.

Noted the IP address of the node

Deleted the node from the old chef server

Via ssh visited the node and clean out the /etc/chef directory

Ran knife bootstrap

Visited the node and confirmed via ps -ef | grep chef-client that
chef-client was running with the proper interval and splay.

After waiting the 15 minutes for the interval the node fails to converge
with these messages in /var/log/chef/client.log

[2014-07-16T13:38:50-05:00] INFO: *** Chef 11.12.2 ***
[2014-07-16T13:38:50-05:00] INFO: Chef-client pid: 7550
[2014-07-16T13:38:50-05:00] FATAL: Stacktrace dumped to
/var/chef/cache/chef-stacktrace.out
[2014-07-16T13:38:50-05:00] ERROR: Unable to determine node name:
configure node_name or configure the system's hostname and fqdn
[2014-07-16T13:38:50-05:00] ERROR: Chef::Exceptions::ChildConvergeError:
Chef run process exited unsuccessfully (exit code 1)
[2014-07-16T13:38:50-05:00] ERROR: Sleeping for 900 seconds before trying
again

And these messages in /var/chef/cache/chef-stacktrace.out

Generated at 2014-07-16 13:38:50 -0500
Chef::Exceptions::CannotDetermineNodeName: Unable to determine node name:
configure node_name or configure the system's hostname and fqdn
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:299:in
node_name' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:313:in register'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:416:in
do_run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:213:in block in run'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:207:in
fork' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/client.rb:207:in run'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application.rb:217:in
run_chef_client' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:328:in block in run_application'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:317:in
loop' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/client.rb:317:in run_application'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application.rb:67:in
run' /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/bin/chef-client:26:in <top (required)>'
/usr/bin/chef-client:23:in load' /usr/bin/chef-client:23:in '
/var/chef/cache/chef-stacktrace.out (END)

I’ve checked hostname -f and it produces the FQDN. Running chef-client
manually on the node succeeds, it’s only the background process that’s
failing. For what it’s worth, here is the output from ps -ef | grep chef-client

root 2662 1 0 Jul11 ? 00:00:13
/opt/chef/embedded/bin/ruby /usr/bin/chef-client -d -c /etc/chef/client.rb
-L /var/log/chef/client.log -P /var/run/chef/client.pid -i 900 -s 180

I’m stumped.

What am I not seeing?

Thanks,
Mark

--
Best regards, Dmitriy V.

Mark_Nichols · July 17, 2014, 2:48am

On Jul 16, 2014, at 7:21 PM, DV vindimy@gmail.com wrote:

Very strange that manual chef-client run succeeds but chef-client daemon does not.

This is silly, but did you kill the old chef-client process before bootstrapping? Also, have you tried rebooting the host?

I did kill the old chef-client process prior to the new bootstrap command. I haven’t tried rebooting the host. That would be, ah, complicated. These are legacy (i.e., pre-chef) servers that aren’t easily rebooted.

I’ll explore more with tcpdump and debug turned on for the chef client.

— Mark

Topic		Replies	Views
Unable to determine node name Chef Infra (archive)	5	2948	September 9, 2016
Simultaneous attribute changes (CHEF-1812) Chef Infra (archive)	0	325	April 21, 2015
Updating node attributes using knife while chef convergence is in progress Chef Infra (archive)	0	316	June 29, 2019
Have hostname, but getting Chef::Exceptions::CannotDetermineNodeName Chef Infra (archive)	1	1474	January 16, 2014
Chef Server 12.01 - Spontaneous Client ERROR: 403 "Forbidden" Chef Infra (archive)	0	751	January 2, 2015

Moved node to new chef server, no longer converges

Related topics