Authentication failure due to wrong initial network config



I have some machines with Chef client installed. All of these machines
are configured to receive their IP via DHCP with fixed MAC -> IP
mapping, and hostname via reverse DNS lookup (via dhcp-exit-hook). So
even in between reboots, my machines stick on the same IP/hostname.
However, I have the impression that there are timing issues between
the initialization of the networking layer and Chef client starting.

After reboot of a machine, my Chef client continuously reports
authentication failures. My assumption is that Chef client reads the
hostname info once at start time and never refreshes this information
(wrong at that point in time), even if the the IP/DNS information is
correctly set a few seconds later. The result is that authentication
fails because of this wrong hostname. If I restart Chef client after
the network configuration becomes stable, everything works as
expected. It is annoying that I have to manually login on every system
that accidentally reboots to get the client running again…

Here is a snippet of the client.log, first showing such an
authentication failure, followed by a chef-client restart and a
successful run.

[Thu, 26 Aug 2010 10:36:53 +0200] FATAL: Sleeping for 1800 seconds
before trying again
[Thu, 26 Aug 2010 11:06:57 +0200] WARN: HTTP Request Returned 401
Unauthorized: Failed to authenticate!
[Thu, 26 Aug 2010 11:06:57 +0200] ERROR: Net::HTTPServerException
[Thu, 26 Aug 2010 11:06:57 +0200] FATAL: 401 “Unauthorized”
/usr/lib/ruby/1.8/net/http.rb:2101:in error!' /usr/lib/ruby/1.8/chef/rest.rb:229:inapi_request’
/usr/lib/ruby/1.8/chef/rest.rb:280:in retriable_rest_request' /usr/lib/ruby/1.8/chef/rest.rb:210:inapi_request’
/usr/lib/ruby/1.8/chef/rest.rb:110:in get_rest' /usr/lib/ruby/1.8/chef/node.rb:479:inload’
/usr/lib/ruby/1.8/chef/node.rb:464:in find_or_create' /usr/lib/ruby/1.8/chef/client.rb:171:inbuild_node’
/usr/lib/ruby/1.8/chef/client.rb:75:in run' /usr/lib/ruby/1.8/chef/application/client.rb:212:inrun_application’
/usr/lib/ruby/1.8/chef/application/client.rb:202:in loop' /usr/lib/ruby/1.8/chef/application/client.rb:202:inrun_application’
/usr/lib/ruby/1.8/chef/application.rb:62:in `run’
[Thu, 26 Aug 2010 11:06:57 +0200] FATAL: Sleeping for 1800 seconds
before trying again
[Thu, 26 Aug 2010 11:10:21 +0200] FATAL: SIGTERM received, stopping
[Thu, 26 Aug 2010 11:10:33 +0200] INFO: Daemonizing…
[Thu, 26 Aug 2010 11:10:33 +0200] INFO: Forked, in 21114. Priveleges: 0 0
[Thu, 26 Aug 2010 11:10:47 +0200] INFO: Starting Chef Run (Version 0.9.8)
[Thu, 26 Aug 2010 11:10:48 +0200] INFO: Storing updated
cookbooks/dss/recipes/pylabs.rb in the cache.
[Thu, 26 Aug 2010 11:10:51 +0200] INFO: Ran execute[apt-get update] successfully
[Thu, 26 Aug 2010 11:11:02 +0200] INFO: Chef Run complete in 14.720712 seconds
[Thu, 26 Aug 2010 11:11:02 +0200] INFO: Running report handlers
[Thu, 26 Aug 2010 11:11:02 +0200] INFO: Report handlers complete

Can a Chef developer confirm my assumption?



Can you try increase the log level to debug (set ‘log_level :debug’ in
the client config). That will give you the full output from ohai, and
allow you to verify that it is in fact getting the wrong hostname.