Chef-2224


#1

I just filed a blocker against 0.10 - the issue is that we no longer
save the node after the node object is created and the attribute files
are applied, but before the application of the resource collection.
This causes a pretty major behavior change - namely that you can no
longer easily inspect what attributes were applied to a node if the
first run fails, and if it fails during a bootstrap, you’ll need to
pass -j /etc/chef/first-boot.json when you re-try (neither of which
you had to do with the 0.9 behavior).

Adam


Opscode, Inc.
Adam Jacob, Chief Product Officer
T: (206) 619-7151 E: adam@opscode.com


#2

On Thursday, April 14, 2011 at 10:16 AM, Adam Jacob wrote:
I just filed a blocker against 0.10 - the issue is that we no longer

save the node after the node object is created and the attribute files
are applied, but before the application of the resource collection.
This causes a pretty major behavior change - namely that you can no
longer easily inspect what attributes were applied to a node if the
first run fails, and if it fails during a bootstrap, you’ll need to
pass -j /etc/chef/first-boot.json when you re-try (neither of which
you had to do with the 0.9 behavior).

But is it a bad behavior change?

For example, if you got the server URL wrong or had a random error while registering in 0.9 and lower, you never would have created the client or node, and therefore you’d have to run with -j again. The new behavior has fewer decision points–if the first run fails, you re-run with -j for all cases.

As for inspecting the attributes on a node after a failed run, I’m not sure what the value is here, where by “I’m not sure,” I mean, I actually don’t know. I’ve certainly never debugged a failed chef run this way. My gut feeling is that the state during the Chef run could be quite different from the initial save state, so you’re better off checking the logs.


Dan DeLeo
Adam


Opscode, Inc.
Adam Jacob, Chief Product Officer
T: (206) 619-7151 E: adam@opscode.com


#3

Yo,

On 15 April 2011 09:55, Daniel DeLeo dan@kallistec.com wrote:

On Thursday, April 14, 2011 at 10:16 AM, Adam Jacob wrote:

I just filed a blocker against 0.10 - the issue is that we no longer
save the node after the node object is created and the attribute files
are applied, but before the application of the resource collection.
This causes a pretty major behavior change - namely that you can no
longer easily inspect what attributes were applied to a node if the
first run fails, and if it fails during a bootstrap, you’ll need to
pass -j /etc/chef/first-boot.json when you re-try (neither of which
you had to do with the 0.9 behavior).

But is it a bad behavior change?
For example, if you got the server URL wrong or had a random error while
registering in 0.9 and lower, you never would have created the client or
node, and therefore you’d have to run with -j again. The new behavior has
fewer decision points–if the first run fails, you re-run with -j for all
cases.

We’ve had failed first-run (but attributes-saved) nodes show up in
monitoring (via search), so I’m -1 on this being a bad behavior
change; it certainly is a change of behavior, but with some testing,
I’d totally support it.

I think this could be a regression too, cause I seem to recall a time
where the attributes weren’t saved to the node prior to the
application of the resource collection.

The fewer decision points diagnosing node-bootstrap-failure rings true.

Regards,

AJ

As for inspecting the attributes on a node after a failed run, I’m not sure
what the value is here, where by “I’m not sure,” I mean, I actually don’t
know. I’ve certainly never debugged a failed chef run this way. My gut
feeling is that the state during the Chef run could be quite different from
the initial save state, so you’re better off checking the logs.


Dan DeLeo

Adam


Opscode, Inc.
Adam Jacob, Chief Product Officer
T: (206) 619-7151 E: adam@opscode.com


#4

On Thu, Apr 14, 2011 at 5:09 PM, AJ Christensen aj@junglist.gen.nz wrote:

We’ve had failed first-run (but attributes-saved) nodes show up in
monitoring (via search), so I’m -1 on this being a bad behavior
change; it certainly is a change of behavior, but with some testing,
I’d totally support it.

I would argue this is exactly what you want - those nodes in fact
should be triggering alarms - the services they were supposed to be
running (given that your intent at bootstrap time was to have a
working system with that run list, not a non-existent one) - and they
now fail. You don’t want the situation where the now-stranded systems
are not included in your monitoring because of a failed bootstrap, do
you?

I think this could be a regression too, cause I seem to recall a time
where the attributes weren’t saved to the node prior to the
application of the resource collection.

It is a regression - there was a time when we didn’t do this, and we
put this behavior in specifically for cases like the above.

In addition, this is a common early work pattern - you’re tweaking
recipes, you’re testing, and then you’re building new systems from
scratch. The change away from storing the data early makes that loop
less intuitive (I’ve had 3 different people today comment on it.)

The fewer decision points diagnosing node-bootstrap-failure rings true.

I feel like this is a red herring - if it brings you joy to include -j
/etc/chef/first-boot.json every time, go for it. :slight_smile:

Adam


Opscode, Inc.
Adam Jacob, Chief Product Officer
T: (206) 619-7151 E: adam@opscode.com