Empty run list after failed run (Windows)

I’m using knife-rackspace to create servers. Whenever I experience an error, the next time I run chef, my node’s run list is empty.

  1. Is this desired behavior?
  2. Am I missing something here?

Thanks!

Brian

On Friday, February 20, 2015 at 2:10 PM, Brian Begy wrote:

I’m using knife-rackspace to create servers. Whenever I experience an error, the next time I run chef, my node’s run list is empty.

Chef avoids saving data until the end of the chef-client run because you might rely on attributes you set some time during the run to be available for searching. Because of the way the bootstrap process works (it puts some files on disk for chef-client to read so it can create the node itself), the run list just doesn’t get saved to the server.

You can work around this by running chef-client -j /etc/chef/first-boot.json when you run chef-client the next time.

Is this desired behavior?
This is pretty annoying, and there are a few ways it can be fixed. On a fundamental level the problem is that the node object mixes desired state (instructions you give chef-client that tell it how to configure a node to your liking) with last observed state (attributes from ohai and cookbooks that are set based on what chef-client actually did the last time it ran). I’m working on a feature called policyfiles that fixes a lot of this by making a separate object on the server own the run list (there’s a lot more to that, see here: https://github.com/chef/chef-dk/blob/master/POLICYFILE_README.md ).

A more targeted fix would be for chef-client to check for /etc/chef/first-boot.json and use it if it’s around, but then move it out of the way on the first successful run. Overall, that’s not too hard to do, but it would have to be implemented in a way that wouldn’t break anyone who had a stale first-boot.json lying around (probably could be a config option to enable this and then the bootstrap process could set this to enabled by default).

Alternatively, the bootstrap process could create the node object ahead of time with the desired run list already set. This gets a bit tricky with search because if you configure your load balancer by searching for nodes with a particular run list item, your load balancer might get configured to treat an incomplete node as an upstream server. That could be mitigated by using node tags or some other attribute to find upstreams, but it would be a cookbook code change for many people, so that would have to be an opt-in behavior at first.

Am I missing something here?

Thanks!

Brian
HTH,

--
Daniel DeLeo

Note that chef-provisioning actually implements Dan's early-node-creation
plan and thus doesn't have this problem.

On Fri, Feb 20, 2015 at 2:42 PM, Daniel DeLeo dan@kallistec.com wrote:

On Friday, February 20, 2015 at 2:10 PM, Brian Begy wrote:

I’m using knife-rackspace to create servers. Whenever I experience an
error, the next time I run chef, my node’s run list is empty.

Chef avoids saving data until the end of the chef-client run because you
might rely on attributes you set some time during the run to be available
for searching. Because of the way the bootstrap process works (it puts some
files on disk for chef-client to read so it can create the node itself),
the run list just doesn’t get saved to the server.

You can work around this by running chef-client -j
/etc/chef/first-boot.json when you run chef-client the next time.

Is this desired behavior?
This is pretty annoying, and there are a few ways it can be fixed. On a
fundamental level the problem is that the node object mixes desired state
(instructions you give chef-client that tell it how to configure a node to
your liking) with last observed state (attributes from ohai and cookbooks
that are set based on what chef-client actually did the last time it ran).
I’m working on a feature called policyfiles that fixes a lot of this by
making a separate object on the server own the run list (there’s a lot more
to that, see here:
https://github.com/chef/chef-dk/blob/master/POLICYFILE_README.md ).

A more targeted fix would be for chef-client to check for
/etc/chef/first-boot.json and use it if it’s around, but then move it out
of the way on the first successful run. Overall, that’s not too hard to do,
but it would have to be implemented in a way that wouldn’t break anyone who
had a stale first-boot.json lying around (probably could be a config option
to enable this and then the bootstrap process could set this to enabled by
default).

Alternatively, the bootstrap process could create the node object ahead of
time with the desired run list already set. This gets a bit tricky with
search because if you configure your load balancer by searching for nodes
with a particular run list item, your load balancer might get configured to
treat an incomplete node as an upstream server. That could be mitigated by
using node tags or some other attribute to find upstreams, but it would be
a cookbook code change for many people, so that would have to be an opt-in
behavior at first.

Am I missing something here?

Thanks!

Brian
HTH,

--
Daniel DeLeo

The validatorless bootstraps in Chef 12.1.0 will also do
early-node-creation as well.

On 2/23/15 2:56 PM, John Keiser wrote:

Note that chef-provisioning actually implements Dan's
early-node-creation plan and thus doesn't have this problem.

On Fri, Feb 20, 2015 at 2:42 PM, Daniel DeLeo <dan@kallistec.com
mailto:dan@kallistec.com> wrote:

On Friday, February 20, 2015 at 2:10 PM, Brian Begy wrote:
> I’m using knife-rackspace to create servers. Whenever I
experience an error, the next time I run chef, my node’s run list
is empty.

Chef avoids saving data until the end of the chef-client run
because you might rely on attributes you set some time during the
run to be available for searching. Because of the way the
bootstrap process works (it puts some files on disk for
chef-client to read so it can create the node itself), the run
list just doesn’t get saved to the server.

You can work around this by running chef-client -j
/etc/chef/first-boot.json when you run chef-client the next time.

>
> Is this desired behavior?
This is pretty annoying, and there are a few ways it can be fixed.
On a fundamental level the problem is that the node object mixes
desired state (instructions you give chef-client that tell it how
to configure a node to your liking) with last observed state
(attributes from ohai and cookbooks that are set based on what
chef-client actually did the last time it ran). I’m working on a
feature called policyfiles that fixes a lot of this by making a
separate object on the server own the run list (there’s a lot more
to that, see here:
https://github.com/chef/chef-dk/blob/master/POLICYFILE_README.md ).

A more targeted fix would be for chef-client to check for
/etc/chef/first-boot.json and use it if it’s around, but then move
it out of the way on the first successful run. Overall, that’s not
too hard to do, but it would have to be implemented in a way that
wouldn’t break anyone who had a stale first-boot.json lying around
(probably could be a config option to enable this and then the
bootstrap process could set this to enabled by default).

Alternatively, the bootstrap process could create the node object
ahead of time with the desired run list already set. This gets a
bit tricky with search because if you configure your load balancer
by searching for nodes with a particular run list item, your load
balancer might get configured to treat an incomplete node as an
upstream server. That could be mitigated by using node tags or
some other attribute to find upstreams, but it would be a cookbook
code change for many people, so that would have to be an opt-in
behavior at first.

> Am I missing something here?
>
> Thanks!
>
> Brian
HTH,

--
Daniel DeLeo