At work we run chef as a cron job with something like “chef-client -o
role[default]”, where role[default] contains a bunch of baseline stuff, and
the full run list does deployments and other things we want to treat more
Every so often we see nodes get their run_list reset to just
"role[default]", and today it finally dawned on me why:
chef run starts.
If there’s an overridden run_list, we stash the original (
recipes can call node.save at any time. This save doesn’t protect the run
( https://github.com/opscode/chef/blob/master/lib/chef/node.rb#L482 )
at the end, on success the original run list is put back and we save
( https://github.com/opscode/chef/blob/master/lib/chef/client.rb#L258 )
** but if we don’t have a successful run, the node.saves that occur
during the run will clobber the run list
It’s potentially a little awkward to fix this without changing API too
much. What I’m thinking is to stash the original_run_list onto the
node_object, and then rearrange #save to put back the original run_list
rather than #save_updated_node.
It means you won’t be able to change the run list during a run where you’re
using an overridden run list, but that seems like a Bad Idea anyway.
Does this seem reasonable or have I missed something glaringly obvious?