Ohai, Master Chefs!
I am trying to understand an aspect of Chef’s behavior that is not fully explained in "Anatomy of a Chef Run"1: the precedence order of attribute values in nested roles. I beg your indulgence for an explanation of my use case and a simple example, which I believe will apply to many of those reading this list.
Like many others who have shared their tips here, I use Chef’s roles to model information about my deployments at several different levels: there are “simple” roles, which contain just a few recipes and some default attributes; and more complex roles, which include one or more of these simpler roles in their run_list, perhaps with different values for the same attributes. By necessity, the simple roles are applied more widely than the complex roles (since the latter includes the former), so under this use case, the attribute values in the more complex, “compound” role should take precedence over the values in the simpler, more generic role.
For example, a simple role for a worker node might define an attribute to hold the address of its “queue server”, for picking up new jobs:
name "simple_role"
default_attributes “myrecipe” => {
“queue_server” => ‘workqueue.mydomain.com’
}
A more complex role might define a “standalone” server, which also runs worker threads, but also acts as its own queue server. So it would include all the functionality of the worker role, but specify a different queue_server:
name "compound_role"
run_list “role[simple_role]”, “recipe[…]”, …
default_attributes “myrecipe” => {
“queue_server” => ‘localhost’
}
For the above use case, the desired value of node[:myrecipe][:queue_server] for all systems running the “compound_role” is “localhost”. Unfortunately, Chef 0.8.16 does not behave this way…
When nested roles are applied during a Chef run, the values in the included role are preferred over those in the including role. So applying the above “compound_role” to a new node results in a value of ‘workqueue.mydomain.com’ for the ‘queue_server’ attribute, which is not the desired behavior for the described use case.
Note that the distinction between default_attributes and override_attributes is not directly relevant here – override_attributes behave the same way, with respect to the precedence order of values set from from nested roles. I have not yet tested the result of setting the same attribute value from more than one included role, or from a role nested more than “once-deep” – does the behavior depend on the include order?
Now, it is true that for this particular example, one could just use override_attributes in the compound role to get the desired result – but just because of the oversimplified example. Roles can be nested indefinitely (as far as I know), so what if you later want to build an even more complex role, one that includes “compound_role”, while still preserving the ability to run nodes with only “compound_role”? You can’t, because once you set override_attributes from within a role, those values will be preferred over any of the same values you might attempt to set from an including role.
As a separate but related issue, any override_attributes defined in your roles will also be preferred over any attribute values you might try to assign via knife, the via Chef web GUI, or by passing run-time JSON file to chef-client. (This took me awhile to get used to in my first few weeks with Chef – I kept setting attribute values for running nodes, only to see them overridden on the next client run.) Using override_attributes removes your ability to tweak the attribute values of a specific node without disabling chef-client altogether. For this reason, I now only use override_attributes to make a deployment-wide, run-time change (that is, all servers running a given role) – and afterwards, I remove the override_attributes immediately, making sure the defaults are correct.
With all that said, I have the following questions for any and all readers:
-
Is there a workaround I am unaware of? Here is the challenge. Given the above example, can you produce a Chef configuration with the following requirements:
a) by default, nodes with “simple_role” have queue_server = "workqueue.mydomain.com"
b) by default, nodes with “compound_role” have queue_server = "localhost"
c) Individual nodes (of either type) can be created with JSON to override the queue_server -
Does the current inheritance behavior of attribute values work well for you as a user? Do you rely on this behavior, or code around it, as I must?
-
What are the objections to reversing this behavior? What if attribute values for the including role were preferred over (is that the same as “interpreted after”?) the values in the included role? Again, this is based on the assumption that the including role is more specific, and the included role applied more generally. If this change were made, the above example configuration would work as expected.
Thanks for your input,
- Ruby Newbie