Configuration location philosophies - what's yours?

As I set up an environment from scratch I’ve had to do some thinking about where I want to be putting the configuration within Chef. Configuration can go in many places: The recipes themselves (e.g. a recipe that has a list of “user” resources in which you add to the recipe when you have new users), attribute files within recipes, in databags, in roles, on the nodes themselves (either set on the fly or by some json blobs you have lying around). (that’s about it, right?).

It feels proper to make a decision as to where you intend to put most of your configuration. I can imagine someone using databags almost exclusively, or relying exclusively on hardcoded data in recipes, etc. What I’m wondering is what the preferences of the people on this list are? What’s ‘normal’ for chef?

Here’s what I’ve been trying (I’d love comments on this). Note that I just started with Chef in 0.9.6 so I dont know what it’s like to not have databags, roles, attribute heirarchies, etc.
My instinct is to define the behavior (cookbooks) generically and keep the actual configuration separate. In that, I’ve been writing cookbooks driven entirely from attributes. I define default attributes in the cookbook with the intent of overriding most of them. In my model, all of my configuration goes into Roles. In the case of any one-off configuration, I plan to further override at the node level (and expect this to be rare). I have multiple server types and multiple environments and so have several types of roles, some of which override one another (order matters - this is where it breaks down since there’s no way to set explicit role dependency).
So to illustrate, examples of 3 main types or levels of Roles:
ServerRole - base role that every server gets
WebappRole - a specific application category role - shouldnt conflict with another app role (like WebserverRole)
StagingRole - an environment role. overrides only those attributes within the previous roles that differ between environments

Any server in my environment that actually does anything will need one of each of these roles in this order (except dev which uses the defaults and doesnt need an env role).
So far I’ve only used databags for things truely global to the environment like a list of hostnames and IPs to generate identical /etc/hosts files on each box.

Comments? Is this way off from what other people do or does this fit with what youve seen before?

Your thinking pretty much aligns with where im at as well. When I
started with chef it was before data bags, and i wanted to be able to
back the "state" of the system with git. My decision then was to use
roles to drop static data and use attribute overrides to drive
cookbook behavior. this allowed the "data" to stay out of the cookbook
and the state of chef to be checked in. With the exception of the node
run-lists. Cookbooks then become heavily data driven. This was the
way i wrapped my head around chef as a migration from puppet. where
there was almost exclusively static data, and a requirement was to
control the change going into puppet.

The stumbling block comes with the update process in my mind. How are
the workflows with chef going to play out. There are 2 people in my
organization who are writing cookbooks, and about 5 total who are
modifying attributes. When someone wants to change a single host you
need to create a new role or add attributes sets or case statements in
the cookbook, and this was not acceptable to me. Ideally people who
need to be twiddling attributes do not have to dive into cookbooks.
This way the logic in the cookbook can be heavily vetted and not such
a moving target one deployed to live systems.

Where I am at now is that Data Bags are the static data (network,
users). and i still use role defaults for sane roles, and let
attribute data be twiddled via knife/api by the 5 end-users. My only
issue with this approach so far is that there is no audit trail on
those updates, and I can't easily store state or state revision
external to chef. Having a hook on api calls to atribute updates could
help this. There are probably other ways im ignorant too, but at this
point with couch being distributed across multiple nodes. I am not
stressing the recovery issue just the audit issue (who changed this
attribute that broke X on host Y, and what was it set to b4 the change
so working state can be restored asap.

my $0.02 apologies in advance for being longwinded.

On Fri, Jul 30, 2010 at 7:41 PM, Leinartas, Michael
MICHAEL.LEINARTAS@orbitz.com wrote:

As I set up an environment from scratch I've had to do some thinking about
where I want to be putting the configuration within Chef. Configuration can
go in many places: The recipes themselves (e.g. a recipe that has a list of
"user" resources in which you add to the recipe when you have new users),
attribute files within recipes, in databags, in roles, on the nodes
themselves (either set on the fly or by some json blobs you have lying
around). (that's about it, right?).

It feels proper to make a decision as to where you intend to put most of
your configuration. I can imagine someone using databags almost
exclusively, or relying exclusively on hardcoded data in recipes, etc. What
I'm wondering is what the preferences of the people on this list are?
What's 'normal' for chef?

Here's what I've been trying (I'd love comments on this). Note that I just
started with Chef in 0.9.6 so I dont know what it's like to not have
databags, roles, attribute heirarchies, etc.
My instinct is to define the behavior (cookbooks) generically and keep the
actual configuration separate. In that, I've been writing cookbooks driven
entirely from attributes. I define default attributes in the cookbook with
the intent of overriding most of them. In my model, all of my configuration
goes into Roles. In the case of any one-off configuration, I plan to
further override at the node level (and expect this to be rare). I have
multiple server types and multiple environments and so have several types of
roles, some of which override one another (order matters - this is where it
breaks down since there's no way to set explicit role dependency).
So to illustrate, examples of 3 main types or levels of Roles:
ServerRole - base role that every server gets
WebappRole - a specific application category role - shouldnt conflict with
another app role (like WebserverRole)
StagingRole - an environment role. overrides only those attributes within
the previous roles that differ between environments

Any server in my environment that actually does anything will need one of
each of these roles in this order (except dev which uses the defaults and
doesnt need an env role).
So far I've only used databags for things truely global to the environment
like a list of hostnames and IPs to generate identical /etc/hosts files on
each box.

Comments? Is this way off from what other people do or does this fit with
what youve seen before?

On Fri, 2010-07-30 at 21:41 -0500, Leinartas, Michael wrote:

My instinct is to define the behavior (cookbooks) generically and keep
the actual configuration separate. In that, I've been writing
cookbooks driven entirely from attributes. I define default attributes
in the cookbook with the intent of overriding most of them. In my
model, all of my configuration goes into Roles. In the case of any
one-off configuration, I plan to further override at the node level
(and expect this to be rare). I have multiple server types and
multiple environments and so have several types of roles, some of
which override one another (order matters - this is where it breaks
down since there's no way to set explicit role dependency).
So to illustrate, examples of 3 main types or levels of Roles:
ServerRole - base role that every server gets
WebappRole - a specific application category role - shouldnt conflict
with another app role (like WebserverRole)
StagingRole - an environment role. overrides only those attributes
within the previous roles that differ between environments

Sorry for chiming in rather late but your workflow is pretty much what
I've been using here successfully.

Quick rundown:

  • Generic cookbooks filled with attributes that closely match the
    package defaults or are otherwise unset.
  • Templates that closely match the upstream versions.
  • Roles for everything, even single servers, so everything is tracked in
    source control. We use development/staging/production roles for many
    groups of servers. The roles are layered (baseline, roaming-profiles,
    web-production) and we have a top level role for each system that
    includes them all so the run_list for each system is represented in the
    ui by only one role.
  • Rake helper task to speed up the sync of cookbooks/roles to the server
    after every change (update_cache.rake · GitHub)
  • Development chef server/client pair for each developer for testing.
    Changes are passed to me in branches for review/merge/deploy.
  • Client runs are triggered manually - though this may change shortly if
    I can get the reporting hooks going.

If I can ever find the time I'd love to put together a functioning demo
repo and doc for this workflow.

--
Matthew Kent \ SA \ bravenet.com

Thanks Matthew,

I really like your layering of the roles. I'm wondering if you follow any naming convention for your roles?

Alex

On Aug 4, 2010, at 1:17 PM, Matthew Kent wrote:

On Fri, 2010-07-30 at 21:41 -0500, Leinartas, Michael wrote:

My instinct is to define the behavior (cookbooks) generically and keep
the actual configuration separate. In that, I've been writing
cookbooks driven entirely from attributes. I define default attributes
in the cookbook with the intent of overriding most of them. In my
model, all of my configuration goes into Roles. In the case of any
one-off configuration, I plan to further override at the node level
(and expect this to be rare). I have multiple server types and
multiple environments and so have several types of roles, some of
which override one another (order matters - this is where it breaks
down since there's no way to set explicit role dependency).
So to illustrate, examples of 3 main types or levels of Roles:
ServerRole - base role that every server gets
WebappRole - a specific application category role - shouldnt conflict
with another app role (like WebserverRole)
StagingRole - an environment role. overrides only those attributes
within the previous roles that differ between environments

Sorry for chiming in rather late but your workflow is pretty much what
I've been using here successfully.

Quick rundown:

  • Generic cookbooks filled with attributes that closely match the
    package defaults or are otherwise unset.
  • Templates that closely match the upstream versions.
  • Roles for everything, even single servers, so everything is tracked in
    source control. We use development/staging/production roles for many
    groups of servers. The roles are layered (baseline, roaming-profiles,
    web-production) and we have a top level role for each system that
    includes them all so the run_list for each system is represented in the
    ui by only one role.
  • Rake helper task to speed up the sync of cookbooks/roles to the server
    after every change (update_cache.rake · GitHub)
  • Development chef server/client pair for each developer for testing.
    Changes are passed to me in branches for review/merge/deploy.
  • Client runs are triggered manually - though this may change shortly if
    I can get the reporting hooks going.

If I can ever find the time I'd love to put together a functioning demo
repo and doc for this workflow.

--
Matthew Kent \ SA \ bravenet.com

On Wed, 2010-08-04 at 14:02 -0700, Alex Soto wrote:

Thanks Matthew,

I really like your layering of the roles. I'm wondering if you follow any naming convention for your roles?

Yes! We do employing a naming convention, usually

--

and I named the final layered roles as "-combined"

eg: thirdparty-web-server-roundcube-staging-combined.rb

name "thirdparty-web-server-roundcube-staging-combined"
description "thirdparty web webmail staging"
run_list "role[baseline]",
"role[roaming-profiles]",
"role[webapp]",
"role[thirdparty-web-server-production]",
"role[thirdparty-web-server-staging]",
"role[thirdparty-web-server-roundcube-production]",
"role[thirdparty-web-server-roundcube-staging]",
"role[baseline-runlast-cleanup]"

This list could be cut down as well with some roles including each other
but I ran into an issue documented in

http://tickets.opscode.com/browse/CHEF-1508

forcing me to lay them all out in one file.

--
Matthew Kent \ SA \ bravenet.com