Best practices: Search dynamically using templates in attributes

Hi chefs,

For our infrastructure automation, we are currently investigating various
Chef design patterns to properly (and maybe dynamically) manage multiple
"clusters" (like zookeeper, hadoop, rsyslog…). We are wondering how to
define a pattern to search other nodes.

The current state of things in the cookbooks we encountered (from Opscode
and others, like OpenStack), is:

  1. Specify a role name to search on and optionally set an attribute to
    search within an environment.
  2. Sometimes (for example rsyslog) you can directly specify a search query
    in the node attributes. This is slightly more flexible but the search is
    still defined by a static attribute.

All this works fine but sometimes you want to dynamically allocate clusters
using various logic (from ohai data like datacenter, or by environment or a
combination of attributes). The classic example is if you have N
datacenters and don’t want to create a role for each datacenter.

One solution would be to create a “site-specific” (or “glue”) cookbook that
will define dynamically the search string at compile time. This is
perfectly OK but adds cookbooks.

Another solution would be to use some sort of template for the search
string. Ruby string interpolation, or ERB or whatever would work like:

In attributes/default.rb:

node.default[:my_app][:nodes_search] = ‘role:my_app AND
location_datacenter:#{node[:location][:datacenter]} AND
environment:#{node.chef_environment}’

In recipes/default.rb:

nodes = search(:node, eval(""#{node[:my_app][:nodes_search]}""))

Is this pattern reasonable? Or should we stick to the “glue” cookbook way?
Or is there anything else we can use?

Thanks,
Maxime

I prefer convention in this case. I.e. all nodes in an environment, and
environments are datacenter specific. So all I have to do is search for
"role:my_app AND chef_environment:#{node[:chef_environment]}" I don't like
the idea of putting search strings in node attributes. Just smells bad to
me. Alternatively I prefer to write helper libraries for the resolution of
where things are. In fact there is an excellent cookbook/library by the
folks @heavywater just for this that i recommend you check out:

On Thu, Jan 31, 2013 at 1:26 AM, Maxime Brugidou
maxime.brugidou@gmail.comwrote:

Hi chefs,

For our infrastructure automation, we are currently investigating various
Chef design patterns to properly (and maybe dynamically) manage multiple
"clusters" (like zookeeper, hadoop, rsyslog...). We are wondering how to
define a pattern to search other nodes.

The current state of things in the cookbooks we encountered (from Opscode
and others, like OpenStack), is:

  1. Specify a role name to search on and optionally set an attribute to
    search within an environment.
  2. Sometimes (for example rsyslog) you can directly specify a search query
    in the node attributes. This is slightly more flexible but the search is
    still defined by a static attribute.

All this works fine but sometimes you want to dynamically allocate
clusters using various logic (from ohai data like datacenter, or by
environment or a combination of attributes). The classic example is if you
have N datacenters and don't want to create a role for each datacenter.

One solution would be to create a "site-specific" (or "glue") cookbook
that will define dynamically the search string at compile time. This is
perfectly OK but adds cookbooks.

Another solution would be to use some sort of template for the search
string. Ruby string interpolation, or ERB or whatever would work like:

In attributes/default.rb:

node.default[:my_app][:nodes_search] = 'role:my_app AND
location_datacenter:#{node[:location][:datacenter]} AND
environment:#{node.chef_environment}'

In recipes/default.rb:

nodes = search(:node, eval(""#{node[:my_app][:nodes_search]}""))

Is this pattern reasonable? Or should we stick to the "glue" cookbook way?
Or is there anything else we can use?

Thanks,
Maxime

I understand that it looks ugly to put the search string in node
attributes. However I dislike having a separate environment per DC, we have
many DCs and I don't want to manage them separately. Prod, preprod, test
and dev are logical environments and not related to the physical location
(which is a simple ohai attribute to me), and I don't want to add 3 or 4
environments every other month when we open a new DC or update a dozen
environments just because we change a value in a logical environment.

I'll check out the discovery cookbook, this looks like a great addition but
it requires every cookbook we use to have a dependency on discovery, and we
use a log of opscode or other third-party cookbooks that we don't plan to
fork at the moment. This is why I am trying to settle on best practices
that people would use when they share cookbooks, this would help a lot and
maybe - if widely used and agreed upon - could get integrated in Chef
itself.

In addition, the discovery cookbook is just a DSL over the search
functionality but does not permit any "dynamic" decision. Even if every
cookbook I use was using the discovery search, I would have to define a
role per DC or something like that.

Right now the only solutions i have is making a search template or adding a
site-specific cookbook that will create all the search strings.

On Thu, Jan 31, 2013 at 11:10 AM, Jesse Nelson spheromak@gmail.com wrote:

I prefer convention in this case. I.e. all nodes in an environment, and
environments are datacenter specific. So all I have to do is search for
"role:my_app AND chef_environment:#{node[:chef_environment]}" I don't like
the idea of putting search strings in node attributes. Just smells bad to
me. Alternatively I prefer to write helper libraries for the resolution of
where things are. In fact there is an excellent cookbook/library by the
folks @heavywater just for this that i recommend you check out:
GitHub - hw-cookbooks/discovery: Discovery cookbook for search, implements Discovery#search environment and non-environment aware search for roles with a few extra checks

On Thu, Jan 31, 2013 at 1:26 AM, Maxime Brugidou <
maxime.brugidou@gmail.com> wrote:

Hi chefs,

For our infrastructure automation, we are currently investigating various
Chef design patterns to properly (and maybe dynamically) manage multiple
"clusters" (like zookeeper, hadoop, rsyslog...). We are wondering how to
define a pattern to search other nodes.

The current state of things in the cookbooks we encountered (from Opscode
and others, like OpenStack), is:

  1. Specify a role name to search on and optionally set an attribute to
    search within an environment.
  2. Sometimes (for example rsyslog) you can directly specify a search
    query in the node attributes. This is slightly more flexible but the search
    is still defined by a static attribute.

All this works fine but sometimes you want to dynamically allocate
clusters using various logic (from ohai data like datacenter, or by
environment or a combination of attributes). The classic example is if you
have N datacenters and don't want to create a role for each datacenter.

One solution would be to create a "site-specific" (or "glue") cookbook
that will define dynamically the search string at compile time. This is
perfectly OK but adds cookbooks.

Another solution would be to use some sort of template for the search
string. Ruby string interpolation, or ERB or whatever would work like:

In attributes/default.rb:

node.default[:my_app][:nodes_search] = 'role:my_app AND
location_datacenter:#{node[:location][:datacenter]} AND
environment:#{node.chef_environment}'

In recipes/default.rb:

nodes = search(:node, eval(""#{node[:my_app][:nodes_search]}""))

Is this pattern reasonable? Or should we stick to the "glue" cookbook
way? Or is there anything else we can use?

Thanks,
Maxime

Hi,

I am going to be really annoying and say "it depends". If the cookbook is
only ever likely to be used by your company then make it as simple and
specific as possible. If however you plan on making it usable by others, or
at least other projects within your company then make it configurable ...

On Thu, Jan 31, 2013 at 8:26 PM, Maxime Brugidou
maxime.brugidou@gmail.comwrote:

The current state of things in the cookbooks we encountered (from Opscode
and others, like OpenStack), is:

  1. Specify a role name to search on and optionally set an attribute to
    search within an environment.

I dislike this. It rarely works for our infrastructure as we tend to use
the "role cookbook" [1] approach and thus rarely have the same types of
roles as the original implementers intended.

  1. Sometimes (for example rsyslog) you can directly specify a search query
    in the node attributes. This is slightly more flexible but the search is
    still defined by a static attribute.

I think that defining a search query as an attribute is the right approach.
Many people want to limit the search to nodes in a particular environment,
in a particular state, or in a particular data center etc.

All this works fine but sometimes you want to dynamically allocate
clusters using various logic (from ohai data like datacenter, or by
environment or a combination of attributes). The classic example is if you
have N datacenters and don't want to create a role for each datacenter.

One solution would be to create a "site-specific" (or "glue") cookbook
that will define dynamically the search string at compile time. This is
perfectly OK but adds cookbooks.

This is exactly what we do. A so called "wrapper" cookbook [1]. However our
wrapper cookbook tends to define the search string and then set the node
attribute before calling the main cookbook that uses search string.

nodes = search(:node, eval(""#{node[:my_app][:nodes_search]}""))

Is this pattern reasonable? Or should we stick to the "glue" cookbook way?
Or is there anything else we can use?

I think that using eval may lead to some issues and a little more
complexity. I would tend to go with the "wrapper" (a.k.a. "glue" or
sometimes "application" cookbook).

[1]
http://realityforge.org/code/2012/11/19/role-cookbooks-and-wrapper-cookbooks.html

--
Cheers,

Peter Donald

Oh - and an example of this is in the "Using a search_driven recipe"
section at [1]

[1] http://realityforge.org/code/2012/11/12/reusable-cookbooks-revisited.html


Cheers,

Peter Donald

Very nice, so I guess we should stick to cookbook wrappers that define
search queries.
Thanks for the blog links.

On Thu, Jan 31, 2013 at 1:06 PM, Peter Donald peter@realityforge.orgwrote:

Oh - and an example of this is in the "Using a search_driven recipe"
section at [1]

[1]
Reusable Cookbooks Revisited

--
Cheers,

Peter Donald

On 1/31/13 2:31 AM, Maxime Brugidou wrote:

I understand that it looks ugly to put the search string in node
attributes. However I dislike having a separate environment per DC, we
have many DCs and I don't want to manage them separately. Prod,
preprod, test and dev are logical environments and not related to the
physical location (which is a simple ohai attribute to me), and I
don't want to add 3 or 4 environments every other month when we open a
new DC or update a dozen environments just because we change a value
in a logical environment.

Additionally if you're searching for something like DNS or NTP servers,
then you really don't want to have those defined
per-software-environment. If you have a complicated software staging in
a large enterprise you might have something like:

dev -> test -> int -> [ load, partner-int ] -> partner-load -> prod

And you may have all those environments in one datacenter, and
operationally for systems-level services like LDAP, DNS, NTP, etc I'm
probably going to make a cut somewhere between internal services and
external-SLA-affected services and I'll only have two clustered
instances of those services for datacenters (probably the primary one
for the enterprise) that have all of these environments. I may also
want to put all the stuff that doesn't map onto the software development
life cycle into a completely separate environment so that when SDEs play
with version pinning in their environments, they never stomp the DNS
servers.

Having an instance of everything in every software app stage sounds
slick (and would probably work if LDAP, DNS and NTP were the extent of
it all), but as the enterprise scales that SDE-centric view of
enterprise IT will completely fail. When you're small enough to load
it all up on a single small EC2 instance and the tax is low it works,
but when you start dealing with 20+ virts and a bunch of repo mirrors
that define the 'base' of a single DC, then the tax for doing things
this way will be prohibitive.

I like the idea of pushing search queries into attributes and letting
people override them in their roles or in their role cookbooks.