How do you manage multiple data centers

Sascha_Bates · May 7, 2013, 9:15pm

I’m about to embark on the multiple part of my multi-datacenter
automation project and am wondering how people are accomplishing this now?

I currently have environments that mimic code promotion: dev/test/prod
whatever.

My production environment is only used to pin cookbook versions and
that’s how I ensure promotion is controlled, by bumping the version in
my environment.

Now that I’m approaching a situation where I need to make some design
decisions, I’m wondering how people are managing data that vary by
location and how they feel about what they’re doing. I’m aware of most
of the ways to do something like this: environments, roles, data bags,
whatevs, so I’m not in need suggestions on what I should do, but am
looking for how you like what YOU’RE doing.

Thanks,
Sascha

geoffrey_papilion · May 7, 2013, 9:31pm

We use a role per datacenter with attributes specific to the
datacenter(proxy servers, nfs hosts, etc...). It works decently
for us, and we still have a single prod environment.

//geoff

On Tue, May 07, 2013 at 04:15:02PM -0500, Sascha Bates wrote:

I'm about to embark on the multiple part of my multi-datacenter
automation project and am wondering how people are accomplishing
this now?

I currently have environments that mimic code promotion:
dev/test/prod whatever.

My production environment is only used to pin cookbook versions and
that's how I ensure promotion is controlled, by bumping the version
in my environment.

Now that I'm approaching a situation where I need to make some
design decisions, I'm wondering how people are managing data that
vary by location and how they feel about what they're doing. I'm
aware of most of the ways to do something like this: environments,
roles, data bags, whatevs, so I'm not in need suggestions on what I
should do, but am looking for how you like what YOU'RE doing.

Thanks,
Sascha

Jay_Pipes · May 7, 2013, 9:40pm

This is exactly what we do as well.

-jay

On 05/07/2013 05:31 PM, geoffrey papilion wrote:

We use a role per datacenter with attributes specific to the
datacenter(proxy servers, nfs hosts, etc...). It works decently
for us, and we still have a single prod environment.

//geoff

On Tue, May 07, 2013 at 04:15:02PM -0500, Sascha Bates wrote:

I'm about to embark on the multiple part of my multi-datacenter
automation project and am wondering how people are accomplishing
this now?

I currently have environments that mimic code promotion:
dev/test/prod whatever.

My production environment is only used to pin cookbook versions and
that's how I ensure promotion is controlled, by bumping the version
in my environment.

Now that I'm approaching a situation where I need to make some
design decisions, I'm wondering how people are managing data that
vary by location and how they feel about what they're doing. I'm
aware of most of the ways to do something like this: environments,
roles, data bags, whatevs, so I'm not in need suggestions on what I
should do, but am looking for how you like what YOU'RE doing.

Thanks,
Sascha

Elvin_Abordo · May 7, 2013, 9:46pm

I have multiple chef servers for different data centers, atleast that's the
plan. I stopped at 2 because i haven't found an elegant solution to keep
them in sync yet. I have 2 more datacenters.

The data that's different between my datacenters are the infrastructure
stuff. DNS, NTP, SMTP relays, etc. etc. Those get set in a datacenter role
(zips up flame suit). Essentially it's items that should not ever change
unless for a good reason. It's a repeatable pattern across each individual
data center. The stuff that needs to be accessed by ALL datacenters are in
a databag which is kept in sync with some janky shell scripts. I need to
find a more elegant solution to handle this.

One OSC server is handling production data centers and another
is handling DR/Development.

It helps limit the blast radius and i can sleep at night knowing that if
something gets pushed to DR/Development it's not going to impact revenue in
production. Although "Dev is a production" dev doesn't bring home the
bacon. I'll just have angry people at me.

The sycning part is tought, but overall this setup has made nervous people
a bit at ease.

On Tue, May 7, 2013 at 5:15 PM, Sascha Bates sascha.bates@gmail.com wrote:

I'm about to embark on the multiple part of my multi-datacenter automation
project and am wondering how people are accomplishing this now?

I currently have environments that mimic code promotion: dev/test/prod
whatever.

My production environment is only used to pin cookbook versions and that's
how I ensure promotion is controlled, by bumping the version in my
environment.

Now that I'm approaching a situation where I need to make some design
decisions, I'm wondering how people are managing data that vary by location
and how they feel about what they're doing. I'm aware of most of the ways
to do something like this: environments, roles, data bags, whatevs, so I'm
not in need suggestions on what I should do, but am looking for how you
like what YOU'RE doing.

Thanks,
Sascha

--
Elvin Abordo
Mobile: (845) 475-8744

Elvin_Abordo · May 7, 2013, 10:02pm

Oh and each "datacenter role" includes another "default" role that has a
master run list that contains organization wide stuff like the apt,
resolver, ntp,postfix, etc. etc. cookbooks. the cookbooks that generally
shouldn't be modified.

we follow the attribute precedence rules pretty heavily. This allows me to
control everything in one shot, like say i want to remove a user
everywhere. But if there are pretty little snow flakes it still gives me
the final say in the chef_environment.

On Tue, May 7, 2013 at 5:46 PM, Elvin Abordo elvin159@gmail.com wrote:

I have multiple chef servers for different data centers, atleast that's
the plan. I stopped at 2 because i haven't found an elegant solution to
keep them in sync yet. I have 2 more datacenters.

The data that's different between my datacenters are the infrastructure
stuff. DNS, NTP, SMTP relays, etc. etc. Those get set in a datacenter role
(zips up flame suit). Essentially it's items that should not ever change
unless for a good reason. It's a repeatable pattern across each individual
data center. The stuff that needs to be accessed by ALL datacenters are in
a databag which is kept in sync with some janky shell scripts. I need to
find a more elegant solution to handle this.

One OSC server is handling production data centers and another
is handling DR/Development.

It helps limit the blast radius and i can sleep at night knowing that if
something gets pushed to DR/Development it's not going to impact revenue in
production. Although "Dev is a production" dev doesn't bring home the
bacon. I'll just have angry people at me.

The sycning part is tought, but overall this setup has made nervous people
a bit at ease.

On Tue, May 7, 2013 at 5:15 PM, Sascha Bates sascha.bates@gmail.comwrote:

I'm about to embark on the multiple part of my multi-datacenter
automation project and am wondering how people are accomplishing this now?

I currently have environments that mimic code promotion: dev/test/prod
whatever.

My production environment is only used to pin cookbook versions and
that's how I ensure promotion is controlled, by bumping the version in my
environment.

Now that I'm approaching a situation where I need to make some design
decisions, I'm wondering how people are managing data that vary by location
and how they feel about what they're doing. I'm aware of most of the ways
to do something like this: environments, roles, data bags, whatevs, so I'm
not in need suggestions on what I should do, but am looking for how you
like what YOU'RE doing.

Thanks,
Sascha

--
Elvin Abordo
Mobile: (845) 475-8744

--
Elvin Abordo
Mobile: (845) 475-8744

Tim_Smith1 · May 7, 2013, 10:26pm

The simplest thing to do is to create a role per datacenter. You can have a generic "base" role that contains your various base recipes and attributes that are not specific to each datacenter. Then your datacenter role contains just attributes that are specific to that datacenter such as dns, smtp, etc. You can still have a single Chef server or Hosted Chef to wrap it all up in a single easy to manage interface.

Tim Smith - Systems Engineer
m: +1 707.738.8132

On May 7, 2013, at 3:02 PM, Elvin Abordo elvin159@gmail.com wrote:

Oh and each "datacenter role" includes another "default" role that has a master run list that contains organization wide stuff like the apt, resolver, ntp,postfix, etc. etc. cookbooks. the cookbooks that generally shouldn't be modified.

we follow the attribute precedence rules pretty heavily. This allows me to control everything in one shot, like say i want to remove a user everywhere. But if there are pretty little snow flakes it still gives me the final say in the chef_environment.

On Tue, May 7, 2013 at 5:46 PM, Elvin Abordo elvin159@gmail.com wrote:
I have multiple chef servers for different data centers, atleast that's the plan. I stopped at 2 because i haven't found an elegant solution to keep them in sync yet. I have 2 more datacenters.

The data that's different between my datacenters are the infrastructure stuff. DNS, NTP, SMTP relays, etc. etc. Those get set in a datacenter role (zips up flame suit). Essentially it's items that should not ever change unless for a good reason. It's a repeatable pattern across each individual data center. The stuff that needs to be accessed by ALL datacenters are in a databag which is kept in sync with some janky shell scripts. I need to find a more elegant solution to handle this.

One OSC server is handling production data centers and another is handling DR/Development.

It helps limit the blast radius and i can sleep at night knowing that if something gets pushed to DR/Development it's not going to impact revenue in production. Although "Dev is a production" dev doesn't bring home the bacon. I'll just have angry people at me.

The sycning part is tought, but overall this setup has made nervous people a bit at ease.

On Tue, May 7, 2013 at 5:15 PM, Sascha Bates sascha.bates@gmail.com wrote:
I'm about to embark on the multiple part of my multi-datacenter automation project and am wondering how people are accomplishing this now?

I currently have environments that mimic code promotion: dev/test/prod whatever.

My production environment is only used to pin cookbook versions and that's how I ensure promotion is controlled, by bumping the version in my environment.

Now that I'm approaching a situation where I need to make some design decisions, I'm wondering how people are managing data that vary by location and how they feel about what they're doing. I'm aware of most of the ways to do something like this: environments, roles, data bags, whatevs, so I'm not in need suggestions on what I should do, but am looking for how you like what YOU'RE doing.

Thanks,
Sascha

--
Elvin Abordo
Mobile: (845) 475-8744

--
Elvin Abordo
Mobile: (845) 475-8744

Brian_Akins · May 7, 2013, 10:53pm

We use a chef-server per data center (well, 2 chef servers:
active/passive). Nodes figure out what datacenter they are in inside our
base recipe. We use the same cookbooks across all datacenters and use some
scripts and berkshelf to keep all the orgs up to date. No roles or
environments, but if we did use them, we’d do something similar. We have
some search wrapper magic that can ping search on other chef servers - this
is still a work in progress and is used sparingly (database replication,
etc).

Brian_Akins · May 7, 2013, 10:54pm

FWIW, if I didn’t stumble too much, I discussed this in my ChefConf talk.

realityforge · May 7, 2013, 10:59pm

Hi,

We currently only have 3 data centers so what we are doing may not
scale but ... what we essentially do is mark each node with a
'datacenter' attribute.

In some cases we explicitly set the data center attribute as part of
the environment, sometimes we discover it via ohai and sometimes we
derive it from the fqdn of the node. In all cases we have a recipe in
our base cookbook that verifies that the derived data center attribute
matches the explicitly set datacenter attribute. (This allows us to do
things like ensure that only nodes with a chef_environment of
'production' appear in the 'BWD' data center

Each datacenter has chef-managed local services and some have
externally managed services (sometimes coloured by environment as
well). For the first set of services we use search against nodes to
discover them and for the second set of services we either use search
against a data bag or we explicitly hardcode them in a recipe. In some
cases it is simple as

if 'BWD' == node['datacenter']
node.override['nameservers'] = ['1.1.1.1','2.2.2.2']
elsif '8NS' == node['datacenter']
node.override['nameservers'] = ['3.3.3.3]
elsif ...

Deciding on databag vs code - It mostly comes down to a pragmatic
decision on how often the data changes, how complex the derivation
rules are and who manages the change.

When we want pairs across datacenters then we typically also use
search with a 'NOT datacenter:#{node['datacenter']}" clause.

So far, so good. About the only negative is that sometimes differences
across datacenters get scattered through multiple cookbooks. I have
been thinking of adding a foodcritic rule in that highlights this
should be centralized into one place but have yet to do so.

On Wed, May 8, 2013 at 7:15 AM, Sascha Bates sascha.bates@gmail.com wrote:

I'm about to embark on the multiple part of my multi-datacenter automation
project and am wondering how people are accomplishing this now?

I currently have environments that mimic code promotion: dev/test/prod
whatever.

My production environment is only used to pin cookbook versions and that's
how I ensure promotion is controlled, by bumping the version in my
environment.

Now that I'm approaching a situation where I need to make some design
decisions, I'm wondering how people are managing data that vary by location
and how they feel about what they're doing. I'm aware of most of the ways
to do something like this: environments, roles, data bags, whatevs, so I'm
not in need suggestions on what I should do, but am looking for how you like
what YOU'RE doing.

Thanks,
Sascha

--
Cheers,

Peter Donald

Jesse_Nelson · May 7, 2013, 11:40pm

I’ve used internal DNS to denote locality. We use a contrived 4letter LTD
and then use the datacenter as the first dot: i.e: 356sf.myorg,
365ny.myorg. I know it violates the cardinal no metadata in name rule that
I try to abide by, but this makes it pretty easy to handle data center
specific needs via node.domain, and without the need for specific roles to
locality. A single cook/role (or role cook) denotes all the specifics for
each datacenter, and individual cooks can easily override if needed.

Torben_Knerr · May 8, 2013, 5:19am

Hey guys,

this is theoretical, I'm not through this in practice yet:

From a aconceptual point of view, I'd argue to definitely use environments
rather than roles for keeping the datacenter (=a different environment)
specific attributes.

From the gut feeling I would have started with sth like 'prod_dc1' and
'prod_dc2' environments etc..

Did I get it conceptually wrong or are there other practical reasons why
you are all using a single 'prod' env and managing the dc specific stuff in
roles instead?

Cheers, Torben
On May 8, 2013 1:40 AM, "Jesse Nelson" spheromak@gmail.com wrote:

I've used internal DNS to denote locality. We use a contrived 4letter LTD
and then use the datacenter as the first dot: i.e: 356sf.myorg,
365ny.myorg. I know it violates the cardinal no metadata in name rule that
I try to abide by, but this makes it pretty easy to handle data center
specific needs via node.domain, and without the need for specific roles to
locality. A single cook/role (or role cook) denotes all the specifics for
each datacenter, and individual cooks can easily override if needed.

Sascha_Bates · May 8, 2013, 5:48am

Well, my gut feeling for this is that it's because, when I promote a
cookbook, I want all my production servers to get that promotion. I
don't want to have to promote for each data center. I am specifically
using my environments to control cookbook code promotion and if I had a
need, to key off the name for organization-wide settings, like an
integration URL that differs across environments.

For the record, when I say "promote a cookbook" I have 4 main cookbooks
that represent server apps or profiles. Inside each of those cookbooks,
the metadata points to the versions of supporting cookbooks it
requires. So I pin versions of my 4 main profiles in the production
env. I don't "promote" supporting cookbooks, but instead bump version
numbers whenever I make a change so that the metadata version
dependencies for each profile cookbook is very specific.

Whatever secondary data container I settle on for dc-specific settings
should be fairly static - local DNS, local package repos, whatever,
requiring many fewer changes.

I'm afraid this might have rambled a bit. I probably should write
technical emails when I'm falling asleep.

For the record, I appreciate everyone's input and have marked some of
the Chef talks to also watch. I'm made this decision before and I've
seen it made and disagreed with some implementations I've seen. My goal
is to make the decision and not regret it in a few months.

Sascha

Torben Knerr wrote:

Hey guys,

this is theoretical, I'm not through this in practice yet:

From a aconceptual point of view, I'd argue to definitely use
environments rather than roles for keeping the datacenter (=a
different environment) specific attributes.

From the gut feeling I would have started with sth like 'prod_dc1' and
'prod_dc2' environments etc..

Did I get it conceptually wrong or are there other practical reasons
why you are all using a single 'prod' env and managing the dc specific
stuff in roles instead?

Cheers, Torben

On May 8, 2013 1:40 AM, "Jesse Nelson" <spheromak@gmail.com
mailto:spheromak@gmail.com> wrote:
I've used internal DNS to denote locality. We use a contrived
4letter LTD and then use the datacenter as the first dot:
i.e: 356sf.myorg, 365ny.myorg.  I know it violates the cardinal no
metadata in name rule that I try to abide by, but this makes it
pretty easy to handle data center specific needs via node.domain,
and without the need for specific roles to locality. A single
cook/role (or role cook) denotes all the specifics for each
datacenter, and individual cooks can easily override if needed. 

Torben_Knerr · May 8, 2013, 7:03am

Hi Sascha,

that makes all sense.

I like the way you handle cookbook promotion. If I understood it right you
pin only the top-level / main / application cookbooks in the environment,
while the supporting / library cookbook versions are pinned in the metadata
of the top-level cookbooks, right?

As for the dc specific stuff: agree, got me convinced that multiple prod
environments per dc are a bad idea

Another way to handle this apart from roles could be:

define overrides for dc-specific node attributes in a databag
use a cookbook at the beginning of the run list that reads this databag
and calls node.override accordingly (wasn't there a hw cookbook exactly for
this purpose?)

Just an idea though...

Cheers, Torben
On May 8, 2013 7:48 AM, "Sascha Bates" sascha.bates@gmail.com wrote:

Well, my gut feeling for this is that it's because, when I promote a
cookbook, I want all my production servers to get that promotion. I don't
want to have to promote for each data center. I am specifically using my
environments to control cookbook code promotion and if I had a need, to key
off the name for organization-wide settings, like an integration URL that
differs across environments.

For the record, when I say "promote a cookbook" I have 4 main cookbooks
that represent server apps or profiles. Inside each of those cookbooks, the
metadata points to the versions of supporting cookbooks it requires. So I
pin versions of my 4 main profiles in the production env. I don't "promote"
supporting cookbooks, but instead bump version numbers whenever I make a
change so that the metadata version dependencies for each profile cookbook
is very specific.

Whatever secondary data container I settle on for dc-specific settings
should be fairly static - local DNS, local package repos, whatever,
requiring many fewer changes.

I'm afraid this might have rambled a bit. I probably should write
technical emails when I'm falling asleep.

For the record, I appreciate everyone's input and have marked some of the
Chef talks to also watch. I'm made this decision before and I've seen it
made and disagreed with some implementations I've seen. My goal is to make
the decision and not regret it in a few months.

Sascha

Torben Knerr wrote:

Hey guys,

this is theoretical, I'm not through this in practice yet:

From a aconceptual point of view, I'd argue to definitely use environments
rather than roles for keeping the datacenter (=a different environment)
specific attributes.

From the gut feeling I would have started with sth like 'prod_dc1' and
'prod_dc2' environments etc..

Did I get it conceptually wrong or are there other practical reasons why
you are all using a single 'prod' env and managing the dc specific stuff in
roles instead?

Cheers, Torben
On May 8, 2013 1:40 AM, "Jesse Nelson" spheromak@gmail.com wrote:

I've used internal DNS to denote locality. We use a contrived 4letter LTD
and then use the datacenter as the first dot: i.e: 356sf.myorg,
365ny.myorg. I know it violates the cardinal no metadata in name rule that
I try to abide by, but this makes it pretty easy to handle data center
specific needs via node.domain, and without the need for specific roles to
locality. A single cook/role (or role cook) denotes all the specifics for
each datacenter, and individual cooks can easily override if needed.

Maxime_Brugidou · May 8, 2013, 7:53am

We run chef on 7 datacenters and get the node's datacenter from an ohai
plugin.

This is actually the cleanest thing to do, you don't have to manually set
the node's DC anywhere since it's auto-discovered. When you need specific
attributes per data center we use "wrapper" cookbooks that dynamically
define attributes according to the DC.
On May 8, 2013 7:19 AM, "Torben Knerr" ukio@gmx.de wrote:

Hey guys,

this is theoretical, I'm not through this in practice yet:

From a aconceptual point of view, I'd argue to definitely use environments
rather than roles for keeping the datacenter (=a different environment)
specific attributes.

From the gut feeling I would have started with sth like 'prod_dc1' and
'prod_dc2' environments etc..

Did I get it conceptually wrong or are there other practical reasons why
you are all using a single 'prod' env and managing the dc specific stuff in
roles instead?

Cheers, Torben
On May 8, 2013 1:40 AM, "Jesse Nelson" spheromak@gmail.com wrote:

I've used internal DNS to denote locality. We use a contrived 4letter LTD
and then use the datacenter as the first dot: i.e: 356sf.myorg,
365ny.myorg. I know it violates the cardinal no metadata in name rule that
I try to abide by, but this makes it pretty easy to handle data center
specific needs via node.domain, and without the need for specific roles to
locality. A single cook/role (or role cook) denotes all the specifics for
each datacenter, and individual cooks can easily override if needed.

Jesse_Nelson · May 8, 2013, 8:26am

Maxime, I agree the fact that a node resides in a certain location
shouldn't be prescribed it should be discovered. What does your ohai plugin
do the discover the datacenter the node is in ?

On Wed, May 8, 2013 at 12:53 AM, Maxime Brugidou
maxime.brugidou@gmail.comwrote:

We run chef on 7 datacenters and get the node's datacenter from an ohai
plugin.

This is actually the cleanest thing to do, you don't have to manually set
the node's DC anywhere since it's auto-discovered. When you need specific
attributes per data center we use "wrapper" cookbooks that dynamically
define attributes according to the DC.
On May 8, 2013 7:19 AM, "Torben Knerr" ukio@gmx.de wrote:

Hey guys,

this is theoretical, I'm not through this in practice yet:

From a aconceptual point of view, I'd argue to definitely use
environments rather than roles for keeping the datacenter (=a different
environment) specific attributes.

From the gut feeling I would have started with sth like 'prod_dc1' and
'prod_dc2' environments etc..

Did I get it conceptually wrong or are there other practical reasons why
you are all using a single 'prod' env and managing the dc specific stuff in
roles instead?

Cheers, Torben
On May 8, 2013 1:40 AM, "Jesse Nelson" spheromak@gmail.com wrote:

I've used internal DNS to denote locality. We use a contrived 4letter
LTD and then use the datacenter as the first dot: i.e: 356sf.myorg,
365ny.myorg. I know it violates the cardinal no metadata in name rule that
I try to abide by, but this makes it pretty easy to handle data center
specific needs via node.domain, and without the need for specific roles to
locality. A single cook/role (or role cook) denotes all the specifics for
each datacenter, and individual cooks can easily override if needed.

Maxime_Brugidou · May 8, 2013, 11:01am

Currently the plugin is purely based on the network subnet since we have
clear separated subnets for each DC.

We are adding additional location info like room/rack/plane using LLDP
(based on the physical network topology which matches the physical
location).
On May 8, 2013 10:26 AM, "Jesse Nelson" spheromak@gmail.com wrote:

Maxime, I agree the fact that a node resides in a certain location
shouldn't be prescribed it should be discovered. What does your ohai plugin
do the discover the datacenter the node is in ?

On Wed, May 8, 2013 at 12:53 AM, Maxime Brugidou <
maxime.brugidou@gmail.com> wrote:

We run chef on 7 datacenters and get the node's datacenter from an ohai
plugin.

This is actually the cleanest thing to do, you don't have to manually set
the node's DC anywhere since it's auto-discovered. When you need specific
attributes per data center we use "wrapper" cookbooks that dynamically
define attributes according to the DC.
On May 8, 2013 7:19 AM, "Torben Knerr" ukio@gmx.de wrote:

Hey guys,

this is theoretical, I'm not through this in practice yet:

From a aconceptual point of view, I'd argue to definitely use
environments rather than roles for keeping the datacenter (=a different
environment) specific attributes.

From the gut feeling I would have started with sth like 'prod_dc1' and
'prod_dc2' environments etc..

Did I get it conceptually wrong or are there other practical reasons why
you are all using a single 'prod' env and managing the dc specific stuff in
roles instead?

Cheers, Torben
On May 8, 2013 1:40 AM, "Jesse Nelson" spheromak@gmail.com wrote:

I've used internal DNS to denote locality. We use a contrived 4letter
LTD and then use the datacenter as the first dot: i.e: 356sf.myorg,
365ny.myorg. I know it violates the cardinal no metadata in name rule that
I try to abide by, but this makes it pretty easy to handle data center
specific needs via node.domain, and without the need for specific roles to
locality. A single cook/role (or role cook) denotes all the specifics for
each datacenter, and individual cooks can easily override if needed.

Michael_Herman · May 8, 2013, 11:31am

We take a slightly different tack, we create an infrastructure environment
and place data centre specific services in that environment, such as DNS,
mail gateways and local repositories.

Each application environment (dev/qa's/production) then includes a
reference naming its infrastructure environment, any service that then
relies on a infrastructure service just performs a search against that
infrastructure environment.

This allows new application environments to be configured without having to
populate site specific data, or updating multiple entries in each
application environment... similar to data bags containing site specific
data, but also managing the services via chef. There are times when I think
it can be cumbersome, but generally works well for us.

Rgds,

mgh

On Wednesday, May 8, 2013, Sascha Bates wrote:

I'm about to embark on the multiple part of my multi-datacenter automation
project and am wondering how people are accomplishing this now?

I currently have environments that mimic code promotion: dev/test/prod
whatever.

My production environment is only used to pin cookbook versions and that's
how I ensure promotion is controlled, by bumping the version in my
environment.

Now that I'm approaching a situation where I need to make some design
decisions, I'm wondering how people are managing data that vary by location
and how they feel about what they're doing. I'm aware of most of the ways
to do something like this: environments, roles, data bags, whatevs, so I'm
not in need suggestions on what I should do, but am looking for how you
like what YOU'RE doing.

Thanks,
Sascha

Justin_Witrick · May 8, 2013, 1:26pm

For my case we already know what datacenter the servers are in, so I use a combination of attributes and data bags.

If the data is not changing between datacenters then I just use attributes.

I create data bags for each datacenter that holds the specific values I need that cannot be default attributes.

Then during the chef run each node knows which specific data bag it needs to look at.

So for example maybe I will have an databag that is named ‘production’ and within it an item named ‘datacenter1’, etc…

Justin

-----Original Message-----
From: “Michael Herman” mgh@historyhound.com
Sent: Wednesday, May 8, 2013 7:31am
To: "chef@lists.opscode.com" chef@lists.opscode.com
Subject: [chef] Re: How do you manage multiple data centers

We take a slightly different tack, we create an infrastructure environment and place data centre specific services in that environment, such as DNS, mail gateways and local repositories.
Each application environment (dev/qa’s/production) then includes a reference naming its infrastructure environment, any service that then relies on a infrastructure service just performs a search against that infrastructure environment.
This allows new application environments to be configured without having to populate site specific data, or updating multiple entries in each application environment… similar to data bags containing site specific data, but also managing the services via chef. There are times when I think it can be cumbersome, but generally works well for us.
Rgds,
mgh

On Wednesday, May 8, 2013, Sascha Bates wrote:
I’m about to embark on the multiple part of my multi-datacenter automation project and am wondering how people are accomplishing this now?

I currently have environments that mimic code promotion: dev/test/prod whatever.

My production environment is only used to pin cookbook versions and that’s how I ensure promotion is controlled, by bumping the version in my environment.

Now that I’m approaching a situation where I need to make some design decisions, I’m wondering how people are managing data that vary by location and how they feel about what they’re doing. I’m aware of most of the ways to do something like this: environments, roles, data bags, whatevs, so I’m not in need suggestions on what I should do, but am looking for how you like what YOU’RE doing.

Thanks,
Sascha

Steffen_Gebert1 · May 10, 2013, 1:47pm

Hi Maxime,

Currently the plugin is purely based on the network subnet since we have
clear separated subnets for each DC.

would you mind sharing this plugin? I'm not too deep into ruby, so
coding it on my own would cost me some efforts, but I'd like to have
such functionality, too.

Yours
Steffen

On 5/8/13 1:01 PM, Maxime Brugidou wrote:

Currently the plugin is purely based on the network subnet since we have
clear separated subnets for each DC.

We are adding additional location info like room/rack/plane using LLDP
(based on the physical network topology which matches the physical
location).
On May 8, 2013 10:26 AM, "Jesse Nelson" spheromak@gmail.com wrote:

Maxime, I agree the fact that a node resides in a certain location
shouldn't be prescribed it should be discovered. What does your ohai plugin
do the discover the datacenter the node is in ?

On Wed, May 8, 2013 at 12:53 AM, Maxime Brugidou <
maxime.brugidou@gmail.com> wrote:

We run chef on 7 datacenters and get the node's datacenter from an ohai
plugin.

This is actually the cleanest thing to do, you don't have to manually set
the node's DC anywhere since it's auto-discovered. When you need specific
attributes per data center we use "wrapper" cookbooks that dynamically
define attributes according to the DC.
On May 8, 2013 7:19 AM, "Torben Knerr" ukio@gmx.de wrote:

Hey guys,

this is theoretical, I'm not through this in practice yet:

From a aconceptual point of view, I'd argue to definitely use
environments rather than roles for keeping the datacenter (=a different
environment) specific attributes.

From the gut feeling I would have started with sth like 'prod_dc1' and
'prod_dc2' environments etc..

Did I get it conceptually wrong or are there other practical reasons why
you are all using a single 'prod' env and managing the dc specific stuff in
roles instead?

Cheers, Torben
On May 8, 2013 1:40 AM, "Jesse Nelson" spheromak@gmail.com wrote:

I've used internal DNS to denote locality. We use a contrived 4letter
LTD and then use the datacenter as the first dot: i.e: 356sf.myorg,
365ny.myorg. I know it violates the cardinal no metadata in name rule that
I try to abide by, but this makes it pretty easy to handle data center
specific needs via node.domain, and without the need for specific roles to
locality. A single cook/role (or role cook) denotes all the specifics for
each datacenter, and individual cooks can easily override if needed.

scottmlikens · May 10, 2013, 5:16pm

Hi,

Ever debated an ec2-like endpoint and an ohai plugin to answer what
datacenter/and other useful metadata? Might seem extreme but if your
dealing with multiple datacenters it should help reduce some bad
patterns such as roles per datacenter, or having to manually set the
datacenter attribute.

Scott

P.S. I know of no open source solutions to accomplish this, if there is
any please link them.

On 5/8/13 4:01 AM, Maxime Brugidou wrote:

Currently the plugin is purely based on the network subnet since we
have clear separated subnets for each DC.

We are adding additional location info like room/rack/plane using LLDP
(based on the physical network topology which matches the physical
location).

On May 8, 2013 10:26 AM, "Jesse Nelson" <spheromak@gmail.com
mailto:spheromak@gmail.com> wrote:

Maxime, I agree the fact that a node resides in a certain location
shouldn't be prescribed it should be discovered. What does your
ohai plugin do the discover the datacenter the node is in ?


On Wed, May 8, 2013 at 12:53 AM, Maxime Brugidou
<maxime.brugidou@gmail.com <mailto:maxime.brugidou@gmail.com>> wrote:

    We run chef on 7 datacenters and get the node's datacenter
    from an ohai plugin.

    This is actually the cleanest thing to do, you don't have to
    manually set the node's DC anywhere since it's
    auto-discovered. When you need specific attributes per data
    center we use "wrapper" cookbooks that dynamically define
    attributes according to the DC.

    On May 8, 2013 7:19 AM, "Torben Knerr" <ukio@gmx.de
    <mailto:ukio@gmx.de>> wrote:

        Hey guys,

        this is theoretical, I'm not through this in practice yet:

        From a aconceptual point of view, I'd argue to definitely
        use environments rather than roles for keeping the
        datacenter (=a different environment) specific attributes.

        From the gut feeling I would have started with sth like
        'prod_dc1' and 'prod_dc2' environments etc..

        Did I get it conceptually wrong or are there other
        practical reasons why you are all using a single 'prod'
        env and managing the dc specific stuff in roles instead?

        Cheers, Torben

        On May 8, 2013 1:40 AM, "Jesse Nelson"
        <spheromak@gmail.com <mailto:spheromak@gmail.com>> wrote:

            I've used internal DNS to denote locality. We use a
            contrived 4letter LTD and then use the datacenter as
            the first dot: i.e: 356sf.myorg, 365ny.myorg.  I know
            it violates the cardinal no metadata in name rule that
            I try to abide by, but this makes it pretty easy to
            handle data center specific needs via node.domain, and
            without the need for specific roles to locality. A
            single cook/role (or role cook) denotes all the
            specifics for each datacenter, and individual cooks
            can easily override if needed.

!DSPAM:518a3092305041804284693!

Topic		Replies	Views
Best practices for multiple data centers Chef Infra (archive)	7	1365	June 24, 2011
Moving role attributes into cookbooks Chef Infra (archive)	9	348	March 18, 2014
Managing Node Info Chef Infra (archive)	8	314	October 4, 2011
Environments Chef Infra (archive)	6	328	March 16, 2014
Chef Environments - Logical vs. Physical Chef Infra (archive)	4	633	January 27, 2014

How do you manage multiple data centers

Related topics