Best practices for multiple data centers


#1

I’m wondering how others use Chef for multiple datacenters and if they
use multiple Chef servers.
In my case, we have multiple data centers around the world and each
data center has a private network that all of the servers are on. I
currently have a single Chef server at one of the data centers that
every client connects to. This works fine, but it does have
disadvantages.

  1. chef-client runs slow at data centers located on the opposite side
    of the world because of latency and bandwidth.
  2. While the bandwidth usage right now is not much, I’m worried that
    it will be significant as our usage of Chef for various things
    increases.

One obvious solution is to have a separate Chef server at each data
center; however, I am then managing multiple Chef installations that
use 99.9% of the same code. The primary disadvantage of this is that
I then have my nodes on different Chef servers and no way to search
for all of the nodes. Specifically, my monitoring is automated and
needs to be able to get a list of all servers, roles, etc. Having
separate Chef servers really ends up creating a large barrier to
managing all of my nodes.

I’m sure others have run into something similar, so I’m wondering what
others are doing.

One thought was that it would be great if there was some sort of Chef
proxy server that I could have at each location that cached the files
and node data. That way my nodes at a dc could contact the proxy for
all of it’s needs and the proxy would sync up with the main Chef
server every so often. Anyone working on anything like that? :slight_smile:


John Alberts


#2

I’d be interested to find out what others are doing in this area.
Right now we host our Chef server out of Seattle and have our DCs in
Boise and NYC use the Seattle Chef server. Is anyone using Couch
replication in a ring to accomplish distribution?

-J

On Fri, Jun 24, 2011 at 12:30 PM, John Alberts john.m.alberts@gmail.com wrote:

I’m wondering how others use Chef for multiple datacenters and if they
use multiple Chef servers.
In my case, we have multiple data centers around the world and each
data center has a private network that all of the servers are on. I
currently have a single Chef server at one of the data centers that
every client connects to. This works fine, but it does have
disadvantages.

  1. chef-client runs slow at data centers located on the opposite side
    of the world because of latency and bandwidth.
  2. While the bandwidth usage right now is not much, I’m worried that
    it will be significant as our usage of Chef for various things
    increases.

One obvious solution is to have a separate Chef server at each data
center; however, I am then managing multiple Chef installations that
use 99.9% of the same code. The primary disadvantage of this is that
I then have my nodes on different Chef servers and no way to search
for all of the nodes. Specifically, my monitoring is automated and
needs to be able to get a list of all servers, roles, etc. Having
separate Chef servers really ends up creating a large barrier to
managing all of my nodes.

I’m sure others have run into something similar, so I’m wondering what
others are doing.

One thought was that it would be great if there was some sort of Chef
proxy server that I could have at each location that cached the files
and node data. That way my nodes at a dc could contact the proxy for
all of it’s needs and the proxy would sync up with the main Chef
server every so often. Anyone working on anything like that? :slight_smile:


John Alberts


#3

We have two local data centers and are in the midst of provisioning a 3rd
VMware-only data center. We have chef servers in each environment in each
location. This was dictated in some ways by ops because the management team
for stage and production is a 3rd party company and they allow almost zero
access to their systems. We keep everything for Chef in Subversion in a
cloud environment and every Chef server has one firewall rule that allows it
to connect to an Apache proxy that allows us to pull chef code from svn and
application code coming from Artifactory. We also version our data bags
although we are discussing the wisdom of versioned data bags as it has
resulted in too much config data being kept in data bags instead of
attribute files.

What we don’t have is a common repository for 3rd party packages and tar
balls. Our Chef servers do triple duty as Chef, package repos and kickstart
servers. Generally we put stuff into dev when requested and sync the
repository to other servers as needed. It also requires us to maintain 7
different Chef servers across 3 different data centers and we have only been
so-so at automating our chef servers themselves. I could probably do a
presentation on the challenges of implementing Chef in a large firewalled
environment with a very protective client. Security is not part of the org
we’re in so we tend to struggle a lot with what we need and what corporate
security will allow. None of our compromises are pretty, but all were
necessary and none were close to what we actually wanted.

We have a long list of to-dos that revolve around automation and
sustainability of the env. We are a small team of 2 full-timers and 3
part-timers supporting new development initiatives and new provisioning
initiatives while simultaneously trying to migrate over an enormous aged
infrastructure. It’s been a real challenge and I don’t know if I would
describe our current condition as the outcome of “best practices” but more
of the best possible outcome under current constraints. We are constantly
refactoring and looking for ways to simplify and automate but it’s slow
going sometimes.

Sascha

On Fri, Jun 24, 2011 at 1:34 PM, Jason J. W. Williams <
jasonjwwilliams@gmail.com> wrote:

I’d be interested to find out what others are doing in this area.
Right now we host our Chef server out of Seattle and have our DCs in
Boise and NYC use the Seattle Chef server. Is anyone using Couch
replication in a ring to accomplish distribution?

-J

On Fri, Jun 24, 2011 at 12:30 PM, John Alberts john.m.alberts@gmail.com
wrote:

I’m wondering how others use Chef for multiple datacenters and if they
use multiple Chef servers.
In my case, we have multiple data centers around the world and each
data center has a private network that all of the servers are on. I
currently have a single Chef server at one of the data centers that
every client connects to. This works fine, but it does have
disadvantages.

  1. chef-client runs slow at data centers located on the opposite side
    of the world because of latency and bandwidth.
  2. While the bandwidth usage right now is not much, I’m worried that
    it will be significant as our usage of Chef for various things
    increases.

One obvious solution is to have a separate Chef server at each data
center; however, I am then managing multiple Chef installations that
use 99.9% of the same code. The primary disadvantage of this is that
I then have my nodes on different Chef servers and no way to search
for all of the nodes. Specifically, my monitoring is automated and
needs to be able to get a list of all servers, roles, etc. Having
separate Chef servers really ends up creating a large barrier to
managing all of my nodes.

I’m sure others have run into something similar, so I’m wondering what
others are doing.

One thought was that it would be great if there was some sort of Chef
proxy server that I could have at each location that cached the files
and node data. That way my nodes at a dc could contact the proxy for
all of it’s needs and the proxy would sync up with the main Chef
server every so often. Anyone working on anything like that? :slight_smile:


John Alberts


#4

Actually, one thing I think I will do to reduce the bandwidth and therefore
reduct chef-client execution time is use rsync to sync a directory of
tarballs that I use in various recipes. I already have a role for each data
center and I can put an attribute in each data center role pointing the the
local rsync mirror.

Couch replication sounds interesting, but I wonder how well that would work.
Has anyone tried something like that?

John

On Fri, Jun 24, 2011 at 2:01 PM, Sascha Bates sascha.bates@gmail.comwrote:

We have two local data centers and are in the midst of provisioning a 3rd
VMware-only data center. We have chef servers in each environment in each
location. This was dictated in some ways by ops because the management team
for stage and production is a 3rd party company and they allow almost zero
access to their systems. We keep everything for Chef in Subversion in a
cloud environment and every Chef server has one firewall rule that allows it
to connect to an Apache proxy that allows us to pull chef code from svn and
application code coming from Artifactory. We also version our data bags
although we are discussing the wisdom of versioned data bags as it has
resulted in too much config data being kept in data bags instead of
attribute files.

What we don’t have is a common repository for 3rd party packages and tar
balls. Our Chef servers do triple duty as Chef, package repos and kickstart
servers. Generally we put stuff into dev when requested and sync the
repository to other servers as needed. It also requires us to maintain 7
different Chef servers across 3 different data centers and we have only been
so-so at automating our chef servers themselves. I could probably do a
presentation on the challenges of implementing Chef in a large firewalled
environment with a very protective client. Security is not part of the org
we’re in so we tend to struggle a lot with what we need and what corporate
security will allow. None of our compromises are pretty, but all were
necessary and none were close to what we actually wanted.

We have a long list of to-dos that revolve around automation and
sustainability of the env. We are a small team of 2 full-timers and 3
part-timers supporting new development initiatives and new provisioning
initiatives while simultaneously trying to migrate over an enormous aged
infrastructure. It’s been a real challenge and I don’t know if I would
describe our current condition as the outcome of “best practices” but more
of the best possible outcome under current constraints. We are constantly
refactoring and looking for ways to simplify and automate but it’s slow
going sometimes.

Sascha

On Fri, Jun 24, 2011 at 1:34 PM, Jason J. W. Williams <
jasonjwwilliams@gmail.com> wrote:

I’d be interested to find out what others are doing in this area.
Right now we host our Chef server out of Seattle and have our DCs in
Boise and NYC use the Seattle Chef server. Is anyone using Couch
replication in a ring to accomplish distribution?

-J

On Fri, Jun 24, 2011 at 12:30 PM, John Alberts john.m.alberts@gmail.com
wrote:

I’m wondering how others use Chef for multiple datacenters and if they
use multiple Chef servers.
In my case, we have multiple data centers around the world and each
data center has a private network that all of the servers are on. I
currently have a single Chef server at one of the data centers that
every client connects to. This works fine, but it does have
disadvantages.

  1. chef-client runs slow at data centers located on the opposite side
    of the world because of latency and bandwidth.
  2. While the bandwidth usage right now is not much, I’m worried that
    it will be significant as our usage of Chef for various things
    increases.

One obvious solution is to have a separate Chef server at each data
center; however, I am then managing multiple Chef installations that
use 99.9% of the same code. The primary disadvantage of this is that
I then have my nodes on different Chef servers and no way to search
for all of the nodes. Specifically, my monitoring is automated and
needs to be able to get a list of all servers, roles, etc. Having
separate Chef servers really ends up creating a large barrier to
managing all of my nodes.

I’m sure others have run into something similar, so I’m wondering what
others are doing.

One thought was that it would be great if there was some sort of Chef
proxy server that I could have at each location that cached the files
and node data. That way my nodes at a dc could contact the proxy for
all of it’s needs and the proxy would sync up with the main Chef
server every so often. Anyone working on anything like that? :slight_smile:


John Alberts


John Alberts


#5

On a similar note, a few weeks ago when we started branching into new data
centers, I got sick of having to hand add roles for locations and
environments (also our dev/qa environments have swimlane designations and a
few other things). Our hostnames are encoded with location, environment,
application, etc. I wrote an ohai plugin that parses the name in
conjunction with a hash of information and returns a couple of top level
node attributes and some lower level info. If you guys encode your host
names, it might be useful. I was talked into refactoring it into a LWRP
because someone wants to use the logic for some other stuff, but I was also
planning to post the basic plugin to github soon. It’ll need reworking with
your own host info, but it shouldn’t be tough.

On Fri, Jun 24, 2011 at 2:13 PM, John Alberts john.m.alberts@gmail.comwrote:

Actually, one thing I think I will do to reduce the bandwidth and therefore
reduct chef-client execution time is use rsync to sync a directory of
tarballs that I use in various recipes. I already have a role for each data
center and I can put an attribute in each data center role pointing the the
local rsync mirror.

Couch replication sounds interesting, but I wonder how well that would
work. Has anyone tried something like that?

John

On Fri, Jun 24, 2011 at 2:01 PM, Sascha Bates sascha.bates@gmail.comwrote:

We have two local data centers and are in the midst of provisioning a 3rd
VMware-only data center. We have chef servers in each environment in each
location. This was dictated in some ways by ops because the management team
for stage and production is a 3rd party company and they allow almost zero
access to their systems. We keep everything for Chef in Subversion in a
cloud environment and every Chef server has one firewall rule that allows it
to connect to an Apache proxy that allows us to pull chef code from svn and
application code coming from Artifactory. We also version our data bags
although we are discussing the wisdom of versioned data bags as it has
resulted in too much config data being kept in data bags instead of
attribute files.

What we don’t have is a common repository for 3rd party packages and tar
balls. Our Chef servers do triple duty as Chef, package repos and kickstart
servers. Generally we put stuff into dev when requested and sync the
repository to other servers as needed. It also requires us to maintain 7
different Chef servers across 3 different data centers and we have only been
so-so at automating our chef servers themselves. I could probably do a
presentation on the challenges of implementing Chef in a large firewalled
environment with a very protective client. Security is not part of the org
we’re in so we tend to struggle a lot with what we need and what corporate
security will allow. None of our compromises are pretty, but all were
necessary and none were close to what we actually wanted.

We have a long list of to-dos that revolve around automation and
sustainability of the env. We are a small team of 2 full-timers and 3
part-timers supporting new development initiatives and new provisioning
initiatives while simultaneously trying to migrate over an enormous aged
infrastructure. It’s been a real challenge and I don’t know if I would
describe our current condition as the outcome of “best practices” but more
of the best possible outcome under current constraints. We are constantly
refactoring and looking for ways to simplify and automate but it’s slow
going sometimes.

Sascha

On Fri, Jun 24, 2011 at 1:34 PM, Jason J. W. Williams <
jasonjwwilliams@gmail.com> wrote:

I’d be interested to find out what others are doing in this area.
Right now we host our Chef server out of Seattle and have our DCs in
Boise and NYC use the Seattle Chef server. Is anyone using Couch
replication in a ring to accomplish distribution?

-J

On Fri, Jun 24, 2011 at 12:30 PM, John Alberts john.m.alberts@gmail.com
wrote:

I’m wondering how others use Chef for multiple datacenters and if they
use multiple Chef servers.
In my case, we have multiple data centers around the world and each
data center has a private network that all of the servers are on. I
currently have a single Chef server at one of the data centers that
every client connects to. This works fine, but it does have
disadvantages.

  1. chef-client runs slow at data centers located on the opposite side
    of the world because of latency and bandwidth.
  2. While the bandwidth usage right now is not much, I’m worried that
    it will be significant as our usage of Chef for various things
    increases.

One obvious solution is to have a separate Chef server at each data
center; however, I am then managing multiple Chef installations that
use 99.9% of the same code. The primary disadvantage of this is that
I then have my nodes on different Chef servers and no way to search
for all of the nodes. Specifically, my monitoring is automated and
needs to be able to get a list of all servers, roles, etc. Having
separate Chef servers really ends up creating a large barrier to
managing all of my nodes.

I’m sure others have run into something similar, so I’m wondering what
others are doing.

One thought was that it would be great if there was some sort of Chef
proxy server that I could have at each location that cached the files
and node data. That way my nodes at a dc could contact the proxy for
all of it’s needs and the proxy would sync up with the main Chef
server every so often. Anyone working on anything like that? :slight_smile:


John Alberts


John Alberts


#6

Please let me know when this presentation is scheduled; I will definitely
attend! :slight_smile: I suspect others looking to use Chef in enterprise environments
would be interested too.

On Fri, Jun 24, 2011 at 2:01 PM, Sascha Bates sascha.bates@gmail.comwrote:

I could probably do a presentation on the challenges of implementing Chef
in a large firewalled environment with a very protective client.


#7

I’ve actually done the same thing using an Ohai plugin. I’m able to parse
data center, server type, and customer region out of our hostnames; however,
I still use data center roles to store other DC specific attributes.

On Fri, Jun 24, 2011 at 2:22 PM, Sascha Bates sascha.bates@gmail.comwrote:

On a similar note, a few weeks ago when we started branching into new data
centers, I got sick of having to hand add roles for locations and
environments (also our dev/qa environments have swimlane designations and a
few other things). Our hostnames are encoded with location, environment,
application, etc. I wrote an ohai plugin that parses the name in
conjunction with a hash of information and returns a couple of top level
node attributes and some lower level info. If you guys encode your host
names, it might be useful. I was talked into refactoring it into a LWRP
because someone wants to use the logic for some other stuff, but I was also
planning to post the basic plugin to github soon. It’ll need reworking with
your own host info, but it shouldn’t be tough.

On Fri, Jun 24, 2011 at 2:13 PM, John Alberts john.m.alberts@gmail.comwrote:

Actually, one thing I think I will do to reduce the bandwidth and
therefore reduct chef-client execution time is use rsync to sync a directory
of tarballs that I use in various recipes. I already have a role for each
data center and I can put an attribute in each data center role pointing the
the local rsync mirror.

Couch replication sounds interesting, but I wonder how well that would
work. Has anyone tried something like that?

John

On Fri, Jun 24, 2011 at 2:01 PM, Sascha Bates sascha.bates@gmail.comwrote:

We have two local data centers and are in the midst of provisioning a 3rd
VMware-only data center. We have chef servers in each environment in each
location. This was dictated in some ways by ops because the management team
for stage and production is a 3rd party company and they allow almost zero
access to their systems. We keep everything for Chef in Subversion in a
cloud environment and every Chef server has one firewall rule that allows it
to connect to an Apache proxy that allows us to pull chef code from svn and
application code coming from Artifactory. We also version our data bags
although we are discussing the wisdom of versioned data bags as it has
resulted in too much config data being kept in data bags instead of
attribute files.

What we don’t have is a common repository for 3rd party packages and tar
balls. Our Chef servers do triple duty as Chef, package repos and kickstart
servers. Generally we put stuff into dev when requested and sync the
repository to other servers as needed. It also requires us to maintain 7
different Chef servers across 3 different data centers and we have only been
so-so at automating our chef servers themselves. I could probably do a
presentation on the challenges of implementing Chef in a large firewalled
environment with a very protective client. Security is not part of the org
we’re in so we tend to struggle a lot with what we need and what corporate
security will allow. None of our compromises are pretty, but all were
necessary and none were close to what we actually wanted.

We have a long list of to-dos that revolve around automation and
sustainability of the env. We are a small team of 2 full-timers and 3
part-timers supporting new development initiatives and new provisioning
initiatives while simultaneously trying to migrate over an enormous aged
infrastructure. It’s been a real challenge and I don’t know if I would
describe our current condition as the outcome of “best practices” but more
of the best possible outcome under current constraints. We are constantly
refactoring and looking for ways to simplify and automate but it’s slow
going sometimes.

Sascha

On Fri, Jun 24, 2011 at 1:34 PM, Jason J. W. Williams <
jasonjwwilliams@gmail.com> wrote:

I’d be interested to find out what others are doing in this area.
Right now we host our Chef server out of Seattle and have our DCs in
Boise and NYC use the Seattle Chef server. Is anyone using Couch
replication in a ring to accomplish distribution?

-J

On Fri, Jun 24, 2011 at 12:30 PM, John Alberts <
john.m.alberts@gmail.com> wrote:

I’m wondering how others use Chef for multiple datacenters and if they
use multiple Chef servers.
In my case, we have multiple data centers around the world and each
data center has a private network that all of the servers are on. I
currently have a single Chef server at one of the data centers that
every client connects to. This works fine, but it does have
disadvantages.

  1. chef-client runs slow at data centers located on the opposite side
    of the world because of latency and bandwidth.
  2. While the bandwidth usage right now is not much, I’m worried that
    it will be significant as our usage of Chef for various things
    increases.

One obvious solution is to have a separate Chef server at each data
center; however, I am then managing multiple Chef installations that
use 99.9% of the same code. The primary disadvantage of this is that
I then have my nodes on different Chef servers and no way to search
for all of the nodes. Specifically, my monitoring is automated and
needs to be able to get a list of all servers, roles, etc. Having
separate Chef servers really ends up creating a large barrier to
managing all of my nodes.

I’m sure others have run into something similar, so I’m wondering what
others are doing.

One thought was that it would be great if there was some sort of Chef
proxy server that I could have at each location that cached the files
and node data. That way my nodes at a dc could contact the proxy for
all of it’s needs and the proxy would sync up with the main Chef
server every so often. Anyone working on anything like that? :slight_smile:


John Alberts


John Alberts


John Alberts


#8

Holy crap. Now I need to write a presentation :slight_smile:

On Fri, Jun 24, 2011 at 2:29 PM, Jeffrey Sussna jes@ingineering.it wrote:

Please let me know when this presentation is scheduled; I will definitely
attend! :slight_smile: I suspect others looking to use Chef in enterprise environments
would be interested too.

On Fri, Jun 24, 2011 at 2:01 PM, Sascha Bates sascha.bates@gmail.comwrote:

I could probably do a presentation on the challenges of implementing Chef
in a large firewalled environment with a very protective client.