Reconfiguring server vs continuous upgrades


#1

ohai chefs,

We manage around 80 servers using chef-server that we host internally and have been doing so for about 18 months. Until a couple months ago, our base image was Ubuntu 11.04, but, in an effort to stay up to date and have access to the latest packages and security features, I moved that up to 12.04. Starting this week, I’ve begun to upgrade our servers using do-release-upgrade and after getting through 4 nodes, I started thinking…

Is there a better way to do this? Right now, I’ve got to ssh into each node, run do-release-upgrade (11.04 -> 11.10), answer some questions, let it run, tell it to replace some files, tell it to not replace some files, let it run some more… let it finish, reboot, run it again (11.10 -> 12.04) and do the same thing again.

Would it be better to just trash the node and configure a new one in place from the image using our chef recipes?

If I did the latter, I’m not entirely sure what the workflow would be. The node would have to have the client.pem file from the old node, but would that Just Work™?

The advantages of configuring from our base image are that it ensures that our cookbooks are all up to date. Many of these nodes were configured over a year ago using completely different cookbooks and updates have been run on top of already configured nodes. The only reason I’m confident that this will work is that I’ve configured nodes from most of the roles recently on the new ubuntu.

So, what do you guys do in cases like this? What’s the workflow look like? Is it as easy as just saving the client.pem, zapping the node, cloning from the base image template, putting client.pem back and running chef-client? Are there any gotchas I should be worried about?

Or is there a better way?

Thanks!

…spike


#2

Hey Spike,

Our workflow for replacing nodes is to completely trash them and start from
scratch. New node, new client, fresh bootstrapping.

A few things that make this possible for us:

Base Images: We start with an Amazon AMI that has our validation key. We
bootstrap the node, get the client key, and remove the master key.

No unique node data: No node.set for Normal attributes, everything is set
before the Chef run starts. There is no data in Chef that depends on the
"state" of the node.

I would recommend discarding the Client information on the old nodes as you
bring up their replacements. IMO you should not view nodes/clients as
anything more than disposable identifiers/authentication information. The
bonus of this approach is that if you have the spare capacity, you can
bring up the new nodes before you remove the old ones, allowing you to
guard for failures for any particularly fragile machines.

Andrew

On Fri, Apr 12, 2013 at 3:02 PM, Spike Grobstein
spike@ticketevolution.comwrote:

ohai chefs,

We manage around 80 servers using chef-server that we host internally and
have been doing so for about 18 months. Until a couple months ago, our base
image was Ubuntu 11.04, but, in an effort to stay up to date and have
access to the latest packages and security features, I moved that up to
12.04. Starting this week, I’ve begun to upgrade our servers using
do-release-upgrade and after getting through 4 nodes, I started
thinking…

Is there a better way to do this? Right now, I’ve got to ssh into each
node, run do-release-upgrade (11.04 -> 11.10), answer some questions, let
it run, tell it to replace some files, tell it to not replace some files,
let it run some more… let it finish, reboot, run it again (11.10 ->
12.04) and do the same thing again.

Would it be better to just trash the node and configure a new one in place
from the image using our chef recipes?

If I did the latter, I’m not entirely sure what the workflow would be. The
node would have to have the client.pem file from the old node, but would
that Just Work™?

The advantages of configuring from our base image are that it ensures that
our cookbooks are all up to date. Many of these nodes were configured over
a year ago using completely different cookbooks and updates have been run
on top of already configured nodes. The only reason I’m confident that this
will work is that I’ve configured nodes from most of the roles recently on
the new ubuntu.

So, what do you guys do in cases like this? What’s the workflow look like?
Is it as easy as just saving the client.pem, zapping the node, cloning from
the base image template, putting client.pem back and running chef-client?
Are there any gotchas I should be worried about?

Or is there a better way?

Thanks!

…spike


#3

Trash the node, and everything with it. You really want to pretend its
a new system.

Upgrades typically contain nasty surprises, and finding the edge cases
for them can be pretty difficult. Also, this method gets rid of any
admin cruft (i.e. random apt-get install ) that might be there.

I don’t bother trying to save the pem files, since its easy enough to
blow away the client records and start over.

//geoff

On Fri, Apr 12, 2013 at 12:13 PM, Andrew Gross andrew@yipit.com wrote:

Hey Spike,

Our workflow for replacing nodes is to completely trash them and start from
scratch. New node, new client, fresh bootstrapping.

A few things that make this possible for us:

Base Images: We start with an Amazon AMI that has our validation key. We
bootstrap the node, get the client key, and remove the master key.

No unique node data: No node.set for Normal attributes, everything is set
before the Chef run starts. There is no data in Chef that depends on the
"state" of the node.

I would recommend discarding the Client information on the old nodes as you
bring up their replacements. IMO you should not view nodes/clients as
anything more than disposable identifiers/authentication information. The
bonus of this approach is that if you have the spare capacity, you can bring
up the new nodes before you remove the old ones, allowing you to guard for
failures for any particularly fragile machines.

Andrew

On Fri, Apr 12, 2013 at 3:02 PM, Spike Grobstein spike@ticketevolution.com
wrote:

ohai chefs,

We manage around 80 servers using chef-server that we host internally and
have been doing so for about 18 months. Until a couple months ago, our base
image was Ubuntu 11.04, but, in an effort to stay up to date and have access
to the latest packages and security features, I moved that up to 12.04.
Starting this week, I’ve begun to upgrade our servers using
do-release-upgrade and after getting through 4 nodes, I started
thinking…

Is there a better way to do this? Right now, I’ve got to ssh into each
node, run do-release-upgrade (11.04 -> 11.10), answer some questions, let it
run, tell it to replace some files, tell it to not replace some files, let
it run some more… let it finish, reboot, run it again (11.10 -> 12.04) and
do the same thing again.

Would it be better to just trash the node and configure a new one in place
from the image using our chef recipes?

If I did the latter, I’m not entirely sure what the workflow would be. The
node would have to have the client.pem file from the old node, but would
that Just Work™?

The advantages of configuring from our base image are that it ensures that
our cookbooks are all up to date. Many of these nodes were configured over a
year ago using completely different cookbooks and updates have been run on
top of already configured nodes. The only reason I’m confident that this
will work is that I’ve configured nodes from most of the roles recently on
the new ubuntu.

So, what do you guys do in cases like this? What’s the workflow look like?
Is it as easy as just saving the client.pem, zapping the node, cloning from
the base image template, putting client.pem back and running chef-client?
Are there any gotchas I should be worried about?

Or is there a better way?

Thanks!

…spike


//geoff


#4

If you are on some cloud provider, +1 for trashing the node. It not only
ensures that you have a clean system to begin with, it also ensures you
have the ability to spawn instances, whenever required,

On Fri, Apr 12, 2013 at 12:22 PM, Geoff Papilion geoffp@wikia-inc.comwrote:

Trash the node, and everything with it. You really want to pretend its
a new system.

Upgrades typically contain nasty surprises, and finding the edge cases
for them can be pretty difficult. Also, this method gets rid of any
admin cruft (i.e. random apt-get install ) that might be there.

I don’t bother trying to save the pem files, since its easy enough to
blow away the client records and start over.

//geoff

On Fri, Apr 12, 2013 at 12:13 PM, Andrew Gross andrew@yipit.com wrote:

Hey Spike,

Our workflow for replacing nodes is to completely trash them and start
from
scratch. New node, new client, fresh bootstrapping.

A few things that make this possible for us:

Base Images: We start with an Amazon AMI that has our validation key.
We
bootstrap the node, get the client key, and remove the master key.

No unique node data: No node.set for Normal attributes, everything is set
before the Chef run starts. There is no data in Chef that depends on the
"state" of the node.

I would recommend discarding the Client information on the old nodes as
you
bring up their replacements. IMO you should not view nodes/clients as
anything more than disposable identifiers/authentication information.
The
bonus of this approach is that if you have the spare capacity, you can
bring
up the new nodes before you remove the old ones, allowing you to guard
for
failures for any particularly fragile machines.

Andrew

On Fri, Apr 12, 2013 at 3:02 PM, Spike Grobstein <
spike@ticketevolution.com>
wrote:

ohai chefs,

We manage around 80 servers using chef-server that we host internally
and

have been doing so for about 18 months. Until a couple months ago, our
base

image was Ubuntu 11.04, but, in an effort to stay up to date and have
access

to the latest packages and security features, I moved that up to 12.04.
Starting this week, I’ve begun to upgrade our servers using
do-release-upgrade and after getting through 4 nodes, I started
thinking…

Is there a better way to do this? Right now, I’ve got to ssh into each
node, run do-release-upgrade (11.04 -> 11.10), answer some questions,
let it

run, tell it to replace some files, tell it to not replace some files,
let

it run some more… let it finish, reboot, run it again (11.10 ->
12.04) and

do the same thing again.

Would it be better to just trash the node and configure a new one in
place

from the image using our chef recipes?

If I did the latter, I’m not entirely sure what the workflow would be.
The

node would have to have the client.pem file from the old node, but would
that Just Work™?

The advantages of configuring from our base image are that it ensures
that

our cookbooks are all up to date. Many of these nodes were configured
over a

year ago using completely different cookbooks and updates have been run
on

top of already configured nodes. The only reason I’m confident that this
will work is that I’ve configured nodes from most of the roles recently
on

the new ubuntu.

So, what do you guys do in cases like this? What’s the workflow look
like?

Is it as easy as just saving the client.pem, zapping the node, cloning
from

the base image template, putting client.pem back and running
chef-client?

Are there any gotchas I should be worried about?

Or is there a better way?

Thanks!

…spike


//geoff


#5

Hey Andrew (and everyone else replying as I compose this),

Thanks for the info. A lot of solid points.

I have a lot of data stored in the node’s attributes (and edited via knife node edit <nodename>). I guess this is where it would be better to use databags? We have things like scout (scoutapp.com) api keys in there. I guess this could also be solved by hitting the API and pulling down the key at configure time.

So this opens up some more questions… There are cases that I’ll configure a node with a postgres role. I then use the node’s attributes to configure whether it’s a master or a slave, and if it is either, which node it will replicate from/to. In the case where I’d be reconfiguring one of those, but I want to retain that configuration, what would be the best way to do that? Specific roles for each of those specific cases with the required attributes? Or some databag trick?

I’ve got some other details I need to work out now, too, but I should be able to work that out on my own. Namely how to handle our internal DNS changes. I have straight-up File resources for the BIND configs that I modify when I add new nodes and we name the nodes serially based on role (eg: app001, app002, resque001, db001, db002, etc), so I’ll have to figure out if that was a solid choice and if there’s a better way to do that.

Unfortunately we’re not on a true cloud provider, so it looks like there’s going to be some amount of manual work no matter what. But I can just drop nodes and bring up new ones, so it’s not a huge deal.

…spike

On Apr 12, 2013, at 3:13 PM, Andrew Gross wrote:

Hey Spike,

Our workflow for replacing nodes is to completely trash them and start from scratch. New node, new client, fresh bootstrapping.

A few things that make this possible for us:

Base Images: We start with an Amazon AMI that has our validation key. We bootstrap the node, get the client key, and remove the master key.

No unique node data: No node.set for Normal attributes, everything is set before the Chef run starts. There is no data in Chef that depends on the “state” of the node.

I would recommend discarding the Client information on the old nodes as you bring up their replacements. IMO you should not view nodes/clients as anything more than disposable identifiers/authentication information. The bonus of this approach is that if you have the spare capacity, you can bring up the new nodes before you remove the old ones, allowing you to guard for failures for any particularly fragile machines.

Andrew

On Fri, Apr 12, 2013 at 3:02 PM, Spike Grobstein spike@ticketevolution.com wrote:
ohai chefs,

We manage around 80 servers using chef-server that we host internally and have been doing so for about 18 months. Until a couple months ago, our base image was Ubuntu 11.04, but, in an effort to stay up to date and have access to the latest packages and security features, I moved that up to 12.04. Starting this week, I’ve begun to upgrade our servers using do-release-upgrade and after getting through 4 nodes, I started thinking…

Is there a better way to do this? Right now, I’ve got to ssh into each node, run do-release-upgrade (11.04 -> 11.10), answer some questions, let it run, tell it to replace some files, tell it to not replace some files, let it run some more… let it finish, reboot, run it again (11.10 -> 12.04) and do the same thing again.

Would it be better to just trash the node and configure a new one in place from the image using our chef recipes?

If I did the latter, I’m not entirely sure what the workflow would be. The node would have to have the client.pem file from the old node, but would that Just Work™?

The advantages of configuring from our base image are that it ensures that our cookbooks are all up to date. Many of these nodes were configured over a year ago using completely different cookbooks and updates have been run on top of already configured nodes. The only reason I’m confident that this will work is that I’ve configured nodes from most of the roles recently on the new ubuntu.

So, what do you guys do in cases like this? What’s the workflow look like? Is it as easy as just saving the client.pem, zapping the node, cloning from the base image template, putting client.pem back and running chef-client? Are there any gotchas I should be worried about?

Or is there a better way?

Thanks!

…spike


#6

My personal preference is to avoid node attributes for everything
unless they are populated by ohai. Because I don’t feel like its
automated if I have to type “knife node edit” to have a fully working
node.

We use roles to apply some attributes for things like master slave
relations in our DBs, this has some downsides but allows us to manage
these relations easier.

We assign hostnames before chef as part of the install process using FAI.

//geoff

On Fri, Apr 12, 2013 at 12:31 PM, Spike Grobstein
spike@ticketevolution.com wrote:

Hey Andrew (and everyone else replying as I compose this),

Thanks for the info. A lot of solid points.

I have a lot of data stored in the node’s attributes (and edited via knife node edit <nodename>). I guess this is where it would be better to use
databags? We have things like scout (scoutapp.com) api keys in there. I
guess this could also be solved by hitting the API and pulling down the key
at configure time.

So this opens up some more questions… There are cases that I’ll configure
a node with a postgres role. I then use the node’s attributes to configure
whether it’s a master or a slave, and if it is either, which node it will
replicate from/to. In the case where I’d be reconfiguring one of those, but
I want to retain that configuration, what would be the best way to do that?
Specific roles for each of those specific cases with the required
attributes? Or some databag trick?

I’ve got some other details I need to work out now, too, but I should be
able to work that out on my own. Namely how to handle our internal DNS
changes. I have straight-up File resources for the BIND configs that I
modify when I add new nodes and we name the nodes serially based on role
(eg: app001, app002, resque001, db001, db002, etc), so I’ll have to figure
out if that was a solid choice and if there’s a better way to do that.

Unfortunately we’re not on a true cloud provider, so it looks like there’s
going to be some amount of manual work no matter what. But I can just drop
nodes and bring up new ones, so it’s not a huge deal.

…spike

On Apr 12, 2013, at 3:13 PM, Andrew Gross wrote:

Hey Spike,

Our workflow for replacing nodes is to completely trash them and start from
scratch. New node, new client, fresh bootstrapping.

A few things that make this possible for us:

Base Images: We start with an Amazon AMI that has our validation key. We
bootstrap the node, get the client key, and remove the master key.

No unique node data: No node.set for Normal attributes, everything is set
before the Chef run starts. There is no data in Chef that depends on the
"state" of the node.

I would recommend discarding the Client information on the old nodes as you
bring up their replacements. IMO you should not view nodes/clients as
anything more than disposable identifiers/authentication information. The
bonus of this approach is that if you have the spare capacity, you can bring
up the new nodes before you remove the old ones, allowing you to guard for
failures for any particularly fragile machines.

Andrew

On Fri, Apr 12, 2013 at 3:02 PM, Spike Grobstein spike@ticketevolution.com
wrote:

ohai chefs,

We manage around 80 servers using chef-server that we host internally and
have been doing so for about 18 months. Until a couple months ago, our base
image was Ubuntu 11.04, but, in an effort to stay up to date and have access
to the latest packages and security features, I moved that up to 12.04.
Starting this week, I’ve begun to upgrade our servers using
do-release-upgrade and after getting through 4 nodes, I started
thinking…

Is there a better way to do this? Right now, I’ve got to ssh into each
node, run do-release-upgrade (11.04 -> 11.10), answer some questions, let it
run, tell it to replace some files, tell it to not replace some files, let
it run some more… let it finish, reboot, run it again (11.10 -> 12.04) and
do the same thing again.

Would it be better to just trash the node and configure a new one in place
from the image using our chef recipes?

If I did the latter, I’m not entirely sure what the workflow would be. The
node would have to have the client.pem file from the old node, but would
that Just Work™?

The advantages of configuring from our base image are that it ensures that
our cookbooks are all up to date. Many of these nodes were configured over a
year ago using completely different cookbooks and updates have been run on
top of already configured nodes. The only reason I’m confident that this
will work is that I’ve configured nodes from most of the roles recently on
the new ubuntu.

So, what do you guys do in cases like this? What’s the workflow look like?
Is it as easy as just saving the client.pem, zapping the node, cloning from
the base image template, putting client.pem back and running chef-client?
Are there any gotchas I should be worried about?

Or is there a better way?

Thanks!

…spike


//geoff


#7

what’s FAI?

Last year, when I noticed I was typing too much, I built a little utility where, once I added a DNS entry for a node, I could configure any node just by telling it its name. Basically I would call it with app001.staging and it would know that it should use the app role and be in the staging environment. From there, it would connect to the node, do a DNS lookup for that hostname and then configure the hostname and network interfaces based on that information, reboot, then call knife bootstrap with the appropriate arguments. I’m still not sure if there’s a better way to do that, but that’s probably a conversation for a different thread.

…spike

On Apr 12, 2013, at 3:46 PM, Geoff Papilion wrote:

My personal preference is to avoid node attributes for everything
unless they are populated by ohai. Because I don’t feel like its
automated if I have to type “knife node edit” to have a fully working
node.

We use roles to apply some attributes for things like master slave
relations in our DBs, this has some downsides but allows us to manage
these relations easier.

We assign hostnames before chef as part of the install process using FAI.

//geoff

On Fri, Apr 12, 2013 at 12:31 PM, Spike Grobstein
spike@ticketevolution.com wrote:

Hey Andrew (and everyone else replying as I compose this),

Thanks for the info. A lot of solid points.

I have a lot of data stored in the node’s attributes (and edited via knife node edit <nodename>). I guess this is where it would be better to use
databags? We have things like scout (scoutapp.com) api keys in there. I
guess this could also be solved by hitting the API and pulling down the key
at configure time.

So this opens up some more questions… There are cases that I’ll configure
a node with a postgres role. I then use the node’s attributes to configure
whether it’s a master or a slave, and if it is either, which node it will
replicate from/to. In the case where I’d be reconfiguring one of those, but
I want to retain that configuration, what would be the best way to do that?
Specific roles for each of those specific cases with the required
attributes? Or some databag trick?

I’ve got some other details I need to work out now, too, but I should be
able to work that out on my own. Namely how to handle our internal DNS
changes. I have straight-up File resources for the BIND configs that I
modify when I add new nodes and we name the nodes serially based on role
(eg: app001, app002, resque001, db001, db002, etc), so I’ll have to figure
out if that was a solid choice and if there’s a better way to do that.

Unfortunately we’re not on a true cloud provider, so it looks like there’s
going to be some amount of manual work no matter what. But I can just drop
nodes and bring up new ones, so it’s not a huge deal.

…spike

On Apr 12, 2013, at 3:13 PM, Andrew Gross wrote:

Hey Spike,

Our workflow for replacing nodes is to completely trash them and start from
scratch. New node, new client, fresh bootstrapping.

A few things that make this possible for us:

Base Images: We start with an Amazon AMI that has our validation key. We
bootstrap the node, get the client key, and remove the master key.

No unique node data: No node.set for Normal attributes, everything is set
before the Chef run starts. There is no data in Chef that depends on the
"state" of the node.

I would recommend discarding the Client information on the old nodes as you
bring up their replacements. IMO you should not view nodes/clients as
anything more than disposable identifiers/authentication information. The
bonus of this approach is that if you have the spare capacity, you can bring
up the new nodes before you remove the old ones, allowing you to guard for
failures for any particularly fragile machines.

Andrew

On Fri, Apr 12, 2013 at 3:02 PM, Spike Grobstein spike@ticketevolution.com
wrote:

ohai chefs,

We manage around 80 servers using chef-server that we host internally and
have been doing so for about 18 months. Until a couple months ago, our base
image was Ubuntu 11.04, but, in an effort to stay up to date and have access
to the latest packages and security features, I moved that up to 12.04.
Starting this week, I’ve begun to upgrade our servers using
do-release-upgrade and after getting through 4 nodes, I started
thinking…

Is there a better way to do this? Right now, I’ve got to ssh into each
node, run do-release-upgrade (11.04 -> 11.10), answer some questions, let it
run, tell it to replace some files, tell it to not replace some files, let
it run some more… let it finish, reboot, run it again (11.10 -> 12.04) and
do the same thing again.

Would it be better to just trash the node and configure a new one in place
from the image using our chef recipes?

If I did the latter, I’m not entirely sure what the workflow would be. The
node would have to have the client.pem file from the old node, but would
that Just Work™?

The advantages of configuring from our base image are that it ensures that
our cookbooks are all up to date. Many of these nodes were configured over a
year ago using completely different cookbooks and updates have been run on
top of already configured nodes. The only reason I’m confident that this
will work is that I’ve configured nodes from most of the roles recently on
the new ubuntu.

So, what do you guys do in cases like this? What’s the workflow look like?
Is it as easy as just saving the client.pem, zapping the node, cloning from
the base image template, putting client.pem back and running chef-client?
Are there any gotchas I should be worried about?

Or is there a better way?

Thanks!

…spike


//geoff


#8
  1. There are more than one knife plugin that lets you backing up the node,
    client and almost all the data from chef-server (chef 11 comes with knife
    essential, knife server also lets you do this)
  2. I’ll always prefer modeling customizable part using attributes, as they
    let me reuse the recipes without tampering them.
  3. When using postgres master/slave like scenario (or in general a
    role/recipe that require attribute from other nodes) i tend to use search
    features , also i scope them using the current node’s chef environment
  4. I tend to use datanags only when the the data im dealing with global,
    not specific to the node k or its need to be encrypted (remember every
    databag call is a network call, so they are much slower than accessing
    node’s attributes)

some components might surface chicken n egg like scenario, it best not to
automate them upfront, i’ve tried to contain them to a handful, and then
slowly automate them part by part once the rest of the dependent system is
automated

On Fri, Apr 12, 2013 at 12:31 PM, Spike Grobstein <spike@ticketevolution.com

wrote:

Hey Andrew (and everyone else replying as I compose this),

Thanks for the info. A lot of solid points.

I have a lot of data stored in the node’s attributes (and edited via
knife node edit <nodename>). I guess this is where it would be better to
use databags? We have things like scout (scoutapp.com) api keys in there.
I guess this could also be solved by hitting the API and pulling down the
key at configure time.

So this opens up some more questions… There are cases that I’ll
configure a node with a postgres role. I then use the node’s attributes to
configure whether it’s a master or a slave, and if it is either, which node
it will replicate from/to. In the case where I’d be reconfiguring one of
those, but I want to retain that configuration, what would be the best way
to do that? Specific roles for each of those specific cases with the
required attributes? Or some databag trick?

I’ve got some other details I need to work out now, too, but I should be
able to work that out on my own. Namely how to handle our internal DNS
changes. I have straight-up File resources for the BIND configs that I
modify when I add new nodes and we name the nodes serially based on role
(eg: app001, app002, resque001, db001, db002, etc), so I’ll have to figure
out if that was a solid choice and if there’s a better way to do that.

Unfortunately we’re not on a true cloud provider, so it looks like there’s
going to be some amount of manual work no matter what. But I can just drop
nodes and bring up new ones, so it’s not a huge deal.

…spike

On Apr 12, 2013, at 3:13 PM, Andrew Gross wrote:

Hey Spike,

Our workflow for replacing nodes is to completely trash them and start
from scratch. New node, new client, fresh bootstrapping.

A few things that make this possible for us:

Base Images: We start with an Amazon AMI that has our validation key. We
bootstrap the node, get the client key, and remove the master key.

No unique node data: No node.set for Normal attributes, everything is set
before the Chef run starts. There is no data in Chef that depends on the
"state" of the node.

I would recommend discarding the Client information on the old nodes as
you bring up their replacements. IMO you should not view nodes/clients as
anything more than disposable identifiers/authentication information. The
bonus of this approach is that if you have the spare capacity, you can
bring up the new nodes before you remove the old ones, allowing you to
guard for failures for any particularly fragile machines.

Andrew

On Fri, Apr 12, 2013 at 3:02 PM, Spike Grobstein <
spike@ticketevolution.com> wrote:

ohai chefs,

We manage around 80 servers using chef-server that we host internally and
have been doing so for about 18 months. Until a couple months ago, our base
image was Ubuntu 11.04, but, in an effort to stay up to date and have
access to the latest packages and security features, I moved that up to
12.04. Starting this week, I’ve begun to upgrade our servers using
do-release-upgrade and after getting through 4 nodes, I started
thinking…

Is there a better way to do this? Right now, I’ve got to ssh into each
node, run do-release-upgrade (11.04 -> 11.10), answer some questions, let
it run, tell it to replace some files, tell it to not replace some files,
let it run some more… let it finish, reboot, run it again (11.10 ->
12.04) and do the same thing again.

Would it be better to just trash the node and configure a new one in
place from the image using our chef recipes?

If I did the latter, I’m not entirely sure what the workflow would be.
The node would have to have the client.pem file from the old node, but
would that Just Work™?

The advantages of configuring from our base image are that it ensures
that our cookbooks are all up to date. Many of these nodes were configured
over a year ago using completely different cookbooks and updates have been
run on top of already configured nodes. The only reason I’m confident that
this will work is that I’ve configured nodes from most of the roles
recently on the new ubuntu.

So, what do you guys do in cases like this? What’s the workflow look
like? Is it as easy as just saving the client.pem, zapping the node,
cloning from the base image template, putting client.pem back and running
chef-client? Are there any gotchas I should be worried about?

Or is there a better way?

Thanks!

…spike


#9

On Friday, April 12, 2013 at 12:31 PM, Spike Grobstein wrote:

Hey Andrew (and everyone else replying as I compose this),

Thanks for the info. A lot of solid points.

I have a lot of data stored in the node’s attributes (and edited via knife node edit <nodename>). I guess this is where it would be better to use databags? We have things like scout (scoutapp.com (http://scoutapp.com)) api keys in there. I guess this could also be solved by hitting the API and pulling down the key at configure time.

So this opens up some more questions… There are cases that I’ll configure a node with a postgres role. I then use the node’s attributes to configure whether it’s a master or a slave, and if it is either, which node it will replicate from/to. In the case where I’d be reconfiguring one of those, but I want to retain that configuration, what would be the best way to do that? Specific roles for each of those specific cases with the required attributes? Or some databag trick?

I’ve got some other details I need to work out now, too, but I should be able to work that out on my own. Namely how to handle our internal DNS changes. I have straight-up File resources for the BIND configs that I modify when I add new nodes and we name the nodes serially based on role (eg: app001, app002, resque001, db001, db002, etc), so I’ll have to figure out if that was a solid choice and if there’s a better way to do that.
We generate hostnames from $PRIMARY_ROLE-$SLUG.$DOMAIN where:

$PRIMARY_ROLE comes from a case statement in a recipe. It looks at node[:roles] and picks the “most important” role. This is just for convenience so you see what kind of machine you’re on from the hostname in the prompt.

$SLUG is the cloud instance id when using a cloud, or picked from a generated UUID

The only tricky bit is that you need to configure chef with a static node_name setting instead of using FQDN, and if you want this to match the hostname, then you need to have a script generate the slug before Chef runs.

In any case, I find sequential integers in hostnames to be a PITA in an automated environment so I’d recommend migrating to a different scheme.


Daniel DeLeo


#10

Hey Spike,

How we get around using custom node information:

  1. Chef Search: Our Redis slaves are launched with a ‘redis-slave’ role
    that does nothing different from the regular redis role. It is only there
    so we can search for nodes with that role and apply config changes
    appropriately.

  2. Databags: This is where we store API keys etc. Bonus points for using
    encrypted versions.

On Fri, Apr 12, 2013 at 3:54 PM, Daniel DeLeo dan@kallistec.com wrote:

On Friday, April 12, 2013 at 12:31 PM, Spike Grobstein wrote:

Hey Andrew (and everyone else replying as I compose this),

Thanks for the info. A lot of solid points.

I have a lot of data stored in the node’s attributes (and edited via
knife node edit <nodename>). I guess this is where it would be better to
use databags? We have things like scout (scoutapp.com) api keys in there.
I guess this could also be solved by hitting the API and pulling down the
key at configure time.

So this opens up some more questions… There are cases that I’ll
configure a node with a postgres role. I then use the node’s attributes to
configure whether it’s a master or a slave, and if it is either, which node
it will replicate from/to. In the case where I’d be reconfiguring one of
those, but I want to retain that configuration, what would be the best way
to do that? Specific roles for each of those specific cases with the
required attributes? Or some databag trick?

I’ve got some other details I need to work out now, too, but I should be
able to work that out on my own. Namely how to handle our internal DNS
changes. I have straight-up File resources for the BIND configs that I
modify when I add new nodes and we name the nodes serially based on role
(eg: app001, app002, resque001, db001, db002, etc), so I’ll have to figure
out if that was a solid choice and if there’s a better way to do that.

We generate hostnames from $PRIMARY_ROLE-$SLUG.$DOMAIN where:

$PRIMARY_ROLE comes from a case statement in a recipe. It looks at
node[:roles] and picks the “most important” role. This is just for
convenience so you see what kind of machine you’re on from the hostname in
the prompt.

$SLUG is the cloud instance id when using a cloud, or picked from a
generated UUID

The only tricky bit is that you need to configure chef with a static
node_name setting instead of using FQDN, and if you want this to match the
hostname, then you need to have a script generate the slug before Chef runs.

In any case, I find sequential integers in hostnames to be a PITA in an
automated environment so I’d recommend migrating to a different scheme.


Daniel DeLeo


#11

On Fri, Apr 12, 2013 at 9:46 PM, Geoff Papilion geoffp@wikia-inc.comwrote:

My personal preference is to avoid node attributes for everything
unless they are populated by ohai. Because I don’t feel like its
automated if I have to type “knife node edit” to have a fully working
node.

You could also store the node attributes in a json file, e.g. initially via
knife node show <nodename> -Fjson > nodes/<nodename>.json and keep this
under version control. Now whenever you change node attributes, you modify
the local json file, then use knife node from file nodes/<nodename>.json
to bulk update the node attributes in an automation-friendly way.

IMHO knife node edit is dangerous and should be avoided if you want
"infrastructure-as-code"

-Torben