Hey Andrew (and everyone else replying as I compose this),
Thanks for the info. A lot of solid points.
I have a lot of data stored in the node’s attributes (and edited via
knife node edit <nodename>). I guess this is where it would be better to use databags? We have things like scout (scoutapp.com) api keys in there. I guess this could also be solved by hitting the API and pulling down the key at configure time.
So this opens up some more questions… There are cases that I’ll configure a node with a postgres role. I then use the node’s attributes to configure whether it’s a master or a slave, and if it is either, which node it will replicate from/to. In the case where I’d be reconfiguring one of those, but I want to retain that configuration, what would be the best way to do that? Specific roles for each of those specific cases with the required attributes? Or some databag trick?
I’ve got some other details I need to work out now, too, but I should be able to work that out on my own. Namely how to handle our internal DNS changes. I have straight-up File resources for the BIND configs that I modify when I add new nodes and we name the nodes serially based on role (eg: app001, app002, resque001, db001, db002, etc), so I’ll have to figure out if that was a solid choice and if there’s a better way to do that.
Unfortunately we’re not on a true cloud provider, so it looks like there’s going to be some amount of manual work no matter what. But I can just drop nodes and bring up new ones, so it’s not a huge deal.
On Apr 12, 2013, at 3:13 PM, Andrew Gross wrote:
Our workflow for replacing nodes is to completely trash them and start from scratch. New node, new client, fresh bootstrapping.
A few things that make this possible for us:
Base Images: We start with an Amazon AMI that has our validation key. We bootstrap the node, get the client key, and remove the master key.
No unique node data: No node.set for Normal attributes, everything is set before the Chef run starts. There is no data in Chef that depends on the “state” of the node.
I would recommend discarding the Client information on the old nodes as you bring up their replacements. IMO you should not view nodes/clients as anything more than disposable identifiers/authentication information. The bonus of this approach is that if you have the spare capacity, you can bring up the new nodes before you remove the old ones, allowing you to guard for failures for any particularly fragile machines.
On Fri, Apr 12, 2013 at 3:02 PM, Spike Grobstein firstname.lastname@example.org wrote:
We manage around 80 servers using chef-server that we host internally and have been doing so for about 18 months. Until a couple months ago, our base image was Ubuntu 11.04, but, in an effort to stay up to date and have access to the latest packages and security features, I moved that up to 12.04. Starting this week, I’ve begun to upgrade our servers using
do-release-upgrade and after getting through 4 nodes, I started thinking…
Is there a better way to do this? Right now, I’ve got to ssh into each node, run do-release-upgrade (11.04 -> 11.10), answer some questions, let it run, tell it to replace some files, tell it to not replace some files, let it run some more… let it finish, reboot, run it again (11.10 -> 12.04) and do the same thing again.
Would it be better to just trash the node and configure a new one in place from the image using our chef recipes?
If I did the latter, I’m not entirely sure what the workflow would be. The node would have to have the client.pem file from the old node, but would that Just Work™?
The advantages of configuring from our base image are that it ensures that our cookbooks are all up to date. Many of these nodes were configured over a year ago using completely different cookbooks and updates have been run on top of already configured nodes. The only reason I’m confident that this will work is that I’ve configured nodes from most of the roles recently on the new ubuntu.
So, what do you guys do in cases like this? What’s the workflow look like? Is it as easy as just saving the client.pem, zapping the node, cloning from the base image template, putting client.pem back and running chef-client? Are there any gotchas I should be worried about?
Or is there a better way?