Removing autoscaled-down nodes from chef server?

Denis_Haskin1 · October 18, 2012, 4:16pm

Looking for input on how to handle this: we use AWS’s autoscaling to
dynamically manage the number of front-end nodes against traffic, but we
configure and provision the nodes with chef, so they do get registered with
our chef server.

However, when the AWS autoscaling takes nodes down as traffic declines,
they don’t get de-registered from the chef server, so we’re accumulating
lots of invalid entries in our chef server. Makes using the chef search
facility sub-optimal (e.g. run a knife ssh command against a set of nodes
that you’ve identified by search, but a lot of them aren’t there any more
so it takes a long time as the ssh requests time out, etc).

Suggestions on how to handle this? I believe (and am checking) that nodes
get an orderly shutdown when autoscaling decides they’re not needed any
more, so I guess I can hook into something there (these are ubuntu). Is
that the best approach?

Thanks,

–
Denis Haskin

Brian_Hatfield · October 18, 2012, 4:26pm

The resource that has guided me to the solution that's working for us is
here:
http://www.nuvoleconsulting.com/2012/07/02/chef-node-de-registration-for-autoscaling-groups/

I used that as a base for writing this, which has some extra features
useful for my deployment:

On Thu, Oct 18, 2012 at 12:16 PM, Denis Haskin denis@constantorbit.comwrote:

Looking for input on how to handle this: we use AWS's autoscaling to
dynamically manage the number of front-end nodes against traffic, but we
configure and provision the nodes with chef, so they do get registered with
our chef server.

However, when the AWS autoscaling takes nodes down as traffic declines,
they don't get de-registered from the chef server, so we're accumulating
lots of invalid entries in our chef server. Makes using the chef search
facility sub-optimal (e.g. run a knife ssh command against a set of nodes
that you've identified by search, but a lot of them aren't there any more
so it takes a long time as the ssh requests time out, etc).

Suggestions on how to handle this? I believe (and am checking) that nodes
get an orderly shutdown when autoscaling decides they're not needed any
more, so I guess I can hook into something there (these are ubuntu). Is
that the best approach?

Thanks,

--
Denis Haskin

nukemberg · October 19, 2012, 10:41pm

we use a slightly different approach: have a daemon (or better yet, nagios)
do api calls to EC2 and fetch node status. we then fill
node['ec2']['status'] with the status of the node. This way, we can deal
with node that fail amazon's health checks (we filter searches on
node['ec2']['status']) in addition to removing autoscaling nodes. it also
allows dealing with non-autoscaling nodes.

On Thu, Oct 18, 2012 at 6:26 PM, Brian Hatfield bhatfield@brightcove.comwrote:

The resource that has guided me to the solution that's working for us is
here:
http://www.nuvoleconsulting.com/2012/07/02/chef-node-de-registration-for-autoscaling-groups/

I used that as a base for writing this, which has some extra features
useful for my deployment:
GitHub - bmhatfield/chef-deregistration-manager: Queue Based Chef Client Deregistration for the Cloud

On Thu, Oct 18, 2012 at 12:16 PM, Denis Haskin denis@constantorbit.comwrote:

Looking for input on how to handle this: we use AWS's autoscaling to
dynamically manage the number of front-end nodes against traffic, but we
configure and provision the nodes with chef, so they do get registered with
our chef server.

However, when the AWS autoscaling takes nodes down as traffic declines,
they don't get de-registered from the chef server, so we're accumulating
lots of invalid entries in our chef server. Makes using the chef search
facility sub-optimal (e.g. run a knife ssh command against a set of nodes
that you've identified by search, but a lot of them aren't there any more
so it takes a long time as the ssh requests time out, etc).

Suggestions on how to handle this? I believe (and am checking) that
nodes get an orderly shutdown when autoscaling decides they're not needed
any more, so I guess I can hook into something there (these are ubuntu).
Is that the best approach?

Thanks,

--
Denis Haskin

johnmartinez · October 19, 2012, 10:46pm

We have a shutdown init script that does two basic things:

knife node delete
knife client delete

Works great.

Our AWS environments are very ephemeral, so we don’t really care to worry
about reboots and those kinds of things.

-john

On Oct 18, 2012, at 9:17 AM, Denis Haskin denis@constantorbit.com wrote: