Clients able to delete themselves / auto-scaling


#1

We’re running some tests with EC2 auto scaling. It’s working great when the
nodes come up, and we have them deleting their node in the chef server when
they get scaled down, but we have lots of leftover clients, since a client
can’t delete itself.

How would folks feel about a patch to allow a client to delete itself even
if it’s not an admin?

Does anyone have alternate suggestions that we might have missed?

Thanks.


#2

]] Michael Ivey

Does anyone have alternate suggestions that we might have missed?

Just have a cron job that looks for nodes that haven’t check in in
$interval and remove those?


Tollef Fog Heen
UNIX is user friendly, it’s just picky about who its friends are


#3

You can also write a knife plugin that compares the list of nodes known to
the chef server and compare that list to the output of
ec2-describe-instances. Anything that’s in chef but not in ec2-desc can be
queued for node/client deletion. You can run this as a cron job on your
chef server or some other management node if you are using hosted chef.

Be sure that this process is robust and runs on a tight enough loop. If you
are using fqdn’s for node names you can get bitten when ec2 reuses them.
We’ve started using the ec2 instance id as node name to help here but if
you don’t delete old instances in a timely manner, they can show up in a
search.

chris

On Tue, May 15, 2012 at 1:35 PM, Tollef Fog Heen tfheen@err.no wrote:

]] Michael Ivey

Does anyone have alternate suggestions that we might have missed?

Just have a cron job that looks for nodes that haven’t check in in
$interval and remove those?


Tollef Fog Heen
UNIX is user friendly, it’s just picky about who its friends are


#4

Here’s a rake task I use.

YMMV

Alex

On May 15, 2012, at 10:18 AM, Michael Ivey wrote:

We’re running some tests with EC2 auto scaling. It’s working great when the nodes come up, and we have them deleting their node in the chef server when they get scaled down, but we have lots of leftover clients, since a client can’t delete itself.

How would folks feel about a patch to allow a client to delete itself even if it’s not an admin?

Does anyone have alternate suggestions that we might have missed?

Thanks.


#5

On Tue, May 15, 2012 at 1:18 PM, Michael Ivey ivey@gweezlebur.com wrote:

We’re running some tests with EC2 auto scaling. It’s working great when the
nodes come up, and we have them deleting their node in the chef server when
they get scaled down, but we have lots of leftover clients, since a client
can’t delete itself.

There’s some past discussion on this topic here:
http://tickets.opscode.com/browse/CHEF-1867

How would folks feel about a patch to allow a client to delete itself even
if it’s not an admin?

I believe that permission would be okay.

Bryan


#6

Often times you can build it into your monitoring framework as well. For
example, we use Sensu at work and have a handler for keepalive failures
that will call out to ec2 and if the instance has been terminated make the
call out to the chef api to delete the node/client.

On Tue, May 15, 2012 at 2:08 PM, Bryan McLellan btm@loftninjas.org wrote:

On Tue, May 15, 2012 at 1:18 PM, Michael Ivey ivey@gweezlebur.com wrote:

We’re running some tests with EC2 auto scaling. It’s working great when
the
nodes come up, and we have them deleting their node in the chef server
when
they get scaled down, but we have lots of leftover clients, since a
client
can’t delete itself.

There’s some past discussion on this topic here:
http://tickets.opscode.com/browse/CHEF-1867

How would folks feel about a patch to allow a client to delete itself
even
if it’s not an admin?

I believe that permission would be okay.

Bryan


#7

we use a custom daemon polling EC2 api then updating chef-server and
nagios (passive check) with the status of nodes. if node status changes
to “terminated” then it is cleaned up after a grace period.

Regards,
Avishai

On 15/05/12 20:18, Michael Ivey wrote:

We’re running some tests with EC2 auto scaling. It’s working great
when the nodes come up, and we have them deleting their node in the
chef server when they get scaled down, but we have lots of leftover
clients, since a client can’t delete itself.

How would folks feel about a patch to allow a client to delete itself
even if it’s not an admin?

Does anyone have alternate suggestions that we might have missed?

Thanks.


#8

On May 15, 2012, at 1:00 PM, Chris Chalfant wrote:

You can also write a knife plugin that compares the list of nodes known to the chef server and compare that list to the output of ec2-describe-instances. Anything that’s in chef but not in ec2-desc can be queued for node/client deletion. You can run this as a cron job on your chef server or some other management node if you are using hosted chef.

For my previous employer, I wrote a shell script that basically did the same sort of thing. However, I kept running into edge cases where the code needed to be modified so that it didn’t accidentally blow away clients of one sort or another – like the $COMPANY-validator client, the -dev clients that we spun up with Chef and then did a “knife node delete” so that we couldn’t accidentally re-run chef-client and wipe out development work that had been done, etc….

Be careful when developing tools that automatically delete stuff from your infrastructure.

Trust me, you REALLY don’t want to delete the wrong clients. #BTDT


Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu


#9

If you already have queuing in place, you could have the deletion
process publish to a queue that is subscribed to by a deleter. Lots of
lightweight messaging options out there these days.

On 5/15/12 6:07 PM, Brad Knowles wrote:

On May 15, 2012, at 1:00 PM, Chris Chalfant wrote:

You can also write a knife plugin that compares the list of nodes known to the chef server and compare that list to the output of ec2-describe-instances. Anything that’s in chef but not in ec2-desc can be queued for node/client deletion. You can run this as a cron job on your chef server or some other management node if you are using hosted chef.
For my previous employer, I wrote a shell script that basically did the same sort of thing. However, I kept running into edge cases where the code needed to be modified so that it didn’t accidentally blow away clients of one sort or another – like the $COMPANY-validator client, the -dev clients that we spun up with Chef and then did a “knife node delete” so that we couldn’t accidentally re-run chef-client and wipe out development work that had been done, etc….

Be careful when developing tools that automatically delete stuff from your infrastructure.

Trust me, you REALLY don’t want to delete the wrong clients. #BTDT


Brad Knowlesbrad@shub-internet.org
LinkedIn Profile:http://tinyurl.com/y8kpxu


#10

On May 15, 2012, at 5:09 PM, Sascha Bates wrote:

If you already have queuing in place, you could have the deletion process publish to a queue that is subscribed to by a deleter. Lots of lightweight messaging options out there these days.

We were using RabbitMQ internally for some other things, but we didn’t want to overload the RabbitMQ instance for a secondary purpose on top of the primary function. And we didn’t want a second installation of RabbitMQ (or anything comparable) just for this one other task.

This is the same reason why we didn’t look at graylog – we were already using MongoDB for something else internally.

As has been said before, YMMV.


Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu


#11

checking nodes instead of clients ensures you do not delete wrong
clients. furthermore, we use an attribute (set by a role) to mark nodes
as deletable.

Regards,
Avishai

On 16/05/12 01:07, Brad Knowles wrote:

On May 15, 2012, at 1:00 PM, Chris Chalfant wrote:

You can also write a knife plugin that compares the list of nodes known to the chef server and compare that list to the output of ec2-describe-instances. Anything that’s in chef but not in ec2-desc can be queued for node/client deletion. You can run this as a cron job on your chef server or some other management node if you are using hosted chef.
For my previous employer, I wrote a shell script that basically did the same sort of thing. However, I kept running into edge cases where the code needed to be modified so that it didn’t accidentally blow away clients of one sort or another – like the $COMPANY-validator client, the -dev clients that we spun up with Chef and then did a “knife node delete” so that we couldn’t accidentally re-run chef-client and wipe out development work that had been done, etc….

Be careful when developing tools that automatically delete stuff from your infrastructure.

Trust me, you REALLY don’t want to delete the wrong clients. #BTDT


Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu