I have over 800 nodes that I manage with Chef. I also had the issue of whenever I want to do massive updates I need to be able to not run them all at once. I had the idea to be able to create a loop where it would only process 10 nodes at a time and when that finished the loop would continue and do another 10 nodes until all nodes were finished. The point is to prevent a DoS. Would anyone have any idea on how to write this loop?
Short answer: you don't!
The client already has an option for it, called
splay. As an example, in client.rb:
interval 1800 splay 300
means "Run Chef every 1800 seconds (30 mins), + or - 300 seconds"
The splay option works well for me. I’m also interested in not creating a DoS. In my environment I run once per hour, with a 1 hr splay, so all my nodes are spread across the hour.
I only want to run and update once on all 800 nodes and not in re-occruing intervals but in increments just spacing out the processing of it all. I don't really care how long it takes as long as it doesn't effect our network. When I add the -i 3600 and -s 3600 it just keeps patching over and over again ever 1hr. That isn't the goal.
To solve your immediate problem, you can run
knife ssh QUERY COMMAND --concurrency NUM where it will only run the command on
NUM nodes at once, but chef is designed to be idempotent, so running once an hour should definitely not be a problem if the recipe is designed properly. You might need a
only_if guard on your patching resource.
worth noting that
splay can work without
knife ssh <some query> 'chef-client --splay 600' will run once, and each node will wait a random time between 0 and 600 secs.
Like ccrebolder pointed out tho, it's not really the intended model.