Let’s imagine we have 50 servers with and a distributed web application and
we need to upgrade software on all of them. New software requires downtime
If upgrade is performed simultaneously on all servers the application will
be totally unavailable for this time. How to avoid it?
My suggestion is to use one of the following ways:
- Manually run chef clients with knife for blocks of 10 servers in series.
- Create some complex cookbook that search for how much servers are being
upgraded at this moment and not run software upgrade recipe if there are
already 10 servers in upgrade queue(chef attribute). At the end of upgrade
recipe there will be some notification to start chef run on the next 10
nodes if it was the last upgrade node upgrade(will search for
node[:upgrade][:in_process]). In this case there is no manual work - just
change role version and run chef-client for all of 50 nodes, all upgrade
logic will be in the recipes.
Which way is better? Maybe there are another great ways to perform partly