Refactoring Infrastructure Code

Hello all!

I am beginning work on my talk on Infrastructure Code Refactoring for DevOpsDays Chicago (hope to meet many of you there!). Although I can speak to my own experience, I want to make sure to not limit it my own POV. I would LOVE to hear other people’s experiences - how have you found refactoring infrastructure code (whether it’s Chef, Terraform, etc.) to be different than refactoring application code? How do you deal with the risk of making changes on live infrastructure with a refactor? What do you do when a refactor goes wrong?

Feel free to reply here or email me directly at

Thanks so much!

Every time I do it, large or small refactorings, its fairly similar to any other app codes. I use three set of tests,

  1. Chefspec based unit tests, that i use to ensure the resultant resources are same, event when i change the abstractions, conditionals etc. Also ensure the code compiles (valid ruby), attributes are in place (valid resource/method stuff), attribute values are propagated as intended. This is fast, and part of the red-green style TDD workflow

  2. LXC based smoke tests (checks basic integrity of the code, without mocking anything)… can be as simple as lxc-attach -n chef12.5 -- chef-client -o recipe[zookeeper]. Takes more time. But gives you confidence that the code converges, URL, network stuff (SHA for remote files) are good, services are starting fine (port based checks etc)

  3. Deploy in staging or UAT server. Or canary deployment to a handful of servers in prod etc. I use either cokbook versions or some kind of attribute gating