How can I handle failure gracefully in a Chef library?

saintaardvark · May 4, 2016, 11:40pm

Hi all – I’m in the process of building a library, and I’m trying to figure out the best way to handle failure without excessive boilerplate.

The library is a wrapper around calls to Conjur, a network service which is meant to manage secrets. (We already use Conjur extensively in other contexts, so encrypted data bags are not an option for us.) The code looks like this:

# libraries/conjurize_variable.rb
require 'mixlib/shellout'

class Conjurize
  class Variable
    def get(var)
     if File.exists?('/opt/conjur/bin/conjur')
        cmd = "/opt/conjur/bin/conjur variable value #{var}"
        conjurize = Mixlib::ShellOut.new(cmd)
        conjurize.run_command
        conjurize.stdout.chomp
      end
    end
  end
end

and I can use it like so in a recipe:

# recipes/default.rb
conjur = Conjurize::Variable.new()
node.override['app']['secret_goes_here'] = conjur.get('prod/app-1.0/secret')
include_recipe 'app::default'

(I’ve chosen to do this in a library, after trying as LWRP and HWRP, as it seems the most succinct way to abstract these calls. If I’m going about this wrong, or if there’s a better way to handle it, let me know.)

What I’m trying to figure out is how to handle failure within the library, so that people who use it don’t have to worry about it. Failure can happen for a couple of reasons:

the server can be down or unavailable
we don’t yet have the Conjur CLI tool available (this is installed as part of another cookbook)

As written above, if we can’t retrieve the secret then a blank string is used in the recipe. That is obviously a Bad Thing™. I don’t want to abort the whole Chef run – this might be the Chef run that is installing the Conjur CLI tool, or we might lose other valuable Chef bits unrelated to Conjur (installing packages, installing services, etc) – so adding a raise within Conjurize::Variable.get() seems like a bad idea. However, I also want to avoid having to wrap every call to conjur.get in checks, as I’d like to avoid imposing that burden on us and others in our org (“Here, use this handy abstraction, but be sure to surround it with exception handling.”) Handlers seem to be just a way to notify the right people about an aborted run, rather than a way to skip the failing part and keep going.

Is there a good way to somehow signal failure, such that (say) the failure is restricted to a particular recipe or cookbook, without having to be diligent about putting rescue clauses in each recipe we call this from?

Thanks in advance for any help, and let me know if you need any further info.

coderanger · May 5, 2016, 12:27am

About the only option you’ve not ruled out is that a return from the “root level code” of a recipe will silently skip the rest of that recipe. That said, pretty much every part of your design shown about should probably be reconsidered. If a bit of recipe code want to rely on the Conjur CLI tool being present, you should include_recipe it beforehand so the command failing should be a fatal error (notably if you use run_command! it will do this for you). You’ll probably need to make sure you use lazy evaluation in the right places though. Also worth noting that setting secrets in to node attributes like that is hugely unsafe, sometimes you have to because you really don’t want to fix an upstream cookbook and they offer no other API, but it should be done with roughly the same level of care as disarming a bomb Also^2 worth noting that Conjur’s API is HTTP/REST based, you can use Chef::HTTP directly and skip running the subprocess, which fixes a lot of sequencing issues and probably improves performance a good bit too.

saintaardvark · May 5, 2016, 5:21pm

Hi coderanger – thanks kindly for your response. Chef newb here, so I’d completely forgotten about persistence/searchability of node attributes. In this case we’re working with an upstream cookbook, so I think we’re going to fork or rewrite to avoid this problem. The Chef::HTTP approach is definitely good, so I think we’re going to look at that as well.

We’re still on the fence about whether we want to abort the Chef run entirely if Conjur fails in some way, but we’ll punt on that for now.

Thanks again!

Topic		Replies	Views
Graceful failure Chef Infra (archive)	4	404	February 6, 2014
Handle errors raised in ruby blocks? Chef Infra (archive)	0	289	March 28, 2013
Handling "common" errors in a chef run Chef Infra (archive)	1	249	February 10, 2011
Recipe throws an error at reboot Chef Infra (archive)	3	2860	March 19, 2018
USe of Exceptional Handler Chef Infra (archive)	0	459	November 14, 2017

How can I handle failure gracefully in a Chef library?

Related topics