Keeping DNS configuration eerrors from breaking working server with bind cookboks


#1

I’m looking at the bind and bind9 cookbooks, and realizing that they don’t check for certain types of invalid configurations that would cause the BIND daemon to fail to restart. In particular, for bind9, multiple data bags that have the same ‘domain’ setup are accepted without error, but break /etc/named.conf.local.

The BIND published command ‘named-checkconf’ can be used check the configurations before the recipe comletes and prevent BIND from being restarted in a broken state. But I’ve been asked to let those BIND related cookbooks report errors, and not actually cause the rest of the chef run to fail.

So I’ve tried various approaches, trying to tie “rescue” operations to the necessary shell command, simply wrapping the whole recipe in a “rescue”, and others. I seem unable to get the right balance of runingn the check, allowing the cookbook to succeed, and getting a visible error report.

Has anyone in the community done this, or other “run this shell script, end this recipe, report the error, and allow other cookbooks to continue”


Nico Kadel-Garcia
Senior Systems Consultant
Email: nkadelgarcia-consultant@scholastic.com
Cell Phone: +1.339.368.2428


#2

Can you just use “ignore_failure true” on the resources you don’t care about?

  • Julian

On Fri, Jun 20, 2014 at 6:35 PM, Kadel-Garcia, Nico
NKadelGarcia-consultant@scholastic.com wrote:

I’m looking at the bind and bind9 cookbooks, and realizing that they don’t
check for certain types of invalid configurations that would cause the BIND
daemon to fail to restart. In particular, for bind9, multiple data bags that
have the same ‘domain’ setup are accepted without error, but break
/etc/named.conf.local.

The BIND published command ‘named-checkconf’ can be used check the
configurations before the recipe comletes and prevent BIND from being
restarted in a broken state. But I’ve been asked to let those BIND related
cookbooks report errors, and not actually cause the rest of the chef run to
fail.

So I’ve tried various approaches, trying to tie “rescue” operations to the
necessary shell command, simply wrapping the whole recipe in a “rescue”, and
others. I seem unable to get the right balance of runingn the check,
allowing the cookbook to succeed, and getting a visible error report.

Has anyone in the community done this, or other “run this shell script, end
this recipe, report the error, and allow other cookbooks to continue”


Nico Kadel-Garcia
Senior Systems Consultant
Email: nkadelgarcia-consultant@scholastic.com
Cell Phone: +1.339.368.2428


[ Julian C. Dunn jdunn@aquezada.com * Sorry, I’m ]
[ WWW: http://www.aquezada.com/staff/julian * only Web 1.0 ]
[ gopher://sdf.org/1/users/keymaker/ * compliant! ]
[ PGP: 91B3 7A9D 683C 7C16 715F 442C 6065 D533 FDC2 05B9 ]


#3

From: Julian C. Dunn [mailto:jdunn@aquezada.com]
Sent: Monday, June 23, 2014 11:20 AM
To: chef@lists.opscode.com
Subject: [chef] Re: Keeping DNS configuration eerrors from breaking working server with bind cookboks

Can you just use “ignore_failure true” on the resources you don’t care about?

  • Julian

Not as things stand, no. For example, the old bind9 cookbook doesn’t even support DNS slaves, only forwarding. So it has no way to configure a failover server for when the upstream chef managed DNS server has an issue. And various classes of errors, such as various classes typos in the data bags or accidentally having two distinct data bags for the same DNS domain, will attempt to be loaded to the DNS server even when they pass any reasonable JSON verification tool.

That kills the BIND DNS server, and services that rely on it, quite dead. So getting a configuration verification as a separate step seems, to me at least, quite mandatory before trying to restart a core daemon. I do seem to have a handle on the problem: I’m defining a “bash” operation with “action :nothing”, then summoning it with a rescue wrapped operation before the daemon is restarted.


#4

On Mon, Jun 23, 2014 at 11:27 AM, Kadel-Garcia, Nico
NKadelGarcia-consultant@scholastic.com wrote:

From: Julian C. Dunn [mailto:jdunn@aquezada.com]
Sent: Monday, June 23, 2014 11:20 AM
To: chef@lists.opscode.com
Subject: [chef] Re: Keeping DNS configuration eerrors from breaking working server with bind cookboks

Can you just use “ignore_failure true” on the resources you don’t care about?

  • Julian

Not as things stand, no. For example, the old bind9 cookbook doesn’t even support DNS slaves, only forwarding. So it has no way to configure a failover server for when the upstream chef managed DNS server has an issue. And various classes of errors, such as various classes typos in the data bags or accidentally having two distinct data bags for the same DNS domain, will attempt to be loaded to the DNS server even when they pass any reasonable JSON verification tool.

That kills the BIND DNS server, and services that rely on it, quite dead. So getting a configuration verification as a separate step seems, to me at least, quite mandatory before trying to restart a core daemon. I do seem to have a handle on the problem: I’m defining a “bash” operation with “action :nothing”, then summoning it with a rescue wrapped operation before the daemon is restarted.

It sounds like you have enough esoteric failure conditions that a set
of helper methods to validate things before proceeding (e.g. run in a
ruby_block or something) would be handy.

  • Julian


[ Julian C. Dunn jdunn@aquezada.com * Sorry, I’m ]
[ WWW: http://www.aquezada.com/staff/julian * only Web 1.0 ]
[ gopher://sdf.org/1/users/keymaker/ * compliant! ]
[ PGP: 91B3 7A9D 683C 7C16 715F 442C 6065 D533 FDC2 05B9 ]


#5

If you have some ideas on making the bind cookbook better, feel free to
open an issue on github
(https://github.com/atomic-penguin/cookbook-bind/issues) for
discussion/implementation.

Eric G. Wolfe
email: eric.wolfe@cyclecomputing.com
cell: 304.942.3970
twitter: @atomic_penguin

Cycle Computing
Leader in Utility HPC Software


twitter: @cyclecomputing

Don’t be overly suspicious where it’s not warranted.

On 06/20/2014 06:35 PM, Kadel-Garcia, Nico wrote:

I’m looking at the bind and bind9 cookbooks, and realizing that they
don’t check for certain types of invalid configurations that would
cause the BIND daemon to fail to restart. In particular, for bind9,
multiple data bags that have the same ‘domain’ setup are accepted
without error, but break /etc/named.conf.local.

The BIND published command ‘named-checkconf’ can be used check the
configurations before the recipe comletes and prevent BIND from being
restarted in a broken state. But I’ve been asked to let those BIND
related cookbooks report errors, and not actually cause the rest of
the chef run to fail.

So I’ve tried various approaches, trying to tie “rescue” operations to
the necessary shell command, simply wrapping the whole recipe in a
"rescue", and others. I seem unable to get the right balance of
runingn the check, allowing the cookbook to succeed, and getting a
visible error report.

Has anyone in the community done this, or other “run this shell
script, end this recipe, report the error, and allow other cookbooks
to continue”


Nico Kadel-Garcia
Senior Systems Consultant
Email: nkadelgarcia-consultant@scholastic.com
Cell Phone: +1.339.368.2428


#6

The extent to which I do not want to re-invent RFC compliant BIND verification, in ruby, from scratch, cannot be overstated. “named-checkconf” and “named-checkzone” do a pretty good job.


Nico Kadel-Garcia
Senior Systems Consultant
Email: nkadelgarcia-consultant@scholastic.com
Cell Phone: +1.339.368.2428

-----Original Message-----
From: Julian C. Dunn [mailto:jdunn@aquezada.com]
Sent: Monday, June 23, 2014 4:26 PM
To: chef@lists.opscode.com
Subject: [chef] Re: RE: Re: Keeping DNS configuration eerrors from breaking working server with bind cookboks

On Mon, Jun 23, 2014 at 11:27 AM, Kadel-Garcia, Nico NKadelGarcia-consultant@scholastic.com wrote:

From: Julian C. Dunn [mailto:jdunn@aquezada.com]
Sent: Monday, June 23, 2014 11:20 AM
To: chef@lists.opscode.com
Subject: [chef] Re: Keeping DNS configuration eerrors from breaking
working server with bind cookboks

Can you just use “ignore_failure true” on the resources you don’t care about?

  • Julian

Not as things stand, no. For example, the old bind9 cookbook doesn’t even support DNS slaves, only forwarding. So it has no way to configure a failover server for when the upstream chef managed DNS server has an issue. And various classes of errors, such as various classes typos in the data bags or accidentally having two distinct data bags for the same DNS domain, will attempt to be loaded to the DNS server even when they pass any reasonable JSON verification tool.

That kills the BIND DNS server, and services that rely on it, quite dead. So getting a configuration verification as a separate step seems, to me at least, quite mandatory before trying to restart a core daemon. I do seem to have a handle on the problem: I’m defining a “bash” operation with “action :nothing”, then summoning it with a rescue wrapped operation before the daemon is restarted.

It sounds like you have enough esoteric failure conditions that a set of helper methods to validate things before proceeding (e.g. run in a ruby_block or something) would be handy.

  • Julian


[ Julian C. Dunn jdunn@aquezada.com * Sorry, I’m ]
[ WWW: http://www.aquezada.com/staff/julian * only Web 1.0 ]
[ gopher://sdf.org/1/users/keymaker/ * compliant! ]
[ PGP: 91B3 7A9D 683C 7C16 715F 442C 6065 D533 FDC2 05B9 ]


#7

On Wednesday, June 25, 2014 at 11:21 AM, Kadel-Garcia, Nico wrote:

The extent to which I do not want to re-invent RFC compliant BIND verification, in ruby, from scratch, cannot be overstated. “named-checkconf” and “named-checkzone” do a pretty good job.

No one was suggesting you do that.

Maybe you could set up your file/template resources to notify an execute resource which will run the config checker, which in turn will fail if the config is invalid. That could be wrapped up in a LWRP to make it easy to reuse.


Daniel DeLeo


#8

The “execute” part is easy, I’ve done some tests with that. I can even wrap the service commands in an “if” statement to use different start_command and restart_command when the verification tools is available, and when it’s not.

The difficulty I’m having is getting it to run immediately before the delayed notification based service restarts, reloads, or starts, without also running before all of any modified configuration files have been deployed. I’d originally just stuck it into the start_command, reload_command, etc. as part of the execution string, but that seems less than graceful.

Once it’s working, then yes, an LWRP or enhancement to the basic “service” toolkit might be in order. I’d love to see it for other daemons, such as HTTPD, that have configuration testers.


Nico Kadel-Garcia
Senior Systems Consultant
Email: nkadelgarcia-consultant@scholastic.com
Cell Phone: +1.339.368.2428

-----Original Message-----
From: Daniel DeLeo [mailto:ddeleo@kallistec.com] On Behalf Of Daniel DeLeo
Sent: Wednesday, June 25, 2014 3:13 PM
To: chef@lists.opscode.com
Subject: [chef] Re: RE: Re: RE: Re: Keeping DNS configuration eerrors from breaking working server with bind cookboks

On Wednesday, June 25, 2014 at 11:21 AM, Kadel-Garcia, Nico wrote:

The extent to which I do not want to re-invent RFC compliant BIND verification, in ruby, from scratch, cannot be overstated. “named-checkconf” and “named-checkzone” do a pretty good job.

No one was suggesting you do that.

Maybe you could set up your file/template resources to notify an execute resource which will run the config checker, which in turn will fail if the config is invalid. That could be wrapped up in a LWRP to make it easy to reuse.


Daniel DeLeo


#9

I do something that sounds similar enough in one of my cookbooks, but with
nginx.

I have the following resource:

execute ‘test nginx config’ do

command ‘nginx -t’

notifies :reload, ‘service[nginx]’, :delayed

end

When a template is modified, you can tell it to notify the execute resource
immediately to test the config (you can do this with action :nothing; I use
this as-is at the end of a recipe so a reload happens at the end of each
good run). At the end, nginx is then reloaded if all tests pass.
Otherwise, the Chef run will fail when the test fails. If you pair this
with “ignore_failure true” like Julian suggested, you could probably
achieve what you’re looking for. I don’t believe that the execute resource
will trigger the notification on a failure with “ignore_failure true”, but
I have not tried myself.

I hope that helps, even if just a little.

Thanks,

Ameir

On Wed, Jun 25, 2014 at 3:54 PM, Kadel-Garcia, Nico <
NKadelGarcia-consultant@scholastic.com> wrote:

The “execute” part is easy, I’ve done some tests with that. I can even
wrap the service commands in an “if” statement to use different
start_command and restart_command when the verification tools is available,
and when it’s not.

The difficulty I’m having is getting it to run immediately before the
delayed notification based service restarts, reloads, or starts, without
also running before all of any modified configuration files have been
deployed. I’d originally just stuck it into the start_command,
reload_command, etc. as part of the execution string, but that seems less
than graceful.

Once it’s working, then yes, an LWRP or enhancement to the basic "service"
toolkit might be in order. I’d love to see it for other daemons, such as
HTTPD, that have configuration testers.


Nico Kadel-Garcia
Senior Systems Consultant
Email: nkadelgarcia-consultant@scholastic.com
Cell Phone: +1.339.368.2428

-----Original Message-----
From: Daniel DeLeo [mailto:ddeleo@kallistec.com] On Behalf Of Daniel DeLeo
Sent: Wednesday, June 25, 2014 3:13 PM
To: chef@lists.opscode.com
Subject: [chef] Re: RE: Re: RE: Re: Keeping DNS configuration eerrors from
breaking working server with bind cookboks

On Wednesday, June 25, 2014 at 11:21 AM, Kadel-Garcia, Nico wrote:

The extent to which I do not want to re-invent RFC compliant BIND
verification, in ruby, from scratch, cannot be overstated.
“named-checkconf” and “named-checkzone” do a pretty good job.

No one was suggesting you do that.

Maybe you could set up your file/template resources to notify an execute
resource which will run the config checker, which in turn will fail if the
config is invalid. That could be wrapped up in a LWRP to make it easy to
reuse.


Daniel DeLeo