Data_bag_item call breaking dynamic resolv.conf update?

I’ve run into a strange situation where calling data_bag_item() results in converge-time updates to the resolv.conf nameservers are not being picked up dynamically. In short:

  1. We leverage the resolver cookbook in the run_list to update our nameserver for DNS resolution.
  2. Later in the run_list, we have a cookbook recipe with a remote_file resource which sources from an artifact repository who’s hostname is only resolvable using the updated nameservers.

Normally, this works perfectly fine, as the updated nameservers are dynamically picked up (per the chef-client switch to using Ruby’s resolv-replace instead of glibc). However, as soon as I modify the recipe, or another recipe in the cookbook, or a different cookbook in the run_list to include include a call to a data bag (encrypted in this case), the remote_file resource fails to resolve the hostname unless I converge a second time.

I did manage to work around this by wrapping the data_bag_item(...) call inside a lambda block, but further testing revealed that ALL data bag calls would need to be wrapped. Basically, the first recipe calling for data bag content and downloading a remote_file worked in isolation with the lambda function, but adding another recipe with a non-lambda data bag call to the run_list would again break the dynamic nameserver update.

Has anyone else run into this? It seems like this a bug in how Chef is handling the recipe/resource when the data bag enters the picture, almost like resolv-replace is being overridden.

There is no functional dependency between our cookbook and the resolver cookbook, as the artifact repository is a configurable attribute that we do flip between public/private repositories for testing purposes – therefore, I don’t have a dependency called out in the metadata.

Yes, I could completely circumvent the issue by either putting in an IP instead of a hostname, or by having a 2-phase bootstrap converge process (resolver update, then primary run_list)… but it seems odd that this works as expected when there are no data bag calls.

The key here is understanding the full cycle of the Chef client run, as described at https://docs.chef.io/chef_client.html

The cookbooks are processed in two main phases, the compile phase and the converge phase. The compile phase “runs” all of the ruby code to determine how the resources used in the converge phase should act. The data_bag_item call happens in the compile phase, regardless of where the cookbook is in your run list. In the converge phase, the resolv.conf file is deployed, but your data_bag_item call happened before any change is actually effected on the system.

ruby_blocks are a special case, because they are a resource, and therefore executed at converge phase, not the compile phase.

For things like DNS resolution that you absolutely require to be up before other resources are called, you may want to look into https://docs.chef.io/resource_common.html#run-in-compile-phase but make sure you have a good understanding of the division between compile and converge first.

–Jp

Thanks for the tips. I’ve always had a reasonable understanding of the 2-phase client cycle, but the fact that data bag calls only happen during the compile phase was new information to me.

I had been trying to defer the data bag call to the execution/converge phase, but you were correct that shifting the resolv.conf update to compile phase fixed the issue.

I do still find it a little bit disconcerting that simply including a data bag call in a recipe causes it to alter the DNS resolution behavior for all other resources in that cookbook and any other dependent cookbooks. In this case, the remote_file resource doesn’t actually have anything to do with the data bag contents, that configuration data is used in a template which is in the same recipe.