Where to store data bags

Hello,

I apologize if this has already been discussed on the list.

We are throwing around the idea of storing databags within the cookbook (that needs a particular settings). We would have one databag per cookbook and one databag item per environment. This seems like a good idea because the data that a said cookbook needs would be versioned within the cookbook and when it changes would require it to go through the SDLC. Without this, someone could change data without using SDLC.

Whether it is a Library or an app cookbooks, it would follow this same model.

What are some thoughts of this? And how are people solving this today?

Thanks,

Cliff

Hi Cliff,

Is it strictly necessary for you to use databags for your environment
specific data? If you want versioning and the other benefits of bundling
structured data with your cookbook, you should almost definitely be
describing that data on cookbook attributes.

For dispatching on environments within attribute data, there are a few
models. The simplest is to define defaults and override them in all non
conformant environments.
That can be taxing when there are a few internally consistent modes in
which the cookbook can operate (for example, "production" mode vs "testing"
mode), and in those cases you may want to set attribute values with a
recipe which looks at a controlling attribute and chooses what to do.

I would strongly advise keeping your databags in your VCS, but keeping them
separate from cookbook data.
They are flat, organization-wide information that must be loosely coupled
with cookbook versions.
Arranging them in a directory tree look like tightly coupled data is
deceptive and will probably lead to confusion the line.

Cheers,
Stephen

On Thu, Sep 24, 2015, 09:02 Cliff Pracht cpracht@ebsco.com wrote:

Hello,

I apologize if this has already been discussed on the list.

We are throwing around the idea of storing databags within the cookbook
(that needs a particular settings). We would have one databag per
cookbook and one databag item per environment. This seems like a good
idea because the data that a said cookbook needs would be versioned within
the cookbook and when it changes would require it to go through the SDLC.
Without this, someone could change data without using SDLC.

Whether it is a Library or an app cookbooks, it would follow this same
model.

What are some thoughts of this? And how are people solving this today?

Thanks,

Cliff

Hi Cliff,
Just to throw something out there: instead of using data bags, why not just
use an attribute and store the data as a hash? Functionally, there's very
little difference, it should tie in well within your SDLC workflow, and it
removes a modicum of complexity from the setup.
Fabien

On Thu, Sep 24, 2015 at 10:01 AM, Cliff Pracht cpracht@ebsco.com wrote:

Hello,

I apologize if this has already been discussed on the list.

We are throwing around the idea of storing databags within the cookbook
(that needs a particular settings). We would have one databag per
cookbook and one databag item per environment. This seems like a good
idea because the data that a said cookbook needs would be versioned within
the cookbook and when it changes would require it to go through the SDLC.
Without this, someone could change data without using SDLC.

Whether it is a Library or an app cookbooks, it would follow this same
model.

What are some thoughts of this? And how are people solving this today?

Thanks,

Cliff

To follow up on this idea I personally use the attribute approach,
common attributes in attributes/default.rb and environments specific
attributes in an attribute file per environment starting with a return
in top of file to avoid reading the attributes in other environments.

QA.rb:

return unless node.chef_environment == 'QA'

rest of attributes specific to QA environment

Le 2015-09-24 16:25, Fabien Delpierre a écrit :

Hi Cliff, Just to throw something out there: instead of using data bags, why not just use an attribute and store the data as a hash? Functionally, there's very little difference, it should tie in well within your SDLC workflow, and it removes a modicum of complexity from the setup. Fabien

On Thu, Sep 24, 2015 at 10:01 AM, Cliff Pracht cpracht@ebsco.com wrote:

Hello,

I apologize if this has already been discussed on the list.

We are throwing around the idea of storing databags within the cookbook (that needs a particular settings). We would have one databag per cookbook and one databag item per environment. This seems like a good idea because the data that a said cookbook needs would be versioned within the cookbook and when it changes would require it to go through the SDLC. Without this, someone could change data without using SDLC.

Whether it is a Library or an app cookbooks, it would follow this same model.

What are some thoughts of this? And how are people solving this today?

Thanks,

Cliff

Thanks for the feedback. So I think we all agree that:

Storing “private” configuration data in attributes make sense I think.

Storing “public” configuration data where we want to expose config data to other nodes probably makes sense to use attributes as well. However if the configuration is not node specific(like VIP/ELB endpoints), then using databags to store it probably makes sense. It’s almost like we are creating a service registry at this point. I’m just having a hard time seeing the downside of storing the databag within the cookbook for these type of endpoints.

As an example I have an app cookbook called my_app, which would publish the following endpoint and have a databag like so:

my_app/DEV.json (data bag item)

{
“example_service" = “http://somevipenpoint.example.com/Example/ExampleService.asmx
}

Where DEV is the environment name in this case.

We would upload the cookbook along with the databag (btw which is currently not supported by Berkshelf) http://github.com/berkshelf/berkshelf/issues/1222

And as a consumer of the example_service, I would load the my_app databag and key off of the environment(DEV) data bag item to get the appropriate endpoint.

I’m trying to understand exactly why having a databag tightly coupled with the app is a bad idea in this case. Can you provide more insight into this?

Thanks,

Cliff

From: Stephen Rosen
Reply-To: "chef@lists.opscode.commailto:chef@lists.opscode.com"
Date: Thursday, September 24, 2015 at 10:24 AM
To: "chef@lists.opscode.commailto:chef@lists.opscode.com"
Subject: [chef] Re: Where to store data bags

Hi Cliff,

Is it strictly necessary for you to use databags for your environment specific data? If you want versioning and the other benefits of bundling structured data with your cookbook, you should almost definitely be describing that data on cookbook attributes.

For dispatching on environments within attribute data, there are a few models. The simplest is to define defaults and override them in all non conformant environments.
That can be taxing when there are a few internally consistent modes in which the cookbook can operate (for example, “production” mode vs “testing” mode), and in those cases you may want to set attribute values with a recipe which looks at a controlling attribute and chooses what to do.

I would strongly advise keeping your databags in your VCS, but keeping them separate from cookbook data.
They are flat, organization-wide information that must be loosely coupled with cookbook versions.
Arranging them in a directory tree look like tightly coupled data is deceptive and will probably lead to confusion the line.

Cheers,
Stephen

On Thu, Sep 24, 2015, 09:02 Cliff Pracht <cpracht@ebsco.commailto:cpracht@ebsco.com> wrote:
Hello,

I apologize if this has already been discussed on the list.

We are throwing around the idea of storing databags within the cookbook (that needs a particular settings). We would have one databag per cookbook and one databag item per environment. This seems like a good idea because the data that a said cookbook needs would be versioned within the cookbook and when it changes would require it to go through the SDLC. Without this, someone could change data without using SDLC.

Whether it is a Library or an app cookbooks, it would follow this same model.

What are some thoughts of this? And how are people solving this today?

Thanks,

Cliff

I'm trying to understand exactly why having a databag tightly coupled with the app is a bad idea in this case. Can you provide more insight into this?

I'll jsut focus on this part, the main problem being the lifecycle of
your databag, it is not coupled with the cookbook, you'll have somewhere
a bad time if you update the databag in qualification and a patch is
needed in prod, you'll have to bump you cookbook (at least you should
tag each version) and merge back to previous version the patch...

I feel unclear reading myself so I'll try with an example:

Apps A and B in the same cookbook:
A v1.0 and B v1.0 => cookbooks v 1.0 All is ok
A v2.0 goes to Qualification, B is unchanged => cookbook v 1.1 All is ok

Here's the dragon, B should be patched but A v2.0 didn't pass the
qualification, you have to make a cookbook v1.2 with A in version 1.0
and B in version 1.1
B v2.0 goes to qualification => cookbook v 1.3 but with wich version of
A (first drawback)
A v2.0 is qualified and should go to prod, taking the cookbook in v 1.1
there's 2 traps:

  • The cookbook version should go from 1.1 to 1.4
  • You have to remind to update B version to 1.1

With this simple one example, it already show some problems, now extend
it to 10/12 patches/corrections between the envs and you'll have an idea
of where you'll bang your head on the wall.

With the databag the case will be close as in your version control
system, your tags will cross and you'll have to merge changes from
different unrelated branches. Somewhere in the future there will be a
change in your databag wich is unrelated to the app (maybe the hostname,
or something else) which will get you in trouble with this pattern.

If it can be, it will be one day :slight_smile:

That's my point of view on why it's a bad idea and why you should keep
your databag as a separate project. If you think it's really unlikely to
happen or that the burden to have them separated is higher than the
burden I just described, you're right keeping the databag in the
cookbook, at the end of the day, choose the path you feel more
comfortable with.

Thanks,

Cliff