Databags vs library cookbook

Mike_Thibodeau · December 20, 2014, 5:11pm

What compelling reason is there for an application cookbook to use a databag vs. a library cookbook or other artifact repo for that data?
I am hoping for actual use cases where a databag was required or significantly better suited.

Here are my thoughts. Please do correct me where I am wrong so I may learn. Or give me a pointer to the book I need to read to educate myself.

When I look at encrypted databags there is still issue of a secret written to the node which decrypts everything.

This feels like installing a combination lock on your door, and a lock box hanging off the knob which contains that combination, yet still placing the key to the box under the mat.

If we use an off node key management service, there must be some other validation that authorizes giving over my key without having one on disk. With key in ram, any other secret, like an MySQL password, can be delivered as a simple encrypted attribute. It can even be managed as a node attribute stored after convergence as a uniquely encrypted string with the help of that node’s unique key (from the key management) system. ( looking in archives I see its node.run_state ) keeping it out of chef policy files

The databag is not versioned. At convergence a node will use the data and if there is a problem there is no way mark that as bad and to use a different version for all future convergence that is the one current version there is no other.

If a new cookbook version depends on the databag having different content, that databag must be backward compatible with the old cookbook version already in play. Breaking change in a databag can not be pinned to a new cookbook version as there is only one “current” version.

Chef is not a database. If we need to pass real data artifacts, like some MySQL table structure or other data, should they not be handled by our build process and placed in the artifact repo for consumption?

If we want to manage account data, would that also not need to be versioned outside chef? Ideally through an LDAP system? ( best case there are no login accounts on your nodes beyond the initial baseline OS root, everything else would be some sort of daemon role account like smtp, oracle, and so on, using LDAP and SUDO to enable acting manually [but who wants to act manually or even login to even one node, never mind thousands of nodes] ).

DV1 · December 21, 2014, 7:06am

These are all valid concerns with data bag usage in Chef.

I just wanted to point out existence of this project:

This project allows one to encrypt data bags using public keys of nodes
which need the decrypted data.

While this may work for some companies, it's not perfect - for example, if
you are using EC2 auto-scaling where VMs are provisioned and terminated all
the time, chef-vault would require extra work to maintain.

On Sat, Dec 20, 2014 at 9:11 AM, Mike Thibodeau miketlive@gmail.com wrote:

What compelling reason is there for an application cookbook to use a
databag vs. a library cookbook or other artifact repo for that data?
I am hoping for actual use cases where a databag was required or
significantly better suited.

Here are my thoughts. Please do correct me where I am wrong so I may
learn. Or give me a pointer to the book I need to read to educate myself.

When I look at encrypted databags there is still issue of a secret written
to the node which decrypts everything.

This feels like installing a combination lock on your door, and a lock box
hanging off the knob which contains that combination, yet still placing the
key to the box under the mat.

If we use an off node key management service, there must be some other
validation that authorizes giving over my key without having one on disk.
With key in ram, any other secret, like an MySQL password, can be delivered
as a simple encrypted attribute. It can even be managed as a node attribute
stored after convergence as a uniquely encrypted string with the help of
that node's unique key (from the key management) system. ( looking in
archives I see its node.run_state ) keeping it out of chef policy files

The databag is not versioned. At convergence a node will use the data and
if there is a problem there is no way mark that as bad and to use a
different version for all future convergence that is the one current
version there is no other.

If a new cookbook version depends on the databag having different content,
that databag must be backward compatible with the old cookbook version
already in play. Breaking change in a databag can not be pinned to a new
cookbook version as there is only one "current" version.

Chef is not a database. If we need to pass real data artifacts, like some
MySQL table structure or other data, should they not be handled by our
build process and placed in the artifact repo for consumption?

If we want to manage account data, would that also not need to be
versioned outside chef? Ideally through an LDAP system? ( best case there
are no login accounts on your nodes beyond the initial baseline OS root,
everything else would be some sort of daemon role account like smtp,
oracle, and so on, using LDAP and SUDO to enable acting manually [but who
wants to act manually or even login to even one node, never mind thousands
of nodes] ).

--
Best regards, Dmitriy V.

William_Jimenez · December 22, 2014, 6:56pm

Great discussion. Something to add, Facebook doesn't use data bags in their
chef implementation:

github.com

facebook/chef-utils/blob/main/Philosophy.md

This file is intended to document the general principals behind how Facebook
thinks about and operates system configuration management. We think most of
these apply at any scale, but certainly make large scale easier.

# Guiding Principles

## Always keep basic scalable building blocks in mind

We try to always keep these basic scaling building blocks in mind:

* idempotent - it should be safe to run the system at any time and know
  it will only make the necessary changes
* distributed - the more work pushed to the clients, the better it scales.
* extensible - the easier it is to extend it for local requirements, the
  better it will work for any environment
* flexible - it needs to work with existing work flows, not dictate strict
  new ones

## Data-driven configuration

This file has been truncated. show original

-William

On Sat, Dec 20, 2014 at 11:06 PM, DV vindimy@gmail.com wrote:

These are all valid concerns with data bag usage in Chef.

I just wanted to point out existence of this project:
GitHub - chef/chef-vault: Securely manage passwords, certs, and other secrets in Chef

This project allows one to encrypt data bags using public keys of nodes
which need the decrypted data.

While this may work for some companies, it's not perfect - for example, if
you are using EC2 auto-scaling where VMs are provisioned and terminated all
the time, chef-vault would require extra work to maintain.

On Sat, Dec 20, 2014 at 9:11 AM, Mike Thibodeau miketlive@gmail.com
wrote:

What compelling reason is there for an application cookbook to use a
databag vs. a library cookbook or other artifact repo for that data?
I am hoping for actual use cases where a databag was required or
significantly better suited.

Here are my thoughts. Please do correct me where I am wrong so I may
learn. Or give me a pointer to the book I need to read to educate myself.

When I look at encrypted databags there is still issue of a secret
written to the node which decrypts everything.

This feels like installing a combination lock on your door, and a lock
box hanging off the knob which contains that combination, yet still placing
the key to the box under the mat.

If we use an off node key management service, there must be some other
validation that authorizes giving over my key without having one on disk.
With key in ram, any other secret, like an MySQL password, can be delivered
as a simple encrypted attribute. It can even be managed as a node attribute
stored after convergence as a uniquely encrypted string with the help of
that node's unique key (from the key management) system. ( looking in
archives I see its node.run_state ) keeping it out of chef policy files

The databag is not versioned. At convergence a node will use the data and
if there is a problem there is no way mark that as bad and to use a
different version for all future convergence that is the one current
version there is no other.

If a new cookbook version depends on the databag having different
content, that databag must be backward compatible with the old cookbook
version already in play. Breaking change in a databag can not be pinned to
a new cookbook version as there is only one "current" version.

Chef is not a database. If we need to pass real data artifacts, like some
MySQL table structure or other data, should they not be handled by our
build process and placed in the artifact repo for consumption?

If we want to manage account data, would that also not need to be
versioned outside chef? Ideally through an LDAP system? ( best case there
are no login accounts on your nodes beyond the initial baseline OS root,
everything else would be some sort of daemon role account like smtp,
oracle, and so on, using LDAP and SUDO to enable acting manually [but who
wants to act manually or even login to even one node, never mind thousands
of nodes] ).

--
Best regards, Dmitriy V.

Topic		Replies	Views
Where to store data bags Chef Infra (archive)	5	538	September 25, 2015
Encrypted Databags are a Code Smell Chef Infra (archive)	16	684	September 18, 2013
RE: Re: Passing variables between recipes? Chef Infra (archive)	3	605	December 19, 2014
Question about data bags - best practice Chef Infra (archive)	9	1039	August 10, 2013
Chef Vault Writeup Chef Infra (archive)	8	390	October 1, 2013

Databags vs library cookbook

Related topics