Databags vs library cookbook


#1

What compelling reason is there for an application cookbook to use a databag vs. a library cookbook or other artifact repo for that data?
I am hoping for actual use cases where a databag was required or significantly better suited.

Here are my thoughts. Please do correct me where I am wrong so I may learn. Or give me a pointer to the book I need to read to educate myself.

When I look at encrypted databags there is still issue of a secret written to the node which decrypts everything.

This feels like installing a combination lock on your door, and a lock box hanging off the knob which contains that combination, yet still placing the key to the box under the mat.

If we use an off node key management service, there must be some other validation that authorizes giving over my key without having one on disk. With key in ram, any other secret, like an MySQL password, can be delivered as a simple encrypted attribute. It can even be managed as a node attribute stored after convergence as a uniquely encrypted string with the help of that node’s unique key (from the key management) system. ( looking in archives I see its node.run_state ) keeping it out of chef policy files

The databag is not versioned. At convergence a node will use the data and if there is a problem there is no way mark that as bad and to use a different version for all future convergence that is the one current version there is no other.

If a new cookbook version depends on the databag having different content, that databag must be backward compatible with the old cookbook version already in play. Breaking change in a databag can not be pinned to a new cookbook version as there is only one “current” version.

Chef is not a database. If we need to pass real data artifacts, like some MySQL table structure or other data, should they not be handled by our build process and placed in the artifact repo for consumption?

If we want to manage account data, would that also not need to be versioned outside chef? Ideally through an LDAP system? ( best case there are no login accounts on your nodes beyond the initial baseline OS root, everything else would be some sort of daemon role account like smtp, oracle, and so on, using LDAP and SUDO to enable acting manually [but who wants to act manually or even login to even one node, never mind thousands of nodes] ).


#2

These are all valid concerns with data bag usage in Chef.

I just wanted to point out existence of this project:

This project allows one to encrypt data bags using public keys of nodes
which need the decrypted data.

While this may work for some companies, it’s not perfect - for example, if
you are using EC2 auto-scaling where VMs are provisioned and terminated all
the time, chef-vault would require extra work to maintain.

On Sat, Dec 20, 2014 at 9:11 AM, Mike Thibodeau miketlive@gmail.com wrote:

What compelling reason is there for an application cookbook to use a
databag vs. a library cookbook or other artifact repo for that data?
I am hoping for actual use cases where a databag was required or
significantly better suited.

Here are my thoughts. Please do correct me where I am wrong so I may
learn. Or give me a pointer to the book I need to read to educate myself.

When I look at encrypted databags there is still issue of a secret written
to the node which decrypts everything.

This feels like installing a combination lock on your door, and a lock box
hanging off the knob which contains that combination, yet still placing the
key to the box under the mat.

If we use an off node key management service, there must be some other
validation that authorizes giving over my key without having one on disk.
With key in ram, any other secret, like an MySQL password, can be delivered
as a simple encrypted attribute. It can even be managed as a node attribute
stored after convergence as a uniquely encrypted string with the help of
that node’s unique key (from the key management) system. ( looking in
archives I see its node.run_state ) keeping it out of chef policy files

The databag is not versioned. At convergence a node will use the data and
if there is a problem there is no way mark that as bad and to use a
different version for all future convergence that is the one current
version there is no other.

If a new cookbook version depends on the databag having different content,
that databag must be backward compatible with the old cookbook version
already in play. Breaking change in a databag can not be pinned to a new
cookbook version as there is only one “current” version.

Chef is not a database. If we need to pass real data artifacts, like some
MySQL table structure or other data, should they not be handled by our
build process and placed in the artifact repo for consumption?

If we want to manage account data, would that also not need to be
versioned outside chef? Ideally through an LDAP system? ( best case there
are no login accounts on your nodes beyond the initial baseline OS root,
everything else would be some sort of daemon role account like smtp,
oracle, and so on, using LDAP and SUDO to enable acting manually [but who
wants to act manually or even login to even one node, never mind thousands
of nodes] ).


Best regards, Dmitriy V.


#3

Great discussion. Something to add, Facebook doesn’t use data bags in their
chef implementation:

-William

On Sat, Dec 20, 2014 at 11:06 PM, DV vindimy@gmail.com wrote:

These are all valid concerns with data bag usage in Chef.

I just wanted to point out existence of this project:
https://github.com/Nordstrom/chef-vault

This project allows one to encrypt data bags using public keys of nodes
which need the decrypted data.

While this may work for some companies, it’s not perfect - for example, if
you are using EC2 auto-scaling where VMs are provisioned and terminated all
the time, chef-vault would require extra work to maintain.

On Sat, Dec 20, 2014 at 9:11 AM, Mike Thibodeau miketlive@gmail.com
wrote:

What compelling reason is there for an application cookbook to use a
databag vs. a library cookbook or other artifact repo for that data?
I am hoping for actual use cases where a databag was required or
significantly better suited.

Here are my thoughts. Please do correct me where I am wrong so I may
learn. Or give me a pointer to the book I need to read to educate myself.

When I look at encrypted databags there is still issue of a secret
written to the node which decrypts everything.

This feels like installing a combination lock on your door, and a lock
box hanging off the knob which contains that combination, yet still placing
the key to the box under the mat.

If we use an off node key management service, there must be some other
validation that authorizes giving over my key without having one on disk.
With key in ram, any other secret, like an MySQL password, can be delivered
as a simple encrypted attribute. It can even be managed as a node attribute
stored after convergence as a uniquely encrypted string with the help of
that node’s unique key (from the key management) system. ( looking in
archives I see its node.run_state ) keeping it out of chef policy files

The databag is not versioned. At convergence a node will use the data and
if there is a problem there is no way mark that as bad and to use a
different version for all future convergence that is the one current
version there is no other.

If a new cookbook version depends on the databag having different
content, that databag must be backward compatible with the old cookbook
version already in play. Breaking change in a databag can not be pinned to
a new cookbook version as there is only one “current” version.

Chef is not a database. If we need to pass real data artifacts, like some
MySQL table structure or other data, should they not be handled by our
build process and placed in the artifact repo for consumption?

If we want to manage account data, would that also not need to be
versioned outside chef? Ideally through an LDAP system? ( best case there
are no login accounts on your nodes beyond the initial baseline OS root,
everything else would be some sort of daemon role account like smtp,
oracle, and so on, using LDAP and SUDO to enable acting manually [but who
wants to act manually or even login to even one node, never mind thousands
of nodes] ).


Best regards, Dmitriy V.