Effeciency of data bags?

I have a recipe that builds an rdist distfile by searching for nodes
that match a particular attribute. These queries are taking about 3
seconds, and i have less then 20 nodes in my chef DB. I’m sure there
are some ways to optimize my chef server, but I am also considering
pre-building the data in a databag. Does chef-client use a “last
modified” field to prevent lookups from the databag if nothing has
changed?

Hi,

On Wed, Dec 8, 2010 at 11:40 AM, Mason Turner opsmason@gmail.com wrote:

I have a recipe that builds an rdist distfile by searching for nodes
that match a particular attribute. These queries are taking about 3
seconds, and i have less then 20 nodes in my chef DB. I'm sure there
are some ways to optimize my chef server, but I am also considering
pre-building the data in a databag. Does chef-client use a "last
modified" field to prevent lookups from the databag if nothing has
changed?

I don't think that chef-client currently does any conditional GETs.
And in fact, a common way of accessing data bags is via search :-\

My general advice is to stick with search and to investigate tuning if
the performance isn't satisfactory.

How many queries are you doing in your recipe? Three second response
time isn't so great, but in the context of a chef-client run might not
be so bad unless you are running many such queries. Perhaps you can
pull across a larger query -- like all nodes if you only have 20 and
do all of the querying locally. Then you'd only need a single query.

  • seth

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Mason!

On Dec 8, 2010, at 12:40 PM, Mason Turner wrote:

I have a recipe that builds an rdist distfile by searching for nodes
that match a particular attribute. These queries are taking about 3
seconds, and i have less then 20 nodes in my chef DB. I'm sure there
are some ways to optimize my chef server, but I am also considering
pre-building the data in a databag. Does chef-client use a "last
modified" field to prevent lookups from the databag if nothing has
changed?

A couple hints for data bag use:

  1. You can load a data bag directly rather than searching.

    mything = data_bag_item("bagname", "itemname")

Then you can access mything in your recipe.

  1. You can save the data bag to the node's run_state. This is an internal holding area where Chef keeps track of the recipes and templates that it has seen, is available across all recipes in the node's run list, and is not persisted as a node attribute at the end of the run.

In one recipe:

node.run_state["mything"] = data_bag_item("bagname", "itemname")

In another:

mything = node.run_state["mything"]

We do this in the application cookbook0, for example, to store the application data to use across multiple recipes.

  1. (bonus) If your search query would be the same in multiple recipes (ie, ":" or similar), you can store the results in the run_state, too.

    node.run_state["nodes"] = search("node", ":")


Opscode, Inc
Joshua Timberman, Technical Evangelist
IRC, Skype, Twitter, Github: jtimberman

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (Darwin)

iEYEARECAAYFAk0AKCcACgkQO97WSdVpzT2H3wCfXCIxzM+A4YmXOBWiAtR5euBR
l04An0sY11txqh+nlW/nBlGeQagsqLjM
=9vaq
-----END PGP SIGNATURE-----

(apologies for the top post)

Thanks. Right now I have my chef server running off a Xen VM with a single CPU and 4GB of RAM. I'm safely moving from "playing around with chef" to some serious production support, so I'll move everything to a beefier VM. Hopefully that'll speed up the queries.

My recipe is making two data_bag_item calls (against two different bags) and one search call. The search call is very specific (node,"custom_attribute:#{unsigned_int_key}") that is searching over about 12 nodes to return two. It doesn't seem like they should take too long. I'll benchmark before and after changing VM images.

The run_state looks very promising, though. Thanks for the tip.

-- Mason Turner (mobile)

On Dec 8, 2010, at 7:51 PM, Joshua Timberman joshua@opscode.com wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Mason!

On Dec 8, 2010, at 12:40 PM, Mason Turner wrote:

I have a recipe that builds an rdist distfile by searching for nodes
that match a particular attribute. These queries are taking about 3
seconds, and i have less then 20 nodes in my chef DB. I'm sure there
are some ways to optimize my chef server, but I am also considering
pre-building the data in a databag. Does chef-client use a "last
modified" field to prevent lookups from the databag if nothing has
changed?

A couple hints for data bag use:

  1. You can load a data bag directly rather than searching.

    mything = data_bag_item("bagname", "itemname")

Then you can access mything in your recipe.

  1. You can save the data bag to the node's run_state. This is an internal holding area where Chef keeps track of the recipes and templates that it has seen, is available across all recipes in the node's run list, and is not persisted as a node attribute at the end of the run.

In one recipe:

node.run_state["mything"] = data_bag_item("bagname", "itemname")

In another:

mything = node.run_state["mything"]

We do this in the application cookbook0, for example, to store the application data to use across multiple recipes.

  1. (bonus) If your search query would be the same in multiple recipes (ie, ":" or similar), you can store the results in the run_state, too.

    node.run_state["nodes"] = search("node", ":")


Opscode, Inc
Joshua Timberman, Technical Evangelist
IRC, Skype, Twitter, Github: jtimberman

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (Darwin)

iEYEARECAAYFAk0AKCcACgkQO97WSdVpzT2H3wCfXCIxzM+A4YmXOBWiAtR5euBR
l04An0sY11txqh+nlW/nBlGeQagsqLjM
=9vaq
-----END PGP SIGNATURE-----