Recipes doing status checkins?

BradKnowles · August 29, 2013, 8:50pm

Folks,

I’m curious to know what the current best practice is with regards to keeping your central chef server updated with current run state information as you go through a lengthy process? Do you have each recipe doing a node.set on an attribute and then a node.save? If so, how “heavy” is a node.save operation?

I know that we need to avoid writing to data bags, because they don’t support multiple simultaneous writers. However, other than using attributes and doing a node.save, I can’t think of any obvious way to have my recipes storing centrally available information on their current runtime status.

I wouldn’t do this kind of thing for most normal recipes that are run on a daily basis, but for lengthy install processes that should be done fairly rarely, it would be nice to be able to have some sort of ongoing progress information that could be fed back in near-realtime.

Thanks!

–
Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu

Larry_Wright · August 30, 2013, 12:23am

I've done this with a data bag. Just write to a 'progress' data bag at each interval. As long as the data bag is only used for this purpose, there shouldn't be any concurrency issues.

Node.save would work, but there's a race condition with nodes where if you modify it during the chef run, you might overwrite changes that occurred externally (like someone running knife node run_list add).

Hope this helps.

Larry

On Aug 29, 2013, at 3:50 PM, Brad Knowles brad@shub-internet.org wrote:

Folks,

I'm curious to know what the current best practice is with regards to keeping your central chef server updated with current run state information as you go through a lengthy process? Do you have each recipe doing a node.set on an attribute and then a node.save? If so, how "heavy" is a node.save operation?

I know that we need to avoid writing to data bags, because they don't support multiple simultaneous writers. However, other than using attributes and doing a node.save, I can't think of any obvious way to have my recipes storing centrally available information on their current runtime status.

I wouldn't do this kind of thing for most normal recipes that are run on a daily basis, but for lengthy install processes that should be done fairly rarely, it would be nice to be able to have some sort of ongoing progress information that could be fed back in near-realtime.

Thanks!

--
Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu

kallistec · August 30, 2013, 12:31am

On Thursday, August 29, 2013 at 5:23 PM, Larry Wright wrote:

I've done this with a data bag. Just write to a 'progress' data bag at each interval. As long as the data bag is only used for this purpose, there shouldn't be any concurrency issues.
Hope this helps.

Larry

On Aug 29, 2013, at 3:50 PM, Brad Knowles <brad@shub-internet.org (mailto:brad@shub-internet.org)> wrote:

Folks,

I'm curious to know what the current best practice is with regards to keeping your central chef server updated with current run state information as you go through a lengthy process? Do you have each recipe doing a node.set on an attribute and then a node.save? If so, how "heavy" is a node.save operation?

I know that we need to avoid writing to data bags, because they don't support multiple simultaneous writers. However, other than using attributes and doing a node.save, I can't think of any obvious way to have my recipes storing centrally available information on their current runtime status.
This depends on your design. If you design the item keys so that every node gets its own key, then you don't have to worry about races.

That said, neither data bags nor node data are designed for this, so you either consider exporting data to an external service or be ready to do so when you outgrow this approach.

--
Daniel DeLeo

BradKnowles · August 30, 2013, 2:45pm

On Aug 29, 2013, at 7:23 PM, Larry Wright larrywright@gmail.com wrote:

I've done this with a data bag. Just write to a 'progress' data bag at each interval. As long as the data bag is only used for this purpose, there shouldn't be any concurrency issues.

Since data bags can only be written to by one process at a time (and I don't think there is any locking used), you end up having to use separate data bags for each node, and then that starts to be a real pain.

Node.save would work, but there's a race condition with nodes where if you modify it during the chef run, you might overwrite changes that occurred externally (like someone running knife node run_list add).

Hmm. That's interesting. I hadn't considered that issue. I'll have to give this one some more thought.

Hope this helps.

Yup! Thanks!

--
Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu

BradKnowles · August 30, 2013, 2:51pm

On Aug 29, 2013, at 7:31 PM, Daniel DeLeo dan@kallistec.com wrote:

This depends on your design. If you design the item keys so that every node gets its own key, then you don't have to worry about races.

I'm a little confused -- are you saying that you can have multiple nodes writing to the same data bag at the same time, so long as they are writing to node-specific parts?

That said, neither data bags nor node data are designed for this, so you either consider exporting data to an external service or be ready to do so when you outgrow this approach.

What we're working on is the installation of TopStack software on top of OpenStack or some other compatible private cloud solution, see http://topstack.org/. This is not the sort of thing that I expect to happen frequently at a given site, but it is lengthy, and I do want to support the possibility of multiple installations at the same site at more-or-less the same time -- think about having multiple separate dev clusters, vs. QA clusters, operational clusters, etc....

I don't think we have to worry too much about scalability in this particular instance. However, this is a good issue to keep in mind for future use of similar functionality.

Thanks!

--
Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu

Larry_Wright · August 30, 2013, 4:39pm

On Friday, August 30, 2013 at 9:51 AM, Brad Knowles wrote:

On Aug 29, 2013, at 7:31 PM, Daniel DeLeo <dan@kallistec.com (mailto:dan@kallistec.com)> wrote:

This depends on your design. If you design the item keys so that every node gets its own key, then you don't have to worry about races.

I'm a little confused -- are you saying that you can have multiple nodes writing to the same data bag at the same time, so long as they are writing to node-specific parts?

Yes. If you have a data bag called status, and then data bag items with the ID of the node (server1, server2, server3, etc), then each node can update its own data bag item without impacting anything else. Where you would run into issues is if you had something like a data bag called status, with a data bag item per software package that multiple nodes were all trying to write to with their individual statuses. There's no locking with the individual data bag items, so the last write will win (same thing with nodes, as I mentioned previously).

It's helpful (at least to me) to think of data bags as relational database tables and data bag items as individual records in that table. In fact, you can just think of data bags as a NoSQL database. That makes Chef webscale

That said, neither data bags nor node data are designed for this, so you either consider exporting data to an external service or be ready to do so when you outgrow this approach.

What we're working on is the installation of TopStack software on top of OpenStack or some other compatible private cloud solution, see http://topstack.org/. This is not the sort of thing that I expect to happen frequently at a given site, but it is lengthy, and I do want to support the possibility of multiple installations at the same site at more-or-less the same time -- think about having multiple separate dev clusters, vs. QA clusters, operational clusters, etc....

I don't think we have to worry too much about scalability in this particular instance. However, this is a good issue to keep in mind for future use of similar functionality.

I think as long as you don't have many nodes writing data very frequently, this should work fine.

Thanks!

--
Brad Knowles <brad@shub-internet.org (mailto:brad@shub-internet.org)>
LinkedIn Profile: http://tinyurl.com/y8kpxu

Topic		Replies	Views
Race Conditions Chef Infra (archive)	2	393	September 27, 2013
Save from recipe data to chef server Chef Infra (archive)	5	368	November 18, 2013
Changes to run_list (sometimes) won't stick! Chef Infra (archive)	5	1303	November 4, 2015
Knife node changes and client run concurrency Chef Infra (archive)	4	555	March 13, 2013
RE: Re: RE: Correct use of Chef Chef Infra (archive)	4	370	November 12, 2014

Recipes doing status checkins?

Related topics