On Friday, August 30, 2013 at 9:51 AM, Brad Knowles wrote:
On Aug 29, 2013, at 7:31 PM, Daniel DeLeo <firstname.lastname@example.org (mailto:email@example.com)> wrote:
This depends on your design. If you design the item keys so that every node gets its own key, then you don’t have to worry about races.
I’m a little confused – are you saying that you can have multiple nodes writing to the same data bag at the same time, so long as they are writing to node-specific parts?
Yes. If you have a data bag called status, and then data bag items with the ID of the node (server1, server2, server3, etc), then each node can update its own data bag item without impacting anything else. Where you would run into issues is if you had something like a data bag called status, with a data bag item per software package that multiple nodes were all trying to write to with their individual statuses. There’s no locking with the individual data bag items, so the last write will win (same thing with nodes, as I mentioned previously).
It’s helpful (at least to me) to think of data bags as relational database tables and data bag items as individual records in that table. In fact, you can just think of data bags as a NoSQL database. That makes Chef webscale
That said, neither data bags nor node data are designed for this, so you either consider exporting data to an external service or be ready to do so when you outgrow this approach.
What we’re working on is the installation of TopStack software on top of OpenStack or some other compatible private cloud solution, see http://topstack.org/. This is not the sort of thing that I expect to happen frequently at a given site, but it is lengthy, and I do want to support the possibility of multiple installations at the same site at more-or-less the same time – think about having multiple separate dev clusters, vs. QA clusters, operational clusters, etc…
I don’t think we have to worry too much about scalability in this particular instance. However, this is a good issue to keep in mind for future use of similar functionality.
I think as long as you don’t have many nodes writing data very frequently, this should work fine.
Brad Knowles <firstname.lastname@example.org (mailto:email@example.com)>
LinkedIn Profile: http://tinyurl.com/y8kpxu