On Jun 24, 2013, at 3:20 PM, Peter Donald peter@realityforge.org wrote:
Just be warned that the lag can be much more significant. We run the
open source variant of chef for now and have noticed lags of anywhere
between <1s to ~120s between the data being inserted and it being
accessible via search.
What version of chef-server are you running? Is it Chef 11.x, 10.x, or something older?
We use search extensively and have a large number of windows nodes
which seems to place pressure on the indexing system.
Windows nodes put a heavier load on the Chef server, but I would think that chef-server 11.x should be a lot more capable of handling large numbers of clients (even large numbers of Windows nodes) much better than the older 10.x-based versions.
After all, my understanding is that Hosted Chef is basically the world's largest instance of Private Chef 11.x set up in a multi-tiered structure, and I believe that Private Chef has been proven by partners like Facebook, Etsy, Netflix, etc... to scale to at least tens of thousands of nodes on a single Private Chef 11.x cluster.
To reduce the
impact we have started to strip out lots of windows ohai data and use
partial search cookbook where possible. That combined wiht a bit more
memory for the chef box has reduced the lag a little.
I've always wondered why ohai generates such massive amounts of information per node (regardless of platform), and that all of this information is usually considered "important enough" that all of it should be saved and indexed after every single run. Windows nodes might be worse in this respect, but the problem isn't all that much better on most *nix nodes.
It seems to me that Ohai data that is going to be saved should be minimized to start with, and then if there are extra bits of information you want/need to be available via search then you should be able to handle those appropriately. Or, at the very least, maybe give us levels of index priority, and some Ohai data should be considered "high priority" and available via search at very low latency, but 90-99% of the rest of the Ohai data should be considered "low priority".
--
Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu