We are running Chef 11 community edition at the moment, and some time ago search started to return weird results - more nodes are returned from search than actually there.
Example:
knife search node "chef_environment:environment_foo AND roles:service_foo"
This search works OK for most environments, but for 2 envs it returns more nodes that expected. In one of these 2 envs it’s almost every node within environment and in another it’s just few extra nodes.
I wonder if there is a way to diagnose what’s wrong with search?
I’ve tried to re-generate solr index already and that did not fix the issue.
Does the same happen if you use AND between your search terms?
knife search node "chef_environment:environment_foo AND roles:service_foo"
Are all of these “extra” nodes alive? If not it sounds like you need something to ensure that when nodes are decommissioned that you remove from chef server. This is the concept: https://github.com/eheydrick/aws-cleaner although implementation is specific to aws but could be applied to any environment.
Thanks for the response - somehow AND condition got missing from my original post - i.e. we have that condition in the query and for some environments that works correctly (or as expected ) but for others - does not.
Also most of these “extra” nodes are alive.
And thanks for pointing out clean-up code for chef - that’s gold!
Glad that was helpful. I have not run chef 11 in a while and its possible that there is a bug that was fixed in 12, is it possible to see if you can replicate with chef 12 in a test environment. Can you try running the same within a recipe or using chef-shell to see if its something specific to knife?
Well, we first encountered that from within a recipe - when suddenly search started to return a lot more nodes than it used to do.
So right now there is no difference between knife search node and recipe-based search.
Not sure if we have chef 12 up and running, and also it seems that problem is subtle - it did not happen for the first year or so, then it happened at small scale, now it’s bigger.
We are currently migrating to 12 since it has so much on offer, so we’ll see if that will be reproducible.
During chef-client converge search results don’t change unless there were changes to nodes (i.e. we added / removed nodes within affected environment).
I’ve also tried to export node objects from chef server into json files and they did not have extra roles or info that could be attributed to bad search results.
Can you share a gist on an example extra node and an expected node (redact anything sensitive) and the role you are searching on? It would also be helpful to see as any roles that include a role in run lists)?
Hmm that looks good to me, can you post the contents of the role identify_server just to be sure we didnt miss anything. At this point (given your already existing efforts and that chef 11 is old) I would say either reaching out to chef support for in depth debugging or even better look at upgrading to chef 12. I would suggest given the problems you have rather than migrating chef 11 to 12 it might be best to set up chef 12 from scratch, upload chef artifacts, and work out a client migration strategy.
Is it possible that the there are attributes somewhere else in the node data that have the name chef_environment or roles? That is, if you have something like node["attr_a"]["attr_b"]["roles"] that could result in some false positives because all 'leaf nodes' are indexed and thus can conflict with top level attributes.
I’ve looked into that hypothesis earlier - no, we don’t have attribute name clash. Also if we would have attribute name clash, then search won’t work for all environments consistently, but that’s not the case.