Using unary NOT in knife search


#1

I am looking for some clarification on the knife search syntax,
particularly related to unary NOT.

I understand that knife search uses a modified Lucene query. In Lucene,
unary NOT is invalid. With knife it seems to sometimes work.

For example, this works with both hosted Chef and knife localmode 12.0.3
(and something like it is in the docs):

knife search node “NOT name:appserver” -i

However, the following works with knife localmode 12.0.3 but does not work
with hosted Chef:

knife search node “(NOT name:appserver) AND name:dbserver” -i

so, is unary NOT actually valid in either of these cases?

I realize I can turn the second example into a valid Lucene query by
reversing it:

knife search node “name:dbserver (NOT name:appserver)” -i

But currently there’s a bug that means this isnt working in knife local
mode:

and it would be great to know more about what “modified” actually means.

Thanks!
Christine


#2

On Tuesday, March 17, 2015 at 2:26 PM, Christine Draper wrote:

I am looking for some clarification on the knife search syntax, particularly related to unary NOT.

I understand that knife search uses a modified Lucene query. In Lucene, unary NOT is invalid. With knife it seems to sometimes work.

For example, this works with both hosted Chef and knife localmode 12.0.3 (and something like it is in the docs):

knife search node “NOT name:appserver” -i

However, the following works with knife localmode 12.0.3 but does not work with hosted Chef:

knife search node “(NOT name:appserver) AND name:dbserver” -i

so, is unary NOT actually valid in either of these cases?

I realize I can turn the second example into a valid Lucene query by reversing it:

knife search node “name:dbserver (NOT name:appserver)” -i

But currently there’s a bug that means this isnt working in knife local mode:
https://github.com/chef/chef/issues/3073

and it would be great to know more about what “modified” actually means.

I’m not sure about the rest of your question, but I can answer this part. In the past, we used to convert documents for search such that every field we indexed was a top-level key in lucene/solr. This worked for a while, but it’s actually really bad to use lucene this way, as it’s designed to be used with maybe 40 or so top-level keys but we had millions in Hosted Chef. So we changed the documents we send to Solr such that all the data to be indexed is in a single key (a single field in the XML document). Since we previously sent users search queries directly to Solr (using the filter query to restrict search to a single organization) we ended up writing a query transformer that rewrites your incoming query to match the new storage format. This allows things like unary NOT and queries beginning with wildcards to work (e.g., name:*foo isn’t valid in lucene but you can use it in Chef)…

As for your other questions, it could be a bug in the query transformer, or maybe it’s not possible to transform it into a valid query, I’m not sure.

Thanks!
Christine


Daniel DeLeo


#3

The more things I try, the more complex it gets. What I’m trying to do is
take a query expression and restrict it to either:

  1. nodes that don’t have a specific attribute i.e. "NOT topo_name:*"
    or 2) nodes that have a specific attribute i.e. “AND topo_name:some_name”

Doing something like:

knife search “(QUERY) NOT topo_name:*”

works in many cases, but when QUERY consists of one or more NOTs it fails
as an invalid query; so in that case I have to do:

knife search “QUERY_WITH_NOTS NOT topo_name:*”

Appending an AND to QUERY_WITH_NOTS doesn’t work, so I have to prepend
instead:

knife search "topo_name:some_name AND (QUERY)"
knife search “topo_name:some_name AND QUERY_WITH_NOTS”

But this is all empirical, and gives some wrong answers with 12.0.3 due to
the known (and possibly other) bugs.

Anyone know more about the query transformer, or can give me some pointers?

Regards,
Christine

On Wed, Mar 18, 2015 at 11:15 AM, Daniel DeLeo dan@kallistec.com wrote:

On Tuesday, March 17, 2015 at 2:26 PM, Christine Draper wrote:

I am looking for some clarification on the knife search syntax,
particularly related to unary NOT.

I understand that knife search uses a modified Lucene query. In Lucene,
unary NOT is invalid. With knife it seems to sometimes work.

For example, this works with both hosted Chef and knife localmode 12.0.3
(and something like it is in the docs):

knife search node “NOT name:appserver” -i

However, the following works with knife localmode 12.0.3 but does not
work with hosted Chef:

knife search node “(NOT name:appserver) AND name:dbserver” -i

so, is unary NOT actually valid in either of these cases?

I realize I can turn the second example into a valid Lucene query by
reversing it:

knife search node “name:dbserver (NOT name:appserver)” -i

But currently there’s a bug that means this isnt working in knife local
mode:
https://github.com/chef/chef/issues/3073

and it would be great to know more about what “modified” actually means.

I’m not sure about the rest of your question, but I can answer this part.
In the past, we used to convert documents for search such that every field
we indexed was a top-level key in lucene/solr. This worked for a while, but
it’s actually really bad to use lucene this way, as it’s designed to be
used with maybe 40 or so top-level keys but we had millions in Hosted Chef.
So we changed the documents we send to Solr such that all the data to be
indexed is in a single key (a single field in the XML document). Since we
previously sent users search queries directly to Solr (using the filter
query to restrict search to a single organization) we ended up writing a
query transformer that rewrites your incoming query to match the new
storage format. This allows things like unary NOT and queries beginning
with wildcards to work (e.g., name:*foo isn’t valid in lucene but you can
use it in Chef)…

As for your other questions, it could be a bug in the query transformer,
or maybe it’s not possible to transform it into a valid query, I’m not sure.

Thanks!
Christine


Daniel DeLeo


#4

Hi there,

I did some of the work “way back when” on modifying how Chef handles
incoming data and indexes it in solr. As Dan explained, we condense
the contents of each Chef object into a single ‘content’ field. The
Chef JSON objects are transformed to key value pairs. The tricky bit
is dealing with the nested structures and arrays in JSON. Skipping
over those details of the nested structure, you can think of the
transform applied like this:

{ "name": "server1", "version": "1.2.3" }  ---> content:
name__=__server1 version__=__1.2.3

So that’s what we put into solr. Then incoming queries are parsed and
re-mapped. So a query comes in as: “name:server*” and we remap that
to: “content:name__=server*". This is why a leading wildcard search,
which is invalid lucene search syntax, works for Chef. A query like:
“name:*” is really sent to solr (and thus lucene) as:
"content:name
=__*”.

Now to your question about NOT and AND. The thing about lucene search
is that it is built for natural language search. For natural language
search, term frequency and other scores tend to be much more valuable
for good resutls than straight boolean term present or not. So NOT in
lucene behaves as a filter but doesn’t itself return results (so a
bare NOT query is not valid because there is nothing to filter).
Similarly, AND can be confusing because you are scoring documents.
Just listing multiple terms will give “and”-ish behavior. Here are a
couple of links that give some detail and might explain this better
than I’ve been able to :slight_smile:

https://lucidworks.com/blog/why-not-and-or-and-not/

Anyhow, hope this context is useful.

Best,

  • seth

#5

Seth,

Yes, thanks for the useful info and pointers.

I realize “NOT xxx” is invalid in Lucene. But given the knife search docs
https://docs.chef.io/knife_search.html include this example:

knife search sample “(NOT id:foo)”

I’d assumed this was supported by Chef. Is that not true? Or do I have to
do something weird like:

knife search sample “id:* NOT id:foo”

and doesn’t that have bad performance implications?

Regards,
Christine

On Fri, Mar 20, 2015 at 12:09 PM, Seth Falcon seth@chef.io wrote:

Hi there,

I did some of the work “way back when” on modifying how Chef handles
incoming data and indexes it in solr. As Dan explained, we condense
the contents of each Chef object into a single ‘content’ field. The
Chef JSON objects are transformed to key value pairs. The tricky bit
is dealing with the nested structures and arrays in JSON. Skipping
over those details of the nested structure, you can think of the
transform applied like this:

{ "name": "server1", "version": "1.2.3" }  ---> content:
name__=__server1 version__=__1.2.3

So that’s what we put into solr. Then incoming queries are parsed and
re-mapped. So a query comes in as: “name:server*” and we remap that
to: “content:name__=server*". This is why a leading wildcard search,
which is invalid lucene search syntax, works for Chef. A query like:
“name:*” is really sent to solr (and thus lucene) as:
"content:name
=__*”.

Now to your question about NOT and AND. The thing about lucene search
is that it is built for natural language search. For natural language
search, term frequency and other scores tend to be much more valuable
for good resutls than straight boolean term present or not. So NOT in
lucene behaves as a filter but doesn’t itself return results (so a
bare NOT query is not valid because there is nothing to filter).
Similarly, AND can be confusing because you are scoring documents.
Just listing multiple terms will give “and”-ish behavior. Here are a
couple of links that give some detail and might explain this better
than I’ve been able to :slight_smile:

https://lucidworks.com/blog/why-not-and-or-and-not/

https://stackoverflow.com/questions/17969461/not-operator-doesnt-work-in-query-lucene

Anyhow, hope this context is useful.

Best,

  • seth