Data Bag Search Delay

Steven_Barre · October 2, 2013, 11:44pm

It takes 60 seconds from when I call “knife data bag from file somebag
path/to/some.json” until “knife search somebag” will return the answer.
Is there anything that can be done to make that faster?

http://community.opscode.com/questions/436 also describes the issue.

CentOS 6.4
chef-server-11.0.8-1.el6.x86_64

–

Steven Barre, RHCE, ZCE, MCP
steven@realestatewebmasters.com

Systems Administrator / Programmer
Real Estate Webmasters - 250-753-9893

coderanger · October 3, 2013, 12:07am

Get more CPU for Solr. There have been some experiments with replacing Solr with ElasticSearch which can have better insertion performance, so you could also look at working on that patch.

--Noah

On Oct 2, 2013, at 4:44 PM, Steven Barre steven@realestatewebmasters.com wrote:

It takes 60 seconds from when I call "knife data bag from file somebag path/to/some.json" until "knife search somebag" will return the answer. Is there anything that can be done to make that faster?

http://community.opscode.com/questions/436 also describes the issue.

CentOS 6.4
chef-server-11.0.8-1.el6.x86_64

--

Steven Barre, RHCE, ZCE, MCP
steven@realestatewebmasters.com

Systems Administrator / Programmer
Real Estate Webmasters - 250-753-9893

sdelano · October 3, 2013, 1:09am

Solr 1.4, the Solr included with the chef server, is asynchronous in
"commiting" saved object to the index. The rate at which Solr commits is
tunable. The defaults are set to commit every 60 seconds or 1000 documents
as seen here

github.com

chef-boneyard/omnibus-chef-server/blob/master/files/chef-server-cookbooks/chef-server/attributes/default.rb#L74-L75


      
          default['chef_server']['chef-solr']['max_commit_docs'] = 1000
          default['chef_server']['chef-solr']['commit_interval'] = 60000 # in ms

.

You can tune these to your heart's content by editing the
/etc/chef-server/chef-server.rb file to override the default values, but
you should be aware of the tradeoffs that you're making by doing so.

Every time Solr commits to the index, it blocks all incoming updates. As
you shorten the duration between commits, the time that chef-expander has
available to send updates to Solr decreases and you may, under heavy load,
find yourself in a state that your update rate outruns the rate at which
you can commit objects to the index. If you're going to be putting this
server under heavy load, proceed with caution.

What you may want to ask instead, is what is it about your usage of
databags that necessitates real-time search?

-Stephen

On Wed, Oct 2, 2013 at 5:07 PM, Noah Kantrowitz noah@coderanger.net wrote:

Get more CPU for Solr. There have been some experiments with replacing
Solr with ElasticSearch which can have better insertion performance, so you
could also look at working on that patch.

--Noah

On Oct 2, 2013, at 4:44 PM, Steven Barre steven@realestatewebmasters.com
wrote:

It takes 60 seconds from when I call "knife data bag from file somebag
path/to/some.json" until "knife search somebag" will return the answer. Is
there anything that can be done to make that faster?

http://community.opscode.com/questions/436 also describes the issue.

CentOS 6.4
chef-server-11.0.8-1.el6.x86_64

--

Steven Barre, RHCE, ZCE, MCP
steven@realestatewebmasters.com

Systems Administrator / Programmer
Real Estate Webmasters - 250-753-9893

--
Stephen Delano
Software Development Engineer
Opscode, Inc.
1008 Western Avenue
Suite 601
Seattle, WA 98104

Seth_Falcon_01 · October 3, 2013, 4:00am

steven@realestatewebmasters.com writes:

It takes 60 seconds from when I call "knife data bag from file somebag
path/to/some.json" until "knife search somebag" will return the answer.
Is there anything that can be done to make that faster?

The default config for the solr commit interval is 60000 ms (1
minute). Depending on your load and hardware specs, you can likely
reduce this time to reduce the latency between data updates and
search-ability of the data.

The config would go into /etc/chef-server/chef-server.rb and look like:

chef_solr['commit_interval'] = 10000

seth

--
Seth Falcon | Development Lead | Opscode | @sfalcon

Greg_Zapp · October 3, 2013, 9:07am

Wow.. This is going to throw a wrench in my grand orchestration plans.. As
one of the primary methods for getting information into a chef run...
Having to insert delays into some other system because data saved into
Chef server isn't actually available is not ideal.

On Thu, Oct 3, 2013 at 5:00 PM, Seth Falcon seth@opscode.com wrote:

steven@realestatewebmasters.com writes:

It takes 60 seconds from when I call "knife data bag from file somebag
path/to/some.json" until "knife search somebag" will return the answer.
Is there anything that can be done to make that faster?

The default config for the solr commit interval is 60000 ms (1
minute). Depending on your load and hardware specs, you can likely
reduce this time to reduce the latency between data updates and
search-ability of the data.

The config would go into /etc/chef-server/chef-server.rb and look like:
chef_solr['commit_interval'] = 10000
seth

--
Seth Falcon | Development Lead | Opscode | @sfalcon

zts · October 3, 2013, 12:06pm

On Thu, Oct 3, 2013 at 10:07 AM, Greg Zapp greg.zapp@gmail.com wrote:

Wow.. This is going to throw a wrench in my grand orchestration plans.. As
one of the primary methods for getting information into a chef run...
Having to insert delays into some other system because data saved into
Chef server isn't actually available is not ideal.

Note that the data is available, it's only the search index that isn't
synchronously updated.
After creating/updating a data bag item, you can immediately load that
specific item and see the latest data.
Similarly, you can iterate over the data bag and the new item will be
present.

Neither of which is any help at all if you absolutely have to use search to
find items of interest.

Zac

Steven_Barre · October 3, 2013, 6:47pm

Thanks for the info Stephen and Seth

I'm still new to this and only expect to have 200 nodes or so max. I've
only got 15 right now.

What you may want to ask instead, is what is it about your usage of
databags that necessitates real-time search?

Maybe you have a better idea of how I should be doing things? I'm
setting up web servers with shared hosting. I've got a data bag for all
the domains, each domain document has an attribute to say which node it
belongs on. Then the recipe does

search(:domains, "nodes:#{node['hostname']}") do |domain|

to find all the domains it needs and to configure them.

We then have a webui to allow people to create domains. And the plan is
to just add the document to the databag then call chef-client on the
node and then return a success message to the user. So having that webui
take a minute is a little undesirable.

What would be a better way to handle this?

=================================================
Steven Barre, RHCE, ZCE, MCP
steven@realestatewebmasters.com

Systems Administrator / Programmer
Real Estate Webmasters - 250-753-9893

On 2013-10-02 18:09, Stephen Delano wrote:

Solr 1.4, the Solr included with the chef server, is asynchronous in
"commiting" saved object to the index. The rate at which Solr commits
is tunable. The defaults are set to commit every 60 seconds or 1000
documents as seen here
omnibus-chef-server/files/chef-server-cookbooks/chef-server/attributes/default.rb at master · chef-boneyard/omnibus-chef-server · GitHub.

You can tune these to your heart's content by editing the
/etc/chef-server/chef-server.rb file to override the default values,
but you should be aware of the tradeoffs that you're making by doing so.

Every time Solr commits to the index, it blocks all incoming updates.
As you shorten the duration between commits, the time that
chef-expander has available to send updates to Solr decreases and you
may, under heavy load, find yourself in a state that your update rate
outruns the rate at which you can commit objects to the index. If
you're going to be putting this server under heavy load, proceed with
caution.

What you may want to ask instead, is what is it about your usage of
databags that necessitates real-time search?

-Stephen

On Wed, Oct 2, 2013 at 5:07 PM, Noah Kantrowitz <noah@coderanger.net
mailto:noah@coderanger.net> wrote:
Get more CPU for Solr. There have been some experiments with
replacing Solr with ElasticSearch which can have better insertion
performance, so you could also look at working on that patch.

--Noah

On Oct 2, 2013, at 4:44 PM, Steven Barre
<steven@realestatewebmasters.com
<mailto:steven@realestatewebmasters.com>> wrote:

> It takes 60 seconds from when I call "knife data bag from file
somebag path/to/some.json" until "knife search somebag" will
return the answer. Is there anything that can be done to make that
faster?
>
> http://community.opscode.com/questions/436 also describes the issue.
>
> CentOS 6.4
> chef-server-11.0.8-1.el6.x86_64
>
> --
> =================================================
> Steven Barre, RHCE, ZCE, MCP
> steven@realestatewebmasters.com
<mailto:steven@realestatewebmasters.com>
>
> Systems Administrator / Programmer
> Real Estate Webmasters - 250-753-9893 <tel:250-753-9893>
> ==================================================
>
--
Stephen Delano
Software Development Engineer
Opscode, Inc.
1008 Western Avenue
Suite 601
Seattle, WA 98104

Greg_Zapp · October 3, 2013, 9:41pm

Thanks too Zac... I may be able to create more data bags than I originally
planned to ease the pain of iterating. Would be nice if Chef used
something like couchbase or even upgraded to Solr4 with their soft commits
though.

I'm also setting up a "shared" hosting environment like Steven and in load
balanced pools to boot. I find myself debating whether I should bother
implementing certain stuff in Chef, or just do it through our agent(it can
run jobs) more frequently as I get into the devilish details.

@Steven: I have a few ideas around this myself.

Create a data bag for each node and place the domains it needs in there.
Then you can iterate over all items efficiently. You can also setup the
domain on multiple hosts for migrating services easily enough.
Have nodes remove data items after working them. Implement a unique ID
and revision numbers. This will allow updates and give the node the
ability to detect which item is the latest. Store the info into node
attributes(or at least the revision number) and save before removing the
data item from the bag. This will keep the domain bag cheap to iterate and
create a job queue of sorts.

I have other ideas but can't recall them all just yet; it's too early.
Looking forward to what others suggest.

On Fri, Oct 4, 2013 at 7:47 AM, Steven Barre <
steven@realestatewebmasters.com> wrote:

Thanks for the info Stephen and Seth

I'm still new to this and only expect to have 200 nodes or so max. I've
only got 15 right now.

What you may want to ask instead, is what is it about your usage of
databags that necessitates real-time search?

Maybe you have a better idea of how I should be doing things? I'm setting
up web servers with shared hosting. I've got a data bag for all the
domains, each domain document has an attribute to say which node it belongs
on. Then the recipe does

search(:domains, "nodes:#{node['hostname']}") do |domain|

to find all the domains it needs and to configure them.

We then have a webui to allow people to create domains. And the plan is to
just add the document to the databag then call chef-client on the node and
then return a success message to the user. So having that webui take a
minute is a little undesirable.

What would be a better way to handle this?

=================================================
Steven Barre, RHCE, ZCE, MCPsteven@realestatewebmasters.com

Systems Administrator / Programmer
Real Estate Webmasters - 250-753-9893

On 2013-10-02 18:09, Stephen Delano wrote:

Solr 1.4, the Solr included with the chef server, is asynchronous in
"commiting" saved object to the index. The rate at which Solr commits is
tunable. The defaults are set to commit every 60 seconds or 1000 documents
as seen here
omnibus-chef-server/files/chef-server-cookbooks/chef-server/attributes/default.rb at master · chef-boneyard/omnibus-chef-server · GitHub
.

You can tune these to your heart's content by editing the
/etc/chef-server/chef-server.rb file to override the default values, but
you should be aware of the tradeoffs that you're making by doing so.

Every time Solr commits to the index, it blocks all incoming updates. As
you shorten the duration between commits, the time that chef-expander has
available to send updates to Solr decreases and you may, under heavy load,
find yourself in a state that your update rate outruns the rate at which
you can commit objects to the index. If you're going to be putting this
server under heavy load, proceed with caution.

What you may want to ask instead, is what is it about your usage of
databags that necessitates real-time search?

-Stephen

On Wed, Oct 2, 2013 at 5:07 PM, Noah Kantrowitz noah@coderanger.netwrote:

Get more CPU for Solr. There have been some experiments with replacing
Solr with ElasticSearch which can have better insertion performance, so you
could also look at working on that patch.

--Noah

On Oct 2, 2013, at 4:44 PM, Steven Barre steven@realestatewebmasters.com
wrote:

It takes 60 seconds from when I call "knife data bag from file somebag
path/to/some.json" until "knife search somebag" will return the answer. Is
there anything that can be done to make that faster?

http://community.opscode.com/questions/436 also describes the issue.

CentOS 6.4
chef-server-11.0.8-1.el6.x86_64

--

Steven Barre, RHCE, ZCE, MCP
steven@realestatewebmasters.com

Systems Administrator / Programmer
Real Estate Webmasters - 250-753-9893

--
Stephen Delano
Software Development Engineer
Opscode, Inc.
1008 Western Avenue
Suite 601
Seattle, WA 98104

Jay_Feldblum1 · October 3, 2013, 9:51pm

Greg,

Node-specific data ("Create a data bag for each node") probably belongs on
the node directly, not in a data-bag item. You can modify the persistent
node data with knife node edit $NODE.

Write access from recipes to anything on the Chef-Server is probably a bad
idea (except for saving the node data at the end of a run).

Cheers,
Jay Feldblum

On Thu, Oct 3, 2013 at 5:41 PM, Greg Zapp greg.zapp@gmail.com wrote:

Thanks too Zac... I may be able to create more data bags than I originally
planned to ease the pain of iterating. Would be nice if Chef used
something like couchbase or even upgraded to Solr4 with their soft commits
though.

I'm also setting up a "shared" hosting environment like Steven and in load
balanced pools to boot. I find myself debating whether I should bother
implementing certain stuff in Chef, or just do it through our agent(it can
run jobs) more frequently as I get into the devilish details.

@Steven: I have a few ideas around this myself.

Create a data bag for each node and place the domains it needs in there.
Then you can iterate over all items efficiently. You can also setup the
domain on multiple hosts for migrating services easily enough.

Have nodes remove data items after working them. Implement a unique ID
and revision numbers. This will allow updates and give the node the
ability to detect which item is the latest. Store the info into node
attributes(or at least the revision number) and save before removing the
data item from the bag. This will keep the domain bag cheap to iterate and
create a job queue of sorts.

I have other ideas but can't recall them all just yet; it's too early.
Looking forward to what others suggest.

Greg_Zapp · October 3, 2013, 9:55pm

Hi Jay,

Because the last writer wins I'm have explicit producer/consumer channels.
The node is responsible for writing node attributes, the API server will
write to certain data bags, etc.

On Fri, Oct 4, 2013 at 10:51 AM, Jay Feldblum yfeldblum@gmail.com wrote:

Greg,

Node-specific data ("Create a data bag for each node") probably belongs on
the node directly, not in a data-bag item. You can modify the persistent
node data with knife node edit $NODE.

Write access from recipes to anything on the Chef-Server is probably a bad
idea (except for saving the node data at the end of a run).

Cheers,
Jay Feldblum

On Thu, Oct 3, 2013 at 5:41 PM, Greg Zapp greg.zapp@gmail.com wrote:

Thanks too Zac... I may be able to create more data bags than I
originally planned to ease the pain of iterating. Would be nice if Chef
used something like couchbase or even upgraded to Solr4 with their soft
commits though.

I'm also setting up a "shared" hosting environment like Steven and in
load balanced pools to boot. I find myself debating whether I should
bother implementing certain stuff in Chef, or just do it through our
agent(it can run jobs) more frequently as I get into the devilish details.

@Steven: I have a few ideas around this myself.

Create a data bag for each node and place the domains it needs in
there. Then you can iterate over all items efficiently. You can also
setup the domain on multiple hosts for migrating services easily enough.

Have nodes remove data items after working them. Implement a unique ID
and revision numbers. This will allow updates and give the node the
ability to detect which item is the latest. Store the info into node
attributes(or at least the revision number) and save before removing the
data item from the bag. This will keep the domain bag cheap to iterate and
create a job queue of sorts.

I have other ideas but can't recall them all just yet; it's too early.
Looking forward to what others suggest.

coderanger · October 3, 2013, 9:57pm

Why not have your recipe read directly from the API service?

--Noah

On Oct 3, 2013, at 2:55 PM, Greg Zapp greg.zapp@gmail.com wrote:

Hi Jay,

Because the last writer wins I'm have explicit producer/consumer channels. The node is responsible for writing node attributes, the API server will write to certain data bags, etc.

On Fri, Oct 4, 2013 at 10:51 AM, Jay Feldblum yfeldblum@gmail.com wrote:
Greg,

Node-specific data ("Create a data bag for each node") probably belongs on the node directly, not in a data-bag item. You can modify the persistent node data with knife node edit $NODE.

Write access from recipes to anything on the Chef-Server is probably a bad idea (except for saving the node data at the end of a run).

Cheers,
Jay Feldblum

On Thu, Oct 3, 2013 at 5:41 PM, Greg Zapp greg.zapp@gmail.com wrote:
Thanks too Zac... I may be able to create more data bags than I originally planned to ease the pain of iterating. Would be nice if Chef used something like couchbase or even upgraded to Solr4 with their soft commits though.

I'm also setting up a "shared" hosting environment like Steven and in load balanced pools to boot. I find myself debating whether I should bother implementing certain stuff in Chef, or just do it through our agent(it can run jobs) more frequently as I get into the devilish details.

@Steven: I have a few ideas around this myself.

Create a data bag for each node and place the domains it needs in there. Then you can iterate over all items efficiently. You can also setup the domain on multiple hosts for migrating services easily enough.

Have nodes remove data items after working them. Implement a unique ID and revision numbers. This will allow updates and give the node the ability to detect which item is the latest. Store the info into node attributes(or at least the revision number) and save before removing the data item from the bag. This will keep the domain bag cheap to iterate and create a job queue of sorts.

I have other ideas but can't recall them all just yet; it's too early. Looking forward to what others suggest.

Greg_Zapp · October 3, 2013, 10:04pm

That's a good question and I don't have a good answer ATM. I'm glad you
asked it though because now I'm seriously weighing the options

On Fri, Oct 4, 2013 at 10:57 AM, Noah Kantrowitz noah@coderanger.netwrote:

Why not have your recipe read directly from the API service?

--Noah

On Oct 3, 2013, at 2:55 PM, Greg Zapp greg.zapp@gmail.com wrote:

Hi Jay,

Because the last writer wins I'm have explicit producer/consumer
channels. The node is responsible for writing node attributes, the API
server will write to certain data bags, etc.

On Fri, Oct 4, 2013 at 10:51 AM, Jay Feldblum yfeldblum@gmail.com
wrote:
Greg,

Node-specific data ("Create a data bag for each node") probably belongs
on the node directly, not in a data-bag item. You can modify the persistent
node data with knife node edit $NODE.

Write access from recipes to anything on the Chef-Server is probably a
bad idea (except for saving the node data at the end of a run).

Cheers,
Jay Feldblum

On Thu, Oct 3, 2013 at 5:41 PM, Greg Zapp greg.zapp@gmail.com wrote:
Thanks too Zac... I may be able to create more data bags than I
originally planned to ease the pain of iterating. Would be nice if Chef
used something like couchbase or even upgraded to Solr4 with their soft
commits though.

I'm also setting up a "shared" hosting environment like Steven and in
load balanced pools to boot. I find myself debating whether I should
bother implementing certain stuff in Chef, or just do it through our
agent(it can run jobs) more frequently as I get into the devilish details.

@Steven: I have a few ideas around this myself.

Create a data bag for each node and place the domains it needs in
there. Then you can iterate over all items efficiently. You can also
setup the domain on multiple hosts for migrating services easily enough.

Have nodes remove data items after working them. Implement a unique
ID and revision numbers. This will allow updates and give the node the
ability to detect which item is the latest. Store the info into node
attributes(or at least the revision number) and save before removing the
data item from the bag. This will keep the domain bag cheap to iterate and
create a job queue of sorts.

I have other ideas but can't recall them all just yet; it's too early.
Looking forward to what others suggest.

BradKnowles · October 3, 2013, 10:09pm

On Oct 3, 2013, at 5:04 PM, Greg Zapp greg.zapp@gmail.com wrote:

That's a good question and I don't have a good answer ATM. I'm glad you asked it though because now I'm seriously weighing the options

Data bags and even writing node attributes is not a particularly good replacement for a proper reliable message queueing system.

--
Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu

Greg_Zapp · October 3, 2013, 11:16pm

I'm totally on board with it not being a replacement, but I feel it could
be a supplement. Working with auto scaling on AWS; nodes come and go. New
nodes will need a complete picture of what needs to be setup and Chef could
provide that picture and it's designed to handle the load now as well. Our
agent supports jobs and will be used to run chef-client on demand, and do
more job type stuff that doesn't fit neatly into the configuration domain.

I know I described a chef server usage pattern that resembles a job queue,
however I'll be going the separate data bag route so new nodes can get all
the info for their pool when they come up. I will probably still use
revision numbers or GUID stamps to speed up runs and allow the node to
determine if it needs to action change on an item.

I've also high-jacked this thread thoroughly, sorry!

On Fri, Oct 4, 2013 at 11:09 AM, Brad Knowles brad@shub-internet.orgwrote:

On Oct 3, 2013, at 5:04 PM, Greg Zapp greg.zapp@gmail.com wrote:

That's a good question and I don't have a good answer ATM. I'm glad you
asked it though because now I'm seriously weighing the options

Data bags and even writing node attributes is not a particularly good
replacement for a proper reliable message queueing system.

--
Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu

Topic		Replies	Views
Problem accessing databag from node Chef Infra (archive)	9	1632	December 10, 2011
Latency on search indicies Chef Infra (archive)	8	355	June 26, 2013
Data Bag search for Chef Solo Chef Infra (archive)	5	310	September 6, 2011
CHEF 11 : delay before databag update in search Chef Infra (archive)	3	342	February 21, 2013
Effeciency of data bags? Chef Infra (archive)	3	315	December 9, 2010

Data Bag Search Delay

–

Systems Administrator / Programmer Real Estate Webmasters - 250-753-9893

--

Systems Administrator / Programmer Real Estate Webmasters - 250-753-9893

--

Systems Administrator / Programmer Real Estate Webmasters - 250-753-9893

Systems Administrator / Programmer Real Estate Webmasters - 250-753-9893

Systems Administrator / Programmer Real Estate Webmasters - 250-753-9893

--

Systems Administrator / Programmer Real Estate Webmasters - 250-753-9893

Related topics

Systems Administrator / Programmer
Real Estate Webmasters - 250-753-9893

Systems Administrator / Programmer
Real Estate Webmasters - 250-753-9893

Systems Administrator / Programmer
Real Estate Webmasters - 250-753-9893

Systems Administrator / Programmer
Real Estate Webmasters - 250-753-9893

Systems Administrator / Programmer
Real Estate Webmasters - 250-753-9893

Systems Administrator / Programmer
Real Estate Webmasters - 250-753-9893