Chef server high cpu usage (not functional)

Hello,

I have a server version 11.04 with 820 nodes. From time to time the server
stops
responding. I see high cpu usauge (on 16GB with 4 cpu) each cpu is busy at
least 60%.

The curl http://server/_status gives failed postgresql and sometime the
search.

When I’m looking into postgress i don’t see much happening

31.464393+00 | 2014-01-08 07:50:31.464876+00 | f | idle | SELECT
’pong’ as ping LIMIT 1
16384 | opscode_chef | 1165 | 16549 | opscode_chef |
| 127.0.0.1 | | 50467 | 2014-01-08
07:41:24.756074+00 | | 2014-01-08 07:50:
31.552503+00 | 2014-01-08 07:50:31.552891+00 | f | idle | SELECT
’pong’ as ping LIMIT 1
16384 | opscode_chef | 1442 | 10 | opscode-pgsql | psql
| | | -1 | 2014-01-08
07:45:22.293381+00 | 2014-01-08 07:50:32.44201+00 | 2014-01-08 07:50:

I increased the values for chef-server.rb as follow

postgresql[‘max_connections’] = 400
erchef[‘db_pool_size’] = 100
chef_expander[‘nodes’] = 4
chef_solr[‘heap_size’] = 8192

When the load will go around 4-5 the server will stop working.

Any suggestion will be welcome.

thanks!

-silviu

Could it be this? https://tickets.opscode.com/browse/CHEF-3921

--
Daniel DeLeo

On Tuesday, January 7, 2014 at 11:54 PM, Silviu Dicu wrote:

Hello,

I have a server version 11.04 with 820 nodes. From time to time the server stops
responding. I see high cpu usauge (on 16GB with 4 cpu) each cpu is busy at least 60%.

The curl http://server/_status gives failed postgresql and sometime the search.

When I'm looking into postgress i don't see much happening

31.464393+00 | 2014-01-08 07:50:31.464876+00 | f | idle | SELECT 'pong' as ping LIMIT 1
16384 | opscode_chef | 1165 | 16549 | opscode_chef | | 127.0.0.1 | | 50467 | 2014-01-08 07:41:24.756074+00 | | 2014-01-08 07:50:
31.552503+00 | 2014-01-08 07:50:31.552891+00 | f | idle | SELECT 'pong' as ping LIMIT 1
16384 | opscode_chef | 1442 | 10 | opscode-pgsql | psql | | | -1 | 2014-01-08 07:45:22.293381+00 | 2014-01-08 07:50:32.44201+00 | 2014-01-08 07:50:
....
....

I increased the values for chef-server.rb as follow

postgresql['max_connections'] = 400
erchef['db_pool_size'] = 100
chef_expander['nodes'] = 4
chef_solr['heap_size'] = 8192

When the load will go around 4-5 the server will stop working.

Any suggestion will be welcome.

thanks!

-silviu

I can try to upgrade the server.
The ticket you sent goes back to June, since then I see 11.10 is out, is this the version you recommend ?

thanks!

  • silviu

On Jan 9, 2014, at 12:42 PM, Daniel DeLeo dan@kallistec.com wrote:

Could it be this? https://tickets.opscode.com/browse/CHEF-3921

--
Daniel DeLeo

On Tuesday, January 7, 2014 at 11:54 PM, Silviu Dicu wrote:

Hello,

I have a server version 11.04 with 820 nodes. From time to time the server stops
responding. I see high cpu usauge (on 16GB with 4 cpu) each cpu is busy at least 60%.

The curl http://server/_status gives failed postgresql and sometime the search.

When I'm looking into postgress i don't see much happening

31.464393+00 | 2014-01-08 07:50:31.464876+00 | f | idle | SELECT 'pong' as ping LIMIT 1
16384 | opscode_chef | 1165 | 16549 | opscode_chef | | 127.0.0.1 | | 50467 | 2014-01-08 07:41:24.756074+00 | | 2014-01-08 07:50:
31.552503+00 | 2014-01-08 07:50:31.552891+00 | f | idle | SELECT 'pong' as ping LIMIT 1
16384 | opscode_chef | 1442 | 10 | opscode-pgsql | psql | | | -1 | 2014-01-08 07:45:22.293381+00 | 2014-01-08 07:50:32.44201+00 | 2014-01-08 07:50:
....
....

I increased the values for chef-server.rb as follow

postgresql['max_connections'] = 400
erchef['db_pool_size'] = 100
chef_expander['nodes'] = 4
chef_solr['heap_size'] = 8192

When the load will go around 4-5 the server will stop working.

Any suggestion will be welcome.

thanks!

-silviu

The updates to the server have only been small fixes, like updating components (e.g., openssl, rails) for security fixes and such. The nightly builds have the updated dependency resolver component that fixes the problem, however there is also a small schema change and we don’t have a proper upgrade system in place yet.

You can test a nightly build by following the instructions on the ticket. Simply upgrading should work, but be sure to back up data just in case.

If you’re not comfortable with that, then you could modify your workflow so that you always set exact equality pins for all cookbooks in each environment until the final release is ready.

--
Daniel DeLeo

On Thursday, January 9, 2014 at 9:52 AM, Silviu Dicu wrote:

I can try to upgrade the server.
The ticket you sent goes back to June, since then I see 11.10 is out, is this the version you recommend ?

thanks!

  • silviu

On Jan 9, 2014, at 12:42 PM, Daniel DeLeo <dan@kallistec.com (mailto:dan@kallistec.com)> wrote:

Could it be this? https://tickets.opscode.com/browse/CHEF-3921

--
Daniel DeLeo

On Tuesday, January 7, 2014 at 11:54 PM, Silviu Dicu wrote:

Hello,

I have a server version 11.04 with 820 nodes. From time to time the server stops
responding. I see high cpu usauge (on 16GB with 4 cpu) each cpu is busy at least 60%.

The curl http://server/_status gives failed postgresql and sometime the search.

When I'm looking into postgress i don't see much happening

31.464393+00 | 2014-01-08 07:50:31.464876+00 | f | idle | SELECT 'pong' as ping LIMIT 1
16384 | opscode_chef | 1165 | 16549 | opscode_chef | | 127.0.0.1 | | 50467 | 2014-01-08 07:41:24.756074+00 | | 2014-01-08 07:50:
31.552503+00 | 2014-01-08 07:50:31.552891+00 | f | idle | SELECT 'pong' as ping LIMIT 1
16384 | opscode_chef | 1442 | 10 | opscode-pgsql | psql | | | -1 | 2014-01-08 07:45:22.293381+00 | 2014-01-08 07:50:32.44201+00 | 2014-01-08 07:50:
....
....

I increased the values for chef-server.rb as follow

postgresql['max_connections'] = 400
erchef['db_pool_size'] = 100
chef_expander['nodes'] = 4
chef_solr['heap_size'] = 8192

When the load will go around 4-5 the server will stop working.

Any suggestion will be welcome.

thanks!

-silviu