Hello,
I have a server version 11.04 with 820 nodes. From time to time the server
stops
responding. I see high cpu usauge (on 16GB with 4 cpu) each cpu is busy at
least 60%.
The curl http://server/_status gives failed postgresql and sometime the
search.
When I’m looking into postgress i don’t see much happening
31.464393+00 | 2014-01-08 07:50:31.464876+00 | f | idle | SELECT
’pong’ as ping LIMIT 1
16384 | opscode_chef | 1165 | 16549 | opscode_chef |
| 127.0.0.1 | | 50467 | 2014-01-08
07:41:24.756074+00 | | 2014-01-08 07:50:
31.552503+00 | 2014-01-08 07:50:31.552891+00 | f | idle | SELECT
’pong’ as ping LIMIT 1
16384 | opscode_chef | 1442 | 10 | opscode-pgsql | psql
| | | -1 | 2014-01-08
07:45:22.293381+00 | 2014-01-08 07:50:32.44201+00 | 2014-01-08 07:50:
…
…
I increased the values for chef-server.rb as follow
postgresql[‘max_connections’] = 400
erchef[‘db_pool_size’] = 100
chef_expander[‘nodes’] = 4
chef_solr[‘heap_size’] = 8192
When the load will go around 4-5 the server will stop working.
Any suggestion will be welcome.
thanks!
-silviu
Could it be this? https://tickets.opscode.com/browse/CHEF-3921
--
Daniel DeLeo
On Tuesday, January 7, 2014 at 11:54 PM, Silviu Dicu wrote:
Hello,
I have a server version 11.04 with 820 nodes. From time to time the server stops
responding. I see high cpu usauge (on 16GB with 4 cpu) each cpu is busy at least 60%.
The curl http://server/_status gives failed postgresql and sometime the search.
When I'm looking into postgress i don't see much happening
31.464393+00 | 2014-01-08 07:50:31.464876+00 | f | idle | SELECT 'pong' as ping LIMIT 1
16384 | opscode_chef | 1165 | 16549 | opscode_chef | | 127.0.0.1 | | 50467 | 2014-01-08 07:41:24.756074+00 | | 2014-01-08 07:50:
31.552503+00 | 2014-01-08 07:50:31.552891+00 | f | idle | SELECT 'pong' as ping LIMIT 1
16384 | opscode_chef | 1442 | 10 | opscode-pgsql | psql | | | -1 | 2014-01-08 07:45:22.293381+00 | 2014-01-08 07:50:32.44201+00 | 2014-01-08 07:50:
....
....
I increased the values for chef-server.rb as follow
postgresql['max_connections'] = 400
erchef['db_pool_size'] = 100
chef_expander['nodes'] = 4
chef_solr['heap_size'] = 8192
When the load will go around 4-5 the server will stop working.
Any suggestion will be welcome.
thanks!
-silviu
I can try to upgrade the server.
The ticket you sent goes back to June, since then I see 11.10 is out, is this the version you recommend ?
thanks!
On Jan 9, 2014, at 12:42 PM, Daniel DeLeo dan@kallistec.com wrote:
Could it be this? https://tickets.opscode.com/browse/CHEF-3921
--
Daniel DeLeo
On Tuesday, January 7, 2014 at 11:54 PM, Silviu Dicu wrote:
Hello,
I have a server version 11.04 with 820 nodes. From time to time the server stops
responding. I see high cpu usauge (on 16GB with 4 cpu) each cpu is busy at least 60%.
The curl http://server/_status gives failed postgresql and sometime the search.
When I'm looking into postgress i don't see much happening
31.464393+00 | 2014-01-08 07:50:31.464876+00 | f | idle | SELECT 'pong' as ping LIMIT 1
16384 | opscode_chef | 1165 | 16549 | opscode_chef | | 127.0.0.1 | | 50467 | 2014-01-08 07:41:24.756074+00 | | 2014-01-08 07:50:
31.552503+00 | 2014-01-08 07:50:31.552891+00 | f | idle | SELECT 'pong' as ping LIMIT 1
16384 | opscode_chef | 1442 | 10 | opscode-pgsql | psql | | | -1 | 2014-01-08 07:45:22.293381+00 | 2014-01-08 07:50:32.44201+00 | 2014-01-08 07:50:
....
....
I increased the values for chef-server.rb as follow
postgresql['max_connections'] = 400
erchef['db_pool_size'] = 100
chef_expander['nodes'] = 4
chef_solr['heap_size'] = 8192
When the load will go around 4-5 the server will stop working.
Any suggestion will be welcome.
thanks!
-silviu
The updates to the server have only been small fixes, like updating components (e.g., openssl, rails) for security fixes and such. The nightly builds have the updated dependency resolver component that fixes the problem, however there is also a small schema change and we don’t have a proper upgrade system in place yet.
You can test a nightly build by following the instructions on the ticket. Simply upgrading should work, but be sure to back up data just in case.
If you’re not comfortable with that, then you could modify your workflow so that you always set exact equality pins for all cookbooks in each environment until the final release is ready.
--
Daniel DeLeo
On Thursday, January 9, 2014 at 9:52 AM, Silviu Dicu wrote:
I can try to upgrade the server.
The ticket you sent goes back to June, since then I see 11.10 is out, is this the version you recommend ?
thanks!
On Jan 9, 2014, at 12:42 PM, Daniel DeLeo <dan@kallistec.com (mailto:dan@kallistec.com)> wrote:
Could it be this? https://tickets.opscode.com/browse/CHEF-3921
--
Daniel DeLeo
On Tuesday, January 7, 2014 at 11:54 PM, Silviu Dicu wrote:
Hello,
I have a server version 11.04 with 820 nodes. From time to time the server stops
responding. I see high cpu usauge (on 16GB with 4 cpu) each cpu is busy at least 60%.
The curl http://server/_status gives failed postgresql and sometime the search.
When I'm looking into postgress i don't see much happening
31.464393+00 | 2014-01-08 07:50:31.464876+00 | f | idle | SELECT 'pong' as ping LIMIT 1
16384 | opscode_chef | 1165 | 16549 | opscode_chef | | 127.0.0.1 | | 50467 | 2014-01-08 07:41:24.756074+00 | | 2014-01-08 07:50:
31.552503+00 | 2014-01-08 07:50:31.552891+00 | f | idle | SELECT 'pong' as ping LIMIT 1
16384 | opscode_chef | 1442 | 10 | opscode-pgsql | psql | | | -1 | 2014-01-08 07:45:22.293381+00 | 2014-01-08 07:50:32.44201+00 | 2014-01-08 07:50:
....
....
I increased the values for chef-server.rb as follow
postgresql['max_connections'] = 400
erchef['db_pool_size'] = 100
chef_expander['nodes'] = 4
chef_solr['heap_size'] = 8192
When the load will go around 4-5 the server will stop working.
Any suggestion will be welcome.
thanks!
-silviu