Chef-server becoming nonresponsive

On our chef-server, I notice it seems to hang multiple times a day.
Clients timeout periodically, knife commands timeout and the webui is
inaccessible. Sometimes, it will come back on it’s own, sometimes I
have to shutdown and restart all the chef services. I have been
looking through the log and thus far haven’t been able to determine a
cause.

I am compacting the database every day or so. (I have to shutdown the
chef server to run the compaction otherwise couchdb will sometimes
crash.

Anyone have any ideas as to what / where I should be looking? We only
have about 100 nodes, 50+ cookbooks and 20 or so roles…

Our current server is running centos 5.7, couchdb 0.11.2,
rabbitmq-server 2.2.0, ruby 1.8.7, gems 1.8.10 and chef-server 0.10.8.
I have built a newer server, but haven’t had the time to migrate.

Thanks
Randy

What OS are you running? We noticed problems with CouchDB 0.10.x that we’re still trying to understand. But we installed Chef on Ubuntu 11.10, which includes CouchDB 1.0.1 and we haven’t seen the issue. Don’t know if that helps.

Ian D. Rossi


From: Van Fossan,Randy [vanfossr@oclc.org]
Sent: Thursday, March 15, 2012 11:46 AM
To: chef@lists.opscode.com
Subject: [chef] chef-server becoming nonresponsive

On our chef-server, I notice it seems to hang multiple times a day.
Clients timeout periodically, knife commands timeout and the webui is
inaccessible. Sometimes, it will come back on it’s own, sometimes I
have to shutdown and restart all the chef services. I have been
looking through the log and thus far haven’t been able to determine a
cause.

I am compacting the database every day or so. (I have to shutdown the
chef server to run the compaction otherwise couchdb will sometimes
crash.

Anyone have any ideas as to what / where I should be looking? We only
have about 100 nodes, 50+ cookbooks and 20 or so roles…

Our current server is running centos 5.7, couchdb 0.11.2,
rabbitmq-server 2.2.0, ruby 1.8.7, gems 1.8.10 and chef-server 0.10.8.
I have built a newer server, but haven’t had the time to migrate.

Thanks
Randy