Chef-server 11 hung forever until chef-server-ctl restart


#1

Hi guys,

I’m using chef-server-11.0.8-1.el5 on CentOS 5. My chef-server hungs
forever after OS bootup.
Thanks in advance if any suggestion.

Below is my env:

tailf /etc/inittab :

SV:123456:respawn:/opt/chef-server/embedded/bin/runsvdir-start
SE:123456:once:/usr/sbin/mybootup.sh

/usr/sbin/mybootup.sh will run ‘chef-server-ctl reconfigure’ on every
system boot and upload some cookbooks and roles.

My issue sometimes the chef-server hungs after /usr/sbin/mybootup.sh
completes. Sometimes erchef hungs, sometimes bookshelf hungs. This occurs
seldom.

e.g. This command will hung forever:

wget http://127.0.0.1:8000/
–2013-11-18 12:41:53-- http://127.0.0.1:8000/
Connecting to 127.0.0.1:8000… connected.
HTTP request sent, awaiting response…

After run’ chef-server-ctl restart’ , the chef-server works well.

The chef logs doesn’t show any error. BTW how to set log level of erchef
and bookshelf to debug ?

sudo chef-server-ctl tail
==> /var/log/chef-server/erchef/erchef.log.idx <==

==> /var/log/chef-server/erchef/erchef.log.siz <==

==> /var/log/chef-server/erchef/current <==
2013-11-18_08:57:40.35284 Loaded key default of type RSAPrivateKey from
file /etc/chef-server/chef-webui.pem
2013-11-18_08:57:40.37516
2013-11-18_08:57:40.37517 =INFO REPORT==== 18-Nov-2013::08:57:40 ===
2013-11-18_08:57:40.37517 starting chef_index_sup
2013-11-18_08:57:40.37575
2013-11-18_08:57:40.37576 =INFO REPORT==== 18-Nov-2013::08:57:40 ===
2013-11-18_08:57:40.37577 Connecting to Rabbit at
127.0.0.1:5672/chef(exchange: <<>>)
2013-11-18_08:57:40.50012
2013-11-18_08:57:40.50013 =INFO REPORT==== 18-Nov-2013::08:57:40 ===
2013-11-18_08:57:40.50014 Negotiated maximums: (Channel = 0, Frame =
131072, Heartbeat = 0)

==> /var/log/chef-server/erchef/erchef.log.1 <==
2013-11-18T08:59:28Z erchef@127.0.0.1 INFO req_id=urcqSgbispND1AVfpvbeUA==;
status=201; method=POST; path=/roles; user=myuser;
msg={created,<<“mapr_jobtracker”>>}; req_time=9; rdbms_time=4; rdbms_count=3
2013-11-18T08:59:32Z erchef@127.0.0.1 INFO req_id=n6nXCO3iagR34yZ388D9TA==;
status=201; method=POST; path=/roles; user=myuser;
msg={created,<<“mapr_zookeeper”>>}; req_time=9; rdbms_time=3; rdbms_count=3
2013-11-18T08:59:32Z erchef@127.0.0.1 INFO req_id=h2OZJpyk2I4UOwgfE2hweg==;
status=201; method=POST; path=/roles; user=myuser;
msg={created,<<“mapr_hbase_client”>>}; req_time=29; rdbms_time=5;
rdbms_count=3
2013-11-18T08:59:32Z erchef@127.0.0.1 INFO req_id=W7tUM5z5HqUeD4RgphehVg==;
status=201; method=POST; path=/roles; user=myuser;
msg={created,<<“mapr_hbase_regionserver”>>}; req_time=8; rdbms_time=3;
rdbms_count=3

==> /var/log/chef-server/chef-expander/current <==
role[0000000000005295a31d2e5d5ee87dcc]
database[chef_00000000000000000000000000000000] transit,xml,solr-post |
0,0.00047397613525390625,0.012217044830322266 |
2013-11-18_08:59:32.52864 [Mon, 18 Nov 2013 08:59:32 +0000] INFO: indexed
role[000000000000ea2b2572dfa22ca1af0c]
database[chef_00000000000000000000000000000000] transit,xml,solr-post |
0,0.0004630088806152344,0.03265976905822754 |
2013-11-18_08:59:32.54012 [Mon, 18 Nov 2013 08:59:32 +0000] INFO: indexed
role[0000000000006d8e8b23feacd9b38599]
database[chef_00000000000000000000000000000000] transit,xml,solr-post |
0,0.00034499168395996094,0.017193078994750977 |
2013-11-18_08:59:32.91701 [Mon, 18 Nov 2013 08:59:32 +0000] INFO: indexed
role[000000000000f9d444c1a03859122ec6]
database[chef_00000000000000000000000000000000] transit,xml,solr-post |
0,0.00034499168395996094,0.00807332992553711 |

==> /var/log/chef-server/chef-solr/current <==
2013-11-18_08:59:35.58380 Nov 18, 2013 8:59:35 AM
org.apache.solr.core.QuerySenderListener newSearcher
2013-11-18_08:59:35.58381 INFO: QuerySenderListener done.
2013-11-18_08:59:35.58469 Nov 18, 2013 8:59:35 AM
org.apache.solr.core.SolrCore registerSearcher
2013-11-18_08:59:35.58470 INFO: [] Registered new searcher Searcher@5c53b56fmain
2013-11-18_08:59:35.58566 Nov 18, 2013 8:59:35 AM
org.apache.solr.search.SolrIndexSearcher close
2013-11-18_08:59:35.58567 INFO: Closing Searcher@23724a1a main
2013-11-18_08:59:35.58567
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-11-18_08:59:35.58568
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-11-18_08:59:35.58568
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=1,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=1,cumulative_evictions=0}
2013-11-18_08:59:35.58569
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

==> /var/log/chef-server/rabbitmq/current <==
2013-11-18_08:56:49.33375 starting exchange, queue and binding
recovery …done
2013-11-18_08:56:49.34651 starting mirror queue slave
sup …done
2013-11-18_08:56:49.34700 starting adding mirrors to
queues …done
2013-11-18_08:56:49.34751 – message delivery logic ready
2013-11-18_08:56:49.34767 starting error log
relay …done
2013-11-18_08:56:49.34852 starting
networking …done
2013-11-18_08:56:49.37482 starting
direct_client …done
2013-11-18_08:56:49.37526 starting notify cluster
nodes …done
2013-11-18_08:56:49.37544
2013-11-18_08:56:49.37545 broker running

==> /var/log/chef-server/bookshelf/current <==
2013-11-18_09:54:37.84878 badarg}}
2013-11-18_10:23:23.94760
2013-11-18_10:23:23.94763 =ERROR REPORT==== 18-Nov-2013::10:23:23 ===
2013-11-18_10:23:23.94763 {error,{{mini_s3,make_signed_url_authorization},
2013-11-18_10:23:23.94764 [<<“SECRETKEY”>>,“get”,"/",undefined,
2013-11-18_10:23:23.94764 [{‘Accept’,"/"},
2013-11-18_10:23:23.94764 {‘Connection’,“Keep-Alive”},
2013-11-18_10:23:23.94765 {‘Host’,“127.0.0.1:4321”},
2013-11-18_10:23:23.94765 {‘User-Agent’,“Wget/1.11.4 Red Hat
modified”}]],
2013-11-18_10:23:23.94765 badarg}}

==> /var/log/chef-server/postgresql/current <==
2013-11-18_08:57:00.82999 LOG: database system was shut down at 2013-11-18
08:56:57 GMT
2013-11-18_08:57:00.84494 LOG: autovacuum launcher started
2013-11-18_08:57:00.84620 LOG: database system is ready to accept
connections

==> /var/log/chef-server/nginx/rewrite-port-9080.log <==

==> /var/log/chef-server/nginx/access.log <==
10.111.88.50 - - [18/Nov/2013:10:00:04 +0000] “GET
//cookbooks?num_versions=1 HTTP/1.1” 504 “300.013” 182 “-” “Chef
Knife/11.4.0 (ruby-1.9.3-p286; ohai-6.16.0; x86_64-linux; +
http://opscode.com)” “127.0.0.1:8000” “504” “300.006” “11.4.0"
“algorithm=sha1;version=1.0;” “myuser” “2013-11-18T09:55:04Z”
“2jmj7l5rSw0yVb/vlWAYkK/YBwk=” 973
10.111.88.50 - - [18/Nov/2013:10:00:15 +0000] “GET /roles HTTP/1.1” 504
"300.020” 182 “-” “Chef Knife/11.4.0 (ruby-1.9.3-p286; ohai-6.16.0;
x86_64-linux; +http://opscode.com)” “127.0.0.1:8000” “504” “300.014"
“11.4.0” “algorithm=sha1;version=1.0;” “myuser” “2013-11-18T09:55:15Z”
“2jmj7l5rSw0yVb/vlWAYkK/YBwk=” 953
10.111.88.50 - - [18/Nov/2013:11:55:12 +0000] “GET /roles HTTP/1.1” 504
"300.007” 182 “-” “Chef Knife/11.4.0 (ruby-1.9.3-p286; ohai-6.16.0;
x86_64-linux; +http://opscode.com)” “127.0.0.1:8000” “504” "300.002"
“11.4.0” “algorithm=sha1;version=1.0;” “myuser” “2013-11-18T11:50:12Z”
“2jmj7l5rSw0yVb/vlWAYkK/YBwk=” 953

==> /var/log/chef-server/nginx/current <==

==> /var/log/chef-server/nginx/error.log <==
2013/11/18 09:34:17 [error] 4557#0: *1041 upstream timed out (110:
Connection timed out) while reading response header from upstream, client:
10.111.88.50, server: , request: “GET
/search/node?q=cluster_name:test&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000
HTTP/1.1”, upstream: “
http://127.0.0.1:8000/search/node?q=cluster_name:test&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000”,
host: ":9443"
2013/11/18 09:39:14 [error] 4557#0: *1043 upstream timed out (110:
Connection timed out) while reading response header from upstream, client:
10.111.88.50, server: , request: “GET /roles HTTP/1.1”, upstream:
http://127.0.0.1:8000/roles”, host: “:9443”

Thanks
Jesse Hu


#2

the CPU and Mem looks normal on my chef-server

top - 13:08:29 up 4:12, 3 users, load average: 0.01, 0.02, 0.00
Tasks: 203 total, 1 running, 202 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 8174384k total, 3595892k used, 4578492k free, 64628k buffers
Swap: 2096472k total, 0k used, 2096472k free, 1435928k cached

The strange thing is all my cookbook and roles are uploaded successfully,
but later on the erchef hungs. Not sure erchef hungs before or after my
chef-client tries to synchronizing cookbook.

The rabbitmq (5672) and bookshelf (4321) is responding, but erchef (8000)
hungs. Sometimes bookshelf hung too. My issue should not be the same as
https://tickets.opscode.com/browse/CHEF-3921 , all ‘depend’ in metadata.rb
of my cookbook doesn’t have version constraints.

wget http://127.0.0.1:8000/
–2013-11-18 13:08:45-- http://127.0.0.1:8000/
Connecting to 127.0.0.1:8000… connected.
HTTP request sent, awaiting response…

wget http://127.0.0.1:5672/
–2013-11-18 13:08:58-- http://127.0.0.1:5672/
Connecting to 127.0.0.1:5672… connected.
HTTP request sent, awaiting response… 200 No headers, assuming HTTP/0.9
Length: unspecified
Saving to: `index.html’

wget http://127.0.0.1:4321/bookshelf/
–2013-11-18 13:09:44-- http://127.0.0.1:4321/bookshelf/
Connecting to 127.0.0.1:4321… connected.
HTTP request sent, awaiting response… 403 Forbidden
2013-11-18 13:09:44 ERROR 403: Forbidden.

Thanks
Jesse Hu, Project Serengeti http://www.projectserengeti.org/

2013/11/18 Hui Hu huhui14@gmail.com

Hi guys,

I’m using chef-server-11.0.8-1.el5 on CentOS 5. My chef-server hungs
forever after OS bootup.
Thanks in advance if any suggestion.

Below is my env:

tailf /etc/inittab :

SV:123456:respawn:/opt/chef-server/embedded/bin/runsvdir-start
SE:123456:once:/usr/sbin/mybootup.sh

/usr/sbin/mybootup.sh will run ‘chef-server-ctl reconfigure’ on every
system boot and upload some cookbooks and roles.

My issue sometimes the chef-server hungs after /usr/sbin/mybootup.sh
completes. Sometimes erchef hungs, sometimes bookshelf hungs. This occurs
seldom.

e.g. This command will hung forever:

wget http://127.0.0.1:8000/
–2013-11-18 12:41:53-- http://127.0.0.1:8000/
Connecting to 127.0.0.1:8000… connected.
HTTP request sent, awaiting response…

After run’ chef-server-ctl restart’ , the chef-server works well.

The chef logs doesn’t show any error. BTW how to set log level of erchef
and bookshelf to debug ?

sudo chef-server-ctl tail
==> /var/log/chef-server/erchef/erchef.log.idx <==

==> /var/log/chef-server/erchef/erchef.log.siz <==

==> /var/log/chef-server/erchef/current <==
2013-11-18_08:57:40.35284 Loaded key default of type RSAPrivateKey from
file /etc/chef-server/chef-webui.pem
2013-11-18_08:57:40.37516
2013-11-18_08:57:40.37517 =INFO REPORT==== 18-Nov-2013::08:57:40 ===
2013-11-18_08:57:40.37517 starting chef_index_sup
2013-11-18_08:57:40.37575
2013-11-18_08:57:40.37576 =INFO REPORT==== 18-Nov-2013::08:57:40 ===
2013-11-18_08:57:40.37577 Connecting to Rabbit at 127.0.0.1:5672/chef(exchange: <<>>)
2013-11-18_08:57:40.50012
2013-11-18_08:57:40.50013 =INFO REPORT==== 18-Nov-2013::08:57:40 ===
2013-11-18_08:57:40.50014 Negotiated maximums: (Channel = 0, Frame =
131072, Heartbeat = 0)

==> /var/log/chef-server/erchef/erchef.log.1 <==
2013-11-18T08:59:28Z erchef@127.0.0.1 INFO
req_id=urcqSgbispND1AVfpvbeUA==; status=201; method=POST; path=/roles;
user=myuser; msg={created,<<“mapr_jobtracker”>>}; req_time=9; rdbms_time=4;
rdbms_count=3
2013-11-18T08:59:32Z erchef@127.0.0.1 INFO
req_id=n6nXCO3iagR34yZ388D9TA==; status=201; method=POST; path=/roles;
user=myuser; msg={created,<<“mapr_zookeeper”>>}; req_time=9; rdbms_time=3;
rdbms_count=3
2013-11-18T08:59:32Z erchef@127.0.0.1 INFO
req_id=h2OZJpyk2I4UOwgfE2hweg==; status=201; method=POST; path=/roles;
user=myuser; msg={created,<<“mapr_hbase_client”>>}; req_time=29;
rdbms_time=5; rdbms_count=3
2013-11-18T08:59:32Z erchef@127.0.0.1 INFO
req_id=W7tUM5z5HqUeD4RgphehVg==; status=201; method=POST; path=/roles;
user=myuser; msg={created,<<“mapr_hbase_regionserver”>>}; req_time=8;
rdbms_time=3; rdbms_count=3

==> /var/log/chef-server/chef-expander/current <==
role[0000000000005295a31d2e5d5ee87dcc]
database[chef_00000000000000000000000000000000] transit,xml,solr-post |
0,0.00047397613525390625,0.012217044830322266 |
2013-11-18_08:59:32.52864 [Mon, 18 Nov 2013 08:59:32 +0000] INFO: indexed
role[000000000000ea2b2572dfa22ca1af0c]
database[chef_00000000000000000000000000000000] transit,xml,solr-post |
0,0.0004630088806152344,0.03265976905822754 |
2013-11-18_08:59:32.54012 [Mon, 18 Nov 2013 08:59:32 +0000] INFO: indexed
role[0000000000006d8e8b23feacd9b38599]
database[chef_00000000000000000000000000000000] transit,xml,solr-post |
0,0.00034499168395996094,0.017193078994750977 |
2013-11-18_08:59:32.91701 [Mon, 18 Nov 2013 08:59:32 +0000] INFO: indexed
role[000000000000f9d444c1a03859122ec6]
database[chef_00000000000000000000000000000000] transit,xml,solr-post |
0,0.00034499168395996094,0.00807332992553711 |

==> /var/log/chef-server/chef-solr/current <==
2013-11-18_08:59:35.58380 Nov 18, 2013 8:59:35 AM
org.apache.solr.core.QuerySenderListener newSearcher
2013-11-18_08:59:35.58381 INFO: QuerySenderListener done.
2013-11-18_08:59:35.58469 Nov 18, 2013 8:59:35 AM
org.apache.solr.core.SolrCore registerSearcher
2013-11-18_08:59:35.58470 INFO: [] Registered new searcher
Searcher@5c53b56f main
2013-11-18_08:59:35.58566 Nov 18, 2013 8:59:35 AM
org.apache.solr.search.SolrIndexSearcher close
2013-11-18_08:59:35.58567 INFO: Closing Searcher@23724a1a main
2013-11-18_08:59:35.58567
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-11-18_08:59:35.58568
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-11-18_08:59:35.58568
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=1,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=1,cumulative_evictions=0}
2013-11-18_08:59:35.58569
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

==> /var/log/chef-server/rabbitmq/current <==
2013-11-18_08:56:49.33375 starting exchange, queue and binding
recovery …done
2013-11-18_08:56:49.34651 starting mirror queue slave
sup …done
2013-11-18_08:56:49.34700 starting adding mirrors to
queues …done
2013-11-18_08:56:49.34751 – message delivery logic ready
2013-11-18_08:56:49.34767 starting error log
relay …done
2013-11-18_08:56:49.34852 starting
networking …done
2013-11-18_08:56:49.37482 starting
direct_client …done
2013-11-18_08:56:49.37526 starting notify cluster
nodes …done
2013-11-18_08:56:49.37544
2013-11-18_08:56:49.37545 broker running

==> /var/log/chef-server/bookshelf/current <==
2013-11-18_09:54:37.84878 badarg}}
2013-11-18_10:23:23.94760
2013-11-18_10:23:23.94763 =ERROR REPORT==== 18-Nov-2013::10:23:23 ===
2013-11-18_10:23:23.94763 {error,{{mini_s3,make_signed_url_authorization},
2013-11-18_10:23:23.94764 [<<“SECRETKEY”>>,“get”,"/",undefined,
2013-11-18_10:23:23.94764 [{‘Accept’,"/"},
2013-11-18_10:23:23.94764 {‘Connection’,“Keep-Alive”},
2013-11-18_10:23:23.94765 {‘Host’,“127.0.0.1:4321”},
2013-11-18_10:23:23.94765 {‘User-Agent’,“Wget/1.11.4 Red Hat
modified”}]],
2013-11-18_10:23:23.94765 badarg}}

==> /var/log/chef-server/postgresql/current <==
2013-11-18_08:57:00.82999 LOG: database system was shut down at
2013-11-18 08:56:57 GMT
2013-11-18_08:57:00.84494 LOG: autovacuum launcher started
2013-11-18_08:57:00.84620 LOG: database system is ready to accept
connections

==> /var/log/chef-server/nginx/rewrite-port-9080.log <==

==> /var/log/chef-server/nginx/access.log <==
10.111.88.50 - - [18/Nov/2013:10:00:04 +0000] “GET
//cookbooks?num_versions=1 HTTP/1.1” 504 “300.013” 182 “-” “Chef
Knife/11.4.0 (ruby-1.9.3-p286; ohai-6.16.0; x86_64-linux; +
http://opscode.com)” “127.0.0.1:8000” “504” “300.006” “11.4.0"
“algorithm=sha1;version=1.0;” “myuser” “2013-11-18T09:55:04Z”
“2jmj7l5rSw0yVb/vlWAYkK/YBwk=” 973
10.111.88.50 - - [18/Nov/2013:10:00:15 +0000] “GET /roles HTTP/1.1” 504
"300.020” 182 “-” “Chef Knife/11.4.0 (ruby-1.9.3-p286; ohai-6.16.0;
x86_64-linux; +http://opscode.com)” “127.0.0.1:8000” “504” “300.014"
“11.4.0” “algorithm=sha1;version=1.0;” “myuser” “2013-11-18T09:55:15Z”
“2jmj7l5rSw0yVb/vlWAYkK/YBwk=” 953
10.111.88.50 - - [18/Nov/2013:11:55:12 +0000] “GET /roles HTTP/1.1” 504
"300.007” 182 “-” “Chef Knife/11.4.0 (ruby-1.9.3-p286; ohai-6.16.0;
x86_64-linux; +http://opscode.com)” “127.0.0.1:8000” “504” "300.002"
“11.4.0” “algorithm=sha1;version=1.0;” “myuser” “2013-11-18T11:50:12Z”
“2jmj7l5rSw0yVb/vlWAYkK/YBwk=” 953

==> /var/log/chef-server/nginx/current <==

==> /var/log/chef-server/nginx/error.log <==
2013/11/18 09:34:17 [error] 4557#0: *1041 upstream timed out (110:
Connection timed out) while reading response header from upstream, client:
10.111.88.50, server: , request: “GET
/search/node?q=cluster_name:test&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000
HTTP/1.1”, upstream: “
http://127.0.0.1:8000/search/node?q=cluster_name:test&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000”,
host: ":9443"
2013/11/18 09:39:14 [error] 4557#0: *1043 upstream timed out (110:
Connection timed out) while reading response header from upstream, client:
10.111.88.50, server: , request: “GET /roles HTTP/1.1”, upstream:
http://127.0.0.1:8000/roles”, host: “:9443”

Thanks
Jesse Hu


#3

Hi there,

huhui14@gmail.com writes:

the CPU and Mem looks normal on my chef-server

top - 13:08:29 up 4:12, 3 users, load average: 0.01, 0.02, 0.00
Tasks: 203 total, 1 running, 202 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 8174384k total, 3595892k used, 4578492k free, 64628k buffers
Swap: 2096472k total, 0k used, 2096472k free, 1435928k cached

The strange thing is all my cookbook and roles are uploaded successfully,
but later on the erchef hungs. Not sure erchef hungs before or after my
chef-client tries to synchronizing cookbook.

Does the server hang at start without receiving chef-client run traffic
or only after nodes check-in and (attempt to) run?

At present, this sounds most like the known issue with the dependency
solver – especially when you describe that updates to cookbooks and
roles work, but CCRs fail.

This would not explain bookshelf hanging, however.

We’ve replaced the dependency solver – restoring the gecode based
solver that is faster and more robust (at the cost of the dependency on
gecode). You can experiment with it by installing a nightly build of
Chef Server.

  • seth