Debugging Chef errors

Hi folks,

We recently upgraded to Chef Server 11.1.1 and are still occasionally
seeing erchef errors like:

Dreaded “no connections” error:

2014-06-11 17:22:03.351 [error] {<<“method=POST; path=/search/node;
status=500;
”>>,{error,{error,function_clause,[{chef_wm_search,’-make_bulk_get_fun/5-lc$^2/1-0-’,[{error,no_connections},#Fun<chef_wm_routes.3.34923093>,[{<<“name”>>,[<<“name”>>]},{<<“hostname”>>,[<<“hostname”>>]},{<<“fqdn”>>,[<<“fqdn”>>]},{<<“ipaddress”>>,[<<“ipaddress”>>]},{<<“rsa”>>,[<<“keys”>>,<<“ssh”>>,<<“host_rsa_public”>>]},{<<“dsa”>>,[<<“keys”>>,<<“ssh”>>,<<“host_dsa_public”>>]}],node],[{file,“src/chef_wm_search.erl”},{line,336}]},{chef_wm_search,fetch_result_rows,4,[{file,“src/chef_wm_search.erl”},{line,447}]},{chef_wm_search,make_search_results,5,[{file,“src/chef_wm_search.erl”},{line,414}]},{chef_wm_search,to_json,2,[{file,“src/chef_wm_search.erl”},{line,131}]},{chef_wm_search,process_post,2,[{file,“src/chef_wm_search.erl”},{line,271}]},{webmachine_resource,resource_call,3,[{file,“src/webmachine_resource.erl”},{line,186}]},{webmachine_resource,do,3,[{file,“src/webmachine_resource.erl”},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,“src/webmachine_decision_core.erl”},{line,48}]}]}}}

“badrecord” error:

2014-06-11 17:21:00.154 [error] {<<“method=POST;
path=/environments/production/cookbook_versions; status=500;
”>>,{error,{error,{badrecord,chef_cookbook_version},[{chef_wm_depsolver,’-assemble_response/3-lc$^0/1-0-’,2,[{file,“src/chef_wm_depsolver.erl”},{line,291}]},{chef_wm_depsolver,’-assemble_response/3-lc$^0/1-0-’,2,[{file,“src/chef_wm_depsolver.erl”},{line,293}]},{chef_wm_depsolver,assemble_response,3,[{file,“src/chef_wm_depsolver.erl”},{line,291}]},{webmachine_resource,resource_call,3,[{file,“src/webmachine_resource.erl”},{line,186}]},{webmachine_resource,do,3,[{file,“src/webmachine_resource.erl”},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,“src/webmachine_decision_core.erl”},{line,48}]},{webmachine_decision_core,decision,1,[{file,“src/webmachine_decision_core.erl”},{line,486}]},{webmachine_decision_core,handle_request,2,[{file,“src/webmachine_decision_core.erl”},{line,33}]}]}}}

And a new one:

2014-06-11 17:30:47.316 [error] Supervisor pooler_chef_depsolver_member_sup
had child chef_depsolver_worker started with
{chef_depsolver_worker,start_link,undefined} at <0.19090.54> exit with
reason killed in context child_terminated

For the “no connections” error, we tried to raise the db client pool to 400
and Postgres max connections to 500 without much success. (As a side note,
I’d really like to understand why erchef uses a db connection pool, unless
Postgres connections are really much more expensive to establish than
MySQL’s. Last I can remember connection pools being useful was way back in
my Oracle PL*SQL days.)

For /search queries are we tuning the right thing? I expect them to be
directed to solr, not Postgres - someone clue me in? Is there a different
tunable we should be looking at?

Just a general comment about erchef logging - the log information is almost
inscrutable to the hapless administrator. If there any work being done to
make the messaging clearer?

Thanks,

–Michael