We have a stand alone chef server (v 12.1.2) installation which was working fine for many months. However it recently started going down few times a day. It would come back up after I would do a chef-server-ctl restart. However, it went down few days back and the chef-server restart is not working anymore.
I checked opscode-erchef current log (/var/log/opscode/opscode-erchef/current) and I saw following error:
2017-07-31_19:10:11.28759 Exec: /opt/opscode/embedded/service/opscode-erchef/erts-6.4/bin/erlexec -noshell -noinput +Bd -boot /opt/opscode/embedded/service/opscode-erchef/releases/12.1.1/oc_erchef -mode embedded -config /opt/opscode/embedded/service/opscode-erchef/sys.config -args_file /opt/opscode/embedded/service/opscode-erchef/vm.args -- foreground
2017-07-31_19:10:11.28761 Root: /opt/opscode/embedded/service/opscode-erchef
2017-07-31_19:11:14.73679
2017-07-31_19:11:14.73681 =ERROR REPORT==== 31-Jul-2017::14:11:14 ===
2017-07-31_19:11:14.73682 pool 'sqerl': exceeded timeout waiting for 20 members
2017-07-31_19:11:14.78063 =ERROR REPORT==== 31-Jul-2017::14:11:14 ===
2017-07-31_19:11:14.78064 ** Generic server <0.1020.0> terminating
2017-07-31_19:11:14.78064 ** Last message in was {tcp_closed,#Port<0.5874>}
2017-07-31_19:11:14.78064 ** When Server state == {state,gen_tcp,#Port<0.5874>,<<>>,undefined,auth,
2017-07-31_19:11:14.78065 undefined,
2017-07-31_19:11:14.78065 {[],[]},
2017-07-31_19:11:14.78066 undefined,[],[],[],[],[],[],undefined,
2017-07-31_19:11:14.78066 undefined}
2017-07-31_19:11:14.78067 ** Reason for termination ==
2017-07-31_19:11:14.78067 ** sock_closed
2017-07-31_19:11:14.78529
2017-07-31_19:11:14.78530 =ERROR REPORT==== 31-Jul-2017::14:11:14 ===
2017-07-31_19:11:14.78531 pool 'sqerl' failed to start member: {error,sock_closed}
2017-07-31_19:11:19.74314 =ERROR REPORT==== 31-Jul-2017::14:11:19 ===
2017-07-31_19:11:19.74314 ** Generic server <0.998.0> terminating
2017-07-31_19:11:19.74315 ** Last message in was timeout
2017-07-31_19:11:19.74315 ** When Server state == {starter,
2017-07-31_19:11:19.74315 {pool,sqerl,undefined,20,20,
2017-07-31_19:11:19.74315 {sqerl_client,start_link,[]},
2017-07-31_19:11:19.74316 [],0,0,1,
2017-07-31_19:11:19.74316 {1,min},
2017-07-31_19:11:19.74316 {30,sec},
2017-07-31_19:11:19.74316 pooler_sqerl_member_sup,undefined,
2017-07-31_19:11:19.74316 {dict,0,16,16,8,80,48,
2017-07-31_19:11:19.74317 {[],[],[],[],[],[],[],[],[],[],[],[],[],
2017-07-31_19:11:19.74317 [],[],[]},
2017-07-31_19:11:19.74317 {{[],[],[],[],[],[],[],[],[],[],[],[],[],
2017-07-31_19:11:19.74317 [],[],[]}}},
2017-07-31_19:11:19.74317 {dict,0,16,16,8,80,48,
2017-07-31_19:11:19.74318 {[],[],[],[],[],[],[],[],[],[],[],[],[],
2017-07-31_19:11:19.74318 [],[],[]},
2017-07-31_19:11:19.74318 {{[],[],[],[],[],[],[],[],[],[],[],[],[],
2017-07-31_19:11:19.74319 [],[],[]}}},
2017-07-31_19:11:19.74319 [],
2017-07-31_19:11:19.74319 {1,min},
2017-07-31_19:11:19.74319 folsom_metrics,folsom,
2017-07-31_19:11:19.74320 {[],[]},
2017-07-31_19:11:19.74320 20},
2017-07-31_19:11:19.74320 <0.994.0>,undefined}
2017-07-31_19:11:19.74320 ** Reason for termination ==
2017-07-31_19:11:19.74320 ** {{killed,{gen_server,call,
2017-07-31_19:11:19.74321 [pooler_sqerl_member_sup,{start_child,[]},infinity]}},
2017-07-31_19:11:19.74321 [{gen_server,call,3,[{file,"gen_server.erl"},{line,190}]},
2017-07-31_19:11:19.74322 {pooler_starter,do_start_member,1,
2017-07-31_19:11:19.74322 [{file,"src/pooler_starter.erl"},{line,138}]},
2017-07-31_19:11:19.74322 {pooler_starter,handle_info,2,
2017-07-31_19:11:19.74322 [{file,"src/pooler_starter.erl"},{line,123}]},
2017-07-31_19:11:19.74322 {gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,593}]},
2017-07-31_19:11:19.74322 {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,659}]},
2017-07-31_19:11:19.74323 {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
Does anyone know what could be wrong ?