[RESOLVED] Chef server not starting up - v 12.1.2

We have a stand alone chef server (v 12.1.2) installation which was working fine for many months. However it recently started going down few times a day. It would come back up after I would do a chef-server-ctl restart. However, it went down few days back and the chef-server restart is not working anymore.

I checked opscode-erchef current log (/var/log/opscode/opscode-erchef/current) and I saw following error:

2017-07-31_19:10:11.28759 Exec: /opt/opscode/embedded/service/opscode-erchef/erts-6.4/bin/erlexec -noshell -noinput +Bd -boot /opt/opscode/embedded/service/opscode-erchef/releases/12.1.1/oc_erchef -mode embedded -config /opt/opscode/embedded/service/opscode-erchef/sys.config -args_file /opt/opscode/embedded/service/opscode-erchef/vm.args -- foreground
2017-07-31_19:10:11.28761 Root: /opt/opscode/embedded/service/opscode-erchef
2017-07-31_19:11:14.73679
2017-07-31_19:11:14.73681 =ERROR REPORT==== 31-Jul-2017::14:11:14 ===
2017-07-31_19:11:14.73682 pool 'sqerl': exceeded timeout waiting for 20 members
2017-07-31_19:11:14.78063 =ERROR REPORT==== 31-Jul-2017::14:11:14 ===
2017-07-31_19:11:14.78064 ** Generic server <0.1020.0> terminating
2017-07-31_19:11:14.78064 ** Last message in was {tcp_closed,#Port<0.5874>}
2017-07-31_19:11:14.78064 ** When Server state == {state,gen_tcp,#Port<0.5874>,<<>>,undefined,auth,
2017-07-31_19:11:14.78065                                undefined,
2017-07-31_19:11:14.78065                                {[],[]},
2017-07-31_19:11:14.78066                                undefined,[],[],[],[],[],[],undefined,
2017-07-31_19:11:14.78066                                undefined}
2017-07-31_19:11:14.78067 ** Reason for termination ==
2017-07-31_19:11:14.78067 ** sock_closed
2017-07-31_19:11:14.78529
2017-07-31_19:11:14.78530 =ERROR REPORT==== 31-Jul-2017::14:11:14 ===
2017-07-31_19:11:14.78531 pool 'sqerl' failed to start member: {error,sock_closed}
2017-07-31_19:11:19.74314 =ERROR REPORT==== 31-Jul-2017::14:11:19 ===
2017-07-31_19:11:19.74314 ** Generic server <0.998.0> terminating
2017-07-31_19:11:19.74315 ** Last message in was timeout
2017-07-31_19:11:19.74315 ** When Server state == {starter,
2017-07-31_19:11:19.74315                             {pool,sqerl,undefined,20,20,
2017-07-31_19:11:19.74315                                 {sqerl_client,start_link,[]},
2017-07-31_19:11:19.74316                                 [],0,0,1,
2017-07-31_19:11:19.74316                                 {1,min},
2017-07-31_19:11:19.74316                                 {30,sec},
2017-07-31_19:11:19.74316                                 pooler_sqerl_member_sup,undefined,
2017-07-31_19:11:19.74316                                 {dict,0,16,16,8,80,48,
2017-07-31_19:11:19.74317                                     {[],[],[],[],[],[],[],[],[],[],[],[],[],
2017-07-31_19:11:19.74317                                      [],[],[]},
2017-07-31_19:11:19.74317                                     {{[],[],[],[],[],[],[],[],[],[],[],[],[],
2017-07-31_19:11:19.74317                                       [],[],[]}}},
2017-07-31_19:11:19.74317                                 {dict,0,16,16,8,80,48,
2017-07-31_19:11:19.74318                                     {[],[],[],[],[],[],[],[],[],[],[],[],[],
2017-07-31_19:11:19.74318                                      [],[],[]},
2017-07-31_19:11:19.74318                                     {{[],[],[],[],[],[],[],[],[],[],[],[],[],
2017-07-31_19:11:19.74319                                       [],[],[]}}},
2017-07-31_19:11:19.74319                                 [],
2017-07-31_19:11:19.74319                                 {1,min},
2017-07-31_19:11:19.74319                                 folsom_metrics,folsom,
2017-07-31_19:11:19.74320                                 {[],[]},
2017-07-31_19:11:19.74320                                 20},
2017-07-31_19:11:19.74320                             <0.994.0>,undefined}
2017-07-31_19:11:19.74320 ** Reason for termination ==
2017-07-31_19:11:19.74320 ** {{killed,{gen_server,call,
2017-07-31_19:11:19.74321                         [pooler_sqerl_member_sup,{start_child,[]},infinity]}},
2017-07-31_19:11:19.74321     [{gen_server,call,3,[{file,"gen_server.erl"},{line,190}]},
2017-07-31_19:11:19.74322      {pooler_starter,do_start_member,1,
2017-07-31_19:11:19.74322                      [{file,"src/pooler_starter.erl"},{line,138}]},
2017-07-31_19:11:19.74322      {pooler_starter,handle_info,2,
2017-07-31_19:11:19.74322                      [{file,"src/pooler_starter.erl"},{line,123}]},
2017-07-31_19:11:19.74322      {gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,593}]},
2017-07-31_19:11:19.74322      {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,659}]},
2017-07-31_19:11:19.74323      {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}

Does anyone know what could be wrong ?

The issue was with the sqerl - pooler_timeout attribute in the erchef sys.config file (/var/opt/opscode/opscode-erchef/sys.config). It was set to 0. Increased it to 1000 and it resolved the issue.