Chef 12 server dead in the water due to rabbitmq problems (reconfigure bombs)


#1

I just finished cutting over part of our systems to our the chef 12 server. And now the server won’t run due to a rabbitmq problem. Can anyone help? I’ve seen mention of my error out there, but I’ve either seen no solution, or the solution I saw (qpidd) is irrelevant.

Pleeeez send halp … Error and details are below.

Amazon Linux AMI release 2015.03 chef-server-core-12.2.0-1.el6.x86_64

[root@chef01a aec1d:a ~]# cat /etc/opscode/chef-server.rb

default_orgname "tenant01"
addons[‘install’] = false

nginx[‘ssl_ciphers’] = "HIGH:MEDIUM:!LOW:!kEDH:!aNULL:!ADH:!eNULL:!EXP:!SSLv2:!SEED:!CAMELLIA:!PSK"
nginx[‘ssl_protocols’] = "TLSv1 TLSv1.1 TLSv1.2"
nginx[‘log_rotation’] = { ‘file_maxbytes’ => 104857600, ‘num_to_keep’ => 5 }

bookshelf[‘log_rotation’] = { ‘file_maxbytes’ => 104857600, ‘num_to_keep’ => 5 }
oc_bifrost[‘log_rotation’] = { ‘file_maxbytes’ => 104857600, ‘num_to_keep’ => 5 }
opscode_erchef[‘log_rotation’] = { ‘file_maxbytes’ => 104857600, ‘num_to_keep’ => 5 }

[root@chef01a aec1d:a ~]# chef-server-ctl status
down: rabbitmq: 1s, normally up, want up; run: log: (pid 2369) 48s

[root@chef01a aec1d:a ~]# chef-server-ctl reconfigure
…snip

  • execute[/opt/chef-server/embedded/bin/chpst -u chef_server -U chef_server /opt/chef-server/embedded/bin/rabbitmqctl wait /var/opt/chef-server/rabbitmq/db/rabbit@localhost.pid] action run
    ================================================================================
    Error executing action run on resource ‘execute[/opt/chef-server/embedded/bin/chpst -u chef_server -U chef_server /opt/chef-server/embedded/bin/rabbitmqctl wait /var/opt/chef-server/rabbitmq/db/rabbit@localhost.pid]’
    ================================================================================
    Mixlib::ShellOut::ShellCommandFailed

Expected process to exit with [0], but received ‘2’
---- Begin output of /opt/chef-server/embedded/bin/chpst -u chef_server -U chef_server /opt/chef-server/embedded/bin/rabbitmqctl wait /var/opt/chef-server/rabbitmq/db/rabbit@localhost.pid ----
STDOUT: Waiting for rabbit@localhost …
pid is 6250 …
STDERR: Error: process_not_running
---- End output of /opt/chef-server/embedded/bin/chpst -u chef_server -U chef_server /opt/chef-server/embedded/bin/rabbitmqctl wait /var/opt/chef-server/rabbitmq/db/rabbit@localhost.pid ----
Ran /opt/chef-server/embedded/bin/chpst -u chef_server -U chef_server /opt/chef-server/embedded/bin/rabbitmqctl wait /var/opt/chef-server/rabbitmq/db/rabbit@localhost.pid returned 2
Resource Declaration:

In /opt/chef-server/embedded/cookbooks/chef-server/recipes/rabbitmq.rb

80: execute “/opt/chef-server/embedded/bin/chpst -u #{node[“chef_server”][“user”][“username”]} -U #{node[“chef_server”][“user”][“username”]} /opt/chef-server/embedded/bin/rabbitmqctl wait #{rabbitmq_data_dir}/rabbit@localhost.pid” do
81: retries 10
82: end
83:
Compiled Resource:

Declared in /opt/chef-server/embedded/cookbooks/chef-server/recipes/rabbitmq.rb:80:in `from_file’

execute("/opt/chef-server/embedded/bin/chpst -u chef_server -U chef_server /opt/chef-server/embedded/bin/rabbitmqctl wait /var/opt/chef-server/rabbitmq/db/rabbit@localhost.pid") do
action "run"
retries 0
retry_delay 2
guard_interpreter :default
command "/opt/chef-server/embedded/bin/chpst -u chef_server -U chef_server /opt/chef-server/embedded/bin/rabbitmqctl wait /var/opt/chef-server/rabbitmq/db/rabbit@localhost.pid"
backup 5
returns 0
cookbook_name :"chef-server"
recipe_name "rabbitmq"
end

Running handlers:
[2015-12-11T23:41:56+00:00] ERROR: Running exception handlers
Running handlers complete

[2015-12-11T23:41:56+00:00] ERROR: Exception handlers complete
[2015-12-11T23:41:56+00:00] FATAL: Stacktrace dumped to /opt/chef-server/embedded/cookbooks/cache/chef-stacktrace.out
Chef Client failed. 3 resources updated in 28.902045354 seconds
[2015-12-11T23:41:56+00:00] ERROR: execute[/opt/chef-server/embedded/bin/chpst -u chef_server -U chef_server /opt/chef-server/embedded/bin/rabbitmqctl wait /var/opt/chef-server/rabbitmq/db/rabbit@localhost.pid] (chef-server::rabbitmq line 80) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received ‘2’
---- Begin output of /opt/chef-server/embedded/bin/chpst -u chef_server -U chef_server /opt/chef-server/embedded/bin/rabbitmqctl wait /var/opt/chef-server/rabbitmq/db/rabbit@localhost.pid ----
STDOUT: Waiting for rabbit@localhost …
pid is 6250 …
STDERR: Error: process_not_running
---- End output of /opt/chef-server/embedded/bin/chpst -u chef_server -U chef_server /opt/chef-server/embedded/bin/rabbitmqctl wait /var/opt/chef-server/rabbitmq/db/rabbit@localhost.pid ----
Ran /opt/chef-server/embedded/bin/chpst -u chef_server -U chef_server /opt/chef-server/embedded/bin/rabbitmqctl wait /var/opt/chef-server/rabbitmq/db/rabbit@localhost.pid returned 2
[2015-12-11T23:41:56+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)


#2

Without actually looking up the specific error messages I can lead off with the vast majority of rabbit runtime issues are caused by a bad FQDN. Check that hostname -f works and that the name it returns resolves to the current machine.


#3

[root@chef01a aec1d:a ~]# hostname -f
chef01a.aec1d.internal


#4

Pardon, you also asked for:

[root@chef01a aec1d:a ~]# host chef01a.aec1d.internal
chef01a.aec1d.internal has address 10.9.207.250

[root@chef01a aec1d:a ~]# curl -I chef01a.aec1d.internal:443
HTTP/1.1 400 Bad Request
Server: openresty/1.7.10.1
Date: Sat, 12 Dec 2015 00:08:45 GMT
Content-Type: text/html
Content-Length: 277
Connection: close


#5

Does that name resolve in DNS?


#6

Yes. …

[root@chef01a aec1d:a ~]# dig chef01a.aec1d.internal

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.30.rc1.39.amzn1 <<>> chef01a.aec1d.internal
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11549
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;chef01a.aec1d.internal. IN A

;; ANSWER SECTION:
chef01a.aec1d.internal. 60 IN A 10.9.207.250

;; Query time: 2 msec
;; SERVER: 169.254.169.253#53(169.254.169.253)
;; WHEN: Sat Dec 12 00:10:59 2015
;; MSG SIZE rcvd: 56


#7

I don’t know rabbitmq at all. But I’m trying. For the hell of it, I tried to repro the command being run from the ctl reconfigure. This looks like a permission denied issue? If so, on what files or dirs?

[root@chef01a aec1d:a ~]# ln -s /opt/opscode/embedded/bin/erl /usr/local/bin/erl [root@chef01a aec1d:a ~]# /opt/chef-server/embedded/bin/chpst -u chef_server -U chef_server /opt/chef-server/embedded/bin/rabbitmqctl wait /var/opt/chef-server/rabbitmq/db/rabbit@localhost.pid {error_logger,{{2015,12,12},{0,17,9}},std_error,"File operation error: eacces. Target: .. Function: read_file_info. Process: code_server."} {error_logger,{{2015,12,12},{0,17,9}},std_error,"File operation error: eacces. Target: ./beam_lib.beam. Function: get_file. Process: code_server."} {error_logger,{{2015,12,12},{0,17,9}},std_error,"File operation error: eacces. Target: ./ram_file.beam. Function: get_file. Process: code_server."} {error_logger,{{2015,12,12},{0,17,9}},std_error,"File operation error: eacces. Target: ./standard_error.beam. Function: get_file. Process: code_server."} {error_logger,{{2015,12,12},{0,17,9}},std_error,"File operation error: eacces. Target: ./supervisor_bridge.beam. Function: get_file. Process: code_server."} {error_logger,{{2015,12,12},{0,17,9}},std_error,"File operation error: eacces. Target: ./user_sup.beam. Function: get_file. Process: code_server."} {error_logger,{{2015,12,12},{0,17,9}},std_error,"File operation error: eacces. Target: ./user.beam. Function: get_file. Process: code_server."} {error_logger,{{2015,12,12},{0,17,9}},std_error,"File operation error: eacces. Target: ./kernel_config.beam. Function: get_file. Process: code_server."} {error_logger,{{2015,12,12},{0,17,9}},std_error,"File operation error: eacces. Target: ./queue.beam. Function: get_file. Process: code_server."}

=ERROR REPORT==== 12-Dec-2015::00:17:09 ===
File operation error: eacces. Target: … Function: read_file_info. Process: code_server.

=ERROR REPORT==== 12-Dec-2015::00:17:09 ===
File operation error: eacces. Target: ./beam_lib.beam. Function: get_file. Process: code_server.

…snip

If it’s a permission issue in dirs like these … the perms I see below are the same as on a chef 12 server which is not exhibiting this problem.

[root@chef01a aec1d:a ~]# ls -ld /var/opt/opscode/rabbitmq/ drwxr-x--- 5 opscode opscode 4096 Nov 24 00:31 /var/opt/opscode/rabbitmq/

[root@chef01a aec1d:a ~]# ls -l /var/opt/opscode/rabbitmq/
total 12
drwxr-x— 4 opscode opscode 4096 Dec 11 23:40 db
drwxr-x— 2 opscode opscode 4096 Nov 24 00:31 etc
drwxr-xr-x 2 opscode opscode 4096 Nov 24 00:31 log


#8

Wait … this output is calling a chef 11 server path … /opt/chef-server

WTF … pardon me while I … uh … un-WTF this. :smile:


#9

NEVERMIND! let’s all have a good laugh …

A recipe COUGH installed COUGH chef 11 server on my 12. a.k.a. you got your chocolate in my peanut butter!

We’re back up now.

[root@chef01a aec1d:a ~]# yum -y erase chef-server-11.1.6-1.el6.x86_64 [root@chef01a aec1d:a ~]# ln -s /opt/opscode/bin/chef-server-ctl /usr/bin/chef-server-ctl [root@chef01a aec1d:a ~]# chef-server-ctl reconfigure Starting Chef Client, version 12.5.0.current.0 resolving cookbooks for run list: ["private-chef::default"] ..snip Chef Server Reconfigured! 12/12 00:27[root@chef01a aec1d:a ~]# chef-server-ctl status run: bookshelf: (pid 2354) 2805s; run: log: (pid 2343) 2805s run: nginx: (pid 2356) 2805s; run: log: (pid 2344) 2805s run: oc_bifrost: (pid 2361) 2805s; run: log: (pid 2349) 2805s run: oc_id: (pid 2358) 2805s; run: log: (pid 2347) 2805s run: opscode-erchef: (pid 2357) 2805s; run: log: (pid 2346) 2805s run: opscode-expander: (pid 2363) 2805s; run: log: (pid 2351) 2805s run: opscode-expander-reindexer: (pid 2362) 2805s; run: log: (pid 2350) 2805s run: opscode-solr4: (pid 2353) 2805s; run: log: (pid 2341) 2805s run: postgresql: (pid 2367) 2805s; run: log: (pid 2345) 2805s run: rabbitmq: (pid 2364) 2805s; run: log: (pid 2352) 2805s run: redis_lb: (pid 20962) 9s; run: log: (pid 2342) 2805s