Cluster is broken after attempt to add second backend


#1

Hi guys,
I was trying to add new backend node

It showed an error during installation related to elasticsearch.
So now if I run chef-backend-ctl status
An unexpected error occurred:
LibCB::Status::ServiceNotFound

I was trying to remove broken node from the main backend, but it fails :confused:

chef-backend-ctl remove-node 192.168.9.240 --verbose

Could not find node ‘192.168.9.240’ in /cb/status
I inferred the existence of this node: 192.168.9.240 / 947ba20bd49d089f98b9ba976b9820c7
Would you like me to try to remove this node?

Are you sure you wish to proceed? Type ‘proceed’ to continue, anything else to cancel.
proceed
Name ID IP Last Checkin Time
947ba20bd49d089f98b9ba976b9820c7 192.168.9.240
DEBUG: Attempting postgresql connection: host=172.21.9.236 port=5432 user=chef_pgsql dbname=template1 password=

Continue with removing this node from the cluster?

Are you sure you wish to proceed? Type ‘proceed’ to continue, anything else to cancel.
proceed
An unexpected error occurred:
execution expired
/opt/chef-backend/embedded/lib/ruby/2.3.0/net/protocol.rb:158:in wait_readable' /opt/chef-backend/embedded/lib/ruby/2.3.0/net/protocol.rb:158:inrbuf_fill’
/opt/chef-backend/embedded/lib/ruby/2.3.0/net/protocol.rb:136:in readuntil' /opt/chef-backend/embedded/lib/ruby/2.3.0/net/protocol.rb:146:inreadline’
/opt/chef-backend/embedded/lib/ruby/2.3.0/net/http/response.rb:40:in read_status_line' /opt/chef-backend/embedded/lib/ruby/2.3.0/net/http/response.rb:29:inread_new’
/opt/chef-backend/embedded/lib/ruby/2.3.0/net/http.rb:1437:in block in transport_request' /opt/chef-backend/embedded/lib/ruby/2.3.0/net/http.rb:1434:incatch’
/opt/chef-backend/embedded/lib/ruby/2.3.0/net/http.rb:1434:in transport_request' /opt/chef-backend/embedded/lib/ruby/2.3.0/net/http.rb:1407:inrequest’
/opt/chef-backend/embedded/lib/ruby/2.3.0/net/http.rb:1400:in block in request' /opt/chef-backend/embedded/lib/ruby/2.3.0/net/http.rb:853:instart’
/opt/chef-backend/embedded/lib/ruby/2.3.0/net/http.rb:1398:in request' /opt/chef-backend/embedded/lib/ruby/gems/2.3.0/gems/etcd-0.3.0/lib/etcd/client.rb:111:inapi_execute’
/opt/chef-backend/embedded/lib/ruby/gems/2.3.0/gems/libcb-0.1.0/lib/libcb/connectors/etcd_connector.rb:364:in block (2 levels) in request' /opt/chef-backend/embedded/lib/ruby/2.3.0/timeout.rb:106:intimeout’
/opt/chef-backend/embedded/lib/ruby/gems/2.3.0/gems/libcb-0.1.0/lib/libcb/connectors/etcd_connector.rb:363:in block in request' /opt/chef-backend/embedded/lib/ruby/gems/2.3.0/gems/libcb-0.1.0/lib/libcb/connectors/etcd_connector.rb:284:inblock in with_connection’
/opt/chef-backend/embedded/lib/ruby/2.3.0/timeout.rb:74:in timeout' /opt/chef-backend/embedded/lib/ruby/gems/2.3.0/gems/libcb-0.1.0/lib/libcb/connectors/etcd_connector.rb:282:inwith_connection’
/opt/chef-backend/embedded/lib/ruby/gems/2.3.0/gems/libcb-0.1.0/lib/libcb/connectors/etcd_connector.rb:357:in request' /opt/chef-backend/embedded/lib/ruby/gems/2.3.0/gems/libcb-0.1.0/lib/libcb/connectors/etcd_connector.rb:335:indelete’
/opt/chef-backend/embedded/lib/ruby/gems/2.3.0/gems/libcb-0.1.0/lib/libcb/connectors/etcd_connector.rb:138:in remove_member' /opt/chef-backend/embedded/lib/ruby/gems/2.3.0/gems/libcb-0.1.0/lib/libcb/command/remove-node.rb:120:inremove_node_from_etcd’
/opt/chef-backend/embedded/lib/ruby/gems/2.3.0/gems/libcb-0.1.0/lib/libcb/command/remove-node.rb:111:in do_remove_node' /opt/chef-backend/embedded/lib/ruby/gems/2.3.0/gems/libcb-0.1.0/lib/libcb/command/remove-node.rb:49:inrun’
/opt/chef-backend/embedded/lib/ruby/gems/2.3.0/gems/libcb-0.1.0/lib/libcb/ctl.rb:349:in run_with_pretty_errors' /opt/chef-backend/embedded/lib/ruby/gems/2.3.0/gems/libcb-0.1.0/lib/libcb/ctl.rb:330:inrun_subcommand’
/opt/chef-backend/embedded/lib/ruby/gems/2.3.0/gems/libcb-0.1.0/lib/libcb/ctl.rb:263:in run' /opt/chef-backend/embedded/lib/ruby/gems/2.3.0/gems/libcb-0.1.0/bin/chef-backend-ctl:15:in<top (required)>’
/bin/chef-backend-ctl:30:in load' /bin/chef-backend-ctl:30:in’

Even the first node in a failed state now :frowning:

leaderl running (pid 24347) 0d 0h 9m 32s leader: 0; waiting: 0; follower: 0; unknown: 1; total: 1
epmd running (pid 24408) 0d 0h 9m 32s status: local-only
etcd running (pid 24412) 0d 0h 9m 31s health: red; healthy nodes: 0/2
postgresql running (pid 24428) 0d 0h 9m 31s leader: 1; offline: 1; syncing: 0; synced: 0
elasticsearch running (pid 24445) 0d 0h 9m 30s state: yellow; nodes online: 1/2
Any thoughts?