How can I backup my Chef server so that when I perform a restore I get
an exact replica of the old server?
I’m trying this on an Ubuntu Oneiric instance on EC2, running chef
0.10.4-1 from the opscode apt repository.
My current approach is this: On backup, I shut down all chef-server
services (chef-client, chef-server-webui, chef-server, chef-expander,
chef-solr, and couchdb)
in that order. I then create a backup of the directories /etc/chef,
/etc/couchdb, /var/lib/chef, /var/lib/couchdb, /var/cache/chef,
/var/log/chef, and /var/log/couchdb, before I start the services in
the reverse order. (BTW: If I don’t back up the /var/cache/chef
directory, the server becomes completely inaccessible, complaining
about unauthorized access when I try to access it.)
On restore, I shut down the services, clean out the directories above,
and restore the backuped up files before I start the services again.
This procedure almost works, the status page in the webui has an
updated ‘Last check-in’ column, etc., except I get the following
errors when running chef-client:
INFO: HTTP Request Returned 500 Internal Server Error: Connection
failed - user: chef
ERROR: Server returned error for
http://localhost:4000/nodes/chef-server.telespor.org, retrying 1/5 in
3s
…
INFO: HTTP Request Returned 500 Internal Server Error: Connection
failed - user: chef
ERROR: Server returned error for
http://localhost:4000/nodes/chef-server.telespor.org, retrying 5/5 in
56s
INFO: HTTP Request Returned 500 Internal Server Error: Connection
failed - user: chef
ERROR: Running exception handlers
FATAL: Saving node information to /var/cache/chef/failed-run-data.json
ERROR: Exception handlers complete
ERROR: Net::HTTPFatalError: 500 "Internal Server Error"
FATAL: Stacktrace dumped to /var/cache/chef/chef-stacktrace.out
ERROR: Sleeping for 1800 seconds before trying again
/var/cache/chef/chef-stacktrace.out:
Net::HTTPFatalError: 500 “Internal Server Error”
/usr/lib/ruby/1.8/net/http.rb:2105:in error!' /usr/lib/ruby/vendor_ruby/chef/rest.rb:237:in
api_request’
/usr/lib/ruby/vendor_ruby/chef/rest.rb:288:in retriable_rest_request' /usr/lib/ruby/vendor_ruby/chef/rest.rb:218:in
api_request’
/usr/lib/ruby/vendor_ruby/chef/rest.rb:130:in put_rest' /usr/lib/ruby/vendor_ruby/chef/node.rb:626:in
save’
/usr/lib/ruby/vendor_ruby/chef/client.rb:203:in save_updated_node' /usr/lib/ruby/vendor_ruby/chef/client.rb:161:in
run’
/usr/lib/ruby/vendor_ruby/chef/application/client.rb:239:in run_application' /usr/lib/ruby/vendor_ruby/chef/application/client.rb:229:in
loop’
/usr/lib/ruby/vendor_ruby/chef/application/client.rb:229:in run_application' /usr/lib/ruby/vendor_ruby/chef/application.rb:67:in
run’
/usr/bin/chef-client:25
and the following in /var/log/chef/server.log:
merb : chef-server (api) : worker (port 4000) ~ Connection failed -
user: chef - (Bunny::ProtocolError)
/usr/lib/ruby/1.8/bunny/client08.rb:196:in open_connection' /usr/lib/ruby/1.8/bunny/client08.rb:397:in
start’
/usr/lib/ruby/1.8/bunny/client08.rb:389:in loop' /usr/lib/ruby/1.8/bunny/client08.rb:389:in
start’
/usr/lib/ruby/vendor_ruby/chef/index_queue/amqp_client.rb:45:in amqp_client' /usr/lib/ruby/vendor_ruby/chef/index_queue/amqp_client.rb:72:in
queue_for_object’
/usr/lib/ruby/vendor_ruby/chef/index_queue/indexable.rb:95:in publish_object' /usr/lib/ruby/vendor_ruby/chef/index_queue/indexable.rb:74:in
add_to_index’
/usr/lib/ruby/vendor_ruby/chef/couchdb.rb:118:in store' /usr/lib/ruby/vendor_ruby/chef/node.rb:618:in
cdb_save’
/usr/share/chef-server-api/app/controllers/nodes.rb:69:in update' /usr/lib/ruby/1.8/merb-core/controller/abstract_controller.rb:315:in
send’
/usr/lib/ruby/1.8/merb-core/controller/abstract_controller.rb:315:in
_call_action' /usr/lib/ruby/1.8/merb-core/controller/abstract_controller.rb:289:in
_dispatch’
/usr/lib/ruby/1.8/merb-core/controller/merb_controller.rb:252:in _dispatch' /usr/lib/ruby/1.8/merb-core/dispatch/dispatcher.rb:102:in
dispatch_action’
/usr/lib/ruby/1.8/merb-core/dispatch/dispatcher.rb:74:in handle' /usr/lib/ruby/1.8/merb-core/dispatch/dispatcher.rb:36:in
handle’
/usr/lib/ruby/1.8/merb-core/rack/application.rb:17:in call' /usr/lib/ruby/1.8/merb-core/rack/middleware/static.rb:28:in
call’
/usr/lib/ruby/1.8/rack/content_length.rb:13:in call' /usr/lib/ruby/1.8/thin/connection.rb:76:in
pre_process’
/usr/lib/ruby/1.8/thin/connection.rb:74:in catch' /usr/lib/ruby/1.8/thin/connection.rb:74:in
pre_process’
/usr/lib/ruby/1.8/thin/connection.rb:57:in process' /usr/lib/ruby/1.8/thin/connection.rb:42:in
receive_data’
/usr/lib/ruby/1.8/eventmachine.rb:257:in run_machine' /usr/lib/ruby/1.8/eventmachine.rb:257:in
run’
/usr/lib/ruby/1.8/thin/backends/base.rb:57:in start' /usr/lib/ruby/1.8/thin/server.rb:156:in
start’
/usr/lib/ruby/1.8/merb-core/rack/adapter/thin.rb:30:in start_server' /usr/lib/ruby/1.8/merb-core/rack/adapter/abstract.rb:298:in
start_at_port’
/usr/lib/ruby/1.8/merb-core/rack/adapter/abstract.rb:128:in start' /usr/lib/ruby/1.8/merb-core/server.rb:174:in
bootup’
/usr/lib/ruby/1.8/merb-core/server.rb:159:in daemonize' /usr/lib/ruby/1.8/merb-core/server.rb:143:in
fork’
/usr/lib/ruby/1.8/merb-core/server.rb:143:in daemonize' /usr/lib/ruby/1.8/merb-core/server.rb:35:in
start’
/usr/lib/ruby/1.8/merb-core.rb:170:in `start’
/usr/sbin/chef-server:86
I would really like to resolve this error, as I don’t want to base my
entire infrastructure on a piece of software I don’t trust the backups
of.
I also realize that I could just provision a completely new
chef-server, push my chef repository (including environments, roles,
etc.), but this would require manually to replace every client’s
/etc/validation.pem, and mapping nodes to roles (this last step may be
unnecessary with the chef_server_backup.rb script). I really want to
be able to provision a new chef-server without having to touch the
clients.
If there is a way to accomplish this, please update the ‘Backing Up
Chef Server’ page on the wiki, as the methods mentioned there really
doesn’t work very well.
Magne Rasmussen