Chef-server backup best option

The two main options for backing up our chef servers seem to the 'knife ec backup' and ' chef-server-ctl backup'

Does anyone on the list have experience or opinions to share ?

As far as I can tell the Pros for knife ec are:

  • Can be run on box other than the chef server (though not recommended)

  • json object output for easier cleanup or restore to a different chef chef server version.

  • can backup a chef server with an external postgresql database server

Advantages of chef-server-ctl backup are:

  • Maybe faster in bigger environments
  • No warnings about possible future deprecation

Our needs are approximately:

  • get this going with a few days labor
  • backup (2) indipendent chef servers with a few hundred nodes each.
  • disaster recovery
  • operator error recovery.

[1] https://blog.chef.io/2017/10/16/migrating-chef-server-knife-ec-backup-knife-tidy/
[2] https://docs.chef.io/server_backup_restore.html

1 Like

We use knife ec backup. As was described to us, that provides a json dump of the databases, so that in the event you have to recreate the server, you can import that json without regard to if the server software version has changed. If
you use chef-server-ctl, that’s not the case; you must import it using the same version that backed it up.

2 Likes

Hey @omacneil!

We've used knife-ec-backup + knife tidy extensively with the guide you posted with great success. @dano is spot on with their assessment.

I would highly recommend performing a test backup and test restore using the process to verify your functionality, then look into performing it in your production-like environments.

Once you've got the hang of it, look into automating the backup and/or restore with a scheduled task/job!

1 Like

We ended up trying both 'knife ec backup' and ' chef-server-ctl backup'

knife ec backup vs chef-server-ctl backup ran in seconds vs minutes on a 4 node/client test system .

The several steps required to chef-server-ctl restore proved slow and hard to consistently run . For example in one of the steps redis failed to restart after 20 minutes of chugging on a 4 node test system.

Some gotachas with 'knife ec backup BACKUP_DESTINATION _DIRECTORY'

  1. to start , I blindly copied over my .chef directory from chef workstation to chef server including some custom plugins which required chef-vault so the backup knife command failed with 'can't load chef-vault'

  2. Our chef-server certs were self signed so we got "run 'knife ssl fetch' " errors running knife commands from chef workstation and odd failures running 'sudo chef-client' from nodes after the restore of our first backup. To fix this we added the cert directory configured in /etc/opscode/chef-server.rb to our backup.

next steps are dump from production system and restore to a VM

FWIW, a dump of a 270 node system took about 3 minutes on a 2 core 4G chef server and yielded a DESTINATION_DIRECTORY taking about 400M ( a fair amount of kruft there)