Chef-server 11 restore from backup problems

My original plan was to upgrade from chef-server 11.0.10 to 11.1.3. To
that end I backed up our chef-server 11.0.10 using the backup script at
https://github.com/sc0rp1us/cehf-useful-scripts/blob/master/server-side/chef-backup.sh.
I then tried to use “chef-server-ctl upgrade” to do the upgrade, but I
got the known errors - issue 1595. I then decided to uninstall the older
version and restore from backup after installing the newer version. This
did not go well, with search in the webui not working and errors in the
logs. So at that point I decided to uninstall the newer version and go
back to 11.0.10, restore from backup and get back to where I started.
But, that did not work either. I saw the same problems as with the 11.1.
install.

But, where I know I went wrong was to restore the backup on a test VM
before I did any of this. That test seemed to work fine, but I did not
check out the test Chef server as well as I should have. Anyway, that
biggest problem is that the /etc/chef-server config files and the
certificates included the FQDN of our operational chef server machine.
So it is pretty clear that the parts of the server on the test VM
thought they were parts of the operational server. And, I think parts of
the test server were connecting to parts of the operational server - it
was running as I was running the test server.

Since I found this out I have destroyed the test VM and then reinstalled
chef server on the operational machine and restored from backup. But it
still does not work properly and there are still errors in the log. Note
that it all works fine after just the reinstall. It is only after I
restore from backup that I see the problems.

Specifically, this is what I see from “knife bootstrap” when I try to
recreate a node. I did delete the node an its client before I ran this.

localhost Starting Chef Client, version 11.16.4
localhost Creating a new client identity for dataapiTest using the
validator key.
localhost [2015-06-01T17:42:51+00:00] ERROR: Server
retchef@lists.opscode.com mailto:chef@lists.opscode.comurned error 500
for https://csprogrammer3.cira.colostate.edu/clients, retrying 1/5 in 3s
localhost [2015-06-01T17:43:00+00:00] ERROR: Server returned error 500
for https://csprogrammer3.cira.colostate.edu/clients, retrying 2/5 in 6s
localhost [2015-06-01T17:43:11+00:00] ERROR: Server returned error 500
for https://csprogrammer3.cira.colostate.edu/clients, retrying 3/5 in 15s
localhost [2015-06-01T17:43:32+00:00] ERROR: Server returned error 500
for https://csprogrammer3.cira.colostate.edu/clients, retrying 4/5 in 32s
localhost [2015-06-01T17:44:10+00:00] ERROR: Server returned error 500
for https://csprogrammer3.cira.colostate.edu/clients, retrying 5/5 in 42s
localhost [2015-06-01T17:44:53+00:00] WARN: Failed to register new
client, 4 tries remaining
localhost [2015-06-01T17:44:53+00:00] WARN: Response: HTTP 502 - 502
"Bad Gateway"
localhost [2015-06-01T17:44:53+00:00] ERROR: Server returned error 502
for https://csprogrammer3.cira.colostate.edu/clients, retrying 1/5 in 3s
localhost [2015-06-01T17:44:56+00:00] ERROR: Server returned error 502
for https://csprogrammer3.cira.colostate.edu/clients, retrying 2/5 in 8s
localhost [2015-06-01T17:45:10+00:00] ERROR: Server returned error 500
for https://csprogrammer3.cira.colostate.edu/clients, retrying 3/5 in 11s
localhost [2015-06-01T17:45:26+00:00] ERROR: Server returned error 500
for https://csprogrammer3.cira.colostate.edu/clients, retrying 4/5 in 32s
localhost [2015-06-01T17:46:01+00:00] ERROR: Server returned error 502
for https://csprogrammer3.cira.colostate.edu/clients, retrying 5/5 in 37s
localhost [2015-06-01T17:46:38+00:00] WARN: Failed to register new
client, 3 tries remaining
localhost [2015-06-01T17:46:38+00:00] WARN: Response: HTTP 502 - 502
"Bad Gateway"

And it was the same for the rest of the tries.

Attached are the logs for that time for the server processes that gave
errors. Note that the erchef log shows errors like this all the time -
even before the restore.

Any help would be much appreciated.

Thanks,
Jim

Here is some new information. I tried re-installing 11.0.10 and
restoring from backup again. Attached are the sections of the logs for
the time of the restore. Some of them show errors and others do not.

And a correction - the erlang log does not show any errors until the
restore.

Thanks,
Jim

On 06/01/2015 12:28 PM, Jim Fluke wrote:

My original plan was to upgrade from chef-server 11.0.10 to 11.1.3. To
that end I backed up our chef-server 11.0.10 using the backup script
at
https://github.com/sc0rp1us/cehf-useful-scripts/blob/master/server-side/chef-backup.sh.
I then tried to use "chef-server-ctl upgrade" to do the upgrade, but I
got the known errors - issue 1595. I then decided to uninstall the
older version and restore from backup after installing the newer
version. This did not go well, with search in the webui not working
and errors in the logs. So at that point I decided to uninstall the
newer version and go back to 11.0.10, restore from backup and get back
to where I started. But, that did not work either. I saw the same
problems as with the 11.1. install.

But, where I know I went wrong was to restore the backup on a test VM
before I did any of this. That test seemed to work fine, but I did not
check out the test Chef server as well as I should have. Anyway, that
biggest problem is that the /etc/chef-server config files and the
certificates included the FQDN of our operational chef server machine.
So it is pretty clear that the parts of the server on the test VM
thought they were parts of the operational server. And, I think parts
of the test server were connecting to parts of the operational server

  • it was running as I was running the test server.

Since I found this out I have destroyed the test VM and then
reinstalled chef server on the operational machine and restored from
backup. But it still does not work properly and there are still errors
in the log. Note that it all works fine after just the reinstall. It
is only after I restore from backup that I see the problems.

Specifically, this is what I see from "knife bootstrap" when I try to
recreate a node. I did delete the node an its client before I ran this.

localhost Starting Chef Client, version 11.16.4
localhost Creating a new client identity for dataapiTest using the
validator key.
localhost [2015-06-01T17:42:51+00:00] ERROR: Server
retchef@lists.opscode.com mailto:chef@lists.opscode.comurned error
500 for https://csprogrammer3.cira.colostate.edu/clients, retrying 1/5
in 3s
localhost [2015-06-01T17:43:00+00:00] ERROR: Server returned error 500
for https://csprogrammer3.cira.colostate.edu/clients, retrying 2/5 in 6s
localhost [2015-06-01T17:43:11+00:00] ERROR: Server returned error 500
for https://csprogrammer3.cira.colostate.edu/clients, retrying 3/5 in 15s
localhost [2015-06-01T17:43:32+00:00] ERROR: Server returned error 500
for https://csprogrammer3.cira.colostate.edu/clients, retrying 4/5 in 32s
localhost [2015-06-01T17:44:10+00:00] ERROR: Server returned error 500
for https://csprogrammer3.cira.colostate.edu/clients, retrying 5/5 in 42s
localhost [2015-06-01T17:44:53+00:00] WARN: Failed to register new
client, 4 tries remaining
localhost [2015-06-01T17:44:53+00:00] WARN: Response: HTTP 502 - 502
"Bad Gateway"
localhost [2015-06-01T17:44:53+00:00] ERROR: Server returned error 502
for https://csprogrammer3.cira.colostate.edu/clients, retrying 1/5 in 3s
localhost [2015-06-01T17:44:56+00:00] ERROR: Server returned error 502
for https://csprogrammer3.cira.colostate.edu/clients, retrying 2/5 in 8s
localhost [2015-06-01T17:45:10+00:00] ERROR: Server returned error 500
for https://csprogrammer3.cira.colostate.edu/clients, retrying 3/5 in 11s
localhost [2015-06-01T17:45:26+00:00] ERROR: Server returned error 500
for https://csprogrammer3.cira.colostate.edu/clients, retrying 4/5 in 32s
localhost [2015-06-01T17:46:01+00:00] ERROR: Server returned error 502
for https://csprogrammer3.cira.colostate.edu/clients, retrying 5/5 in 37s
localhost [2015-06-01T17:46:38+00:00] WARN: Failed to register new
client, 3 tries remaining
localhost [2015-06-01T17:46:38+00:00] WARN: Response: HTTP 502 - 502
"Bad Gateway"

And it was the same for the rest of the tries.

Attached are the logs for that time for the server processes that gave
errors. Note that the erchef log shows errors like this all the time -
even before the restore.

Any help would be much appreciated.

Thanks,
Jim