I am working through a basic script to backup and restore a habitat depot. The backup portion seems to be working (most of my code is roughly based on this GH issue thread: https://github.com/habitat-sh/on-prem-builder/issues/60).
However, when I go to restore our production habitat depot on a new habitat depot instance (clean-slate) using the archive(s) created from the above GH issue, everything restores correctly (postgres tables, minio datastore, etc.), but when I restart services, the packages from the production habitat depot are not present on the newly created and restored dev instance. The files are on disk, but when I log in to the habitat UI and search for my packages, they don’t come up.
It feels like I need to trigger some other type of reindex or refresh of the backend metadata, so the UI can “catch up”.
Has anyone else run into this, or have pointers about where I should start looking?
Hi Kyle - it is possible that you will need to run the shard migration script on your new instance, as detailed in the on-prem README - https://github.com/habitat-sh/on-prem-builder
It is likely that when you created the new instance, you have installed the latest versions of the builder services, and they require the shard migration. You may also need to do a Minio migration, as the package files are now in Minio, and not on the file system. Again, the migration information is in the README.
You can check the Builder services versions by doing a
sudo hab svc status, and compare the versions between your old instance and your new instance.
Please post the service versions, and any output from
journalctl -fu hab-sup if the above does not work. Thanks!
All of the versions are the same on prod vs dev, with only a minor version change for builder-sessionsrv & builder-originsrv (prod = v7519, dev= v7582).
package type desired state elapsed (s) pid group
habitat/builder-sessionsrv/7519/20180731190110 standalone up up 1646944 6050 builder-sessionsrv.default
habitat/builder-router/7519/20180731190111 standalone up up 1646945 5981 builder-router.default
habitat/builder-originsrv/7519/20180731190110 standalone up up 1646943 6248 builder-originsrv.default
habitat/builder-minio/0.1.0/20180612201128 standalone up up 96044 7208 builder-minio.default
habitat/builder-datastore/7311/20180426183913 standalone up up 1646945 5993 builder-datastore.default
habitat/builder-api-proxy/7519/20180731190110 standalone up up 1646944 6210 builder-api-proxy.default
habitat/builder-api/7554/20180808175204 standalone up up 1646944 6111 builder-api.default
package type desired state elapsed (s) pid group
habitat/builder-sessionsrv/7582/20180822212645 standalone up up 378 19003 builder-sessionsrv.default
habitat/builder-router/7519/20180731190111 standalone up up 87833 9924 builder-router.default
habitat/builder-originsrv/7582/20180822212645 standalone up up 382 18903 builder-originsrv.default
habitat/builder-minio/0.1.0/20180612201128 standalone up up 527 18740 builder-minio.default
habitat/builder-datastore/7311/20180426183913 standalone up up 461 18799 builder-datastore.default
habitat/builder-api-proxy/7519/20180731190110 standalone up up 87813 10061 builder-api-proxy.default
habitat/builder-api/7554/20180808175204 standalone up up 87821 9945 builder-api.default
Attaching a log from
journalctl -fu hab-sup. I stopped all services, started the journalctl command, then started all habitat services. Once they were done (according to the hab-sup.log), I refreshed the builder UI (which asked me to authenticate again, which I did successfully). Once I was in, I searched for one of my packages, and it did not show up.
EDIT: Can’t figure out how to attach files, should I send the file somewhere?
Also, I did try running through the steps on https://github.com/habitat-sh/on-prem-builder#migration-1, but after I do the
./install.sh, I don’t see a
./scripts/migrate.sh file, so I’m not sure if I should continue?
weird - How did you download the on-prem-depot repo?
We cloned it from the main GH repo, and are using our cloned repo to deploy.
oh I see the issue - the script is called
Instructions are here https://github.com/habitat-sh/on-prem-builder#merging-database-shards but the script name is incorrect in the docs. Fixing that now
Note that our repo doesn’t have a copy of that
merge-shards.sh script, but I pulled a copy down from the current
on-prem-depot repo, and attempted to run it anyways.
drwxrwxr-x. 2 centos centos 111 Sep 24 20:10 .
drwxrwxr-x. 3 centos centos 119 Sep 24 20:10 ..
-rwxr-xr-x. 1 centos centos 779 Sep 7 19:43 hab-sup.service.sh
-rwxr-xr-x. 1 centos centos 315 Sep 7 19:43 install-hab.sh
-rw-r--r--. 1 centos centos 9755 Sep 7 19:43 on-prem-archive.sh
-rwxr-xr-x. 1 centos centos 7408 Sep 7 19:43 provision.sh
I got further this time, but it error’d out when I attempted to run the script on our DEV instance (this being the instance we restored the pg backup to):
# PGPASSWORD=$(sudo cat /hab/svc/builder-datastore/config/pwfile) ./scripts/merge-shards.sh originsrv migrate
[ ... snip ... ]
current schema = shard_30
Count for shard_30.origins = 1
ERROR: duplicate key value violates unique constraint "origins_name_key"
DETAIL: Key (name)=(core) already exists.
Hi Kyle - in general, it is important to not grab things piecemeal from the repo - there may be config in provision or other changes that might be impacting the migration and causing issues. The recommended path is to pull down the full repo (either directly, or by pulling all changes into your own repo), then do an uninstall (it is not destructive), and then do the install.
That said, we will look further at scoping down the root cause of the error you are seeing during migrate.
As luck would have it, the two releases in question straddle a database migration that we did in the middle of August to merge all of the database schemas into the public schema.
7519 expects there to be 128 shards in the database and
7582 expects to see that data migrated in a very specific way.
The error you’re getting suggests that data has already been inserted into the
origins table of the
public schema before
migrate-shards.sh is ever run, which shouldn’t happen until after the migration has run. If you open
psql on your new dev instance, connect to the
builder_originsrv database and run
SELECT * FROM origins; I’m guessing you’ll see a record for the
When you do your restore, it’s important that the
merge-shards.sh script be run after the services have been started up (to make sure the migrations have been run), but before any clients connect to the database and start making requests. It’s difficult to know exactly what’s happening here without seeing the code for your backup/restore process.
It’s also worth noting that you’re likely seeing this problem specifically because of the two different versions of the builder services that you’re running. If you were backing up and restoring the same versions of the Builder services, I don’t think you’d have this problem.
@salam, I understand that’s not the recommended approach (and it’s not necessarily how I would have approached it either), but I am not the owner of our repo, and so I am trying to figure out why the backup and restore wasn’t working.
Based on what @raskchanky said, I think that’s probably what is going on. I’ll look at getting our repo either to the same version as our PROD instance, or wiping prod and dev, and starting over so they are both consistent, having the same starting point.
Thank you both for your feedback and direction, it’s been greatly appreciated!
Cool, let us know how things go!