Service Group Encryption Issue

I am trying to use Service Group Encryption to encrypt my Vault cluster so that only new nodes or containers that have a key (public file for a generated key) can join the service group. I have created 4 containers. In the first container I did the following :

  1. hab svc key generate vault.default vault_org
  2. Made sure the keys are generated in /hab/cache/keys. theres a box file and a public key file
  3. started the supervisor using
    sudo -E hab sup run --org vault_org --group vault.default --bind backend:consul.default --peer leonardo --topology leader --strategy rolling --channel stable > /home/kitchen/nohup.out & echo $! > /tmp/run.pid

Now the public key files are not present in other containers. And in all the other containers I started the supervisor with the same command:
sudo -E hab sup run --org vault_org --group vault.default --bind backend:consul.default --peer leonardo --topology leader --strategy rolling --channel stable > /home/kitchen/nohup.out & echo $! > /tmp/run.pid

I already had the tail of nohups running in different windows. Initially the log said there are no other active members even though there are binds found in the census. But after a few seconds the hooks compiled and the vault came up sealed. I unsealed them manually so that my hooks start to run and the hooks ran once I unsealed and the cluster came up fine.

The expected behavior was that all the nodes that did not have the public key were not supposed to join the cluster but the resulted behavior was they joined the cluster without the key which was what this service group encryption was supposed to do.

It also ran election and chose a leader as I have attached in the picture above.
To make sure about this. I wrote a secret in the first container before the above stuff and after the above stuff I could read the secret from every other containers.

am I doing wrong? If so what is it? or is this a bug?

Thanks,
Kiran Marla

I haven’t yet experimented personally with the service group encryption feature, but based on the documentation, this requires “key-based authorization prior to allowing configuration changes”. Are you able to execute hab config apply commands within this service group without putting the user’s keys in the supervisors’ /hab/cache/keys directory?

It looks like we could use some more documentation about the intended way to use this feature. I’ll do some investigation and see what I can find.

Yes, I am able to do hab config apply with this service group without user’s keys /hab/cache/keys.

OK, I’m doing more investigation about what the actual state of this feature is. I’ll get back to you, but it may not be until tomorrow.

ok Thanks for being quick in responding.

I’m still digging through the code to see what’s up here. Can you include the output you see when you do the hab config apply? If things go right, I’d expect a status line that starts with ☛ Encrypting to be generated by this code.

When I do a hab config apply, this is what I get

hab-sup(MR): Setting new configuration version 1535052205 for vault.default
vault.default@vault_org(SR): The group 'consul.default' satisfies the `backend` bind
vault.default@vault_org(HK): Hooks compiled
hab-sup(MR): Setting new configuration version 1535052289 for vault.default
vault.default@vault_org(SR): The group 'consul.default' satisfies the `backend` bind
vault.default@vault_org(HK): Hooks compiled

which looks fine

More Observations:
I have also observed that sometimes killing the supervisor and restarting it leads to

vault.default@vault_org(SR): Waiting for service binds...
vault.default@vault_org(SR): The specified service group 'consul.default' for binding 'backend' is present in the census, but currently has no active members.
vault.default@vault_org(SR): Waiting for service binds...

Also when this error message comes in I kill the supervisor again and start the supervisor again which sometimes either leads to the same above message or the node just joins the service group and proceeds with hooks.

Also I have noticed that sometimes even though the key is incorrect it joins the cluster where as sometimes it fails

Also the node joins the cluster and fails to join the cluster with the correct key

And the node sometimes joins the cluster or gives the above message even if the key is not present in the node

What does your hab config apply command look like? As I go through the code it looks like part of the issue may be that we haven’t properly documented how the <SERVICE_GROUP> argument should be formatted. Based on this code, it needs to contain an org. Are you calling it like hab config apply vault.default@vault_org …?

I’m continuing to investigate as there may be other issues further down the line.

The other main issue I see is that either HAB_USER must be set in the environment or the --user param must be passed to the hab config apply command. If the org parsing is working but not the user key, you should get an error like:

✗✗✗
✗✗✗ Crypto error: No revisions found for jbauman box key
✗✗✗

I was able to locally reproduce this. It should look more or less like:

➤ hab config apply -u chris foo.default@foo_org (date +%s)
☛ Encrypting TOML as chris-20180823211349 for foo.default@foo_org-20180823171247
» Setting new configuration version 1535059000 for foo.default@foo_org
Ω Creating service configuration
↑ Applying via peer 127.0.0.1:9632
★ Applied configuration

I've opened some PRs to hopefully improve the situation for the future

Here are my keys

ito002664:~ # ls /hab/cache/keys/
aos-20180606172220.pub                        coreapps-20180612154548.pub
chef-20160614114050.pub                       monitoring-20180724181407.pub
consul.dev2@vault_org-20180824180611.box.key  sacm-20180808172318.pub
consul.dev2@vault_org-20180824180611.pub      vault-20180712190018.pub
core-20160810182414.pub                       vault.dev2@vault_org-20180824181015.box.key
core-20180119235000.pub                       vault.dev2@vault_org-20180824181015.pub

the way I did hab config apply was hab config apply vault.dev2 $(date +%s) /hab/pkgs/vault/vault/0.9.5/20180803013037/default.toml. So I never passed the user parameter. Now that I know about the user parameter I have passed it but it does not respond at all. He just lies there stuck

ito002664:~ # hab config apply -u vault.dev2 vault.dev2@vault_org $(date +%s)
^C
ito002664:~ # hab config apply -u vault vault.dev2@vault_org $(date +%s)

^C
ito002664:~ #
ito002664:~ # hab config apply -u vault.dev2@vault_org-20180824181015.pub vault.dev2@vault_org $(date +%s)
^C
ito002664:~ # hab config apply -u vault.dev2@vault_org-20180824181015.box.key vault.dev2@vault_org $(date +%s)

^C
ito002664:~ 
```

But the census still shows a new service group with all the nodes.

I dint pass the user or u parameter before but now that I passed it I see that it is getting stuck

ito002664:~ # hab config apply -u vault.dev2@vault_org-20180824181015.pub vault.dev2@vault_org $(date +%s)
^C
ito002664:~ # hab config apply -u vault.dev2@vault_org-20180824181015.box.key vault.dev2@vault_org $(date +%s)

^C
ito002664:~ # hab config apply -u vault.dev2 vault.dev2@vault_org $(date +%s)
^C
ito002664:~ # hab config apply -u vault vault.dev2@vault_org $(date +%s)

^C
ito002664:~ #h

It may appear to be getting stuck, but I think it’s waiting for input on stdin since no file was provided:

USAGE:
    hab config apply [OPTIONS] <SERVICE_GROUP> <VERSION_NUMBER> [FILE]
…
    <FILE>              Path to local file on disk (ex: /tmp/config.toml, default: <stdin>)

I now ran

ito002664:~ # hab config apply -u umarla vault.dev2@vault_org $(date +%s) /hab/pkgs/vault/vault/0.9.5/20180803013037/default.toml
☛ Encrypting TOML as umarla-20180824183707 for vault.dev2@vault_org-20180824181015
» Setting new configuration version 1535135955 for vault.dev2@vault_org
Ω Creating service configuration
↑ Applying via peer 127.0.0.1:9632
★ Applied configuration![43%20PM|690x304]

Im confirming if other nodes can join this svc group

I could encrypt the leader node with a newly generated user key. So now there are two keys one is a svc key and another is a user key. I used the svc key to start the supervisor in the leader and then ran a hab config apply with the user key. Now in other nodes I do NOT have the user key but the svc pub key is present. I started the supervisor and the hooks ran fine. Also when I open the census page I see everything fine. So my questions are

  1. what does hab config apply -u user servicemame (date) encrypt?
  2. Why are the supervisors not only joining the service group fine when there is no user key present but also running the hooks? meaning why are they even able to decrypt the config.toml and retrieve data and run hooks?
  3. What is the svc key encrypting??

Also I just removed the svc keys from other nodes and reran the supervisor and loaded the package. They loaded fine and ran the hooks.

The contents of the configuration update (either from a file or stdin). You can see it's the buf referenced here.

What makes you think they shouldn't be able to join the service group? I don't see anything in the docs about service group encryption that governs what supervisors can join the group. Perhaps you want something like wire encryption. What are your needs?

Both the user key and the service key are encrypting the config buffer such that it can only be decrypted by the holder of the service pair private key. See the BoxKeyPair::encrypt that the previously linked code is calling and more detail in the underlying crypto library:

/// The `seal()` function encrypts a message `m` for a recipient whose public key
/// is `pk`. It returns the ciphertext whose length is `SEALBYTES + m.len()`.
///
/// The function creates a new key pair for each message, and attaches the public
/// key to the ciphertext. The secret key is overwritten and is not accessible
/// after this function returns.

The wire encryption as per the documentation only encrypts the traffic between the supervisor.
My needs are:

  1. I dont want any supervisor created to join the service group without any authentication. This puts my Vault cluster at risk as anyone can create a node, deploy the vault package , join the cluster and obtain access
  2. even if they are able to join it would be nice to see that they can not retrieve any data unless they have a key or authenticate.

I was of the opinion that the service group key encryption solves this problem. If it does not then is there a way I can accomplish the above? If yes then how?

2 Likes

I followed the new steps added in the documentation. The 4th step in Generating a Ring Key in Wire Encryption says

The Supervisor becomes part of the named ring <RING> and uses the key for network encryption. Other supervisors that now attempt to connect to it without presenting the correct ring key will be rejected.

I have created a vm. This vm is going to be the first vm that other vms will peer to. I installed the consul package and did this.

ito002380:~ # hab ring key generate consulring
» Generating ring key for consulring
★ Generated ring key pair consulring-20180830135404.
ito002380:~ #          nohup sudo -E hab sup run --ring consulring --topology leader --strategy rolling --channel stable --url http://{realfqdn}.fhc.ford.com/ &
[1] 20989
ito002380:~ # nohup: ignoring input and appending output to 'nohup.out'

ito002380:~ # tail -f nohup.out
hab-sup(MR): Supervisor Member-ID 652e19b11715438aa1cda8447be57efc
hab-sup(MR): Starting gossip-listener on 0.0.0.0:9638
hab-sup(MR): Starting ctl-gateway on 127.0.0.1:9632
hab-sup(MR): Starting http-gateway on 0.0.0.0:9631
ERROR 2018-08-30T13:56:00Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:01Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:01Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:03Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:04Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:04Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:07Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:07Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:07Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:10Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:10Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:10Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))

what does the "Secret key and nonce could not decrypt ciphertext" mean? What am I doing wrong?

I am running hab 0.59.0/20180712155441