I am trying to use Service Group Encryption to encrypt my Vault cluster so that only new nodes or containers that have a key (public file for a generated key) can join the service group. I have created 4 containers. In the first container I did the following :
hab svc key generate vault.default vault_org
Made sure the keys are generated in /hab/cache/keys. theres a box file and a public key file
started the supervisor using
sudo -E hab sup run --org vault_org --group vault.default --bind backend:consul.default --peer leonardo --topology leader --strategy rolling --channel stable > /home/kitchen/nohup.out & echo $! > /tmp/run.pid
Now the public key files are not present in other containers. And in all the other containers I started the supervisor with the same command:
sudo -E hab sup run --org vault_org --group vault.default --bind backend:consul.default --peer leonardo --topology leader --strategy rolling --channel stable > /home/kitchen/nohup.out & echo $! > /tmp/run.pid
I already had the tail of nohups running in different windows. Initially the log said there are no other active members even though there are binds found in the census. But after a few seconds the hooks compiled and the vault came up sealed. I unsealed them manually so that my hooks start to run and the hooks ran once I unsealed and the cluster came up fine.
The expected behavior was that all the nodes that did not have the public key were not supposed to join the cluster but the resulted behavior was they joined the cluster without the key which was what this service group encryption was supposed to do.
It also ran election and chose a leader as I have attached in the picture above.
To make sure about this. I wrote a secret in the first container before the above stuff and after the above stuff I could read the secret from every other containers.
am I doing wrong? If so what is it? or is this a bug?
I haven’t yet experimented personally with the service group encryption feature, but based on the documentation, this requires “key-based authorization prior to allowing configuration changes”. Are you able to execute hab config apply commands within this service group without putting the user’s keys in the supervisors’ /hab/cache/keys directory?
It looks like we could use some more documentation about the intended way to use this feature. I’ll do some investigation and see what I can find.
I’m still digging through the code to see what’s up here. Can you include the output you see when you do the hab config apply? If things go right, I’d expect a status line that starts with ☛ Encrypting to be generated by this code.
hab-sup(MR): Setting new configuration version 1535052205 for vault.default
vault.default@vault_org(SR): The group 'consul.default' satisfies the `backend` bind
vault.default@vault_org(HK): Hooks compiled
hab-sup(MR): Setting new configuration version 1535052289 for vault.default
vault.default@vault_org(SR): The group 'consul.default' satisfies the `backend` bind
vault.default@vault_org(HK): Hooks compiled
More Observations:
I have also observed that sometimes killing the supervisor and restarting it leads to
vault.default@vault_org(SR): Waiting for service binds...
vault.default@vault_org(SR): The specified service group 'consul.default' for binding 'backend' is present in the census, but currently has no active members.
vault.default@vault_org(SR): Waiting for service binds...
Also when this error message comes in I kill the supervisor again and start the supervisor again which sometimes either leads to the same above message or the node just joins the service group and proceeds with hooks.
Also I have noticed that sometimes even though the key is incorrect it joins the cluster where as sometimes it fails
Also the node joins the cluster and fails to join the cluster with the correct key
And the node sometimes joins the cluster or gives the above message even if the key is not present in the node
What does your hab config apply command look like? As I go through the code it looks like part of the issue may be that we haven’t properly documented how the <SERVICE_GROUP> argument should be formatted. Based on this code, it needs to contain an org. Are you calling it like hab config apply vault.default@vault_org …?
I’m continuing to investigate as there may be other issues further down the line.
The other main issue I see is that either HAB_USER must be set in the environment or the --user param must be passed to the hab config apply command. If the org parsing is working but not the user key, you should get an error like:
✗✗✗
✗✗✗ Crypto error: No revisions found for jbauman box key
✗✗✗
I was able to locally reproduce this. It should look more or less like:
➤ hab config apply -u chris foo.default@foo_org (date +%s)
☛ Encrypting TOML as chris-20180823211349 for foo.default@foo_org-20180823171247
» Setting new configuration version 1535059000 for foo.default@foo_org
Ω Creating service configuration
↑ Applying via peer 127.0.0.1:9632
★ Applied configuration
the way I did hab config apply was hab config apply vault.dev2 $(date +%s) /hab/pkgs/vault/vault/0.9.5/20180803013037/default.toml. So I never passed the user parameter. Now that I know about the user parameter I have passed it but it does not respond at all. He just lies there stuck
ito002664:~ # hab config apply -u vault.dev2 vault.dev2@vault_org $(date +%s)
^C
ito002664:~ # hab config apply -u vault vault.dev2@vault_org $(date +%s)
^C
ito002664:~ #
ito002664:~ # hab config apply -u vault.dev2@vault_org-20180824181015.pub vault.dev2@vault_org $(date +%s)
^C
ito002664:~ # hab config apply -u vault.dev2@vault_org-20180824181015.box.key vault.dev2@vault_org $(date +%s)
^C
ito002664:~
```
But the census still shows a new service group with all the nodes.
I could encrypt the leader node with a newly generated user key. So now there are two keys one is a svc key and another is a user key. I used the svc key to start the supervisor in the leader and then ran a hab config apply with the user key. Now in other nodes I do NOT have the user key but the svc pub key is present. I started the supervisor and the hooks ran fine. Also when I open the census page I see everything fine. So my questions are
what does hab config apply -u user servicemame (date) encrypt?
Why are the supervisors not only joining the service group fine when there is no user key present but also running the hooks? meaning why are they even able to decrypt the config.toml and retrieve data and run hooks?
What is the svc key encrypting??
Also I just removed the svc keys from other nodes and reran the supervisor and loaded the package. They loaded fine and ran the hooks.
What makes you think they shouldn't be able to join the service group? I don't see anything in the docs about service group encryption that governs what supervisors can join the group. Perhaps you want something like wire encryption. What are your needs?
/// The `seal()` function encrypts a message `m` for a recipient whose public key
/// is `pk`. It returns the ciphertext whose length is `SEALBYTES + m.len()`.
///
/// The function creates a new key pair for each message, and attaches the public
/// key to the ciphertext. The secret key is overwritten and is not accessible
/// after this function returns.
The wire encryption as per the documentation only encrypts the traffic between the supervisor.
My needs are:
I dont want any supervisor created to join the service group without any authentication. This puts my Vault cluster at risk as anyone can create a node, deploy the vault package , join the cluster and obtain access
even if they are able to join it would be nice to see that they can not retrieve any data unless they have a key or authenticate.
I was of the opinion that the service group key encryption solves this problem. If it does not then is there a way I can accomplish the above? If yes then how?
I followed the new steps added in the documentation. The 4th step in Generating a Ring Key in Wire Encryption says
The Supervisor becomes part of the named ring <RING> and uses the key for network encryption. Other supervisors that now attempt to connect to it without presenting the correct ring key will be rejected.
I have created a vm. This vm is going to be the first vm that other vms will peer to. I installed the consul package and did this.
ito002380:~ # hab ring key generate consulring
» Generating ring key for consulring
★ Generated ring key pair consulring-20180830135404.
ito002380:~ # nohup sudo -E hab sup run --ring consulring --topology leader --strategy rolling --channel stable --url http://{realfqdn}.fhc.ford.com/ &
[1] 20989
ito002380:~ # nohup: ignoring input and appending output to 'nohup.out'
ito002380:~ # tail -f nohup.out
hab-sup(MR): Supervisor Member-ID 652e19b11715438aa1cda8447be57efc
hab-sup(MR): Starting gossip-listener on 0.0.0.0:9638
hab-sup(MR): Starting ctl-gateway on 127.0.0.1:9632
hab-sup(MR): Starting http-gateway on 0.0.0.0:9631
ERROR 2018-08-30T13:56:00Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:01Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:01Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:03Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:04Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:04Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:07Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:07Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:07Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:10Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:10Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
ERROR 2018-08-30T13:56:10Z: habitat_butterfly::server::inbound: Error parsing protobuf: HabitatCore(CryptoError("Secret key and nonce could not decrypt ciphertext"))
what does the "Secret key and nonce could not decrypt ciphertext" mean? What am I doing wrong?