Sensitive Data w/ Solo

I’m curious how others are handling sensitive information (passwords,
ssh keys, etc) that may be part of a chef cookbook… Keeping the
repository private is one option, however, that will require either
generating an ssh key pair and providing the public key to the git
repo’s ssh server or using a shared set of ssh keys that is used only
for accessing the git repository. Can anyone provide some insight into
the best practices for this?

I asked this question earlier today on IRC, kallistec suggested using
data bags, but I’m using chef-solo so that is out of the question.

Best regards,
Michael Guterl

I keep the repository private and use gitosis to manage all the
commiters public keys.

-lee

On Thu, Jun 17, 2010 at 12:25 PM, Michael Guterl mguterl@gmail.com wrote:

I'm curious how others are handling sensitive information (passwords,
ssh keys, etc) that may be part of a chef cookbook... Keeping the
repository private is one option, however, that will require either
generating an ssh key pair and providing the public key to the git
repo's ssh server or using a shared set of ssh keys that is used only
for accessing the git repository. Can anyone provide some insight into
the best practices for this?

I asked this question earlier today on IRC, kallistec suggested using
data bags, but I'm using chef-solo so that is out of the question.

Best regards,
Michael Guterl

--


Lee Azzarello
drop.io staff hacker

Ohai, Michael.

On Jun 17, 2010, at 12:25 PM, Michael Guterl wrote:

I'm curious how others are handling sensitive information (passwords,
ssh keys, etc) that may be part of a chef cookbook
For now, I have been keeping such data a 'secrets' directory that is located within my Chef operations repo, but is not actually part of the repo (using gitignore). Most of the secrets are contained in a JSON file, as Chef attributes, but there are also some flat files (the current Chef server validation.pem, apache SSL certs, etc.), which are used by other software systems, in addition to chef-client (like knife).

When I run 'rake roles', all of the JSON data gets read into the override_attributes of a special 'deployment-secrets' role, by custom ruby code in the role file -- something like this:

name "deployment-secrets"
description "Credentials and other sensitive attributes"

The JSON secrets are in a 'chef' subdirectory of the secrets folder

OPS_DIR = File.expand_path(File.join(File.dirname(FILE), ".."))
SEC_DIR = File.join(OPS_DIR,"secrets")
SECRETS_FILE = File.join(SEC_DIR, 'chef', "chef-secrets.json")

Load secrets from JSON data if present

secrets = Hash.new
if File.exists? SECRETS_FILE
Chef::Log.debug "Loading template-specific secrets from #{SECRETS_FILE}..."
secrets = JSON.parse(File.read SECRETS_FILE)
end

Load some data from a flat file and encode it as JSON

Merge it into the 'secrets' hash

...

Use the secrets hash as the override attributes of this role

override_attributes secrets if not secrets.empty?

You can also copy flat files from the secrets directory into cookbooks at this stage:

puts "Copying SSL certificates into comapny cookbook..."
RSYNC = "rsync --exclude '.*' --copy-links -vprt"
COMPANY_SSL_RECIPE="#{OPS_DIR}/site-cookbooks/my_company"
rsync_cmd = "#{RSYNC} #{SEC_DIR}/ssl/ #{COMPANY_SSL_RECIPE}/files/default/ssl/"
#{rsync_cmd}

The benefits of this system are:

  1. Operational secrets are stored separately from the main configuration repo in a DRY way. The usual 'rake install' process takes care of inserting all secrets into the regular workflow, for both chef-client and chef-solo.
  2. Different sets of operational secrets (for developers, testers, and production ops personnel) can be maintained for different environments, and symlinked into place as desired.

The drawbacks are:

  1. Storing secrets as attributes is not so secure in the first place; and
  2. All secrets are stored on all nodes, regardless of whether they're needed on that particular node. This makes this system even less secure.
  3. More I haven't thought of...?

Hope this helps,

  • R. Newbie

I'm not using this too much yet, but I like storing hashed passwords
and public keys. Neither are super sensitive (the hashed passwords are
more so, but not so sensitive I'd be totally against keeping them in a
repo on my private network). To generate passwords that work: openssl
passwd -1
(got that from here:
http://github.com/37signals/37s_cookbooks/blob/master/users/attributes/default.rb)

The one thing I haven't figured out is how to store a hashed password
for mysql :slight_smile:

On Thu, Jun 17, 2010 at 12:59 PM, Ruby Newbie rubynewbie@me.com wrote:

Ohai, Michael.

On Jun 17, 2010, at 12:25 PM, Michael Guterl wrote:

I'm curious how others are handling sensitive information (passwords,
ssh keys, etc) that may be part of a chef cookbook
For now, I have been keeping such data a 'secrets' directory that is located within my Chef operations repo, but is not actually part of the repo (using gitignore). Most of the secrets are contained in a JSON file, as Chef attributes, but there are also some flat files (the current Chef server validation.pem, apache SSL certs, etc.), which are used by other software systems, in addition to chef-client (like knife).

When I run 'rake roles', all of the JSON data gets read into the override_attributes of a special 'deployment-secrets' role, by custom ruby code in the role file -- something like this:

name "deployment-secrets"
description "Credentials and other sensitive attributes"

The JSON secrets are in a 'chef' subdirectory of the secrets folder

OPS_DIR = File.expand_path(File.join(File.dirname(FILE), ".."))
SEC_DIR = File.join(OPS_DIR,"secrets")
SECRETS_FILE = File.join(SEC_DIR, 'chef', "chef-secrets.json")

Load secrets from JSON data if present

secrets = Hash.new
if File.exists? SECRETS_FILE
Chef::Log.debug "Loading template-specific secrets from #{SECRETS_FILE}..."
secrets = JSON.parse(File.read SECRETS_FILE)
end

Load some data from a flat file and encode it as JSON

Merge it into the 'secrets' hash

...

Use the secrets hash as the override attributes of this role

override_attributes secrets if not secrets.empty?

You can also copy flat files from the secrets directory into cookbooks at this stage:

puts "Copying SSL certificates into comapny cookbook..."
RSYNC = "rsync --exclude '.*' --copy-links -vprt"
COMPANY_SSL_RECIPE="#{OPS_DIR}/site-cookbooks/my_company"
rsync_cmd = "#{RSYNC} #{SEC_DIR}/ssl/ #{COMPANY_SSL_RECIPE}/files/default/ssl/"
#{rsync_cmd}

The benefits of this system are:

  1. Operational secrets are stored separately from the main configuration repo in a DRY way. The usual 'rake install' process takes care of inserting all secrets into the regular workflow, for both chef-client and chef-solo.
  2. Different sets of operational secrets (for developers, testers, and production ops personnel) can be maintained for different environments, and symlinked into place as desired.

The drawbacks are:

  1. Storing secrets as attributes is not so secure in the first place; and
  2. All secrets are stored on all nodes, regardless of whether they're needed on that particular node. This makes this system even less secure.
  3. More I haven't thought of...?

Hope this helps,

  • R. Newbie

On Thu, Jun 17, 2010 at 12:59 PM, Ruby Newbie rubynewbie@me.com wrote:

Ohai, Michael.

On Jun 17, 2010, at 12:25 PM, Michael Guterl wrote:

I'm curious how others are handling sensitive information (passwords,
ssh keys, etc) that may be part of a chef cookbook
For now, I have been keeping such data a 'secrets' directory that is located within my Chef operations repo, but is not actually part of the repo (using gitignore). Most of the secrets are contained in a JSON file, as Chef attributes, but there are also some flat files (the current Chef server validation.pem, apache SSL certs, etc.), which are used by other software systems, in addition to chef-client (like knife).

I'm still relatively new to Chef and I may be missing something
obvious. If you don't keep the secrets directory as part of the repo,
how do you get the secrets to the server that you're provisioning?

Best,
Michael Guterl

On Thu, Jun 17, 2010 at 12:42 PM, Lee Azzarello lee@dropio.com wrote:

I keep the repository private and use gitosis to manage all the
commiters public keys.

-lee

On Thu, Jun 17, 2010 at 12:25 PM, Michael Guterl mguterl@gmail.com wrote:

I'm curious how others are handling sensitive information (passwords,
ssh keys, etc) that may be part of a chef cookbook... Keeping the
repository private is one option, however, that will require either
generating an ssh key pair and providing the public key to the git
repo's ssh server or using a shared set of ssh keys that is used only
for accessing the git repository. Can anyone provide some insight into
the best practices for this?

I asked this question earlier today on IRC, kallistec suggested using
data bags, but I'm using chef-solo so that is out of the question.

How do you provide the machine you're provisioning with access to gitosis?

Best,
Michael Guterl

On Thu, Jun 17, 2010 at 12:25 PM, Michael Guterl mguterl@gmail.com wrote:

I'm curious how others are handling sensitive information (passwords,
ssh keys, etc) that may be part of a chef cookbook... Keeping the
repository private is one option, however, that will require either
generating an ssh key pair and providing the public key to the git
repo's ssh server or using a shared set of ssh keys that is used only
for accessing the git repository. Can anyone provide some insight into
the best practices for this?

I asked this question earlier today on IRC, kallistec suggested using
data bags, but I'm using chef-solo so that is out of the question.

Given the variety of replies that I have received, I realize I may
have not explained myself as good as I could have. I appreciate all
of the replies I have received thus far, although I still don't know
if any of them will help my particular case.

I have created a gist with the bootstrap.sh script I'm using:

In order to gain access to the private repository I have to either use
a set of shared SSH keys that github is already setup with or I have
to generate a new set of SSH keys and somehow let github know about
the new public key.

I think a set of shared keys is probably the easiest approach,
however, storing the text of the keys in the bootstrap script seems
wrong.

Is anyone dealing with anything similar?

Best,
Michael Guterl

Hey, Michael.

On Jun 18, 2010, at 2:48 PM, Michael Guterl wrote:

I'm still relatively new to Chef and I may be missing something
obvious. If you don't keep the secrets directory as part of the repo,
how do you get the secrets to the server that you're provisioning?
Sorry, my first response wasn't clear. I guess that the whole discussion around where to store secret data is motivated by security concerns, so instead of asking whether the secrets are "part of" the configuration repo, perhaps we should address:

  1. Whether the secret data is stored in the same source-code-management repository as the rest of the config; and
  2. Whether the secrets are available to people who need to configure new nodes (whether using chef-client, or chef-solo).

Presumably we all would like to make sure that the answer to the first question is "no". Both of the mechanisms described so far in response to your inquiry (merging the secrets into a special role at launch time, and storing them in some external server, like Chef's own data bags) satisfy this condition.

However, I can't think of an easy way to satisfy the second condition. Any person in your organization who requires the capability to configure nodes, is ultimately going to require SOME level of access to whatever secrets those nodes need. Even if the secrets are stored in some Kerberos-like, time-limited key server, at some point the data needed to authenticate to THAT needs to be generated and given to the node.

So the important thing for my team was just to make sure that secrets are properly compartmentalized, so that developers, testers, and production ops personnel can all use the same shared config without overly compromising security. When production nodes are launched, they pull the secrets from the special-purpose role stored in the Chef server, but when I want to try out a node configuration using Chef solo, I have a rake task that does the following:

  1. Uses a single rsync command to push both the Chef configuration repo and the secrets (that are symlinked beneath the config repo, but again, not stored in Git).
  2. Opens an SSH connection and runs chef-solo, passing the same JSON launch data as would have been used in client-server mode.

Again, this method satisfies the first security-related condition, but not the second.

Hope this helps,

  • R. Newbie

Ohai, Michael.

On Jun 18, 2010, at 3:09 PM, Michael Guterl wrote:

I have created a gist with the bootstrap.sh script I'm using:
gist:557d9694a9606b9edbeb · GitHub
OK, now it's clear what you're trying to do -- thanks.

...
however, storing the text of the keys in the bootstrap script seems
wrong.
"Wrong" from a security standpoint, correct? In my opinion, storing the private key data in your bootstrap script is not so bad -- if the bootstrap script is not part of your generic disk image, and is instead copied over secure a channel to the node at boot time. In this case, you're really treating the whole bootstrap script as "secret data". Since the script is not that long or complex (and therefore presumably not too subject to rapid change), this may work OK for you.

That said, most of us would try to separate the secrets from the code that acts on them, for reasons mentioned earlier in this thread.

Is anyone dealing with anything similar?
Yes -- I think that everyone who makes substantial use of Chef must eventually confront the following issue, whether using chef solo or full client-server...

Chef needs to handle the secret data that your applications use (regardless of when that data gets inserted into your workflow). So we all use some kind of "meta-secret" to protect that data. In the client-server setup, the validation.pem (or individual client.pem) represents the meta-secret. In your case, it's the private half of your GitHub repo. In either case, if this data is compromised, so is all the secret data that Chef works with to configure your nodes.

For this reason, the best practice it to NOT store this meta-secret (or indeed any secret data) on your generic disk image, but to instead pass the data to the node at boot time, and if possible, to remove it after convergence. That's why Chef's knife tool can generate Amazon EC2 launch data containing the "meta-secret". You can follow the same security model with chef-solo, if you make a rake task in your Chef repo that carries out the following steps:

  1. Use your cloud provder's API, or your virtualization host's API, or whatever else you need to boot a fresh system, and store it's IP address.
  2. Use 'scp' (or Ruby's Net::SCP, or whatever) to transfer the private key data (or the entire bootstrap script, if you must) to the node.
  3. Use 'ssh' (or Net:SSH) to run you bootstrap script.
  4. Use 'ssh' to DELETE THE GITHUB PRIVATE KEY FROM THE NODE! (or do so from the bootstrap script)

Make sense?
-RN

Hi,
I was thinking about putting my ssh keys in the disk image. It seems
the objection in terms of security is that someone else could use
those public ssh keys and pretend to be one of your node(s). So a
login to your central security server would allow an outsider to get
those other secret security information (namely the MYSQL db
password).

So its expected that central security server also needs also some DNS
based (or IP address based) method to tell the true nodes from the
fake ones?

Is it any more secure to be initially logging into each node with a
capistrano script (multiple-hosts, password-based ssh), so to upload
the ssh keys? Then people couldn't get them from your master disk
image. Of course they (id_rsa.pub) would still be chmod 700 in ~/.ssh
directory on the nodes.

Perhaps a more custom scheme is preferred as they are harder to break
when critical security elements
arent in the standard places where people usually expect?

Of course including an obfruscation approach also by definition makes
such a scheme harder for us to understand. But tell me, can you think
of a more secure scheme whereby a central server must hold all of the
sensitive password data?

It strikes me also that in many traditional setups, there might be a
dedicated set of DB servers. And a dedicated set of Webservers, LDAP
server etc. Well in that case the passwords are role-specific, and a
server which belongs to one 'role', wont need to know the passwords
that exist for other roles and classes of service-provisioning
machines.

I guess that means if you do use a private git repository, then should
delete those repository data locally after provisioning the node with
the passwords for its role(s)? Otherwise any node of 1 role is going
to know the secrets for all of the other roles. Ie if you think
someone might break the weakest role type of your nodes? By their very
nature, certain server roles will present more of a security risk and
be more open to attackers than others.

On Sat, Jun 19, 2010 at 1:05 AM, Ruby Newbie rubynewbie@me.com wrote:

Ohai, Michael.

On Jun 18, 2010, at 3:09 PM, Michael Guterl wrote:

I have created a gist with the bootstrap.sh script I'm using:
gist:557d9694a9606b9edbeb · GitHub
OK, now it's clear what you're trying to do -- thanks.

...
however, storing the text of the keys in the bootstrap script seems
wrong.
"Wrong" from a security standpoint, correct? In my opinion, storing the private key data in your bootstrap script is not so bad -- if the bootstrap script is not part of your generic disk image, and is instead copied over secure a channel to the node at boot time. In this case, you're really treating the whole bootstrap script as "secret data". Since the script is not that long or complex (and therefore presumably not too subject to rapid change), this may work OK for you.

That said, most of us would try to separate the secrets from the code that acts on them, for reasons mentioned earlier in this thread.

Is anyone dealing with anything similar?
Yes -- I think that everyone who makes substantial use of Chef must eventually confront the following issue, whether using chef solo or full client-server...

Chef needs to handle the secret data that your applications use (regardless of when that data gets inserted into your workflow). So we all use some kind of "meta-secret" to protect that data. In the client-server setup, the validation.pem (or individual client.pem) represents the meta-secret. In your case, it's the private half of your GitHub repo. In either case, if this data is compromised, so is all the secret data that Chef works with to configure your nodes.

For this reason, the best practice it to NOT store this meta-secret (or indeed any secret data) on your generic disk image, but to instead pass the data to the node at boot time, and if possible, to remove it after convergence. That's why Chef's knife tool can generate Amazon EC2 launch data containing the "meta-secret". You can follow the same security model with chef-solo, if you make a rake task in your Chef repo that carries out the following steps:

  1. Use your cloud provder's API, or your virtualization host's API, or whatever else you need to boot a fresh system, and store it's IP address.
  2. Use 'scp' (or Ruby's Net::SCP, or whatever) to transfer the private key data (or the entire bootstrap script, if you must) to the node.
  3. Use 'ssh' (or Net:SSH) to run you bootstrap script.
  4. Use 'ssh' to DELETE THE GITHUB PRIVATE KEY FROM THE NODE! (or do so from the bootstrap script)

Make sense?
-RN

On Fri, Jun 18, 2010 at 8:05 PM, Ruby Newbie rubynewbie@me.com wrote:

Ohai, Michael.

On Jun 18, 2010, at 3:09 PM, Michael Guterl wrote:

I have created a gist with the bootstrap.sh script I'm using:
gist:557d9694a9606b9edbeb · GitHub
OK, now it's clear what you're trying to do -- thanks.

...
however, storing the text of the keys in the bootstrap script seems
wrong.
"Wrong" from a security standpoint, correct? In my opinion, storing the private key data in your bootstrap script is not so bad -- if the bootstrap script is not part of your generic disk image, and is instead copied over secure a channel to the node at boot time. In this case, you're really treating the whole bootstrap script as "secret data". Since the script is not that long or complex (and therefore presumably not too subject to rapid change), this may work OK for you.

You make a very good point here, I guess my biggest worry was sharing
the SSH keys across multiple machines. Being the key will only be
used for access to the repository, this probably isn't as big of a
deal.

That said, most of us would try to separate the secrets from the code that acts on them, for reasons mentioned earlier in this thread.

This makes sense, I'm still thinking through the solution you propose
in your two earlier replies.

Is anyone dealing with anything similar?
Yes -- I think that everyone who makes substantial use of Chef must eventually confront the following issue, whether using chef solo or full client-server...

Chef needs to handle the secret data that your applications use (regardless of when that data gets inserted into your workflow). So we all use some kind of "meta-secret" to protect that data. In the client-server setup, the validation.pem (or individual client.pem) represents the meta-secret. In your case, it's the private half of your GitHub repo. In either case, if this data is compromised, so is all the secret data that Chef works with to configure your nodes.

This was perhaps my biggest concern from a security standpoint;
gaining access to our chef repository could open up access to our
entire infrastructure. My knowledge of security is not very strong
and prior to getting too deep into Chef, I wanted to make sure I
wasn't making any obvious blunders with securing our installation.

For this reason, the best practice it to NOT store this meta-secret (or indeed any secret data) on your generic disk image, but to instead pass the data to the node at boot time, and if possible, to remove it after convergence. That's why Chef's knife tool can generate Amazon EC2 launch data containing the "meta-secret". You can follow the same security model with chef-solo, if you make a rake task in your Chef repo that carries out the following steps:

  1. Use your cloud provder's API, or your virtualization host's API, or whatever else you need to boot a fresh system, and store it's IP address.
  2. Use 'scp' (or Ruby's Net::SCP, or whatever) to transfer the private key data (or the entire bootstrap script, if you must) to the node.
  3. Use 'ssh' (or Net:SSH) to run you bootstrap script.
  4. Use 'ssh' to DELETE THE GITHUB PRIVATE KEY FROM THE NODE! (or do so from the bootstrap script)

Make sense?

I can't thank you enough for the insight and advice you have provided,
it is VERY much appreciated.

Best regards,
Michael Guterl