Database cookbook suggestion: Shell out for peer authentication?

Hello list,

I’ve been adding a postgresql recipe for the rsyslog cookbook to
provide out-of-the-box PostgreSQL backend support. It’s working well
but something has been bothering me.

MySQL users have traditionally used password authentication but
PostgreSQL has put equal emphasis on peer authentication, where the
user is given access without a password on the basis that they are
already connected via the operating system user of the same name.
Obviously this only works for local socket connections.

I have always liked this concept. Why waste time and effort trying to
hide a password when one isn’t even necessary? If an intruder has
already gained shell access to the account then it’s probably game
over anyway.

Unfortunately the database cookbook doesn’t cater for this concept at
all. It connects using the pg gem, meaning that the operating system
user is (almost?) always root. I’ve been thinking about whether it
would be possible for the “db” method to shell out to psql via su,
allowing peer authentication from any local user. This has a few
benefits.

Obviously it negates the need for many users to have a password. This
also applies to the postgres user, which currently has a password
randomly generated by default, and I gather there are problems
surrounding this.

It would also mean that the installation of “build-essential” could
potentially be avoided. This consumes substantial time and bandwidth
and I was rather disappointed to see it being installed on what I had
hoped would be rather minimal boxes.

MySQL could also reap these benefits. It’s probably not widely used but
it has supported peer authentication since 5.5.10.

You may be thinking that I could just do all my recipe database
operations as the postgres user with a password and allow other users
to connect using peer authentication from their respective applications.
This generally doesn’t work because any tables, sequences, and other
such things that get created are then owned by the postgres user.
Granting any other user access to these is very tedious, short of
making them a superuser. The most practical solution is for the end
user to create these objects in the first place.

There are certain things to be aware of. psql should work in all cases
but su might not. What if the server is remote? Connecting to a socket
requires passing a path so one could just check to see whether "host"
starts with a slash. What if chef-client is not being run as root? I’m
not sure if that’s a supported use case but I have heard of it. In that
case, passwords will be necessary so simply don’t use su. And what
about operating system support? The docs say that peer authentication
should work on Linux, most flavors of BSD including Mac OS X, and
Solaris. Like the non-root case, Windows could just connect without su.

Any drawbacks? The pg gem provides nice exceptions when things go wrong
but I don’t think Chef has a mechanism to intelligently react to these
so there’s probably no big loss there. Some may think that shelling out
is just nasty and I would usually agree but Chef does plenty of it
already and I think the benefits far outweigh this concern. I honestly
can’t think of anything else.

What do you think?

Regards,
James

Is there a way to obtain the meta data of a data bag object? For example, a
way to get the time stamp of the last update or a check sum of the object.

I have hundreds of nodes that will be processing the elements of the data
bag, which could be hundreds, and I want a way to filter out those which
have already been processed. If the data bag item had been changed since
the last run, I want to make sure it is processed.

JOHN HASTY
Software as a Service - DevOps
Software Group

Phone: 1-512-804-9968 IBM
E-mail: jahasty@us.ibm.com
2407 S Congress Ave Ste E-350
Austin, TX 78704
United States

i dont think. in fact we had few discussions in the mailing list about pain
points where data bag were used with an expectation of versions/checksums
or transactions..

On Tue, Jun 17, 2014 at 3:02 PM, JOHN HASTY jahasty@us.ibm.com wrote:

Is there a way to obtain the meta data of a data bag object? For example,
a way to get the time stamp of the last update or a check sum of the object.

I have hundreds of nodes that will be processing the elements of the data
bag, which could be hundreds, and I want a way to filter out those which
have already been processed. If the data bag item had been changed since
the last run, I want to make sure it is processed.

JOHN HASTY
Software as a Service - DevOps
Software Group

Phone: 1-512-804-9968

2407 S Congress Ave Ste E-350
Austin, TX 78704
United States

On Jun 17, 2014, at 2:46 PM, James Le Cuirot chewi@aura-online.co.uk wrote:

Hello list,

I've been adding a postgresql recipe for the rsyslog cookbook to
provide out-of-the-box PostgreSQL backend support. It's working well
but something has been bothering me.

MySQL users have traditionally used password authentication but
PostgreSQL has put equal emphasis on peer authentication, where the
user is given access without a password on the basis that they are
already connected via the operating system user of the same name.
Obviously this only works for local socket connections.

I have always liked this concept. Why waste time and effort trying to
hide a password when one isn't even necessary? If an intruder has
already gained shell access to the account then it's probably game
over anyway.

Unfortunately the database cookbook doesn't cater for this concept at
all. It connects using the pg gem, meaning that the operating system
user is (almost?) always root. I've been thinking about whether it
would be possible for the "db" method to shell out to psql via su,
allowing peer authentication from any local user. This has a few
benefits.

Obviously it negates the need for many users to have a password. This
also applies to the postgres user, which currently has a password
randomly generated by default, and I gather there are problems
surrounding this.

It would also mean that the installation of "build-essential" could
potentially be avoided. This consumes substantial time and bandwidth
and I was rather disappointed to see it being installed on what I had
hoped would be rather minimal boxes.

MySQL could also reap these benefits. It's probably not widely used but
it has supported peer authentication since 5.5.10.

You may be thinking that I could just do all my recipe database
operations as the postgres user with a password and allow other users
to connect using peer authentication from their respective applications.
This generally doesn't work because any tables, sequences, and other
such things that get created are then owned by the postgres user.
Granting any other user access to these is very tedious, short of
making them a superuser. The most practical solution is for the end
user to create these objects in the first place.

There are certain things to be aware of. psql should work in all cases
but su might not. What if the server is remote? Connecting to a socket
requires passing a path so one could just check to see whether "host"
starts with a slash. What if chef-client is not being run as root? I'm
not sure if that's a supported use case but I have heard of it. In that
case, passwords will be necessary so simply don't use su. And what
about operating system support? The docs say that peer authentication
should work on Linux, most flavors of BSD including Mac OS X, and
Solaris. Like the non-root case, Windows could just connect without su.

Any drawbacks? The pg gem provides nice exceptions when things go wrong
but I don't think Chef has a mechanism to intelligently react to these
so there's probably no big loss there. Some may think that shelling out
is just nasty and I would usually agree but Chef does plenty of it
already and I think the benefits far outweigh this concern. I honestly
can't think of anything else.

The primary drawback is that unless your application is a toy or experiment, you will have more than one server (usually at least 4, 2x web 2x DB for reliability). In this situation peer auth can't be used and because it is by far the more common case, complicating the code to support peer auth for single-server situations just isn't worthwhile.

--Noah

I don't believe the API has a way to do this. However, if your main pain
point is the time it takes to process the data bag (rather than the time it
takes to retrieve it), you can get the checksum of a string like this:

require 'digest/md5'
Digest::MD5.hexdigest(str)

In that case, you could download the data bag items, checksum them, and
compare with the last time you processed.

I know that there are upcoming APIs that will probably help with this.

--John

On Tue, Jun 17, 2014 at 3:02 PM, JOHN HASTY jahasty@us.ibm.com wrote:

Is there a way to obtain the meta data of a data bag object? For example,
a way to get the time stamp of the last update or a check sum of the object.

I have hundreds of nodes that will be processing the elements of the data
bag, which could be hundreds, and I want a way to filter out those which
have already been processed. If the data bag item had been changed since
the last run, I want to make sure it is processed.

JOHN HASTY
Software as a Service - DevOps
Software Group

Phone: 1-512-804-9968

2407 S Congress Ave Ste E-350
Austin, TX 78704
United States

Chef does not current expose this information through the API, you can access some of it directly via postgres (I think mtime is tracked, but not the username of the last editor). This will eventually be available in Enterprise Chef as part of the reporting and audit trail features, but no ETA has been announced that I know of.

--Noah

On Jun 17, 2014, at 3:07 PM, Ranjib Dey dey.ranjib@gmail.com wrote:

i dont think. in fact we had few discussions in the mailing list about pain points where data bag were used with an expectation of versions/checksums or transactions..

On Tue, Jun 17, 2014 at 3:02 PM, JOHN HASTY jahasty@us.ibm.com wrote:
Is there a way to obtain the meta data of a data bag object? For example, a way to get the time stamp of the last update or a check sum of the object.

I have hundreds of nodes that will be processing the elements of the data bag, which could be hundreds, and I want a way to filter out those which have already been processed. If the data bag item had been changed since the last run, I want to make sure it is processed.

JOHN HASTY
Software as a Service - DevOps
Software Group

Phone: 1-512-804-9968
E-mail: jahasty@us.ibm.com
<11477162.gif>

2407 S Congress Ave Ste E-350
Austin, TX 78704
United States

On Tue, 17 Jun 2014 15:08:28 -0700
Noah Kantrowitz noah@coderanger.net wrote:

On Jun 17, 2014, at 2:46 PM, James Le Cuirot
chewi@aura-online.co.uk wrote:

MySQL users have traditionally used password authentication but
PostgreSQL has put equal emphasis on peer authentication, where the
user is given access without a password on the basis that they are
already connected via the operating system user of the same name.
Obviously this only works for local socket connections.

I have always liked this concept. Why waste time and effort trying
to hide a password when one isn't even necessary? If an intruder has
already gained shell access to the account then it's probably game
over anyway.

Unfortunately the database cookbook doesn't cater for this concept
at all. It connects using the pg gem, meaning that the operating
system user is (almost?) always root. I've been thinking about
whether it would be possible for the "db" method to shell out to
psql via su, allowing peer authentication from any local user.

The primary drawback is that unless your application is a toy or
experiment, you will have more than one server (usually at least 4,
2x web 2x DB for reliability). In this situation peer auth can't be
used and because it is by far the more common case, complicating the
code to support peer auth for single-server situations just isn't
worthwhile.

I feel that's a tad dismissive. My company might not be Google but
we've become successful enough to warrant the need for Chef. Although
we do have some dedicated database servers, we also have more
standalone systems with their own databases and we have clones of these
for failover because it works for us. I'm sure we're not alone in doing
this.

As for it complicating the code, I decided to rise to that challenge
and took a first stab at it. It works and there's actually less code
than there was before.

A few concerns arose during this work but no show-stoppers. I need to
ensure psql can be found as it's not always in the PATH. Using
a .pgpass file would be safer than setting PGPASSWORD in the
environment. Finally, I wanted to preserve the existing behaviour in
terms of transactions as best I could. Passing the SQL using the -c
option seems to be the best way to achieve this but I fear this might
hit the command line limit on some systems; probably not Linux but
possibly Windows. Unfortunately using stdin results in slightly
different behaviour.

Regards,
James