CHEF-1621 - "recursive" attribute on "directory" resource does not apply users and groups to entire path


#1

Hi folks,

I was discussing this ticket with Dan DeLeo on irc and we thought it
would be a good idea to get feedback from a larger audience.

Have a look at the example at http://tickets.opscode.com/browse/CHEF-1621.

The user was expecting:

777 ldm:ldm /data
777 ldm:ldm /data/realtime
777 ldm:ldm /data/realtime/fetched
777 ldm:ldm /data/realtime/fetched/radar

Here’s what chef did:

755 root:root /data
755 root:root /data/realtime
755 root:root /data/realtime/fetched
777 ldm:ldm /data/realtime/fetched/radar

The current chef behavior makes sense to me. Here are a couple of reasons why:

  1. As the directory being defined is /data/realtime/fetched/radar, I
    would not expect chef to touch the ownership or permissions on
    anything above it. I would however expect chef to create the required
    parent directories if they did not exist in order to satisfy the
    desired state.

  2. If chef modified the permissions/ownership for the entire
    structure, this could have adverse effects if there were additional
    directories defined under /data, or /data/realtime or
    /data/realtime/fetched. If /data/foo existed for example. In my
    opinion, this would be counterintuitive behavior.

I agree with the ticket submitter that the docs aren’t 100% clear on
this and should probably be updated.

I would love to hear additional thoughts on this.

-Tommy


#2

On Sat, Oct 2, 2010 at 4:22 PM, Thomas Bishop bishop.thomas@gmail.com wrote:

Hi folks,

I was discussing this ticket with Dan DeLeo on irc and we thought it
would be a good idea to get feedback from a larger audience.

Have a look at the example at http://tickets.opscode.com/browse/CHEF-1621.

The user was expecting:

777 ldm:ldm /data
777 ldm:ldm /data/realtime
777 ldm:ldm /data/realtime/fetched
777 ldm:ldm /data/realtime/fetched/radar

Here’s what chef did:

755 root:root /data
755 root:root /data/realtime
755 root:root /data/realtime/fetched
777 ldm:ldm /data/realtime/fetched/radar

The current chef behavior makes sense to me. Here are a couple of reasons why:

  1. As the directory being defined is /data/realtime/fetched/radar, I
    would not expect chef to touch the ownership or permissions on
    anything above it. I would however expect chef to create the required
    parent directories if they did not exist in order to satisfy the
    desired state.

  2. If chef modified the permissions/ownership for the entire
    structure, this could have adverse effects if there were additional
    directories defined under /data, or /data/realtime or
    /data/realtime/fetched. If /data/foo existed for example. In my
    opinion, this would be counterintuitive behavior.

I agree with the ticket submitter that the docs aren’t 100% clear on
this and should probably be updated.

I would love to hear additional thoughts on this.

That makes sense, if the tree already exists. You don’t want to muck
with existing permissions.

However, if the parent directory hierarchy were being created from
scratch (did not already exist), what should happen? Should they be
owned by root, or by the owner of the file at the bottom of the
hierarchy?

I think the latter makes sense, otherwise you’d have to explicitly
create every level in the hierarchy and assign permissions. That’d be
silly.

– Chad


#3

I see your perspective in the case of a new directory.

Given the user’s example, my initial expectation is that I would have
to do the following:

directory “/data/realtime/fetched/radar” do
action :create
owner "ldm"
group "ldm"
mode "777"
end

directory “/data” do
action :create
owner "ldm"
group "ldm"
mode "777"
recursive true
end

To me, this is more clear of what should happen. I think there are a
couple of items that lead me to this.

  1. I’m probably influenced by how I would do this on the command
    line. (mkdir -p /data/realtime/fetched/radar; chmod -R 777 /data;
    chown -R ldm:ldm /data;)

  2. When I see ‘recursive’, I think of impacting down the tree of the
    argument specified and not up. Again, probably influenced by the
    behavior of command line tools (cp -R, chmod -R, chown -R, etc.).

I guess another approach could be to have an attribute which denotes
to apply the ownership/permission settings to the entire path or
perhaps at a specific starting part. Although this could be confusing
as well in combination with the recursive attribute. Maybe change the
recursive attribute from a boolean to fit this type of functionality?
I don’t care for either of these approaches too much.

I think this might just be a preference of implicit vs. explicit.

Personally, for something like file system
hierarchies/ownership/permissions I would prefer explicit definition.

Thoughts?

-Tommy

On Sat, Oct 2, 2010 at 19:02, Chad Woolley thewoolleyman@gmail.com wrote:

On Sat, Oct 2, 2010 at 4:22 PM, Thomas Bishop bishop.thomas@gmail.com wrote:

Hi folks,

I was discussing this ticket with Dan DeLeo on irc and we thought it
would be a good idea to get feedback from a larger audience.

Have a look at the example at http://tickets.opscode.com/browse/CHEF-1621.

The user was expecting:

777 ldm:ldm /data
777 ldm:ldm /data/realtime
777 ldm:ldm /data/realtime/fetched
777 ldm:ldm /data/realtime/fetched/radar

Here’s what chef did:

755 root:root /data
755 root:root /data/realtime
755 root:root /data/realtime/fetched
777 ldm:ldm /data/realtime/fetched/radar

The current chef behavior makes sense to me. Here are a couple of reasons why:

  1. As the directory being defined is /data/realtime/fetched/radar, I
    would not expect chef to touch the ownership or permissions on
    anything above it. I would however expect chef to create the required
    parent directories if they did not exist in order to satisfy the
    desired state.

  2. If chef modified the permissions/ownership for the entire
    structure, this could have adverse effects if there were additional
    directories defined under /data, or /data/realtime or
    /data/realtime/fetched. If /data/foo existed for example. In my
    opinion, this would be counterintuitive behavior.

I agree with the ticket submitter that the docs aren’t 100% clear on
this and should probably be updated.

I would love to hear additional thoughts on this.

That makes sense, if the tree already exists. You don’t want to muck
with existing permissions.

However, if the parent directory hierarchy were being created from
scratch (did not already exist), what should happen? Should they be
owned by root, or by the owner of the file at the bottom of the
hierarchy?

I think the latter makes sense, otherwise you’d have to explicitly
create every level in the hierarchy and assign permissions. That’d be
silly.

– Chad


#4

On Sun, Oct 3, 2010 at 5:08 PM, Thomas Bishop bishop.thomas@gmail.com wrote:

  1. I’m probably influenced by how I would do this on the command
    line. (mkdir -p /data/realtime/fetched/radar; chmod -R 777 /data;
    chown -R ldm:ldm /data;)

On the command line, if I wanted an entire tree owned by a non-root
user, I would do this:

mkdir -p /home/ldm/realtime/fetched/radar

IF the top level dir happened to not be root-writeable, I would do this first:

sudo mkdir /data
sudo chown ldm:ldm /data
mkdir -p /data/realtime/fetched/radar

I think you are coming from a root-user perspective, and assuming that
everything for which you don’t explicitly specify should be
root-owned.

I’m assuming that if I’m issuing a command to create something (e.g. a
directory tree) for a non-root user, everything that command creates
should be owned by that non-root user. If I wanted part of my
directory tree owned by root, I would do that with a separate command
without the owner specified.

– Chad


#5

Hi all,

I’m glad to see some discussion of this issue hitting the list. It’s
clear from a quick search of the chef bug database, that this is
something that keeps coming up. For example, see:

http://tickets.opscode.com/browse/CHEF-205
http://tickets.opscode.com/browse/CHEF-1327
http://tickets.opscode.com/browse/CHEF-1621

And related:
http://tickets.opscode.com/browse/CHEF-690
http://tickets.opscode.com/browse/CHEF-933

Although it may confuse some users, there is agreement that having
chef change ownership of already existing parent directories is not
desirable.

The question is about the ownership of parent directories that are
created as a result of the “mkdir -p” :recursive option to the
Directory resource.

On Sun, Oct 3, 2010 at 5:42 PM, Chad Woolley thewoolleyman@gmail.com wrote:

I’m assuming that if I’m issuing a command to create something (e.g. a
directory tree) for a non-root user, everything that command creates
should be owned by that non-root user. If I wanted part of my
directory tree owned by root, I would do that with a separate command
without the owner specified.

Initially, I shared this perspective about created parent directories
being owned by the user/group specified in the resource. It seems to
balance the convenience provided by the option; to achieve
root-owns-created-parents, one extra resource call is required,
whereas to get the specified user to own the entire tree is one call
instead of a call for each level (assuming these dirs don’t already
exist with different ownership).

But it’s that last part that gives me doubt about the right thing to
do. If you want to rely on parent directories having the same owner
as the leaf dir, then you should specify that explicitly.

While I find the specified-user-owns-created-parents behavior least
surprising, I have concerns that it is ultimately not useful. If you
actually need the results of that behavior, you should do it
explicitly else your recipe will be brittle.

So if I was to choose, I think I would leave current behavior as-is,
and add some notes to the documentation to clarify what happens when
parent directories are created.

Cheers,

  • seth

#6

On Sun, Oct 3, 2010 at 9:06 PM, Seth Falcon seth@opscode.com wrote:

While I find the specified-user-owns-created-parents behavior least
surprising, I have concerns that it is ultimately not useful. If you
actually need the results of that behavior, you should do it
explicitly else your recipe will be brittle.

It does meet the principle of least surprise, and there are
justifications for it.

However, I’d like to hear the justifications for why it is “not
useful” or “brittle”.

For single-role boxes and workstations, I personally like to have as
much as possible user-owned. This approach is gaining popularity
(e.g. RVM, Homebrew). For these type of boxes, the root-centric
approach doesn’t make as much sense as it did in the old days of
multi-user multi-use boxes where you had to lock stuff down so people
wouldn’t break it (and one of chef’s main purposes is to solve that!).
Having more stuff owned by root just forces you to use sudo more (and
all the associated issues, such as jumping through hoops to preserve
environment variables properly when using sudo). Heresy? Perhaps.
Do the simplest thing, I say…

But I digress. In any case, there should be a way to do this, even if
it is a non-default option. Having to explicitly create each
non-root-owned directory in a tree is silly.

– Chad


#7

Ohai Chefs!

On Sun, Oct 3, 2010 at 9:25 PM, Chad Woolley thewoolleyman@gmail.com wrote:

On Sun, Oct 3, 2010 at 9:06 PM, Seth Falcon seth@opscode.com wrote:

While I find the specified-user-owns-created-parents behavior least
surprising, I have concerns that it is ultimately not useful. If you
actually need the results of that behavior, you should do it
explicitly else your recipe will be brittle.

It does meet the principle of least surprise, and there are
justifications for it.

However, I’d like to hear the justifications for why it is “not
useful” or “brittle”.

What worries me about setting the owner/permissions on the
intermediate directories is that the final state of the system is
dependent on the starting state, so if /data (for example) doesn’t
exist, you would end up with that directory owned by the user
specified in the resource, but if it does exist and is owned (say) by
root, it would stay that way. In practice, that’s probably not a big
deal, since you’re probably going to start from a similar OS image,
but if you were to change the user or ACL settings, Chef would only
modify the directory specified in the resource, leaving the
intermediate directories effectively unmanaged, i.e., you’d have

old setting: /data
old setting: /data/realtime
old setting: /data/realtime/fetched
this one was updated: /data/realtime/fetched/radar

But I digress. In any case, there should be a way to do this, even if
it is a non-default option. Having to explicitly create each
non-root-owned directory in a tree is silly.

I think we’re really discussing two things here: 1) a short-hand
syntax to have Chef manage a directory tree with a single directory
resource and 2) one possible implementation of it.

I definitely see the value in (1), but I think we can do better on
(2). For example, what if you specify the top level directory to be
managed, like:

directory("/data/realtime/fetched/radar") do
recursive true
recurse_upto "/data/realtime"
owner “not-root”
# other settings
end

That way Chef knows which directories in the tree to manage, and when
you change the owner or modes, Chef can update all of them.

Thoughts?

– Chad

Dan DeLeo