On Wed, 15 Jun 2011 09:21:32 -0700
Daniel DeLeo dan@kallistec.com wrote:
On Wednesday, June 15, 2011 at 8:43 AM, Brian Akins wrote:
On Wed, Jun 15, 2011 at 10:38 AM, Allan Wind
<allan_wind@lifeintegrity.com
(mailto:allan_wind@lifeintegrity.com)> wrote:
$ strace cp 1 2
...
open("1", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1048576, ...}) = 0
open("2", O_WRONLY|O_TRUNC) = 4
Confirmed that FileUtils.cp does the same. Probably not a good
thing.
The original behavior of these providers was to use FileUtils.cp, and
this is why it was changed. That said, mv is only atomic for moves on
the same partition. We had a discussion about this on the dev list a
few weeks back in the context of Windows ACLs, which are similarly
mangled when using mv.
cp's drawbacks - compared to mv - which are that
(a) it is not atomic and
(b) it truncates the file and then does a write
with the new contents -
- are a problem in some case, sure. I think it
depends on what kind of file is being template'd by
chef.
One of the most frequent use cases is to automate
the editing of config files - something that without
chef, is done using a text editor. I'm guessing most
editors simply do an open() with WRONLY|TRUNC, then
write out the new contents. This is not any worse
than cp.
Take ed for instance:
test.txt contains only one line -> "The first line.\n"
echo -e '$\na\n#a new line\n.\nwq\n' | strace ed test.txt
...
open("test.txt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
fd=3 seems to be a temp. file in this case.
write(3, "#a new line", 11) = 11
lseek(3, 0, SEEK_SET) = 0
read(3, "The first line.#a new line", 4096) = 26
fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f44573a5000 write(4, "The first line.\n#a new line\n", 28) = 28
close(4) = 0
So for the majority of config file updates,
chef calling FileUtils.cp does not seem any worse than
editing the files by hand. (It's probably the
closest emulation of that behavior).
However, O_TRUNC might be a problem in other cases;
I haven't come across one yet, but it's probably just
a matter of time.
My favored solution is to create a temporary file right next to the
target file, so if you were writing /etc/chef/client.rb from a
template, you would use /etc/chef/client.rb.RANDOM_SLUG as the
temporary file. Of course, in the case that something has registered
to watch this directory, you could get misbehavior, and extreme
failure cases, such as segfaults in the ruby runtime could leave
these files in that directory, which would be problematic for
multi-file (e.g., conf.d and friends) configurations. It also does
not solve the permissions issue raised here, but that can be fixed by
patching the providers to set the desired file mode to the current
one when not explicitly specified; there is nothing that can be done
to make cp atomic.
That's right... I think we're dealing with the classic
difficulty of trying to ensure atomic file write operations
in a cross-platform manner.
Are the chances of
(a) a left-behind RANDOM_SLUG breaking a system after such
an extreme failure
much smaller than:
(b) a daemon being negatively impacted because of a
non-atomic config file update?
If Yes, and if (a) gives up atomicity, well why not.
I should also mention that I'm a bit uncomfortable with the outcome
of converging a resource being dependent on initial system state
outside of the resource, but then again, I would not go so far as to
make file modes required for file resources. Suggestions or thoughts
about this would be welcome.
That's indeed a tricky question. I lean towards not changing the
property of a resource unless the change was specified, but I
also share your discomfort about this: The problem is that
there will remain an undocumented/unknown property of
the system (in this case the mode of a file) which is
vital to the system as a whole. (an unknown unknown)
Making the mode required could potentially do away with
all the ambiguity; but then that would break so many
recipes already in use.
One possible (though somewhat uncool) middle-ground approach would
be to print a converge-time warning when initial system state does
not match the resources' defaults - and continue the converge keeping
the state unchanged - the risks of breaking a running system
because a file suddenly became unreadable by a daemon are very real.
I do think that running systems should be left to depend on
undocumented state. If a chef recipe worked well simply because say,
a certain library needed by some part of the system just happened
to be available, but this fact was never made explicit in the
recipe - then that situation needs to be identified and fixed.
(In this case, whether or not chef makes the the mode attribute
required for files). It's a balance between that and the risk
of breaking something in production.
Faiz