remote_file checksum


#1

Hi All,

I’ve been working with the remote_file a bit lately. I’m using chef-solo.

I’m misunderstanding the use of checksum attribute ( at least I hope)

with the following code , should it retry download until retries limit?

remote_file "/tmp/myfile.tmp" do
	source "http://example.com/test.tmp"
  	checksum "incorrect_should_cause_redownload"
	retries 3
	retry_delay 20
end

My understanding was that it should only fetch the source if the checksums don’t match.
I’ve tried this with forcing the use_conditional_get true and use_etag true.

It seems to ignore what I set checksum too , string , md5, and sha256 is what it should be.

Thanks
D.


#2

On Mar 11, 2014, at 9:59 AM, dcroche@gmail.com wrote:

with the following code , should it retry download until retries limit?

remote_file “/tmp/myfile.tmp” do
source "http://example.com/test.tmp"
checksum "incorrect_should_cause_redownload"
retries 3
retry_delay 20
end

If the checksum you provide doesn’t match the checksum of the remote file, then I don’t think it will download anything. If the checksums do match and you don’t have a local copy of the file, then there should be multiple download attempts.

My understanding was that it should only fetch the source if the checksums don’t match.
I’ve tried this with forcing the use_conditional_get true and use_etag true.

The checksum is compared against the remote file, not the local one.

It seems to ignore what I set checksum too , string , md5, and sha256 is what it should be.

IIUC, checksum is always sha256. If you use anything else, you are likely to be unpleasantly surprised.


Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu


#3

It’s downloading the file with a mismatch in the checksums with the previous example.

When using AWS S3 the non-multipart uploaded files seem to use etag md5 checksums.
Multipart uploads are different again, depending on how they are uploaded ( the web ui seems to multipart using 50Mb parts )

On 11 Mar 2014, at 15:05, Brad Knowles brad@shub-internet.org wrote:

On Mar 11, 2014, at 9:59 AM, dcroche@gmail.com wrote:

with the following code , should it retry download until retries limit?

remote_file “/tmp/myfile.tmp” do
source "http://example.com/test.tmp"
checksum "incorrect_should_cause_redownload"
retries 3
retry_delay 20
end

If the checksum you provide doesn’t match the checksum of the remote file, then I don’t think it will download anything. If the checksums do match and you don’t have a local copy of the file, then there should be multiple download attempts.

My understanding was that it should only fetch the source if the checksums don’t match.
I’ve tried this with forcing the use_conditional_get true and use_etag true.

The checksum is compared against the remote file, not the local one.

It seems to ignore what I set checksum too , string , md5, and sha256 is what it should be.

IIUC, checksum is always sha256. If you use anything else, you are likely to be unpleasantly surprised.


Brad Knowles brad@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu


#4

Hi,

On Tue, Mar 11, 2014 at 2:59 PM, dcroche@gmail.com wrote:

I’ve been working with the remote_file a bit lately. I’m using chef-solo.

I’m misunderstanding the use of checksum attribute ( at least I hope)

with the following code , should it retry download until retries limit?

remote_file “/tmp/myfile.tmp” do
source "http://example.com/test.tmp"
checksum "incorrect_should_cause_redownload"
retries 3
retry_delay 20
end

My understanding was that it should only fetch the source if the checksums
don’t match.
I’ve tried this with forcing the use_conditional_get true and use_etag
true.

The point of the checksum attribute is to skip downloading a file you’ve
already got. When Chef runs, it will checksum the local file - if that
matches the checksum in the resource, it assumes it already has the latest
copy and doesn’t do anything more. (That’s documented, somewhat tersely,
here: http://docs.opscode.com/resource_remote_file.html )

Note that the checksum is not used to verify that the file downloaded
matches the file you were expecting. There’s an open ticket requesting
that behaviour here: https://tickets.opscode.com/browse/CHEF-4647

So, with currently released versions of Chef, the behaviour you’d expect
from specifying an incorrect checksum is that Chef will download the file
every time it runs. It will only download it once in each run, and
"retries" will only come into play if the download was aborted for some
reason.

Zac


#5

HI Zac, That ticket describes what I’m getting. (CHEF-4647)

Guess my option is an only_if on the block using to verify checksums

Thanks to you both. Much appreciate it.

D.

On 11 Mar 2014, at 15:15, Zac Stevens zts@cryptocracy.com wrote:

Hi,

On Tue, Mar 11, 2014 at 2:59 PM, dcroche@gmail.com wrote:
I’ve been working with the remote_file a bit lately. I’m using chef-solo.

I’m misunderstanding the use of checksum attribute ( at least I hope)

with the following code , should it retry download until retries limit?

remote_file “/tmp/myfile.tmp” do
source "http://example.com/test.tmp"
checksum “incorrect_should_cause_redownload”

  retries 3
  retry_delay 20

end

My understanding was that it should only fetch the source if the checksums don’t match.
I’ve tried this with forcing the use_conditional_get true and use_etag true.

The point of the checksum attribute is to skip downloading a file you’ve already got. When Chef runs, it will checksum the local file - if that matches the checksum in the resource, it assumes it already has the latest copy and doesn’t do anything more. (That’s documented, somewhat tersely, here: http://docs.opscode.com/resource_remote_file.html )

Note that the checksum is not used to verify that the file downloaded matches the file you were expecting. There’s an open ticket requesting that behaviour here: https://tickets.opscode.com/browse/CHEF-4647

So, with currently released versions of Chef, the behaviour you’d expect from specifying an incorrect checksum is that Chef will download the file every time it runs. It will only download it once in each run, and “retries” will only come into play if the download was aborted for some reason.

Zac