How to copy big files from the cb to the node w/o running out of memory?


#1

Hi all,

I’m building a recipe where I’ve to copy 2 2GB files. My VM has 4GB and it runs out of memory when I use remote_file our cookbook_file to copy the files to it.

Any idea on how to copy the files on some kind of streaming mode to avoid running out of memory?

Thanks!


#2

This is one way to do it. Mind you it doesn’t do things like backup the current version, retry on network failures, etc…

    ruby_block "httpclient - #{remote_path}" do
      block do
        require 'httpclient'
        Chef::Log.info("Downloading #{remote_path} to #{local_path}")
        ::File.open(local_path, 'w') do |file|
          ::HTTPClient.new.get_content(remote_path) do |chunk|
            file.write(chunk)
          end
        end
      end
    end

#3

Hi Brian,

Thanks for the response!

Do we have a way to access the cookbook host as an HTTP server when using
Chef Zero? or the only way is to setup an HTTP server for that?

Thanks


#4

The cookbook_file resource should work fine AFAIK. If it doesn’t, that’s a bug.


#5

Cookbooks are really not the right tool for distributing large files like that, so you did the right thing by not using cookbook_file.

When your file is such a large percentage of the total memory, I expect that remote_file’s internal implementation is too inefficient. If I was to venture a guess, the problem may be that remote_file computes a hash of file, and may keep the file in memory for that purpose.

Try replacing remote_file with an execute resource that uses curl, or maybe a bash resource.

Kevin Keane
The NetTech
http://www.4nettech.com
Our values: Privacy, Liberty, Justice
See https://www.4nettech.com/corp/the-nettech-values.html


#6

I agree with Noah, this should work. Possibly you’re hitting the issue when chef generates the checksum or the diff (though I thought we had a maximum file size for attempting a diff). Would need more detailed logs to figure it out.