Application cookbook memory leak

Karol_Hosiawa · November 22, 2012, 9:59am

I'm using

https://github.com/opscode-cookbooks/application

and

to deploy about 60 apps on different nodes
running chef-client every 30 minutes as a service.
After a few hours the chef-client process uses 500MB, after a day more
than 1 GB.
I have 14 nodes and only the ones that have the "application" cookbook
in the run list leak memory.

I've not been able to pin point the leak yet, my initial
investigation points to the implementation of
def method_missing here:

https://github.com/opscode-cookbooks/application/blob/master/resources/default.rb

but I've not confirmed yet.

I just wanted to ask if anyone else noticed it as well ? If no one else has
then it suggests it's something specific to my setup and I should be looking
for the leak somewhere else.

Thanks
Karol

Andrea_Campi · November 22, 2012, 10:07am

On Thu, Nov 22, 2012 at 10:59 AM, Karol Hosiawa hosiawak@gmail.com wrote:

I'm using

https://github.com/opscode-cookbooks/application

and

GitHub - poise/application_ruby: Development repository for Opscode Cookbook application_ruby

to deploy about 60 apps on different nodes
running chef-client every 30 minutes as a service.
After a few hours the chef-client process uses 500MB, after a day more
than 1 GB.
I have 14 nodes and only the ones that have the "application" cookbook
in the run list leak memory.

We run chef-client from crontab so we don't have that problem, but I have
seen something I think is related:
If the chef run fails during a deployment, the JSON dump file will be
humongous.
I haven't dug in yet, but it looks like we are holding a reference to a
huge amount of state.

If you find out more, I'd be very interested to hear.

As an aside: the chef-client recently gained the ability to fork for each
run when running as a service. In theory that should mitigate the problem
for you.

Alfredo_Palhares · November 22, 2012, 10:25am

I experienced this too with other cookbook too. Some take longer than others. I would take months
to bloat on sode nodes, but it can take all machine ram.
I did tried to look for the problem before without success.
So I ended using monit[1] to monitor chef-client, and everytime this one used more than 350mb, it
restarted it.

[1] http://mmonit.com/monit/

–
Regards,
Alfredo Palhares

AJ_Christensen · November 22, 2012, 11:10am

Chef has built in support to fork the run when run from a daemon. Give it a
shot and see if it helps. You'll likely still see the 500MB process usage,
but the forked copy (where the memory is being used) should be terminated
at the end of the run. [0]

You'll want 'chef-(client|solo) --fork'

Cheers,

--AJ

[0]

github.com

chef/chef/blob/main/lib/chef/application/client.rb#L188


      
          #
          # Author:: AJ Christensen (<aj@chef.io)
          # Author:: Christopher Brown (<cb@chef.io>)
          # Author:: Mark Mzyk (mmzyk@chef.io)
          # Copyright:: Copyright (c) Chef Software Inc.
          # License:: Apache License, Version 2.0
          #
          # Licensed under the Apache License, Version 2.0 (the "License");
          # you may not use this file except in compliance with the License.
          # You may obtain a copy of the License at
          #
          #     http://www.apache.org/licenses/LICENSE-2.0
          #
          # Unless required by applicable law or agreed to in writing, software
          # distributed under the License is distributed on an "AS IS" BASIS,
          # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          # See the License for the specific language governing permissions and
          # limitations under the License.
          
          require_relative "base"
          require_relative "../handler/error_report"
          require_relative "../workstation_config_loader"
          autoload :URI, "uri"
          require "chef-utils" unless defined?(ChefUtils::CANARY)
          module Mixlib
            module Authentication
              autoload :Log, "mixlib/authentication"
            end
          end
          autoload :Train, "train"
          
          # DO NOT MAKE EDITS, see Chef::Application::Base
          #
          # External code may call / subclass or make references to this class.
          #
          class Chef::Application::Client < Chef::Application::Base
          
            option :config_file,
              short: "-c CONFIG",
              long: "--config CONFIG",
              description: "The configuration file to use."
          
            unless ChefUtils.windows?
              option :daemonize,
                short: "-d [WAIT]",
                long: "--daemonize [WAIT]",
                description: "Daemonize the process. Accepts an optional integer which is the " \
                  "number of seconds to wait before the first daemonized run.",
                proc: lambda { |wait| /^\d+$/.match?(wait) ? wait.to_i : true }
            end
          
            option :pid_file,
              short: "-P PID_FILE",
              long: "--pid PIDFILE",
              description: "Set the PID file location, for the #{ChefUtils::Dist::Infra::CLIENT} daemon process. Defaults to /tmp/chef-client.pid.",
              proc: nil
          
            option :runlist,
              short: "-r RunlistItem,RunlistItem...",
              long: "--runlist RunlistItem,RunlistItem...",
              description: "Permanently replace current run list with specified items.",
              proc: lambda { |items|
                items = items.split(",")
                items.compact.map do |item|
                  Chef::RunList::RunListItem.new(item)
                end
              }
          
            option :recipe_url,
              long: "--recipe-url=RECIPE_URL",
              description: "Pull down a remote archive of recipes and unpack it to the cookbook cache. Only used in local mode."
          
            # Reconfigure the chef client
            # Re-open the JSON attributes and load them into the node
            def reconfigure
              super
          
              raise Chef::Exceptions::PIDFileLockfileMatch if Chef::Util::PathHelper.paths_eql? (Chef::Config[:pid_file] || "" ), (Chef::Config[:lockfile] || "")
          
              set_specific_recipes
          
              Chef::Config[:fips] = config[:fips] if config.key? :fips
          
              Chef::Config[:chef_server_url] = config[:chef_server_url] if config.key? :chef_server_url
          
              Chef::Config.local_mode = config[:local_mode] if config.key?(:local_mode)
          
              if Chef::Config.key?(:chef_repo_path) && Chef::Config.chef_repo_path.nil?
                Chef::Config.delete(:chef_repo_path)
                Chef::Log.warn "chef_repo_path was set in a config file but was empty. Assuming #{Chef::Config.chef_repo_path}"
              end
          
              if Chef::Config.local_mode && !Chef::Config.key?(:cookbook_path) && !Chef::Config.key?(:chef_repo_path)
                Chef::Config.chef_repo_path = Chef::Config.find_chef_repo_path(Dir.pwd)
              end
          
              if Chef::Config[:recipe_url]
                if !Chef::Config.local_mode
                  Chef::Application.fatal!("recipe-url can be used only in local-mode")
                else
                  if Chef::Config[:delete_entire_chef_repo]
                    Chef::Log.trace "Cleanup path #{Chef::Config.chef_repo_path} before extract recipes into it"
                    FileUtils.rm_rf(Chef::Config.chef_repo_path, secure: true)
                  end
                  Chef::Log.trace "Creating path #{Chef::Config.chef_repo_path} to extract recipes into"
                  FileUtils.mkdir_p(Chef::Config.chef_repo_path)
                  tarball_path = File.join(Chef::Config.chef_repo_path, "recipes.tgz")
                  fetch_recipe_tarball(Chef::Config[:recipe_url], tarball_path)
                  Mixlib::Archive.new(tarball_path).extract(Chef::Config.chef_repo_path, perms: false, ignore: /^\.$/)
                  config_path = File.join(Chef::Config.chef_repo_path, "#{ChefUtils::Dist::Infra::USER_CONF_DIR}/config.rb")
                  Chef::Config.from_string(IO.read(config_path), config_path) if File.file?(config_path)
                end
              end
          
              Chef::Config.chef_zero.host = config[:chef_zero_host] if config[:chef_zero_host]
              Chef::Config.chef_zero.port = config[:chef_zero_port] if config[:chef_zero_port]
          
              if config[:target] || Chef::Config.target
                Chef::Config.target_mode.host = config[:target] || Chef::Config.target
                if URI.parse(Chef::Config.target_mode.host).scheme
                  train_config = Train.unpack_target_from_uri(Chef::Config.target_mode.host)
                  Chef::Config.target_mode = train_config
                end
                Chef::Config.target_mode.enabled = true
                Chef::Config.node_name = Chef::Config.target_mode.host unless Chef::Config.node_name
              end
          
              if Chef::Config[:daemonize]
                Chef::Config[:interval] ||= 1800
              end
          
              if Chef::Config[:once]
                Chef::Config[:interval] = nil
                Chef::Config[:splay] = nil
              end
          
              # supervisor processes are enabled by default for interval-running processes but not for one-shot runs
              if Chef::Config[:client_fork].nil?
                Chef::Config[:client_fork] = !!Chef::Config[:interval]
              end
          
              if Chef::Config[:interval]
                if Chef::Platform.windows?
                  Chef::Application.fatal!(windows_interval_error_message)
                elsif !Chef::Config[:client_fork]
                  Chef::Application.fatal!(unforked_interval_error_message)
                end
              end
          
              if Chef::Config[:json_attribs]
                config_fetcher = Chef::ConfigFetcher.new(Chef::Config[:json_attribs])
                @chef_client_json = config_fetcher.fetch_json
              end
            end
          
            def load_config_file
              if !config.key?(:config_file) && !config[:disable_config]
                if config[:local_mode]
                  config[:config_file] = Chef::WorkstationConfigLoader.new(nil, Chef::Log).config_location
                else
                  config[:config_file] = Chef::Config.platform_specific_path("#{ChefConfig::Config.etc_chef_dir}/client.rb")
                end
              end
          
              # Load the client.rb configuration
              super
          
              # Load all config files in client.d
              load_dot_d(Chef::Config[:client_d_dir]) if Chef::Config[:client_d_dir]
            end
          
            def configure_logging
              super
              Mixlib::Authentication::Log.use_log_devices( Chef::Log )
              Ohai::Log.use_log_devices( Chef::Log )
            end
          
          end

This file has been truncated. show original

On 22 November 2012 23:25, Alfredo Palhares masterkorp@masterkorp.netwrote:

I experienced this too with other cookbook too. Some take longer than
others. I would take months
to bloat on sode nodes, but it can take all machine ram.
I did tried to look for the problem before without success.
So I ended using monit[1] to monitor chef-client, and everytime this one
used more than 350mb, it
restarted it.

[1] Easy, proactive monitoring of processes, programs, files, directories, filesystems and hosts | Monit

--
Regards,
Alfredo Palhares

Arnold_Krille · November 22, 2012, 8:46pm

Hi,

On Thursday 22 November 2012 10:59:04 Karol Hosiawa wrote:

I'm using
GitHub - poise/application: A Chef cookbook to deploy applications.
and
GitHub - poise/application_ruby: Development repository for Opscode Cookbook application_ruby
to deploy about 60 apps on different nodes
running chef-client every 30 minutes as a service.
After a few hours the chef-client process uses 500MB, after a day more
than 1 GB.
I have 14 nodes and only the ones that have the "application" cookbook
in the run list leak memory.

I just wanted to ask if anyone else noticed it as well ? If no one else has
then it suggests it's something specific to my setup and I should be looking
for the leak somewhere else.

I can't help, I just noticed that I might have suffered the same problem. I
didn't investigate that deeply to pin it on the application-cookbook. But the
machine concerned has the application-cookbook applied and the others without
don't suffer mem-leaks.
I think I "solved" it by running chef-client from cron instead of as a deamon.
Gotta see how the machine behaves...

Have fun,

Arnold

Kevin_Nuckolls · November 23, 2012, 11:56pm

I can confirm seeing this in the wild on multiple nodes in our
architecture. After upgrading everything to 10.16.2 and using the --fork
flag for our chef-client daemon the problem has gone away. I had to merge
in the --fork pull request myself, but it may have been merged into the
chef-client cookbook repository by now. If it hasn't, it should. It's an
important patch IMO.

-Kevin

On Thu, Nov 22, 2012 at 2:46 PM, Arnold Krille arnold@arnoldarts.de wrote:

Hi,

On Thursday 22 November 2012 10:59:04 Karol Hosiawa wrote:

I'm using
https://github.com/opscode-cookbooks/application
and
GitHub - poise/application_ruby: Development repository for Opscode Cookbook application_ruby
to deploy about 60 apps on different nodes
running chef-client every 30 minutes as a service.
After a few hours the chef-client process uses 500MB, after a day more
than 1 GB.
I have 14 nodes and only the ones that have the "application" cookbook
in the run list leak memory.

I just wanted to ask if anyone else noticed it as well ? If no one else
has
then it suggests it's something specific to my setup and I should be
looking
for the leak somewhere else.

I can't help, I just noticed that I might have suffered the same problem. I
didn't investigate that deeply to pin it on the application-cookbook. But
the
machine concerned has the application-cookbook applied and the others
without
don't suffer mem-leaks.
I think I "solved" it by running chef-client from cron instead of as a
deamon.
Gotta see how the machine behaves...

Have fun,

Arnold

Karol_Hosiawa · November 24, 2012, 9:54am

Thanks everyone.

The suggestion to use forking is really sweeping the problem under the carpet.
If there’s a leak it should be fixed otherwise people using the
application cookbook
not aware of the need to fork chef-client will run into the same problem again.
Forking is also not an option on some platforms/Ruby VMs afaik.
I’ll report if I manage to find it.

Thanks
Karol

Andrea_Campi · November 24, 2012, 3:43pm

On Sat, Nov 24, 2012 at 10:54 AM, Karol Hosiawa hosiawak@gmail.com wrote:

The suggestion to use forking is really sweeping the problem under the
carpet.
If there's a leak it should be fixed otherwise people using the
application cookbook
not aware of the need to fork chef-client will run into the same problem
again.
Forking is also not an option on some platforms/Ruby VMs afaik.
I'll report if I manage to find it.

Oh sure, it's just that it might take a while to find, fix and merge; so I
was simply offering a short-term workaround.
But beyond that, of course we want a proper fix.

Topic		Replies	Views
Debugging memory leak issues with chef-client? Chef Infra (archive)	6	805	July 9, 2013
Application cookbooks status: Shipped! Chef Infra (archive)	0	251	May 24, 2012
Memory leak Chef Infra (archive)	6	357	June 18, 2013
Https://github.com/opscode-cookbooks/application Chef Infra (archive)	17	381	November 12, 2012
Chef-client memory usage Chef Infra (archive)	26	3035	December 23, 2011

Application cookbook memory leak

Related topics