Application cookbook memory leak


#1

I’m using

and

to deploy about 60 apps on different nodes
running chef-client every 30 minutes as a service.
After a few hours the chef-client process uses 500MB, after a day more
than 1 GB.
I have 14 nodes and only the ones that have the “application” cookbook
in the run list leak memory.

I’ve not been able to pin point the leak yet, my initial
investigation points to the implementation of
def method_missing here:

https://github.com/opscode-cookbooks/application/blob/master/resources/default.rb

but I’ve not confirmed yet.

I just wanted to ask if anyone else noticed it as well ? If no one else has
then it suggests it’s something specific to my setup and I should be looking
for the leak somewhere else.

Thanks
Karol


#2

On Thu, Nov 22, 2012 at 10:59 AM, Karol Hosiawa hosiawak@gmail.com wrote:

I’m using

https://github.com/opscode-cookbooks/application

and

https://github.com/opscode-cookbooks/application_ruby

to deploy about 60 apps on different nodes
running chef-client every 30 minutes as a service.
After a few hours the chef-client process uses 500MB, after a day more
than 1 GB.
I have 14 nodes and only the ones that have the “application” cookbook
in the run list leak memory.

We run chef-client from crontab so we don’t have that problem, but I have
seen something I think is related:
If the chef run fails during a deployment, the JSON dump file will be
humongous.
I haven’t dug in yet, but it looks like we are holding a reference to a
huge amount of state.

If you find out more, I’d be very interested to hear.

As an aside: the chef-client recently gained the ability to fork for each
run when running as a service. In theory that should mitigate the problem
for you.


#3

I experienced this too with other cookbook too. Some take longer than others. I would take months
to bloat on sode nodes, but it can take all machine ram.
I did tried to look for the problem before without success.
So I ended using monit[1] to monitor chef-client, and everytime this one used more than 350mb, it
restarted it.

[1] http://mmonit.com/monit/


Regards,
Alfredo Palhares


#4

Chef has built in support to fork the run when run from a daemon. Give it a
shot and see if it helps. You’ll likely still see the 500MB process usage,
but the forked copy (where the memory is being used) should be terminated
at the end of the run. [0]

You’ll want ‘chef-(client|solo) --fork’

Cheers,

–AJ

[0]

On 22 November 2012 23:25, Alfredo Palhares masterkorp@masterkorp.netwrote:

I experienced this too with other cookbook too. Some take longer than
others. I would take months
to bloat on sode nodes, but it can take all machine ram.
I did tried to look for the problem before without success.
So I ended using monit[1] to monitor chef-client, and everytime this one
used more than 350mb, it
restarted it.

[1] http://mmonit.com/monit/


Regards,
Alfredo Palhares


#5

Hi,

On Thursday 22 November 2012 10:59:04 Karol Hosiawa wrote:

I’m using
https://github.com/opscode-cookbooks/application
and
https://github.com/opscode-cookbooks/application_ruby
to deploy about 60 apps on different nodes
running chef-client every 30 minutes as a service.
After a few hours the chef-client process uses 500MB, after a day more
than 1 GB.
I have 14 nodes and only the ones that have the “application” cookbook
in the run list leak memory.

I just wanted to ask if anyone else noticed it as well ? If no one else has
then it suggests it’s something specific to my setup and I should be looking
for the leak somewhere else.

I can’t help, I just noticed that I might have suffered the same problem. I
didn’t investigate that deeply to pin it on the application-cookbook. But the
machine concerned has the application-cookbook applied and the others without
don’t suffer mem-leaks.
I think I “solved” it by running chef-client from cron instead of as a deamon.
Gotta see how the machine behaves…

Have fun,

Arnold


#6

I can confirm seeing this in the wild on multiple nodes in our
architecture. After upgrading everything to 10.16.2 and using the --fork
flag for our chef-client daemon the problem has gone away. I had to merge
in the --fork pull request myself, but it may have been merged into the
chef-client cookbook repository by now. If it hasn’t, it should. It’s an
important patch IMO.

-Kevin

On Thu, Nov 22, 2012 at 2:46 PM, Arnold Krille arnold@arnoldarts.de wrote:

Hi,

On Thursday 22 November 2012 10:59:04 Karol Hosiawa wrote:

I’m using
https://github.com/opscode-cookbooks/application
and
https://github.com/opscode-cookbooks/application_ruby
to deploy about 60 apps on different nodes
running chef-client every 30 minutes as a service.
After a few hours the chef-client process uses 500MB, after a day more
than 1 GB.
I have 14 nodes and only the ones that have the “application” cookbook
in the run list leak memory.

I just wanted to ask if anyone else noticed it as well ? If no one else
has
then it suggests it’s something specific to my setup and I should be
looking
for the leak somewhere else.

I can’t help, I just noticed that I might have suffered the same problem. I
didn’t investigate that deeply to pin it on the application-cookbook. But
the
machine concerned has the application-cookbook applied and the others
without
don’t suffer mem-leaks.
I think I “solved” it by running chef-client from cron instead of as a
deamon.
Gotta see how the machine behaves…

Have fun,

Arnold


#7

Thanks everyone.

The suggestion to use forking is really sweeping the problem under the carpet.
If there’s a leak it should be fixed otherwise people using the
application cookbook
not aware of the need to fork chef-client will run into the same problem again.
Forking is also not an option on some platforms/Ruby VMs afaik.
I’ll report if I manage to find it.

Thanks
Karol


#8

On Sat, Nov 24, 2012 at 10:54 AM, Karol Hosiawa hosiawak@gmail.com wrote:

The suggestion to use forking is really sweeping the problem under the
carpet.
If there’s a leak it should be fixed otherwise people using the
application cookbook
not aware of the need to fork chef-client will run into the same problem
again.
Forking is also not an option on some platforms/Ruby VMs afaik.
I’ll report if I manage to find it.

Oh sure, it’s just that it might take a while to find, fix and merge; so I
was simply offering a short-term workaround.
But beyond that, of course we want a proper fix.