Chef-client memory usage

My company is pretty late to the Chef party, only getting things started
about 6 months ago (after a year of asking for it), but now that we have
things up and running we’ve run into a bit of a problem. The client
consumes a fairly large amount of memory, between 175-250m per server. This
has caused a lot of concern from the Operations team since that amount * N
VMs can get quite expensive. I’ve been doing some research into this and
noticed that the amount of resident memory can depend on how many recipes
are loaded on a node, and Opscode docs seem to confirm this. Right now
these cookbooks are loaded into a single base role and added to each node
for ease of use. They’re all OS level recipes to manage hostfiles,
resolv.conf etc… etc… There are 20 total. We also have application roles
that can add another 3 or 4 recipes.
I’ve hacked around a bit on the Samba cookbook and removed all the code
used to create users, which has lowered the memory foot print down to a
steady 192m, but i fear this won’t be enough to convince my ops team to
keep chef. They want to dump it and go back to using shell and perl scripts
for everything.

My question is, does anyone have any tips for reducing the memory usage?
I’d like to be able to keep Chef around.

Thanks!

On Dec 8, 2011, at 3:43 PM, Chris wrote:

My company is pretty late to the Chef party, only getting things started about 6 months ago (after a year of asking for it), but now that we have things up and running we've run into a bit of a problem. The client consumes a fairly large amount of memory, between 175-250m per server. This has caused a lot of concern from the Operations team since that amount * N VMs can get quite expensive. I've been doing some research into this and noticed that the amount of resident memory can depend on how many recipes are loaded on a node, and Opscode docs seem to confirm this. Right now these cookbooks are loaded into a single base role and added to each node for ease of use. They're all OS level recipes to manage hostfiles, resolv.conf etc.. etc.. There are 20 total. We also have application roles that can add another 3 or 4 recipes.

We've been doing chef since August using 0.10.4 on CentOS 5.6. We currently have 43 cookbooks and 37 roles across all of our machines, but I use roles very heavily (I'll test a new cookbook as a new role on a new machine and then when I'm happy I might include that role as part of another larger role). We just spun up a "staging" environment today, which added twelve new nodes, taking us up to 33 total being managed by chef. On one of our most complex nodes, the run_list has five main roles loaded, while the expanded run_list is sixteen roles and comprises thirty recipes.

I checked, and when chef-client is active, we hit a VSS of about 195MB, but a Resident (working) Set Size of 60-70MB. Even a dry run includes multiple invocations of Python, Perl, and various other programs and languages, many of which have VSS & RSS that are almost as big as chef-client, even though they might only persist for a few seconds during the run.

In comparison, the RevealCloud agent that we run on every machine has a VSS of ~160MB, although the RSS is just over 2MB. This machine is brand-new and is virtually idle, but each httpd process has a VSS of ~150MB and an RSS of just under 5MB, and we spin up a total of seventeen of them.

This is on a Rackspace "flavor 3" VM which has allocated to it 1GB of RAM, 40GB of hard disk space (~35GB usable), etc.... There are only two VM images that Rackspace makes available that are smaller than this -- a "flavor 2" with 512MB of RAM, and a "flavor 1" with 256MB of RAM.

Compared to all the other things that this VM is doing, the overhead of chef-client seems pretty reasonable to me -- not really any more than another httpd process, or the overhead from the RevealCloud monitoring system. Not something that I would consider totally negligible, but also not that significant.

Speaking only for myself, I believe that if you've got systems where you really are this tightly constrained for memory, then I think you've got much bigger problems than whether or not you can afford to run chef-client.

--
Brad Knowles bknowles@ihiji.com
SAGE Level IV, Chef Level 0.0.1

Hi Brad,

I agree with you, there are other problems surrounding our memory
constraints. Namely, I'm not allowed to take memory away from Java and Java
is never allowed to go into swap. We routinely provision 8GB VMs and give
Java 7 of that, and the Java dev teams won't budge on that.

I'm really surprised that your client is so lean, but you're CentOS version
is also newer then ours. We have systems ranging from 5.2-5.5, most have
never been patched. I'm pretty sure we're missing out on some
optimizations. I'm curious, are most of your cookbooks from Opscode or home
grown?

On Thu, Dec 8, 2011 at 3:00 PM, Brad Knowles bknowles@ihiji.com wrote:

On Dec 8, 2011, at 3:43 PM, Chris wrote:

My company is pretty late to the Chef party, only getting things started
about 6 months ago (after a year of asking for it), but now that we have
things up and running we've run into a bit of a problem. The client
consumes a fairly large amount of memory, between 175-250m per server. This
has caused a lot of concern from the Operations team since that amount * N
VMs can get quite expensive. I've been doing some research into this and
noticed that the amount of resident memory can depend on how many recipes
are loaded on a node, and Opscode docs seem to confirm this. Right now
these cookbooks are loaded into a single base role and added to each node
for ease of use. They're all OS level recipes to manage hostfiles,
resolv.conf etc.. etc.. There are 20 total. We also have application roles
that can add another 3 or 4 recipes.

We've been doing chef since August using 0.10.4 on CentOS 5.6. We
currently have 43 cookbooks and 37 roles across all of our machines, but I
use roles very heavily (I'll test a new cookbook as a new role on a new
machine and then when I'm happy I might include that role as part of
another larger role). We just spun up a "staging" environment today, which
added twelve new nodes, taking us up to 33 total being managed by chef. On
one of our most complex nodes, the run_list has five main roles loaded,
while the expanded run_list is sixteen roles and comprises thirty recipes.

I checked, and when chef-client is active, we hit a VSS of about 195MB,
but a Resident (working) Set Size of 60-70MB. Even a dry run includes
multiple invocations of Python, Perl, and various other programs and
languages, many of which have VSS & RSS that are almost as big as
chef-client, even though they might only persist for a few seconds during
the run.

In comparison, the RevealCloud agent that we run on every machine has a
VSS of ~160MB, although the RSS is just over 2MB. This machine is
brand-new and is virtually idle, but each httpd process has a VSS of ~150MB
and an RSS of just under 5MB, and we spin up a total of seventeen of them.

This is on a Rackspace "flavor 3" VM which has allocated to it 1GB of RAM,
40GB of hard disk space (~35GB usable), etc.... There are only two VM
images that Rackspace makes available that are smaller than this -- a
"flavor 2" with 512MB of RAM, and a "flavor 1" with 256MB of RAM.

Compared to all the other things that this VM is doing, the overhead of
chef-client seems pretty reasonable to me -- not really any more than
another httpd process, or the overhead from the RevealCloud monitoring
system. Not something that I would consider totally negligible, but also
not that significant.

Speaking only for myself, I believe that if you've got systems where you
really are this tightly constrained for memory, then I think you've got
much bigger problems than whether or not you can afford to run chef-client.

--
Brad Knowles bknowles@ihiji.com
SAGE Level IV, Chef Level 0.0.1

--
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

On Dec 8, 2011, at 5:09 PM, Chris wrote:

I agree with you, there are other problems surrounding our memory constraints. Namely, I'm not allowed to take memory away from Java and Java is never allowed to go into swap. We routinely provision 8GB VMs and give Java 7 of that, and the Java dev teams won't budge on that.

Ouch. Yeah, politics is going to hurt.

I'm really surprised that your client is so lean, but you're CentOS version is also newer then ours. We have systems ranging from 5.2-5.5, most have never been patched. I'm pretty sure we're missing out on some optimizations. I'm curious, are most of your cookbooks from Opscode or home grown?

We started with the Opscode cookbooks, but some of them have been modified pretty heavily, and we've created a few of our own. Some of the cookbooks have come from elsewhere on github, with some local modifications.

I'm hoping that I will be able to get up to speed enough on git in the near future that we can contribute pretty much all our work back to github, so that the community can upgrade the Opscode cookbooks as appropriate, or the few non-Opscode cookbooks that we have used. Most importantly, I think we may be the first non-Ubuntu site to make use of the edelight cookbook for MongoDB, and I'd really like to get our enhancements folded back in.

We've already signed all the CLA and CCLA forms, so now it's more a matter of me finding the time and inclination to "git" up to speed.

One other observation I'd like to make about VSS & RSS -- I think this might depend on your HyperVisor implementation, but I would be surprised if you didn't get "shared" pages with multiple different VMs on the system each with their own copy of chef-client that is running.

So, the real memory impact would not be the number of VMs times the VSS (as they claimed), but more like some cost-reduction factor times the number of VMs times the RSS for each of those chef-client instances. The more VMs you've got on a single machine, I think the more memory overall that you would save as a result of getting something akin to deduplication being performed by the HyperVisor in conjunction with whatever OS is running in Ring Zero. All they need to do is implement standard "copy-on-write" functionality for each of the affected pages.

--
Brad Knowles bknowles@ihiji.com
SAGE Level IV, Chef Level 0.0.1

Manage the ammouny of data being shoved into your node object. This is
typically where the memory is being consumed. Disabling ohai plugins you
don't use might help.

Also pay attention to how your cookbooks are using search. Search results
can get very large depending on node size and nodes returned.
On Dec 8, 2011 3:34 PM, "Brad Knowles" bknowles@ihiji.com wrote:

On Dec 8, 2011, at 5:09 PM, Chris wrote:

I agree with you, there are other problems surrounding our memory
constraints. Namely, I'm not allowed to take memory away from Java and Java
is never allowed to go into swap. We routinely provision 8GB VMs and give
Java 7 of that, and the Java dev teams won't budge on that.

Ouch. Yeah, politics is going to hurt.

I'm really surprised that your client is so lean, but you're CentOS
version is also newer then ours. We have systems ranging from 5.2-5.5, most
have never been patched. I'm pretty sure we're missing out on some
optimizations. I'm curious, are most of your cookbooks from Opscode or home
grown?

We started with the Opscode cookbooks, but some of them have been modified
pretty heavily, and we've created a few of our own. Some of the cookbooks
have come from elsewhere on github, with some local modifications.

I'm hoping that I will be able to get up to speed enough on git in the
near future that we can contribute pretty much all our work back to github,
so that the community can upgrade the Opscode cookbooks as appropriate, or
the few non-Opscode cookbooks that we have used. Most importantly, I think
we may be the first non-Ubuntu site to make use of the edelight cookbook
for MongoDB, and I'd really like to get our enhancements folded back in.

We've already signed all the CLA and CCLA forms, so now it's more a matter
of me finding the time and inclination to "git" up to speed.

One other observation I'd like to make about VSS & RSS -- I think this
might depend on your HyperVisor implementation, but I would be surprised if
you didn't get "shared" pages with multiple different VMs on the system
each with their own copy of chef-client that is running.

So, the real memory impact would not be the number of VMs times the VSS
(as they claimed), but more like some cost-reduction factor times the
number of VMs times the RSS for each of those chef-client instances. The
more VMs you've got on a single machine, I think the more memory overall
that you would save as a result of getting something akin to deduplication
being performed by the HyperVisor in conjunction with whatever OS is
running in Ring Zero. All they need to do is implement standard
"copy-on-write" functionality for each of the affected pages.

--
Brad Knowles bknowles@ihiji.com
SAGE Level IV, Chef Level 0.0.1

Also, what version of ruby are you running?

Adam


Opscode, Inc.
Adam Jacob, Chief Customer Officer
T: (206) 619-7151 E: adam@opscode.com

On Dec 8, 2011, at 4:09 PM, Chris wrote:

Hi Brad,

I agree with you, there are other problems surrounding our memory constraints. Namely, I'm not allowed to take memory away from Java and Java is never allowed to go into swap. We routinely provision 8GB VMs and give Java 7 of that, and the Java dev teams won't budge on that.

I'm really surprised that your client is so lean, but you're CentOS version is also newer then ours. We have systems ranging from 5.2-5.5, most have never been patched. I'm pretty sure we're missing out on some optimizations. I'm curious, are most of your cookbooks from Opscode or home grown?

On Thu, Dec 8, 2011 at 3:00 PM, Brad Knowles bknowles@ihiji.com wrote:

On Dec 8, 2011, at 3:43 PM, Chris wrote:

My company is pretty late to the Chef party, only getting things started about 6 months ago (after a year of asking for it), but now that we have things up and running we've run into a bit of a problem. The client consumes a fairly large amount of memory, between 175-250m per server. This has caused a lot of concern from the Operations team since that amount * N VMs can get quite expensive. I've been doing some research into this and noticed that the amount of resident memory can depend on how many recipes are loaded on a node, and Opscode docs seem to confirm this. Right now these cookbooks are loaded into a single base role and added to each node for ease of use. They're all OS level recipes to manage hostfiles, resolv.conf etc.. etc.. There are 20 total. We also have application roles that can add another 3 or 4 recipes.

We've been doing chef since August using 0.10.4 on CentOS 5.6. We currently have 43 cookbooks and 37 roles across all of our machines, but I use roles very heavily (I'll test a new cookbook as a new role on a new machine and then when I'm happy I might include that role as part of another larger role). We just spun up a "staging" environment today, which added twelve new nodes, taking us up to 33 total being managed by chef. On one of our most complex nodes, the run_list has five main roles loaded, while the expanded run_list is sixteen roles and comprises thirty recipes.

I checked, and when chef-client is active, we hit a VSS of about 195MB, but a Resident (working) Set Size of 60-70MB. Even a dry run includes multiple invocations of Python, Perl, and various other programs and languages, many of which have VSS & RSS that are almost as big as chef-client, even though they might only persist for a few seconds during the run.

In comparison, the RevealCloud agent that we run on every machine has a VSS of ~160MB, although the RSS is just over 2MB. This machine is brand-new and is virtually idle, but each httpd process has a VSS of ~150MB and an RSS of just under 5MB, and we spin up a total of seventeen of them.

This is on a Rackspace "flavor 3" VM which has allocated to it 1GB of RAM, 40GB of hard disk space (~35GB usable), etc.... There are only two VM images that Rackspace makes available that are smaller than this -- a "flavor 2" with 512MB of RAM, and a "flavor 1" with 256MB of RAM.

Compared to all the other things that this VM is doing, the overhead of chef-client seems pretty reasonable to me -- not really any more than another httpd process, or the overhead from the RevealCloud monitoring system. Not something that I would consider totally negligible, but also not that significant.

Speaking only for myself, I believe that if you've got systems where you really are this tightly constrained for memory, then I think you've got much bigger problems than whether or not you can afford to run chef-client.

--
Brad Knowles bknowles@ihiji.com
SAGE Level IV, Chef Level 0.0.1

--
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

ruby 1.8.7 (2011-02-18 patchlevel 334) -- CentOS 5.5(mostly)

On Thu, Dec 8, 2011 at 11:37 PM, Adam Jacob adam@opscode.com wrote:

Also, what version of ruby are you running?

Adam


Opscode, Inc.
Adam Jacob, Chief Customer Officer
T: (206) 619-7151 E: adam@opscode.com

On Dec 8, 2011, at 4:09 PM, Chris wrote:

Hi Brad,

I agree with you, there are other problems surrounding our memory
constraints. Namely, I'm not allowed to take memory away from Java and Java
is never allowed to go into swap. We routinely provision 8GB VMs and give
Java 7 of that, and the Java dev teams won't budge on that.

I'm really surprised that your client is so lean, but you're CentOS
version is also newer then ours. We have systems ranging from 5.2-5.5, most
have never been patched. I'm pretty sure we're missing out on some
optimizations. I'm curious, are most of your cookbooks from Opscode or home
grown?

On Thu, Dec 8, 2011 at 3:00 PM, Brad Knowles bknowles@ihiji.com wrote:

On Dec 8, 2011, at 3:43 PM, Chris wrote:

My company is pretty late to the Chef party, only getting things
started about 6 months ago (after a year of asking for it), but now that we
have things up and running we've run into a bit of a problem. The client
consumes a fairly large amount of memory, between 175-250m per server. This
has caused a lot of concern from the Operations team since that amount * N
VMs can get quite expensive. I've been doing some research into this and
noticed that the amount of resident memory can depend on how many recipes
are loaded on a node, and Opscode docs seem to confirm this. Right now
these cookbooks are loaded into a single base role and added to each node
for ease of use. They're all OS level recipes to manage hostfiles,
resolv.conf etc.. etc.. There are 20 total. We also have application roles
that can add another 3 or 4 recipes.

We've been doing chef since August using 0.10.4 on CentOS 5.6. We
currently have 43 cookbooks and 37 roles across all of our machines, but I
use roles very heavily (I'll test a new cookbook as a new role on a new
machine and then when I'm happy I might include that role as part of
another larger role). We just spun up a "staging" environment today, which
added twelve new nodes, taking us up to 33 total being managed by chef. On
one of our most complex nodes, the run_list has five main roles loaded,
while the expanded run_list is sixteen roles and comprises thirty recipes.

I checked, and when chef-client is active, we hit a VSS of about 195MB,
but a Resident (working) Set Size of 60-70MB. Even a dry run includes
multiple invocations of Python, Perl, and various other programs and
languages, many of which have VSS & RSS that are almost as big as
chef-client, even though they might only persist for a few seconds during
the run.

In comparison, the RevealCloud agent that we run on every machine has a
VSS of ~160MB, although the RSS is just over 2MB. This machine is
brand-new and is virtually idle, but each httpd process has a VSS of ~150MB
and an RSS of just under 5MB, and we spin up a total of seventeen of them.

This is on a Rackspace "flavor 3" VM which has allocated to it 1GB of
RAM, 40GB of hard disk space (~35GB usable), etc.... There are only two VM
images that Rackspace makes available that are smaller than this -- a
"flavor 2" with 512MB of RAM, and a "flavor 1" with 256MB of RAM.

Compared to all the other things that this VM is doing, the overhead of
chef-client seems pretty reasonable to me -- not really any more than
another httpd process, or the overhead from the RevealCloud monitoring
system. Not something that I would consider totally negligible, but also
not that significant.

Speaking only for myself, I believe that if you've got systems where you
really are this tightly constrained for memory, then I think you've got
much bigger problems than whether or not you can afford to run chef-client.

--
Brad Knowles bknowles@ihiji.com
SAGE Level IV, Chef Level 0.0.1

--
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

--
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

On Friday, December 9, 2011 at 7:46 AM, Chris wrote:

ruby 1.8.7 (2011-02-18 patchlevel 334) -- CentOS 5.5(mostly)

Disabling unneeded ohai plugins will probably provide the biggest benefit:

http://wiki.opscode.com/display/chef/Disabling+Ohai+Plugins

You can also experiment with Ruby Enterprise Edition. The changes to memory allocation and GC tend to reduce heap fragmentation, which will in turn reduce RSS when chef-client is sleeping.

--
Dan DeLeo

I've been working on disabling plugins, but it doesn't seem to be working.
I've added these to client.rb, but when running the client with debug
output they still get loaded. Does order matter?

Ohai::Config[:disabled_plugins] = ["passwd"]
Ohai::Config[:disabled_plugins] = ["rackspace"]
Ohai::Config[:disabled_plugins] = ["dmi"]
Ohai::Config[:disabled_plugins] = ["dmi_common"]
Ohai::Config[:disabled_plugins] = ["erlang"]
Ohai::Config[:disabled_plugins] = ["groovy"]
Ohai::Config[:disabled_plugins] = ["php"]
Ohai::Config[:disabled_plugins] = ["eucalyptus"]
Ohai::Config[:disabled_plugins] = ["network_listeners"]
Ohai::Config[:disabled_plugins] = ["mono"]

On Fri, Dec 9, 2011 at 8:34 AM, Daniel DeLeo dan@kallistec.com wrote:

On Friday, December 9, 2011 at 7:46 AM, Chris wrote:

ruby 1.8.7 (2011-02-18 patchlevel 334) -- CentOS 5.5(mostly)

Disabling unneeded ohai plugins will probably provide the biggest benefit:

http://wiki.opscode.com/display/chef/Disabling+Ohai+Plugins

You can also experiment with Ruby Enterprise Edition. The changes to
memory allocation and GC tend to reduce heap fragmentation, which will in
turn reduce RSS when chef-client is sleeping.

--
Dan DeLeo

--
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

Ohai::Config[:disable_plugins] = [ "password", "rackspace", "dmi", "dmi_common" ]

Instead of N calls to Ohai::Config, which is over-writing the value.

Adam


Opscode, Inc.
Adam Jacob, Chief Customer Officer
T: (206) 619-7151 E: adam@opscode.com

On Dec 9, 2011, at 9:45 AM, Chris wrote:

I've been working on disabling plugins, but it doesn't seem to be working. I've added these to client.rb, but when running the client with debug output they still get loaded. Does order matter?

Ohai::Config[:disabled_plugins] = ["passwd"]
Ohai::Config[:disabled_plugins] = ["rackspace"]
Ohai::Config[:disabled_plugins] = ["dmi"]
Ohai::Config[:disabled_plugins] = ["dmi_common"]
Ohai::Config[:disabled_plugins] = ["erlang"]
Ohai::Config[:disabled_plugins] = ["groovy"]
Ohai::Config[:disabled_plugins] = ["php"]
Ohai::Config[:disabled_plugins] = ["eucalyptus"]
Ohai::Config[:disabled_plugins] = ["network_listeners"]
Ohai::Config[:disabled_plugins] = ["mono"]

On Fri, Dec 9, 2011 at 8:34 AM, Daniel DeLeo dan@kallistec.com wrote:

On Friday, December 9, 2011 at 7:46 AM, Chris wrote:

ruby 1.8.7 (2011-02-18 patchlevel 334) -- CentOS 5.5(mostly)

Disabling unneeded ohai plugins will probably provide the biggest benefit:

http://wiki.opscode.com/display/chef/Disabling+Ohai+Plugins

You can also experiment with Ruby Enterprise Edition. The changes to memory allocation and GC tend to reduce heap fragmentation, which will in turn reduce RSS when chef-client is sleeping.

--
Dan DeLeo

--
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

Yep, that worked. Thanks.

Client is sitting at 138m RSS now, which is a lot better. Hopefully it will
stop creeping up over time as well.

Thanks for all the help everyone.

On Fri, Dec 9, 2011 at 8:46 AM, Adam Jacob adam@opscode.com wrote:

Ohai::Config[:disable_plugins] = [ "password", "rackspace", "dmi",
"dmi_common" ]

Instead of N calls to Ohai::Config, which is over-writing the value.

Adam


Opscode, Inc.
Adam Jacob, Chief Customer Officer
T: (206) 619-7151 E: adam@opscode.com

On Dec 9, 2011, at 9:45 AM, Chris wrote:

I've been working on disabling plugins, but it doesn't seem to be working.
I've added these to client.rb, but when running the client with debug
output they still get loaded. Does order matter?

Ohai::Config[:disabled_plugins] = ["passwd"]
Ohai::Config[:disabled_plugins] = ["rackspace"]
Ohai::Config[:disabled_plugins] = ["dmi"]
Ohai::Config[:disabled_plugins] = ["dmi_common"]
Ohai::Config[:disabled_plugins] = ["erlang"]
Ohai::Config[:disabled_plugins] = ["groovy"]
Ohai::Config[:disabled_plugins] = ["php"]
Ohai::Config[:disabled_plugins] = ["eucalyptus"]
Ohai::Config[:disabled_plugins] = ["network_listeners"]
Ohai::Config[:disabled_plugins] = ["mono"]

On Fri, Dec 9, 2011 at 8:34 AM, Daniel DeLeo dan@kallistec.com wrote:

On Friday, December 9, 2011 at 7:46 AM, Chris wrote:

ruby 1.8.7 (2011-02-18 patchlevel 334) -- CentOS 5.5(mostly)

Disabling unneeded ohai plugins will probably provide the biggest benefit:

http://wiki.opscode.com/display/chef/Disabling+Ohai+Plugins

You can also experiment with Ruby Enterprise Edition. The changes to
memory allocation and GC tend to reduce heap fragmentation, which will in
turn reduce RSS when chef-client is sleeping.

--
Dan DeLeo

--
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

--
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

On Dec 9, 2011, at 11:58 AM, Chris wrote:

Yep, that worked. Thanks.

Client is sitting at 138m RSS now, which is a lot better. Hopefully it will stop creeping up over time as well.

Lately, we've been running monit (for various services) and we have it restart chef-client when memory is above x MB for n minutes.

--Brian

That's not a bad idea either, we already have monit running on each client node to restart the client when it crashes

Sent from a phone

On Dec 12, 2011, at 4:34 PM, Brian Akins brian@akins.org wrote:

On Dec 9, 2011, at 11:58 AM, Chris wrote:

Yep, that worked. Thanks.

Client is sitting at 138m RSS now, which is a lot better. Hopefully it will stop creeping up over time as well.

Lately, we've been running monit (for various services) and we have it restart chef-client when memory is above x MB for n minutes.

--Brian

I haven't actually had a chance to play with these myself, but if
you're on a modern linux distro, you may be able to use cgroups to
isolate memory usage on a per process basis. (ie, keep your Java
processes safe)

-s

On Mon, Dec 12, 2011 at 8:32 PM, Chris grocerylist@gmail.com wrote:

That's not a bad idea either, we already have monit running on each client node to restart the client when it crashes

Sent from a phone

On Dec 12, 2011, at 4:34 PM, Brian Akins brian@akins.org wrote:

On Dec 9, 2011, at 11:58 AM, Chris wrote:

Yep, that worked. Thanks.

Client is sitting at 138m RSS now, which is a lot better. Hopefully it will stop creeping up over time as well.

Lately, we've been running monit (for various services) and we have it restart chef-client when memory is above x MB for n minutes.

--Brian

Yeah, I wouldn't call centos 5.5 modern, I'll have to look and see if we can even support cgroups

Sent from a phone

On Dec 20, 2011, at 8:56 AM, Sean OMeara someara@gmail.com wrote:

I haven't actually had a chance to play with these myself, but if
you're on a modern linux distro, you may be able to use cgroups to
isolate memory usage on a per process basis. (ie, keep your Java
processes safe)

cgroups - Wikipedia
Control-groups in rhel6 - All things sysadmin

-s

On Mon, Dec 12, 2011 at 8:32 PM, Chris grocerylist@gmail.com wrote:

That's not a bad idea either, we already have monit running on each client node to restart the client when it crashes

Sent from a phone

On Dec 12, 2011, at 4:34 PM, Brian Akins brian@akins.org wrote:

On Dec 9, 2011, at 11:58 AM, Chris wrote:

Yep, that worked. Thanks.

Client is sitting at 138m RSS now, which is a lot better. Hopefully it will stop creeping up over time as well.

Lately, we've been running monit (for various services) and we have it restart chef-client when memory is above x MB for n minutes.

--Brian

sadly, cgroups are only available on centos 6

I run chef-client under a cron job rather than under a daemon, that
will reduce memory usage over time but not protect you from spikes.

I don't know if ruby's vm take gc tuning options like java but that could help

On Tue, Dec 20, 2011 at 8:29 PM, Chris grocerylist@gmail.com wrote:

Yeah, I wouldn't call centos 5.5 modern, I'll have to look and see if we can even support cgroups

Sent from a phone

On Dec 20, 2011, at 8:56 AM, Sean OMeara someara@gmail.com wrote:

I haven't actually had a chance to play with these myself, but if
you're on a modern linux distro, you may be able to use cgroups to
isolate memory usage on a per process basis. (ie, keep your Java
processes safe)

cgroups - Wikipedia
Control-groups in rhel6 - All things sysadmin

-s

On Mon, Dec 12, 2011 at 8:32 PM, Chris grocerylist@gmail.com wrote:

That's not a bad idea either, we already have monit running on each client node to restart the client when it crashes

Sent from a phone

On Dec 12, 2011, at 4:34 PM, Brian Akins brian@akins.org wrote:

On Dec 9, 2011, at 11:58 AM, Chris wrote:

Yep, that worked. Thanks.

Client is sitting at 138m RSS now, which is a lot better. Hopefully it will stop creeping up over time as well.

Lately, we've been running monit (for various services) and we have it restart chef-client when memory is above x MB for n minutes.

--Brian

On Dec 21, 2011, at 7:34 AM, Bryan Berry wrote:

sadly, cgroups are only available on centos 6

Wouldn't cgroups just limit the amount of memory chef could use? It wouldn't really "solve" the problem.

Just run chef-client from cron and come up with a simple way to trigger it. Simple thing in (x)inetd should be good enough.

FWIW, a few weeks into using monit with chef-cient - so far so good.

That would actually help, since the client memory can balloon to 400m on some systems
I suggested running from cron with a random interval, but the spike issue came up. I think we're just going to have to buy more memory

Sent from a phone

On Dec 21, 2011, at 6:50 AM, Brian Akins brian@akins.org wrote:

On Dec 21, 2011, at 7:34 AM, Bryan Berry wrote:

sadly, cgroups are only available on centos 6

Wouldn't cgroups just limit the amount of memory chef could use? It wouldn't really "solve" the problem.

Just run chef-client from cron and come up with a simple way to trigger it. Simple thing in (x)inetd should be good enough.

FWIW, a few weeks into using monit with chef-cient - so far so good.

On 21 December 2011 15:32, Chris grocerylist@gmail.com wrote:

That would actually help, since the client memory can balloon to 400m on some systems
I suggested running from cron with a random interval, but the spike issue came up. I think we're just going to have to buy more memory

Buying a little more memory is probably a very quick ROI versus not
having automation.

I agree, but my ops team does not

Sent from a phone

On Dec 22, 2011, at 4:43 PM, Alex Howells lists@howells.me wrote:

On 21 December 2011 15:32, Chris grocerylist@gmail.com wrote:

That would actually help, since the client memory can balloon to 400m on some systems
I suggested running from cron with a random interval, but the spike issue came up. I think we're just going to have to buy more memory

Buying a little more memory is probably a very quick ROI versus not
having automation.