Dependency solver overloaded issue


#1

I’m seeing occasional errors on chef-client runs which cause the runs to fail. I originally thought it was a resource issue on the chef server (open source chef server v11.1.1 on Ubuntu LTS) and resized the server, also set erchef[‘db_pool_size’] = 40 and postgresql[‘max_connections’] = 400. However that doesn’t resolve it.

I suspect a hint in the erchef error below "msg=no_depsolver_workers”, but I don’t know how to further debug this one. I’d appreciate any hints/suggestions. Logs follow:

The error on the chef client run: (Note: hostnames have been changed to protect the innocent to a.b.comhttp://a.b.com)

[2014-09-25T13:14:43+00:00] INFO: Starting Chef Run for a.b.comhttp://a.b.com
[2014-09-25T13:14:43+00:00] INFO: Running start handlers
[2014-09-25T13:14:43+00:00] INFO: Start handlers complete.
[2014-09-25T13:14:43+00:00] INFO: HTTP Request Returned 404 Object Not Found:
[2014-09-25T13:14:44+00:00] INFO: HTTP Request Returned 503 Service Unavailable: Dependency solver overloaded. Try again later.
================================================================================e
eError Resolving Cookbooks for Run List:e
================================================================================e

Server Unavailablee
------------------e
The Chef Server is temporarily unavailablee

Server Response:e
----------------e
Dependency solver overloaded. Try again later.e

I can’t find much in the server logs except the following:

nginx/access.log:
10.10.10.100 - - [25/Sep/2014:13:14:44 +0000] “POST /environments/production-0_5814/cookbook_versions HTTP/1.1” 503 “0.774” 60 “-” “Chef Client/11.12.8 (ruby-1.9.3-p484; ohai-7.0.4; x86_64-linux; +http://opscode.com)” “127.0.0.1:8000” “503” “0.733” “11.12.8” “algorithm=sha1;version=1.0;” “a.b.comhttp://a.b.com” “2014-09-25T13:14:43Z” "O66i4b7dFTSNpuu9trSDnzWDklE=“ 1544

erchef/current
2014-09-25_13:14:44.64379 [error] {<<"method=POST; path=/environments/production-0_5814/cookbook_versions; status=503; ">>,“Service Unavailable”}

erchef/requests.log.4
2014-09-25T13:14:44Z erchef@127.0.0.1mailto:erchef@127.0.0.1 method=POST; path=/environments/production-0_5814/cookbook_versions; status=503; user=a.b.comhttp://a.b.com; req_id=tRvX5i45OFCH9a92qmPKGg==; msg=no_depsolver_workers; req_time=729; rdbms_time=709; rdbms_count=3; depsolver_time=0; depsolver_count=1;

thanks!
mike

Michael Hart
Arctic Wolf Networks
M: 226-388-4773


#2

Did you ever find a solution to this? We’re seeing this exact behavior with
our Chef server, and there doesn’t seem to be much on this out in the wild

On Thu, Sep 25, 2014 at 7:06 AM, Michael Hart michael.hart@arcticwolf.com
wrote:

I’m seeing occasional errors on chef-client runs which cause the runs to
fail. I originally thought it was a resource issue on the chef server (open
source chef server v11.1.1 on Ubuntu LTS) and resized the server, also set
erchef[‘db_pool_size’] = 40 and postgresql[‘max_connections’] = 400.
However that doesn’t resolve it.

I suspect a hint in the erchef error below "msg=no_depsolver_workers”,
but I don’t know how to further debug this one. I’d appreciate any
hints/suggestions. Logs follow:

The error on the chef client run: (Note: hostnames have been changed to
protect the innocent to a.b.com)

[2014-09-25T13:14:43+00:00] INFO: Starting Chef Run for a.b.com
[2014-09-25T13:14:43+00:00] INFO: Running start handlers
[2014-09-25T13:14:43+00:00] INFO: Start handlers complete.
[2014-09-25T13:14:43+00:00] INFO: HTTP Request Returned 404 Object Not
Found:
[2014-09-25T13:14:44+00:00] INFO: HTTP Request Returned 503 Service
Unavailable: Dependency solver overloaded. Try again later.

Error Resolving Cookbooks for Run List:

Server Unavailable

The Chef Server is temporarily unavailable

Server Response:

Dependency solver overloaded. Try again later.

I can’t find much in the server logs except the following:

nginx/access.log:

10.10.10.100 - - [25/Sep/2014:13:14:44 +0000] “POST
/environments/production-0_5814/cookbook_versions HTTP/1.1” 503 “0.774” 60
"-" “Chef Client/11.12.8 (ruby-1.9.3-p484; ohai-7.0.4; x86_64-linux; +
http://opscode.com)” “127.0.0.1:8000” “503” “0.733” "11.12.8"
“algorithm=sha1;version=1.0;” “a.b.com” “2014-09-25T13:14:43Z”
"O66i4b7dFTSNpuu9trSDnzWDklE=“ 1544

erchef/current

2014-09-25_13:14:44.64379 [error] {<<“method=POST;
path=/environments/production-0_5814/cookbook_versions; status=503;
”>>,“Service Unavailable”}

erchef/requests.log.4

2014-09-25T13:14:44Z erchef@127.0.0.1 method=POST;
path=/environments/production-0_5814/cookbook_versions; status=503; user=
a.b.com; req_id=tRvX5i45OFCH9a92qmPKGg==; msg=no_depsolver_workers;
req_time=729; rdbms_time=709; rdbms_count=3; depsolver_time=0;
depsolver_count=1;

thanks!
mike

Michael Hart
Arctic Wolf Networks
M: 226-388-4773


#3

On Monday, October 20, 2014 at 9:46 AM, Sean Clemmer wrote:

Did you ever find a solution to this? We’re seeing this exact behavior with our Chef server, and there doesn’t seem to be much on this out in the wild

On Thu, Sep 25, 2014 at 7:06 AM, Michael Hart <michael.hart@arcticwolf.com (mailto:michael.hart@arcticwolf.com)> wrote:

I’m seeing occasional errors on chef-client runs which cause the runs to fail. I originally thought it was a resource issue on the chef server (open source chef server v11.1.1 on Ubuntu LTS) and resized the server, also set erchef[‘db_pool_size’] = 40 and postgresql[‘max_connections’] = 400. However that doesn’t resolve it.

I suspect a hint in the erchef error below "msg=no_depsolver_workers”, but I don’t know how to further debug this one. I’d appreciate any hints/suggestions. Logs follow:

This isn’t really a bug per se. You’re giving the server more dependency solving problems than it can handle at a time. Your options are:

  • Increase the number of dep solver workers. This means you’re throwing more CPUs at the problem, so it won’t work unless you have spare CPU capacity on the server. http://docs.getchef.com/config_rb_chef_server_optional_settings.html#erchef
  • Use more splay to make chef client run at different times
  • Lock down your dependencies to exact versions using environments, to make the dependency solving problem easier for the server. This entails some workflow changes. The environment cookbook pattern is one way of doing this, another is to put all of your cookbooks in one repo and have Ci or some other tool generate your environments.

HTH,


Daniel DeLeo


#4

Do you have cookbooks that depend on themselves? i.e.:

name “foo”

depends "foo"
depends “otherthings”

If you do, get rid of the self-dependency. It is unnecessary and it
will cause loops in the depsolver for every cookbook version you’ve
uploaded (which can be solved, but takes time). Then prune all your old
cookbook versions that aren’t being used any more.

On 10/20/14, 9:46 AM, Sean Clemmer wrote:

Did you ever find a solution to this? We’re seeing this exact behavior
with our Chef server, and there doesn’t seem to be much on this out in
the wild

On Thu, Sep 25, 2014 at 7:06 AM, Michael Hart
<michael.hart@arcticwolf.com mailto:michael.hart@arcticwolf.com> wrote:

I’m seeing occasional errors on chef-client runs which cause the
runs to fail. I originally thought it was a resource issue on the
chef server (open source chef server v11.1.1 on Ubuntu LTS) and
resized the server, also set erchef['db_pool_size'] = 40 and
postgresql['max_connections'] = 400. However that doesn’t resolve it.

I suspect a hint in the erchef error below
"msg=no_depsolver_workers”, but I don’t know how to further debug
this one. I’d appreciate any hints/suggestions. Logs follow:

The error on the chef client run: (Note: hostnames have been
changed to protect the innocent to a.b.com <http://a.b.com>)

    [2014-09-25T13:14:43+00:00] INFO: Starting Chef Run for
    a.b.com <http://a.b.com>
    [2014-09-25T13:14:43+00:00] INFO: Running start handlers
    [2014-09-25T13:14:43+00:00] INFO: Start handlers complete.
    [2014-09-25T13:14:43+00:00] INFO: HTTP Request Returned 404
    Object Not Found:
    [2014-09-25T13:14:44+00:00] INFO: HTTP Request Returned 503
    Service Unavailable: Dependency solver overloaded. Try again
    later.
    ================================================================================

    Error Resolving Cookbooks for Run List:
    ================================================================================


    Server Unavailable
    ------------------
    The Chef Server is temporarily unavailable

    Server Response:
    ----------------
    Dependency solver overloaded. Try again later.



I can’t find much in the server logs except the following:

nginx/access.log:

    10.10.10.100 - - [25/Sep/2014:13:14:44 +0000]  "POST
    /environments/production-0_5814/cookbook_versions HTTP/1.1"
    503 "0.774" 60 "-" "Chef Client/11.12.8 (ruby-1.9.3-p484;
    ohai-7.0.4; x86_64-linux; +http://opscode.com)"
    "127.0.0.1:8000 <http://127.0.0.1:8000>" "503" "0.733"
    "11.12.8" "algorithm=sha1;version=1.0;” “a.b.com
    <http://a.b.com>" "2014-09-25T13:14:43Z"
    "O66i4b7dFTSNpuu9trSDnzWDklE=“ 1544


erchef/current

    2014-09-25_13:14:44.64379 [error] {<<"method=POST;
    path=/environments/production-0_5814/cookbook_versions;
    status=503; ">>,"Service Unavailable"}


erchef/requests.log.4

    2014-09-25T13:14:44Z erchef@127.0.0.1
    <mailto:erchef@127.0.0.1> method=POST;
    path=/environments/production-0_5814/cookbook_versions;
    status=503; user=a.b.com <http://a.b.com>;
    req_id=tRvX5i45OFCH9a92qmPKGg==; msg=no_depsolver_workers;
    req_time=729; rdbms_time=709; rdbms_count=3; depsolver_time=0;
    depsolver_count=1;


thanks!
mike
--
Michael Hart
Arctic Wolf Networks
M: 226-388-4773

#5

Thanks guys. I believe it’s just a thundering herd problem we’re inflicting
upon ourselves–we run chef-client manually on our entire infrastructure at
once–just seems to have gotten worse with Chef Server 11. I’ll look into
tuning and pruning

On Mon, Oct 20, 2014 at 10:10 AM, Lamont Granquist lamont@opscode.com
wrote:

Do you have cookbooks that depend on themselves? i.e.:

name “foo”

depends "foo"
depends “otherthings”

If you do, get rid of the self-dependency. It is unnecessary and it will
cause loops in the depsolver for every cookbook version you’ve uploaded
(which can be solved, but takes time). Then prune all your old cookbook
versions that aren’t being used any more.

On 10/20/14, 9:46 AM, Sean Clemmer wrote:

Did you ever find a solution to this? We’re seeing this exact behavior
with our Chef server, and there doesn’t seem to be much on this out in the
wild

On Thu, Sep 25, 2014 at 7:06 AM, Michael Hart <michael.hart@arcticwolf.com

wrote:

I’m seeing occasional errors on chef-client runs which cause the runs to
fail. I originally thought it was a resource issue on the chef server (open
source chef server v11.1.1 on Ubuntu LTS) and resized the server, also set
erchef[‘db_pool_size’] = 40 and postgresql[‘max_connections’] = 400.
However that doesn’t resolve it.

I suspect a hint in the erchef error below "msg=no_depsolver_workers”,
but I don’t know how to further debug this one. I’d appreciate any
hints/suggestions. Logs follow:

The error on the chef client run: (Note: hostnames have been changed to
protect the innocent to a.b.com)

[2014-09-25T13:14:43+00:00] INFO: Starting Chef Run for a.b.com
[2014-09-25T13:14:43+00:00] INFO: Running start handlers
[2014-09-25T13:14:43+00:00] INFO: Start handlers complete.
[2014-09-25T13:14:43+00:00] INFO: HTTP Request Returned 404 Object Not
Found:
[2014-09-25T13:14:44+00:00] INFO: HTTP Request Returned 503 Service
Unavailable: Dependency solver overloaded. Try again later.

Error Resolving Cookbooks for Run List:

Server Unavailable

The Chef Server is temporarily unavailable

Server Response:

Dependency solver overloaded. Try again later.

I can’t find much in the server logs except the following:

nginx/access.log:

10.10.10.100 - - [25/Sep/2014:13:14:44 +0000] “POST
/environments/production-0_5814/cookbook_versions HTTP/1.1” 503 “0.774” 60
"-" “Chef Client/11.12.8 (ruby-1.9.3-p484; ohai-7.0.4; x86_64-linux; +
http://opscode.com)” “127.0.0.1:8000” “503” “0.733” "11.12.8"
“algorithm=sha1;version=1.0;” “a.b.com” “2014-09-25T13:14:43Z”
"O66i4b7dFTSNpuu9trSDnzWDklE=“ 1544

erchef/current

2014-09-25_13:14:44.64379 [error] {<<“method=POST;
path=/environments/production-0_5814/cookbook_versions; status=503;
”>>,“Service Unavailable”}

erchef/requests.log.4

2014-09-25T13:14:44Z erchef@127.0.0.1 method=POST;
path=/environments/production-0_5814/cookbook_versions; status=503; user=
a.b.com; req_id=tRvX5i45OFCH9a92qmPKGg==; msg=no_depsolver_workers;
req_time=729; rdbms_time=709; rdbms_count=3; depsolver_time=0;
depsolver_count=1;

thanks!
mike

Michael Hart
Arctic Wolf Networks
M: 226-388-4773


#6

On Monday, October 20, 2014 at 10:16 AM, Sean Clemmer wrote:

Thanks guys. I believe it’s just a thundering herd problem we’re inflicting upon ourselves–we run chef-client manually on our entire infrastructure at once–just seems to have gotten worse with Chef Server 11. I’ll look into tuning and pruning

A quick note about that—in Chef 10 server, the dep solver ran in the ruby server process(es), so the number of workers was the number of unicorn workers (or whatever ruby web server you might have used). On the same hardware, you should be able to get identical performance (though I’d expect it to be better since we get better IO perf out of Postgres).


Daniel DeLeo


#7

we had hit similar issues due to circular dependency. After we started
vendoring our cookbooks with berks, around in Berkshelf 2.0 era it used to
allow declaring circular dependencies via transitive dependencies.
Berkshelf 3 fixed it.

regards
ranjib

On Mon, Oct 20, 2014 at 10:25 AM, Daniel DeLeo dan@kallistec.com wrote:

On Monday, October 20, 2014 at 10:16 AM, Sean Clemmer wrote:

Thanks guys. I believe it’s just a thundering herd problem we’re
inflicting upon ourselves–we run chef-client manually on our entire
infrastructure at once–just seems to have gotten worse with Chef Server

  1. I’ll look into tuning and pruning

A quick note about that—in Chef 10 server, the dep solver ran in the ruby
server process(es), so the number of workers was the number of unicorn
workers (or whatever ruby web server you might have used). On the same
hardware, you should be able to get identical performance (though I’d
expect it to be better since we get better IO perf out of Postgres).


Daniel DeLeo


#8

You may have just too many cookbooks. I wrote something up on this a couple
of months ago:

http://www.pburkholder.com/post/93155061892/clearing-the-counter-cookbook-clutter-and-knife

On Mon, Oct 20, 2014 at 2:30 PM, Ranjib Dey dey.ranjib@gmail.com wrote:

we had hit similar issues due to circular dependency. After we started
vendoring our cookbooks with berks, around in Berkshelf 2.0 era it used to
allow declaring circular dependencies via transitive dependencies.
Berkshelf 3 fixed it.

regards
ranjib

On Mon, Oct 20, 2014 at 10:25 AM, Daniel DeLeo dan@kallistec.com wrote:

On Monday, October 20, 2014 at 10:16 AM, Sean Clemmer wrote:

Thanks guys. I believe it’s just a thundering herd problem we’re
inflicting upon ourselves–we run chef-client manually on our entire
infrastructure at once–just seems to have gotten worse with Chef Server

  1. I’ll look into tuning and pruning

A quick note about that—in Chef 10 server, the dep solver ran in the ruby
server process(es), so the number of workers was the number of unicorn
workers (or whatever ruby web server you might have used). On the same
hardware, you should be able to get identical performance (though I’d
expect it to be better since we get better IO perf out of Postgres).


Daniel DeLeo