Erchef errors


#1

Hi,

All the sudden my chef server version 11.4 started to act up - I get

Postgresql

2013-11-26_17:09:25.72986 STATEMENT: INSERT INTO checksums(org_id,
checksum)
2013-11-26_17:09:25.72986 VALUES ($1, $2)
2013-11-26_17:09:26.51887 ERROR: duplicate key value violates unique
constraint "checksums_pkey"
2013-11-26_17:09:26.51891 DETAIL: Key (org_id,
checksum)=(00000000000000000000000000000000,
4dc4a7ba29b5ae8e48d6fe7745a5de0e) already exists.
2013-11-26_17:09:26.51892 STATEMENT: INSERT INTO checksums(org_id,
checksum)
2013-11-26_17:09:26.51893 VALUES ($1, $2)
2013-11-26_17:09:27.23903 ERROR: duplicate key value violates unique
constraint "checksums_pkey"
2013-11-26_17:09:27.23909 DETAIL: Key (org_id,
checksum)=(00000000000000000000000000000000,
4e10085c11e681d11be36a169c5d10f4) already exists.
2013-11-26_17:09:27.23910 STATEMENT: INSERT INTO checksums(org_id,
checksum)
2013-11-26_17:09:27.23911 VALUES ($1, $2)

Erchef

2013-11-26_17:22:28.88338 webmachine error: path="/environments/
web_uat/cookbook_versions"
2013-11-26_17:22:28.88338 {error,
2013-11-26_17:22:28.88338 {error,
2013-11-26_17:22:28.88339 {badrecord,chef_cookbook_version},
2013-11-26_17:22:28.88339
[{chef_wm_depsolver,’-assemble_response/3-lc$^0/1-0-’,1,
2013-11-26_17:22:28.88339
[{file,“src/chef_wm_depsolver.erl”},{line,266}]},
2013-11-26_17:22:28.88340
{chef_wm_depsolver,’-assemble_response/3-lc$^0/1-0-’,1,
2013-11-26_17:22:28.88340
[{file,“src/chef_wm_depsolver.erl”},{line,268}]},
2013-11-26_17:22:28.88340 {chef_wm_depsolver,assemble_response,3,
2013-11-26_17:22:28.88341
[{file,“src/chef_wm_depsolver.erl”},{line,266}]},
2013-11-26_17:22:28.88341 {webmachine_resource,resource_call,3,
2013-11-26_17:22:28.88342
[{file,“src/webmachine_resource.erl”},{line,166}]},
2013-11-26_17:22:28.88342 {webmachine_resource,do,3,
2013-11-26_17:22:28.88342
[{file,“src/webmachine_resource.erl”},{line,125}]},
2013-11-26_17:22:28.88343
{webmachine_decision_core,resource_call,1,
2013-11-26_17:22:28.88343
[{file,“src/webmachine_decision_core.erl”},{line,48}]},
2013-11-26_17:22:28.88343 {webmachine_decision_core,decision,1,
2013-11-26_17:22:28.88344
[{file,“src/webmachine_decision_core.erl”},{line,458}]},
2013-11-26_17:22:28.88346
{webmachine_decision_core,handle_request,2,
2013-11-26_17:22:28.88348
[{file,“src/webmachine_decision_core.erl”},{line,33}]}]}}

Nginx

… truncated
"POST /environments/web_prod/cookbook_versions HTTP/1.1" 500 “201.387” 36
"-" "Chef Client/11.4.0

It all leads to the fact that there are some problems in relation to
cookbook versions.
However I don’t get what is it.

thanks!

-silviu


#2

Hi,

Answering to myself … hope will help somebody.

Symptoms on the chef server

  • higher load then usual, from 0.5 - 1 to 4 - 5
  • the beam.smp eating cpu constantly
  • all the errors that I sent before into the log files

Spent some time looking at all components, read the erl files etc.
Noticed as well exceptions for connection to the PostgreSQL from chef_wm
however the db was doing nothing - has a max of 200 connections with a
pool of 20 and never used it at full.

The problem … something as stupid as - somebody had written into
/etc/hosts
the wrong entry for chef server. After I corrected all went fine.

cheers!

ps: (/var/opt/chef-server/postgresql/data/pg_hba.conf has ONLY ip addresses
however
there most be something else that uses the hostname/ip from hosts files)

On Tue, Nov 26, 2013 at 12:28 PM, Silviu Dicu silviudicu@gmail.com wrote:

Hi,

All the sudden my chef server version 11.4 started to act up - I get

Postgresql

2013-11-26_17:09:25.72986 STATEMENT: INSERT INTO checksums(org_id,
checksum)
2013-11-26_17:09:25.72986 VALUES ($1, $2)
2013-11-26_17:09:26.51887 ERROR: duplicate key value violates unique
constraint "checksums_pkey"
2013-11-26_17:09:26.51891 DETAIL: Key (org_id,
checksum)=(00000000000000000000000000000000,
4dc4a7ba29b5ae8e48d6fe7745a5de0e) already exists.
2013-11-26_17:09:26.51892 STATEMENT: INSERT INTO checksums(org_id,
checksum)
2013-11-26_17:09:26.51893 VALUES ($1, $2)
2013-11-26_17:09:27.23903 ERROR: duplicate key value violates unique
constraint "checksums_pkey"
2013-11-26_17:09:27.23909 DETAIL: Key (org_id,
checksum)=(00000000000000000000000000000000,
4e10085c11e681d11be36a169c5d10f4) already exists.
2013-11-26_17:09:27.23910 STATEMENT: INSERT INTO checksums(org_id,
checksum)
2013-11-26_17:09:27.23911 VALUES ($1, $2)

Erchef

2013-11-26_17:22:28.88338 webmachine error: path="/environments/
web_uat/cookbook_versions"
2013-11-26_17:22:28.88338 {error,
2013-11-26_17:22:28.88338 {error,
2013-11-26_17:22:28.88339 {badrecord,chef_cookbook_version},
2013-11-26_17:22:28.88339
[{chef_wm_depsolver,’-assemble_response/3-lc$^0/1-0-’,1,
2013-11-26_17:22:28.88339
[{file,“src/chef_wm_depsolver.erl”},{line,266}]},
2013-11-26_17:22:28.88340
{chef_wm_depsolver,’-assemble_response/3-lc$^0/1-0-’,1,
2013-11-26_17:22:28.88340
[{file,“src/chef_wm_depsolver.erl”},{line,268}]},
2013-11-26_17:22:28.88340 {chef_wm_depsolver,assemble_response,3,
2013-11-26_17:22:28.88341
[{file,“src/chef_wm_depsolver.erl”},{line,266}]},
2013-11-26_17:22:28.88341 {webmachine_resource,resource_call,3,
2013-11-26_17:22:28.88342
[{file,“src/webmachine_resource.erl”},{line,166}]},
2013-11-26_17:22:28.88342 {webmachine_resource,do,3,
2013-11-26_17:22:28.88342
[{file,“src/webmachine_resource.erl”},{line,125}]},
2013-11-26_17:22:28.88343
{webmachine_decision_core,resource_call,1,
2013-11-26_17:22:28.88343
[{file,“src/webmachine_decision_core.erl”},{line,48}]},
2013-11-26_17:22:28.88343 {webmachine_decision_core,decision,1,
2013-11-26_17:22:28.88344
[{file,“src/webmachine_decision_core.erl”},{line,458}]},
2013-11-26_17:22:28.88346
{webmachine_decision_core,handle_request,2,
2013-11-26_17:22:28.88348
[{file,“src/webmachine_decision_core.erl”},{line,33}]}]}}

Nginx

… truncated
"POST /environments/web_prod/cookbook_versions HTTP/1.1" 500 “201.387” 36
"-" "Chef Client/11.4.0

It all leads to the fact that there are some problems in relation to
cookbook versions.
However I don’t get what is it.

thanks!

-silviu


#3

Hi there,

silviudicu@gmail.com writes:

Answering to myself … hope will help somebody.

Sorry not to have been able to respond sooner.

Symptoms on the chef server

  • higher load then usual, from 0.5 - 1 to 4 - 5
  • the beam.smp eating cpu constantly
  • all the errors that I sent before into the log files

The beam.smp for erchef pegging the CPU on 11.0.8 is a strong indication
that you’re hitting a bug in the dependency solver [1].

[1] https://tickets.opscode.com/browse/CHEF-3921

Spent some time looking at all components, read the erl files etc.
Noticed as well exceptions for connection to the PostgreSQL from chef_wm
however the db was doing nothing - has a max of 200 connections with a
pool of 20 and never used it at full.

The problem … something as stupid as - somebody had written into
/etc/hosts
the wrong entry for chef server. After I corrected all went fine.

Chef Server needs to generate URLs to send to clients for cookbook
content upload/download. So it is possible that changes in /etc/hosts
could affect functionality there – but unlikely to be related to high
CPU use of erchef beam process.

ps: (/var/opt/chef-server/postgresql/data/pg_hba.conf has ONLY ip addresses
however
there most be something else that uses the hostname/ip from hosts
files)

Yep, erchef treats the bookshelf service as an external
service. Cookbook upload and cookbook content download rely on erchef
being able to generate a URL to the bookshelf service (by default at
https://$SERVER_FQDN/bookshelf) that is routable by clients.

On Tue, Nov 26, 2013 at 12:28 PM, Silviu Dicu silviudicu@gmail.com wrote:

Hi,

All the sudden my chef server version 11.4 started to act up - I get

Postgresql

2013-11-26_17:09:25.72986 STATEMENT: INSERT INTO checksums(org_id,
checksum)
2013-11-26_17:09:25.72986 VALUES ($1, $2)
2013-11-26_17:09:26.51887 ERROR: duplicate key value violates unique
constraint "checksums_pkey"
2013-11-26_17:09:26.51891 DETAIL: Key (org_id,
checksum)=(00000000000000000000000000000000,
4dc4a7ba29b5ae8e48d6fe7745a5de0e) already exists.

These errors are (a bit sad to say) expected and happen under normal
operation. We use pg’s foreign key constraints to do reference counting
on cookbook content (checksums). The checksummed cookbook content is
shared across cookbooks and cookbook versions. When we update a cookbook
version or delete one, the query we run attempts to insert or delete all
of the associated checksums where we expect to only succeed if the
constraints align – side effect of this approach is errors in the
log. We have a ticket to address this using a stored procedure.

Erchef

2013-11-26_17:22:28.88338 webmachine error: path="/environments/
web_uat/cookbook_versions"
2013-11-26_17:22:28.88338 {error,
2013-11-26_17:22:28.88338 {error,
2013-11-26_17:22:28.88339 {badrecord,chef_cookbook_version},
2013-11-26_17:22:28.88339
[{chef_wm_depsolver,’-assemble_response/3-lc$^0/1-0-’,1,
2013-11-26_17:22:28.88339
[{file,“src/chef_wm_depsolver.erl”},{line,266}]},
2013-11-26_17:22:28.88340
{chef_wm_depsolver,’-assemble_response/3-lc$^0/1-0-’,1,

This one is not expected but difficult to say from this trace if there’s
bad data in the db (I doubt it) and more likely that we encountered an
error fetching cookbook content.

Nginx

… truncated
"POST /environments/web_prod/cookbook_versions HTTP/1.1" 500 “201.387” 36
"-" "Chef Client/11.4.0

This is the call made by chef-client to resolve dependencies.

  • seth


Seth Falcon | Development Lead | Opscode | @sfalcon