Odd errors after restoring couchdb onto new chef server


#1

hi. i’m trying to retire one chef server, and create a new one by loading
couchdb backups from the old one. i’m taking the restored chef server
for a spin and seeing odd errors. i’d like to ask about two things here:

  1. previous knife search results are no longer present on new server

  2. wierd errors running a chef-client against new server, like:

it makes me wonder if my couchdb load is borked. i do pass to couchdb-load
"–ignore-errors" because it won’t load without that … :>

the first question:

here’s the old chef server i’m attempting to retire. i do indeed get
results from this query:

[cheftain02-auw2p chef-repo]$ knife search node ‘role:postfix-server’ | grep ^Node
Node Name: admin1.venus.spergacula.com
Node Name: admin2.venus.spergacula.com

new chef server which was restored from the old one’s couchdb:

[cheftain04-auw2p chef-repo]$ knife search node 'role:postfix-server’
0 items found

… nothing. why?

the second question involves lots of debugging output, seen here:

this gist shows me trying to run cheftain04 as a chef-client against
itself. doing this worked for me on the old server, cheftain02.

any idea why i’m seeing the 2 kinds of errors therein?

ERROR: Server returned error for https://chef.venus.spergacula.com/cookbooks/users/1.0.0/files/[snip]

and

[Fri, 24 Feb 2012 21:55:03 +0000] DEBUG: Re-raising exception: EOFError - cookbook_file[/home/billeh/.ssh/known_hosts] (users::user-file-dist line 49) had an error: EOFError: end of file reached

i note that i was able to successfully run a different chef-client
against the new chef server successfully (after i disabled something
in a recipe that relied on search results that are currently missing).

thanks much,
kallen


#2

On 25 February 2012 11:28, kallen@groknaut.net wrote:

hi. i’m trying to retire one chef server, and create a new one by loading
couchdb backups from the old one. i’m taking the restored chef server
for a spin and seeing odd errors. i’d like to ask about two things here:

  1. previous knife search results are no longer present on new server

you have to rebuild the index (solr), knife index rebuild

  1. wierd errors running a chef-client against new server, like:

did you copy the upload/copy the cookbooks to the new server? are they
in the file cache? Kind of looks like couchdb has information about
the files, but the files aren’t on the chef-server file system (hazard
guess)

–AJ

it makes me wonder if my couchdb load is borked. i do pass to couchdb-load
"–ignore-errors" because it won’t load without that … :>

the first question:

here’s the old chef server i’m attempting to retire. i do indeed get
results from this query:

[cheftain02-auw2p chef-repo]$ knife search node ‘role:postfix-server’ | grep ^Node
Node Name: admin1.venus.spergacula.com
Node Name: admin2.venus.spergacula.com

new chef server which was restored from the old one’s couchdb:

[cheftain04-auw2p chef-repo]$ knife search node 'role:postfix-server’
0 items found

… nothing. why?

the second question involves lots of debugging output, seen here:
https://gist.github.com/1904175

this gist shows me trying to run cheftain04 as a chef-client against
itself. doing this worked for me on the old server, cheftain02.

any idea why i’m seeing the 2 kinds of errors therein?

ERROR: Server returned error for https://chef.venus.spergacula.com/cookbooks/users/1.0.0/files/[snip]

and

[Fri, 24 Feb 2012 21:55:03 +0000] DEBUG: Re-raising exception: EOFError - cookbook_file[/home/billeh/.ssh/known_hosts] (users::user-file-dist line 49) had an error: EOFError: end of file reached

i note that i was able to successfully run a different chef-client
against the new chef server successfully (after i disabled something
in a recipe that relied on search results that are currently missing).

thanks much,
kallen


#3

On Sat, 25 Feb 2012, AJ Christensen wrote:

On 25 February 2012 11:28, kallen@groknaut.net wrote:

hi. i’m trying to retire one chef server, and create a new one by loading
couchdb backups from the old one. i’m taking the restored chef server
for a spin and seeing odd errors. i’d like to ask about two things here:

  1. previous knife search results are no longer present on new server

you have to rebuild the index (solr), knife index rebuild

aha. makes sense. done.

  1. wierd errors running a chef-client against new server, like:

did you copy the upload/copy the cookbooks to the new server? are they
in the file cache? Kind of looks like couchdb has information about
the files, but the files aren’t on the chef-server file system (hazard
guess)

i hadn’t done an upload/copy of the cookbooks to the new server because
i thought the cookbooks were loaded in the couchdb-load. when i run
knife cookbook list, it looks like all my cookbooks are already there.

regardless, i just reuploaded all cookbooks from my chef-repo to the new
server. i saw no upload errors. now, running chef-client on a
non-chef-server node, pointing chef_server_url directly at port 4000
rather than hitting nginx proxy over https, is erroring out on
"EOFError - end of file reached", which is similar to error shown in gist.

[Fri, 24 Feb 2012 23:13:08 +0000] DEBUG: Sending HTTP Request via GET to chef.venus.spergacula.com:4000/cookbooks/mrepo/0.0.1/files/c0cffe50c70191353d8bd9a8bd568ce5
[Fri, 24 Feb 2012 23:13:08 +0000] ERROR: Running exception handlers
[Fri, 24 Feb 2012 23:13:08 +0000] FATAL: Saving node information to /var/cache/chef/failed-run-data.json
[Fri, 24 Feb 2012 23:13:08 +0000] ERROR: Exception handlers complete
[Fri, 24 Feb 2012 23:13:08 +0000] DEBUG: Re-raising exception: EOFError - end of file reached
/usr/lib/ruby/1.8/net/protocol.rb:135:in sysread' /usr/lib/ruby/1.8/net/protocol.rb:135:inrbuf_fill’
/usr/lib/ruby/1.8/timeout.rb:67:in timeout' /usr/lib/ruby/1.8/timeout.rb:101:intimeout’

i’m looking at chef-server debug log and couchdb log. but nothing is
jumping out at me as relevant. but i may not recognize relevancy…

last 3 lines of couchdb log fwiw:

[Fri, 24 Feb 2012 23:18:13 GMT] [info] [<0.17029.0>] 127.0.0.1 - - ‘POST’ /chef/_all_docs?include_docs=true 200
[Fri, 24 Feb 2012 23:18:14 GMT] [info] [<0.17030.0>] 127.0.0.1 - - ‘GET’ /chef/_design/id_map/_view/name_to_id?key=[%22client%22,%22admin2.venus.spergacula.com%22]&include_docs=true 200
[Fri, 24 Feb 2012 23:18:14 GMT] [info] [<0.17031.0>] 127.0.0.1 - - ‘GET’ /chef/_design/id_map/_view/name_to_id?key=[%22cookbook_version%22,%22mrepo-0.0.1%22]&include_docs=true 200

thoughts?

it makes me wonder if my couchdb load is borked. i do pass to couchdb-load
"–ignore-errors" because it won’t load without that … :>

the first question:

here’s the old chef server i’m attempting to retire. i do indeed get
results from this query:

[cheftain02-auw2p chef-repo]$ knife search node ‘role:postfix-server’ | grep ^Node
Node Name: admin1.venus.spergacula.com
Node Name: admin2.venus.spergacula.com

new chef server which was restored from the old one’s couchdb:

[cheftain04-auw2p chef-repo]$ knife search node 'role:postfix-server’
0 items found

… nothing. why?

the second question involves lots of debugging output, seen here:
https://gist.github.com/1904175

this gist shows me trying to run cheftain04 as a chef-client against
itself. doing this worked for me on the old server, cheftain02.

any idea why i’m seeing the 2 kinds of errors therein?

ERROR: Server returned error for https://chef.venus.spergacula.com/cookbooks/users/1.0.0/files/[snip]

and

[Fri, 24 Feb 2012 21:55:03 +0000] DEBUG: Re-raising exception: EOFError - cookbook_file[/home/billeh/.ssh/known_hosts] (users::user-file-dist line 49) had an error: EOFError: end of file reached

i note that i was able to successfully run a different chef-client
against the new chef server successfully (after i disabled something
in a recipe that relied on search results that are currently missing).


#4

On Fri, 24 Feb 2012, kallen@groknaut.net wrote:

On Sat, 25 Feb 2012, AJ Christensen wrote:

  1. wierd errors running a chef-client against new server, like:

did you copy the upload/copy the cookbooks to the new server? are they
in the file cache? Kind of looks like couchdb has information about
the files, but the files aren’t on the chef-server file system (hazard
guess)

i hadn’t done an upload/copy of the cookbooks to the new server because
i thought the cookbooks were loaded in the couchdb-load. when i run
knife cookbook list, it looks like all my cookbooks are already there.

regardless, i just reuploaded all cookbooks from my chef-repo to the new
server. i saw no upload errors. now, running chef-client on a
non-chef-server node, pointing chef_server_url directly at port 4000
rather than hitting nginx proxy over https, is erroring out on
"EOFError - end of file reached", which is similar to error shown in gist.

[Fri, 24 Feb 2012 23:13:08 +0000] DEBUG: Sending HTTP Request via GET to chef.venus.spergacula.com:4000/cookbooks/mrepo/0.0.1/files/c0cffe50c70191353d8bd9a8bd568ce5
[Fri, 24 Feb 2012 23:13:08 +0000] ERROR: Running exception handlers
[Fri, 24 Feb 2012 23:13:08 +0000] FATAL: Saving node information to /var/cache/chef/failed-run-data.json
[Fri, 24 Feb 2012 23:13:08 +0000] ERROR: Exception handlers complete
[Fri, 24 Feb 2012 23:13:08 +0000] DEBUG: Re-raising exception: EOFError - end of file reached
/usr/lib/ruby/1.8/net/protocol.rb:135:in sysread' /usr/lib/ruby/1.8/net/protocol.rb:135:inrbuf_fill’
/usr/lib/ruby/1.8/timeout.rb:67:in timeout' /usr/lib/ruby/1.8/timeout.rb:101:intimeout’

i’m looking at chef-server debug log and couchdb log. but nothing is
jumping out at me as relevant. but i may not recognize relevancy…

last 3 lines of couchdb log fwiw:

[Fri, 24 Feb 2012 23:18:13 GMT] [info] [<0.17029.0>] 127.0.0.1 - - ‘POST’ /chef/_all_docs?include_docs=true 200
[Fri, 24 Feb 2012 23:18:14 GMT] [info] [<0.17030.0>] 127.0.0.1 - - ‘GET’ /chef/_design/id_map/_view/name_to_id?key=[%22client%22,%22admin2.venus.spergacula.com%22]&include_docs=true 200
[Fri, 24 Feb 2012 23:18:14 GMT] [info] [<0.17031.0>] 127.0.0.1 - - ‘GET’ /chef/_design/id_map/_view/name_to_id?key=[%22cookbook_version%22,%22mrepo-0.0.1%22]&include_docs=true 200

thoughts?

trying various things, flailing. for the heck of it:

  • deleted possible problematic cookbook (mrepo) and reuploaded. same errors.
  • deleted client node which i’m testing on (admin2), reran, same errors.
  • reloaded couchdb dump again

and now a different cookbook download claims “EOFError - end of file reached”.

[Sat, 25 Feb 2012 02:14:22 +0000] DEBUG: Sending HTTP Request via GET to cheftain04.venus.spergacula.com:4000/cookbooks/begin/0.0.1/files/e447bc0fcbef6dafe72007fe70507767
[Sat, 25 Feb 2012 02:14:22 +0000] ERROR: Running exception handlers
[Sat, 25 Feb 2012 02:14:22 +0000] FATAL: Saving node information to /var/cache/chef/failed-run-data.json
[Sat, 25 Feb 2012 02:14:22 +0000] ERROR: Exception handlers complete
[Sat, 25 Feb 2012 02:14:22 +0000] DEBUG: Re-raising exception: EOFError - end of file reached
/usr/lib/ruby/1.8/net/protocol.rb:135:in sysread' /usr/lib/ruby/1.8/net/protocol.rb:135:inrbuf_fill’

dunno what to do here…


#5

On Friday, February 24, 2012 at 6:18 PM, kallen@groknaut.net wrote:

On Fri, 24 Feb 2012, kallen@groknaut.net (mailto:kallen@groknaut.net) wrote:

On Sat, 25 Feb 2012, AJ Christensen wrote:

  1. wierd errors running a chef-client against new server, like:

did you copy the upload/copy the cookbooks to the new server? are they
in the file cache? Kind of looks like couchdb has information about
the files, but the files aren’t on the chef-server file system (hazard
guess)

i hadn’t done an upload/copy of the cookbooks to the new server because
i thought the cookbooks were loaded in the couchdb-load. when i run
knife cookbook list, it looks like all my cookbooks are already there.

regardless, i just reuploaded all cookbooks from my chef-repo to the new
server. i saw no upload errors. now, running chef-client on a
non-chef-server node, pointing chef_server_url directly at port 4000
rather than hitting nginx proxy over https, is erroring out on
"EOFError - end of file reached", which is similar to error shown in gist.

[Fri, 24 Feb 2012 23:13:08 +0000] DEBUG: Sending HTTP Request via GET to chef.venus.spergacula.com:4000/cookbooks/mrepo/0.0.1/files/c0cffe50c70191353d8bd9a8bd568ce5 (http://chef.venus.spergacula.com:4000/cookbooks/mrepo/0.0.1/files/c0cffe50c70191353d8bd9a8bd568ce5)
[Fri, 24 Feb 2012 23:13:08 +0000] ERROR: Running exception handlers
[Fri, 24 Feb 2012 23:13:08 +0000] FATAL: Saving node information to /var/cache/chef/failed-run-data.json
[Fri, 24 Feb 2012 23:13:08 +0000] ERROR: Exception handlers complete
[Fri, 24 Feb 2012 23:13:08 +0000] DEBUG: Re-raising exception: EOFError - end of file reached
/usr/lib/ruby/1.8/net/protocol.rb:135:in sysread' /usr/lib/ruby/1.8/net/protocol.rb:135:inrbuf_fill’
/usr/lib/ruby/1.8/timeout.rb:67:in timeout' /usr/lib/ruby/1.8/timeout.rb:101:intimeout’

i’m looking at chef-server debug log and couchdb log. but nothing is
jumping out at me as relevant. but i may not recognize relevancy…

last 3 lines of couchdb log fwiw:

[Fri, 24 Feb 2012 23:18:13 GMT] [info] [<0.17029.0>] 127.0.0.1 - - ‘POST’ /chef/_all_docs?include_docs=true 200
[Fri, 24 Feb 2012 23:18:14 GMT] [info] [<0.17030.0>] 127.0.0.1 - - ‘GET’ /chef/_design/id_map/_view/name_to_id?key=[%22client%22,%22admin2.venus.spergacula.com (http://22admin2.venus.spergacula.com)%22]&include_docs=true 200
[Fri, 24 Feb 2012 23:18:14 GMT] [info] [<0.17031.0>] 127.0.0.1 - - ‘GET’ /chef/_design/id_map/_view/name_to_id?key=[%22cookbook_version%22,%22mrepo-0.0.1%22]&include_docs=true 200

thoughts?

trying various things, flailing. for the heck of it:

  • deleted possible problematic cookbook (mrepo) and reuploaded. same errors.
  • deleted client node which i’m testing on (admin2), reran, same errors.
  • reloaded couchdb dump again

and now a different cookbook download claims “EOFError - end of file reached”.

[Sat, 25 Feb 2012 02:14:22 +0000] DEBUG: Sending HTTP Request via GET to cheftain04.venus.spergacula.com:4000/cookbooks/begin/0.0.1/files/e447bc0fcbef6dafe72007fe70507767 (http://cheftain04.venus.spergacula.com:4000/cookbooks/begin/0.0.1/files/e447bc0fcbef6dafe72007fe70507767)
[Sat, 25 Feb 2012 02:14:22 +0000] ERROR: Running exception handlers
[Sat, 25 Feb 2012 02:14:22 +0000] FATAL: Saving node information to /var/cache/chef/failed-run-data.json
[Sat, 25 Feb 2012 02:14:22 +0000] ERROR: Exception handlers complete
[Sat, 25 Feb 2012 02:14:22 +0000] DEBUG: Re-raising exception: EOFError - end of file reached
/usr/lib/ruby/1.8/net/protocol.rb:135:in sysread' /usr/lib/ruby/1.8/net/protocol.rb:135:inrbuf_fill’

dunno what to do here…
The files that comprise your cookbooks aren’t stored in couch, they’re stored according to their MD5 under whatever you’ve configured for checksum_path in your server.rb. When you backup/restore your chef server, you need to restore these files also.

The error you’re seeing occurs because the file transfer happens outside of the normal request cycle, so server side errors aren’t converted to 500s[1].

To fix it, you can either bulk delete all cookbooks using the --purge option, or copy the files.

HTH,


Dan DeLeo

  1. Merb is trying to hand off the work to another thread (in the case of threaded servers) or the event loop (in case of evented servers like thin) so that the transfer can happen while other requests are processed. I’m not sure if this even works any more. In any case, I believe a user has contributed a fix for this problem.

#6

just to close out this thread:

to solve this on the newly built server, i purged all cookbooks, then
reuploaded all from my chef-repo in git. i didn’t attempt to backup and
restore the checksum_path dir.

the new restored server seems to be working ok.

thanks for the help!

kallen

On Fri, 24 Feb 2012, Daniel DeLeo wrote:

On Friday, February 24, 2012 at 6:18 PM, kallen@groknaut.net wrote:

On Fri, 24 Feb 2012, kallen@groknaut.net (mailto:kallen@groknaut.net) wrote:

On Sat, 25 Feb 2012, AJ Christensen wrote:

  1. wierd errors running a chef-client against new server, like:

did you copy the upload/copy the cookbooks to the new server? are they
in the file cache? Kind of looks like couchdb has information about
the files, but the files aren’t on the chef-server file system (hazard
guess)

i hadn’t done an upload/copy of the cookbooks to the new server because
i thought the cookbooks were loaded in the couchdb-load. when i run
knife cookbook list, it looks like all my cookbooks are already there.

regardless, i just reuploaded all cookbooks from my chef-repo to the new
server. i saw no upload errors. now, running chef-client on a
non-chef-server node, pointing chef_server_url directly at port 4000
rather than hitting nginx proxy over https, is erroring out on
"EOFError - end of file reached", which is similar to error shown in gist.

[Fri, 24 Feb 2012 23:13:08 +0000] DEBUG: Sending HTTP Request via GET to chef.venus.spergacula.com:4000/cookbooks/mrepo/0.0.1/files/c0cffe50c70191353d8bd9a8bd568ce5 (http://chef.venus.spergacula.com:4000/cookbooks/mrepo/0.0.1/files/c0cffe50c70191353d8bd9a8bd568ce5)
[Fri, 24 Feb 2012 23:13:08 +0000] ERROR: Running exception handlers
[Fri, 24 Feb 2012 23:13:08 +0000] FATAL: Saving node information to /var/cache/chef/failed-run-data.json
[Fri, 24 Feb 2012 23:13:08 +0000] ERROR: Exception handlers complete
[Fri, 24 Feb 2012 23:13:08 +0000] DEBUG: Re-raising exception: EOFError - end of file reached
/usr/lib/ruby/1.8/net/protocol.rb:135:in sysread' /usr/lib/ruby/1.8/net/protocol.rb:135:inrbuf_fill’
/usr/lib/ruby/1.8/timeout.rb:67:in timeout' /usr/lib/ruby/1.8/timeout.rb:101:intimeout’

i’m looking at chef-server debug log and couchdb log. but nothing is
jumping out at me as relevant. but i may not recognize relevancy…

last 3 lines of couchdb log fwiw:

[Fri, 24 Feb 2012 23:18:13 GMT] [info] [<0.17029.0>] 127.0.0.1 - - ‘POST’ /chef/_all_docs?include_docs=true 200
[Fri, 24 Feb 2012 23:18:14 GMT] [info] [<0.17030.0>] 127.0.0.1 - - ‘GET’ /chef/_design/id_map/_view/name_to_id?key=[%22client%22,%22admin2.venus.spergacula.com (http://22admin2.venus.spergacula.com)%22]&include_docs=true 200
[Fri, 24 Feb 2012 23:18:14 GMT] [info] [<0.17031.0>] 127.0.0.1 - - ‘GET’ /chef/_design/id_map/_view/name_to_id?key=[%22cookbook_version%22,%22mrepo-0.0.1%22]&include_docs=true 200

thoughts?

trying various things, flailing. for the heck of it:

  • deleted possible problematic cookbook (mrepo) and reuploaded. same errors.
  • deleted client node which i’m testing on (admin2), reran, same errors.
  • reloaded couchdb dump again

and now a different cookbook download claims “EOFError - end of file reached”.

[Sat, 25 Feb 2012 02:14:22 +0000] DEBUG: Sending HTTP Request via GET to cheftain04.venus.spergacula.com:4000/cookbooks/begin/0.0.1/files/e447bc0fcbef6dafe72007fe70507767 (http://cheftain04.venus.spergacula.com:4000/cookbooks/begin/0.0.1/files/e447bc0fcbef6dafe72007fe70507767)
[Sat, 25 Feb 2012 02:14:22 +0000] ERROR: Running exception handlers
[Sat, 25 Feb 2012 02:14:22 +0000] FATAL: Saving node information to /var/cache/chef/failed-run-data.json
[Sat, 25 Feb 2012 02:14:22 +0000] ERROR: Exception handlers complete
[Sat, 25 Feb 2012 02:14:22 +0000] DEBUG: Re-raising exception: EOFError - end of file reached
/usr/lib/ruby/1.8/net/protocol.rb:135:in sysread' /usr/lib/ruby/1.8/net/protocol.rb:135:inrbuf_fill’

dunno what to do here…
The files that comprise your cookbooks aren’t stored in couch, they’re stored according to their MD5 under whatever you’ve configured for checksum_path in your server.rb. When you backup/restore your chef server, you need to restore these files also.

The error you’re seeing occurs because the file transfer happens outside of the normal request cycle, so server side errors aren’t converted to 500s[1].

To fix it, you can either bulk delete all cookbooks using the --purge option, or copy the files.

HTH,


Dan DeLeo

  1. Merb is trying to hand off the work to another thread (in the case of threaded servers) or the event loop (in case of evented servers like thin) so that the transfer can happen while other requests are processed. I’m not sure if this even works any more. In any case, I believe a user has contributed a fix for this problem.