Bad Request "corruption"

Hi,
I have been using chef-client on Windows for the last few weeks. At
some point, the client simply stops working properly. I think it’s
couchdb related.

The stack trace on the client side looks like:
[Tue, 17 Aug 2010 09:07:20 -0700] WARN: HTTP Request Returned 500
Internal Server Error: 400 "Bad Request"
C:/Ruby187/lib/ruby/1.8/net/http.rb:2101:in error!': 500 "Internal Server Error " (Net::HTTPFatalError) from C:/Ruby187/lib/ruby/gems/1.8/gems/chef-0.9.8/bin/../lib/chef/rest.r b:229:inapi_request’

On the server with debug enabled, I see:
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Signatures match? : ‘true’
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Request time difference:
1.149902, within 900 seconds? : true
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Expected content hash is:
‘8C9sV3ubH2xoJ1cEq6/xwnPrL5o=’
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Request Content Hash is:
‘8C9sV3ubH2xoJ1cEq6/xwnPrL5o=’
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Hashes match?: true
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Sending HTTP Request via GET
to localhost:5984/chef/_design/id_map/_view/name_to_id
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Sending HTTP Request via GET
to localhost:5984/chef/_design/id_map/_view/name_to_id
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Sending HTTP Request via PUT
to localhost:5984/chef/a1773816-65ab-4026-a342-112ef9438106
[Tue, 17 Aug 2010 16:14:30 +0000] WARN: HTTP Request Returned 400 Bad
Request: bad_request

I can’t come up with a set of reproducible steps. It just seems to
"happen" suddenly. That document_id corresponds to the Node obviously,
but I am not sure how to get more information from Couch?

I would appreciate thoughts on how I can debug this further and try to
figure out what’s going on. This is a total show stopper for me since
I can’t figure out how to prevent or fix short of deleting the node
and readding it.

Help!

Thanks,
Landon

Hi Landon,

A couple of ideas below...

On Tue, Aug 17, 2010 at 9:31 AM, Landon chef@lclark.net wrote:

The stack trace on the client side looks like:
[Tue, 17 Aug 2010 09:07:20 -0700] WARN: HTTP Request Returned 500
Internal Server Error: 400 "Bad Request"
C:/Ruby187/lib/ruby/1.8/net/http.rb:2101:in error!': 500 "Internal Server Error " (Net::HTTPFatalError) from C:/Ruby187/lib/ruby/gems/1.8/gems/chef-0.9.8/bin/../lib/chef/rest.r b:229:in api_request'
.....

Once this starts happening, does it continue to occur for subsequent
chef-client runs?

On the server with debug enabled, I see:
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Signatures match? : 'true'
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Request time difference:
1.149902, within 900 seconds? : true
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Expected content hash is:
'8C9sV3ubH2xoJ1cEq6/xwnPrL5o='
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Request Content Hash is:
'8C9sV3ubH2xoJ1cEq6/xwnPrL5o='
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Hashes match?: true
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Sending HTTP Request via GET
to localhost:5984/chef/_design/id_map/_view/name_to_id
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Sending HTTP Request via GET
to localhost:5984/chef/_design/id_map/_view/name_to_id
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Sending HTTP Request via PUT
to localhost:5984/chef/a1773816-65ab-4026-a342-112ef9438106
[Tue, 17 Aug 2010 16:14:30 +0000] WARN: HTTP Request Returned 400 Bad
Request: bad_request

I can't come up with a set of reproducible steps. It just seems to
"happen" suddenly. That document_id corresponds to the Node obviously,
but I am not sure how to get more information from Couch?

I would appreciate thoughts on how I can debug this further and try to
figure out what's going on. This is a total show stopper for me since
I can't figure out how to prevent or fix short of deleting the node
and readding it.

Is there anything of interest in the couchdb log file? One thing you
could try to get more info from couchdb is to turn on debug logging
(http://localhost:5984/_utils/config.html look for the log section).

  • seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

Yes, it happens during every run. I will have to check the log. I
will turn on teh debug logging and send what I see.

Thanks,
Landon

On Tue, Aug 17, 2010 at 11:26 AM, Seth Falcon seth@userprimary.net wrote:

Hi Landon,

A couple of ideas below...

On Tue, Aug 17, 2010 at 9:31 AM, Landon chef@lclark.net wrote:

The stack trace on the client side looks like:
[Tue, 17 Aug 2010 09:07:20 -0700] WARN: HTTP Request Returned 500
Internal Server Error: 400 "Bad Request"
C:/Ruby187/lib/ruby/1.8/net/http.rb:2101:in error!': 500 "Internal Server Error " (Net::HTTPFatalError) from C:/Ruby187/lib/ruby/gems/1.8/gems/chef-0.9.8/bin/../lib/chef/rest.r b:229:in api_request'
.....

Once this starts happening, does it continue to occur for subsequent
chef-client runs?

On the server with debug enabled, I see:
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Signatures match? : 'true'
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Request time difference:
1.149902, within 900 seconds? : true
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Expected content hash is:
'8C9sV3ubH2xoJ1cEq6/xwnPrL5o='
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Request Content Hash is:
'8C9sV3ubH2xoJ1cEq6/xwnPrL5o='
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Hashes match?: true
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Sending HTTP Request via GET
to localhost:5984/chef/_design/id_map/_view/name_to_id
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Sending HTTP Request via GET
to localhost:5984/chef/_design/id_map/_view/name_to_id
[Tue, 17 Aug 2010 16:14:29 +0000] DEBUG: Sending HTTP Request via PUT
to localhost:5984/chef/a1773816-65ab-4026-a342-112ef9438106
[Tue, 17 Aug 2010 16:14:30 +0000] WARN: HTTP Request Returned 400 Bad
Request: bad_request

I can't come up with a set of reproducible steps. It just seems to
"happen" suddenly. That document_id corresponds to the Node obviously,
but I am not sure how to get more information from Couch?

I would appreciate thoughts on how I can debug this further and try to
figure out what's going on. This is a total show stopper for me since
I can't figure out how to prevent or fix short of deleting the node
and readding it.

Is there anything of interest in the couchdb log file? One thing you
could try to get more info from couchdb is to turn on debug logging
(http://localhost:5984/_utils/config.html look for the log section).

  • seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

On Tue, Aug 17, 2010 at 12:17 PM, Landon chef@lclark.net wrote:

Yes, it happens during every run. I will have to check the log. I
will turn on teh debug logging and send what I see.

Another thing you could explore is capturing the HTTP request that is
erroring out using ngrep (or other packet sniffer tool). If we could
see what is getting sent, we might be able to determine where things
are going astray. If you pursue this route, you could capture the
communication between chef-server and couchdb as well as between the
client and chef-server.

  • seth

Hi Seth,
First of all, thanks for helping with this. There seems to be a
couple of things going on:
I find the error message in the couch log. The start looks like this:
[Tue, 17 Aug 2010 16:14:30 GMT] [error] [<0.24065.13>] attempted
upload of invalid JSON

The invalid JSON is 200KB large!! Is this normal? It appears that
OHAI on windows is posting LARGE amounts of data. I have attached the
whole error line here but it seems like there is something broken if
OHAI is posting this much data, right?

The other strange thing is a few hours later the server started
rejecting the Client key so I had to delete. now the Client Node
connects fine, but this is a ticking time bomb I feel.

Please tell me I am not crazy??

Thanks,
Landon

On Tue, Aug 17, 2010 at 2:40 PM, Seth Falcon seth@userprimary.net wrote:

On Tue, Aug 17, 2010 at 12:17 PM, Landon chef@lclark.net wrote:

Yes, it happens during every run. I will have to check the log. I
will turn on teh debug logging and send what I see.

Another thing you could explore is capturing the HTTP request that is
erroring out using ngrep (or other packet sniffer tool). If we could
see what is getting sent, we might be able to determine where things
are going astray. If you pursue this route, you could capture the
communication between chef-server and couchdb as well as between the
client and chef-server.

  • seth

A couple of more points:

My version of CouchDB is 0.10.0, the default in Ubuntu 10.4 it
appears. Is anyone else having issues with this version of CouchDB?

The JSON that I posted, though large, does appear to be valid
according to JSON lint...

Again, appreciate all thoughts.

Landon

On Tue, Aug 17, 2010 at 3:31 PM, Landon chef@lclark.net wrote:

Hi Seth,
First of all, thanks for helping with this. There seems to be a
couple of things going on:
I find the error message in the couch log. The start looks like this:
[Tue, 17 Aug 2010 16:14:30 GMT] [error] [<0.24065.13>] attempted
upload of invalid JSON

The invalid JSON is 200KB large!! Is this normal? It appears that
OHAI on windows is posting LARGE amounts of data. I have attached the
whole error line here but it seems like there is something broken if
OHAI is posting this much data, right?

The other strange thing is a few hours later the server started
rejecting the Client key so I had to delete. now the Client Node
connects fine, but this is a ticking time bomb I feel.

Please tell me I am not crazy??

Thanks,
Landon

On Tue, Aug 17, 2010 at 2:40 PM, Seth Falcon seth@userprimary.net wrote:

On Tue, Aug 17, 2010 at 12:17 PM, Landon chef@lclark.net wrote:

Yes, it happens during every run. I will have to check the log. I
will turn on teh debug logging and send what I see.

Another thing you could explore is capturing the HTTP request that is
erroring out using ngrep (or other packet sniffer tool). If we could
see what is getting sent, we might be able to determine where things
are going astray. If you pursue this route, you could capture the
communication between chef-server and couchdb as well as between the
client and chef-server.

  • seth

On Tue, Aug 17, 2010 at 4:01 PM, Landon chef@lclark.net wrote:

A couple of more points:

My version of CouchDB is 0.10.0, the default in Ubuntu 10.4 it
appears. Is anyone else having issues with this version of CouchDB?

I haven't seen this problem before.

The invalid JSON is 200KB large!! Is this normal? It appears that
OHAI on windows is posting LARGE amounts of data. I have attached the
whole error line here but it seems like there is something broken if
OHAI is posting this much data, right?

The general philosophy with ohai is to collect as much info as
possible, since disk is cheap and by the time you know you need it, it
could be too late. That said, you can opt out to any plugins you don't
want by including something like the following in your client.rb:

Ohai::Config[:disabled_plugins] << 'darwin::system_profiler' <<
'darwin::kernel' << 'darwin::ssh_host_key'

HTH,
Dan DeLeo

My version of CouchDB is 0.10.0, the default in Ubuntu 10.4 it
appears. Is anyone else having issues with this version of CouchDB?

I haven't seen this problem before.

Hmmm. I was afraid that was going to be the general consensus. I am
loathe to upgrade CouchDB since I am guessing most people run Chef
just fine on 0.10.0...

The invalid JSON is 200KB large!! Is this normal? It appears that
OHAI on windows is posting LARGE amounts of data. I have attached the
whole error line here but it seems like there is something broken if
OHAI is posting this much data, right?

The general philosophy with ohai is to collect as much info as
possible, since disk is cheap and by the time you know you need it, it
could be too late. That said, you can opt out to any plugins you don't
want by including something like the following in your client.rb:

Ohai::Config[:disabled_plugins] << 'darwin::system_profiler' <<
'darwin::kernel' << 'darwin::ssh_host_key'

HTH,
Dan DeLeo

That does help. Good to know. I see your point on cheap disk, etc.
And the obvious point is that the request has been large during the
entire development cycle and didn't bomb earlier. Like I said before,
I am concerned that we are trying to roll out an infrastructure using
Chef but I feel like there is a time-bomb somewhere in the stack
waiting to go off and I can't find the right wire to cut (ok I took
that metaphor too far...)

Thanks to both of you for the help. Really appreciated.

Landon

On 8/17/10 6:10 PM, Landon wrote:

My version of CouchDB is 0.10.0, the default in Ubuntu 10.4 it
appears. Is anyone else having issues with this version of CouchDB?

I haven't seen this problem before.

Hmmm. I was afraid that was going to be the general consensus. I am
loathe to upgrade CouchDB since I am guessing most people run Chef
just fine on 0.10.0...

One thing you might try is to create a test database in couchdb and try
putting the node data JSON into a document to see if you can reproduce
the error.

Using the latest couchdb I was not able to repro the error with these steps:

start couchdb in a terminal

couchdb

in another terminal

curl -X PUT http://127.0.0.1:5984/test
curl -X PUT -H "content-type: application/json"
http://127.0.0.1:5984/test/doc -d @badjson.json
curl http://127.0.0.1:5984/test/doc

  • seth

One other thought:

Can you grab the last successfully saved node JSON from couchdb and
compare with the JSON that fails? That might shed some light (or rule
something out) as well.

Seth,
That's a great idea! Thanks. I tested the JSON with a fresh .10
install and an error was thrown so I have much higher hopes that
upgrading Couch will make this go away! Thanks for the help. I
really do appreciate it.

Landon

One thing you might try is to create a test database in couchdb and try
putting the node data JSON into a document to see if you can reproduce
the error.

Using the latest couchdb I was not able to repro the error with these steps:

start couchdb in a terminal

couchdb

in another terminal

curl -X PUT http://127.0.0.1:5984/test
curl -X PUT -H "content-type: application/json"
http://127.0.0.1:5984/test/doc -d @badjson.json
curl http://127.0.0.1:5984/test/doc

  • seth

Seth,
Just wanted to be clear. You were able to upload the JSON I sent into
the latest version of CouchDB? I installed the latest version on my
development machine (OSX 10.6.4) using brew and I get the same error
that I get on .10.0 on Ubuntu 10.4-
{"error":"bad_request","reason":"invalid UTF-8 JSON"}

Thanks,
landon

On Thu, Aug 19, 2010 at 2:15 PM, Landon chef@lclark.net wrote:

Seth,
That's a great idea! Thanks. I tested the JSON with a fresh .10
install and an error was thrown so I have much higher hopes that
upgrading Couch will make this go away! Thanks for the help. I
really do appreciate it.

Landon

One thing you might try is to create a test database in couchdb and try
putting the node data JSON into a document to see if you can reproduce
the error.

Using the latest couchdb I was not able to repro the error with these steps:

start couchdb in a terminal

couchdb

in another terminal

curl -X PUT http://127.0.0.1:5984/test
curl -X PUT -H "content-type: application/json"
http://127.0.0.1:5984/test/doc -d @badjson.json
curl http://127.0.0.1:5984/test/doc

  • seth

Hi Landon,

On Thu, Aug 19, 2010 at 3:46 PM, Landon chef@lclark.net wrote:

Seth,
Just wanted to be clear. You were able to upload the JSON I sent into
the latest version of CouchDB? I installed the latest version on my
development machine (OSX 10.6.4) using brew and I get the same error
that I get on .10.0 on Ubuntu 10.4-
{"error":"bad_request","reason":"invalid UTF-8 JSON"}

Yes, I took the JSON you sent to the list and was able to do a PUT to
a local couchdb instance at the command line using curl. This was on
OS X.

I wonder if this is somehow a locale issue? I have:

echo $LANG
en_US.utf-8

  • seth

On Thu, Aug 19, 2010 at 3:46 PM, Landon chef@lclark.net wrote:

Seth,
Just wanted to be clear. You were able to upload the JSON I sent into
the latest version of CouchDB? I installed the latest version on my
development machine (OSX 10.6.4) using brew and I get the same error
that I get on .10.0 on Ubuntu 10.4-
{"error":"bad_request","reason":"invalid UTF-8 JSON"}

Thanks,
landon

I looked at the attached JSON ran it through iconv to see if it had
invalid UTF-8 characters, and I think it does. In my terminal, which
is set to UTF-8, I see stuff like this:

(from Landon Clark<92>s )

If I run it through iconv and specify an input encoding ISO-8859-1
then iconv doesn't report any errors. I know we've dealt with encoding
problems in windows on ohai before, but I wasn't involved so I don't
know what was done about it. I'll ask around.

Dan DeLeo

I looked at the attached JSON ran it through iconv to see if it had
invalid UTF-8 characters, and I think it does. In my terminal, which
is set to UTF-8, I see stuff like this:

(from Landon Clark<92>s )

If I run it through iconv and specify an input encoding ISO-8859-1
then iconv doesn't report any errors. I know we've dealt with encoding
problems in windows on ohai before, but I wasn't involved so I don't
know what was done about it. I'll ask around.
Thanks Dan. I appreciate anything you can find! This has been a real
thorn in our side.

Landon