Validation.pem seems to stop working after 24-48 hours


#1

I’ve got a strange issue where my bootstraps start failing after using
the same validation.pem for more than 24-48 hours. The only way to fix
it is to do a “knife client regenerate” to generate a new private key.
Would really like to figure out why validation.pem stops working for
authentication after 24 hours. When I run “openssl rsa -in
validation.pem -pubout” I get the same public key that’s listed in the
Chef WebUI for chef-validator. It’s not a time sync problem, because
if I run knife with my own client key for my laptop, authentication is
fine, but the minute I do “-k validation.pem” I get 401 Unauthorized.
Any help is greatly appreciated. Regenerating chef-validator every 24
hours is getting old.

-J


#2

Run NTP and keep your times in sync

On 17 May 2011 09:24, Jason J. W. Williams jasonjwwilliams@gmail.com wrote:

I’ve got a strange issue where my bootstraps start failing after using
the same validation.pem for more than 24-48 hours. The only way to fix
it is to do a “knife client regenerate” to generate a new private key.
Would really like to figure out why validation.pem stops working for
authentication after 24 hours. When I run “openssl rsa -in
validation.pem -pubout” I get the same public key that’s listed in the
Chef WebUI for chef-validator. It’s not a time sync problem, because
if I run knife with my own client key for my laptop, authentication is
fine, but the minute I do “-k validation.pem” I get 401 Unauthorized.
Any help is greatly appreciated. Regenerating chef-validator every 24
hours is getting old.

-J


#3

Run NTP and keep your times in sync

As noted, on the same workstation my own client key works but
chef-validator does not, so the NTP is not an issue (otherwise both
keys would fail). Also, on the client servers I’m trying to bootstrap,
the first thing we do before the bootstrap is to run "ntpdate " to sync the time. It’s not an NTP issue.

-J


#4

Are you doing any custom couchdb compaction?

On Mon, May 16, 2011 at 4:29 PM, Jason J. W. Williams <
jasonjwwilliams@gmail.com> wrote:

Run NTP and keep your times in sync

As noted, on the same workstation my own client key works but
chef-validator does not, so the NTP is not an issue (otherwise both
keys would fail). Also, on the client servers I’m trying to bootstrap,
the first thing we do before the bootstrap is to run "ntpdate " to sync the time. It’s not an NTP issue.

-J


Charles Sullivan
charlie.sullivan@gmail.com


#5

On Mon, May 16, 2011 at 3:32 PM, Charles Sullivan
charlie.sullivan@gmail.com wrote:

Are you doing any custom couchdb compaction?

No we’re not. Running Chef Server 0.9.16 from debs without any
customization beyond the config files.

-J


#6

Yo,

On 17 May 2011 09:34, Jason J. W. Williams jasonjwwilliams@gmail.com wrote:

On Mon, May 16, 2011 at 3:32 PM, Charles Sullivan
charlie.sullivan@gmail.com wrote:

Are you doing any custom couchdb compaction?

No we’re not. Running Chef Server 0.9.16 from debs without any
customization beyond the config files.

Sorry about the NTP dig. Can you post -l debug output from both client
and server showing the full authentication failure backtrace(s)?

–AJ


#7

On Mon, May 16, 2011 at 2:24 PM, Jason J. W. Williams
jasonjwwilliams@gmail.com wrote:

I’ve got a strange issue where my bootstraps start failing after using
the same validation.pem for more than 24-48 hours. The only way to fix
it is to do a “knife client regenerate” to generate a new private key.
Would really like to figure out why validation.pem stops working for
authentication after 24 hours. When I run “openssl rsa -in
validation.pem -pubout” I get the same public key that’s listed in the
Chef WebUI for chef-validator. It’s not a time sync problem, because
if I run knife with my own client key for my laptop, authentication is
fine, but the minute I do “-k validation.pem” I get 401 Unauthorized.
Any help is greatly appreciated. Regenerating chef-validator every 24
hours is getting old.

I know you’re going to hate my asking this - but are you sure you
aren’t having a time problem? On your laptop, when you use -k, are you
also setting -u to the validators client name?

Adam


Opscode, Inc.
Adam Jacob, Chief Product Officer
T: (206) 619-7151 E: adam@opscode.com


#8

I know you’re going to hate my asking this - but are you sure you
aren’t having a time problem? On your laptop, when you use -k, are you
also setting -u to the validators client name?

Setting -u to chef-validator. Both my server, my laptop and my server
are using time.xmission.com for NTP syncing and have the same time
when I run date.

-J


#9

Cool - as AJ said, let’s check out the client and server side debug logs.

Adam

On Mon, May 16, 2011 at 2:54 PM, Jason J. W. Williams
jasonjwwilliams@gmail.com wrote:

I know you’re going to hate my asking this - but are you sure you
aren’t having a time problem? On your laptop, when you use -k, are you
also setting -u to the validators client name?

Setting -u to chef-validator. Both my server, my laptop and my server
are using time.xmission.com for NTP syncing and have the same time
when I run date.

-J


Opscode, Inc.
Adam Jacob, Chief Product Officer
T: (206) 619-7151 E: adam@opscode.com


#10

Hi AJ,

Sorry about the NTP dig. Can you post -l debug output from both client
and server showing the full authentication failure backtrace(s)?

No worries. Here’s the client side debug: https://gist.github.com/975464

Server side debug: https://gist.github.com/975480

What’s interesting is that in the server side debug it says the
expected hash matches the requested hash.

-J


#11

Actually use this gist for the client: https://gist.github.com/975507

On Mon, May 16, 2011 at 4:04 PM, Jason J. W. Williams
jasonjwwilliams@gmail.com wrote:

Hi AJ,

Sorry about the NTP dig. Can you post -l debug output from both client
and server showing the full authentication failure backtrace(s)?

No worries. Here’s the client side debug: https://gist.github.com/975464

Server side debug: https://gist.github.com/975480

What’s interesting is that in the server side debug it says the
expected hash matches the requested hash.

-J


#12

On Monday, May 16, 2011 at 3:18 PM, Jason J. W. Williams wrote:
Actually use this gist for the client: https://gist.github.com/975507

On Mon, May 16, 2011 at 4:04 PM, Jason J. W. Williams
jasonjwwilliams@gmail.com wrote:

Hi AJ,

Sorry about the NTP dig. Can you post -l debug output from both client
and server showing the full authentication failure backtrace(s)?

No worries. Here’s the client side debug: https://gist.github.com/975464

Server side debug: https://gist.github.com/975480

What’s interesting is that in the server side debug it says the
expected hash matches the requested hash.
The signature is incorrect, though, so the private key used to sign the request doesn’t match the public being used to verify the signature.

Are you deleting /etc/chef/validation.pem on the server for any reason? Is there anything else on the server side that correlates with the validation.pem going bad, such as restarts for logrotation?


Dan DeLeo

-J


#13

Hi Dan,

The signature is incorrect, though, so the private key used to sign the
request doesn’t match the public being used to verify the signature.
Are you deleting /etc/chef/validation.pem on the server for any reason? Is
there anything else on the server side that correlates with the
validation.pem going bad, such as restarts for logrotation?

By “on the server” I assume you mean on the server being provisioned
via chef-client? validation.pem on servers being provisioned is loaded
via “knife bootstrap” from my workstation, and it’s not changing on my
workstation. If I run “openssl rsa -noout -modulus -in validation.pem
| openssl md5” I get:

9b2a64dd6acd1e5337b5804886841208

However, if I run “openssl rsa -noout -modulus -pubin -in | openssl
md5” on the public key as shown in the Chef console I get errors:

unable to load Public Key
48166:error:0D0680A8:asn1 encoding routines:ASN1_CHECK_TLEN:wrong
tag:/SourceCache/OpenSSL098/OpenSSL098-35/src/crypto/asn1/tasn_dec.c:1316:
48166:error:0D07803A:asn1 encoding routines:ASN1_ITEM_EX_D2I:nested
asn1 error:/SourceCache/OpenSSL098/OpenSSL098-35/src/crypto/asn1/tasn_dec.c:380:Type=X509_ALGOR
48166:error:0D08303A:asn1 encoding
routines:ASN1_TEMPLATE_NOEXP_D2I:nested asn1
error:/SourceCache/OpenSSL098/OpenSSL098-35/src/crypto/asn1/tasn_dec.c:748:Field=algor,
Type=X509_PUBKEY
48166:error:0906700D:PEM routines:PEM_ASN1_read_bio:ASN1
lib:/SourceCache/OpenSSL098/OpenSSL098-35/src/crypto/pem/pem_oth.c:83:

It’s almost as if the public key in Chef has become corrupted, which
would seem to explain the “padding error” message on the server side
logs.

-J


#14

So the last time I regenerated the validation key, I saved a copy of
the public key as seen from the chef UI. Now if I compare that saved
copy of the public key against what is now reported from the WebUI
(now that the validation.pem is not working again) they’re different.
I’m the only one using the Chef server at the moment and I haven’t
regenerated the validation key. So my question else could cause the
public key on the chef server to change?

-J

On Mon, May 16, 2011 at 7:24 PM, Daniel DeLeo dan@kallistec.com wrote:

On Monday, May 16, 2011 at 3:18 PM, Jason J. W. Williams wrote:

Actually use this gist for the client: https://gist.github.com/975507

On Mon, May 16, 2011 at 4:04 PM, Jason J. W. Williams
jasonjwwilliams@gmail.com wrote:

Hi AJ,

Sorry about the NTP dig. Can you post -l debug output from both client
and server showing the full authentication failure backtrace(s)?

No worries. Here’s the client side debug: https://gist.github.com/975464

Server side debug: https://gist.github.com/975480

What’s interesting is that in the server side debug it says the
expected hash matches the requested hash.

The signature is incorrect, though, so the private key used to sign the
request doesn’t match the public being used to verify the signature.
Are you deleting /etc/chef/validation.pem on the server for any reason? Is
there anything else on the server side that correlates with the
validation.pem going bad, such as restarts for logrotation?


Dan DeLeo

-J


#15

On Tuesday, May 24, 2011 at 11:46 AM, Jason J. W. Williams wrote:

So the last time I regenerated the validation key, I saved a copy of
the public key as seen from the chef UI. Now if I compare that saved
copy of the public key against what is now reported from the WebUI
(now that the validation.pem is not working again) they’re different.
I’m the only one using the Chef server at the moment and I haven’t
regenerated the validation key. So my question else could cause the
public key on the chef server to change?

-J

You’re definitely not deleting /etc/chef/validation.pem from the filesystem of the chef-server box?

If you delete it, it will be regenerated on the next restart.


Dan DeLeo


#16

You’re definitely not deleting /etc/chef/validation.pem from the filesystem of the chef-server box?

If you delete it, it will be regenerated on the next restart.

Since I wasn’t doing it, I tried stopping the chef-client on the
server for 48 hours, and low and behold the problem went away. Thank
you for your comment above, it made me go through the run list and I
found the chef::delete_validation recipe was getting run. I didn’t
realize deleting validation.pem on the chef-server would cause a
regeneration…figured the private key was only used by the
chef-client. Thank you for your patience and help.

-J


#17

I worked around this by naming the file validation-server.pem and updating the server config as necessary. I’ll get a ticket in for that shortly.

– Mason Turner (mobile)

On May 26, 2011, at 3:01 PM, “Jason J. W. Williams” jasonjwwilliams@gmail.com wrote:

You’re definitely not deleting /etc/chef/validation.pem from the filesystem of the chef-server box?

If you delete it, it will be regenerated on the next restart.

Since I wasn’t doing it, I tried stopping the chef-client on the
server for 48 hours, and low and behold the problem went away. Thank
you for your comment above, it made me go through the run list and I
found the chef::delete_validation recipe was getting run. I didn’t
realize deleting validation.pem on the chef-server would cause a
regeneration…figured the private key was only used by the
chef-client. Thank you for your patience and help.

-J