Opscode nagios quick start


#1

Hey guys,

Coming from long familiarity with puppet I am completely jazzed by what
I’ve seen so far of the capabilities of chef. Knife is simply the most
amazing server automation tool I’ve ever seen. Period. Especially for could
instances.

Ok, now that I’m done gushing let me describe the issue I’m having. The
nagios quickstart tutorial on the opscode wiki is astounding and works
amazingly well. Except, it’s not completely flawless in my case as you
might have gathered. For some reason my validation.pem never makes it to
the new ec2 server and therefore my chef (community) chef server cannot
validate the client, and apply the roles to the new instance. So I have to
log into the new server (which in it self is cool) to make sure I can. Then
I trey to run chef-client and have it complain about it not being able to
validate against the chef server. I then scp up my validation.pem into
place and add the roles to the new server on the command line. Then I run
chef-client again on the new instance and have my new nagios server and can
log into the web interface. Still amazing, but I want it to be as seemless
as the how-to implies it can be. :smiley: Here’s the link to the how-to for quick
reference:

http://wiki.opscode.com/display/chef/Nagios+Quick+Start

I followed all the steps of the tutorial including cloning the git repo
and especially these particular steps:

mkdir ~/nagios-quick-start/.chef
cp ~/chef-repo/.chef/knife.rb ~/nagios-quick-start/.chef
cp ~/chef-repo/.chef/USERNAME.pem ~/nagios-quick-start/.chef
cp ~/chef-repo/.chef/validation.pem ~/nagios-quick-start/.chef

Just to be clear I am using my own open source community chef server 10.16.
For the validation.pem I copied the one generated by my chef server into my
git repo. Just to be double sure I scp’d it down from my chef server to
~/nagios-quick-start/.chef and also of course copied over my knife.rb and
username.pem.

Here’s a quick look at my knife.rb


[dunphy@localhost:~/chef-repo] #cat ~/.chef/knife.rb
log_level :info
log_location STDOUT
node_name 'dunphy’
client_key '/Users/dunphy/.chef/dunphy.pem’
validation_client_name 'chef-validator’
validation_key '/etc/chef/validation.pem’
chef_server_url 'http://chef.mydomain.com:4000
cache_type 'BasicFile’
cache_options( :path => ‘/Users/dunphy/.chef/checksums’ )
cookbook_path ["/Users/dunphy/chef-repo/cookbooks"]

EC2 Authentication

knife[:aws_access_key_id] = "aws-key-here"
knife[:aws_secret_access_key] = “aws-secret-here”

Rackspace:

knife[:rackspace_api_key] = "rackspace-key-here"
knife[:rackspace_api_username] = “myuser”

Linode

knife[:linode_api_key] = “linode-key-here”


This is the exact command I used:


[dunphy@localhost:~] #knife ec2 server create -G default -I ami-7000f019 -f
m1.small -S mykeypair-aws -i ~/.ssh/id_rsa -x ubuntu -r
’role[production],role[base],role[monitoring]’


When I log into the new instance, this is what the client.rb looks like

root@domU-12-31-39-0E-09-37:~# cat /etc/chef/client.rb
log_level :info
log_location STDOUT
chef_server_url "http://chef.mydomain.com:4000"
validation_client_name "chef-validator"
node_name “i-23d72052”


This is what the chef-validator looks like on my chef command line:

[dunphy@localhost:~] #knife client show chef-validator
_rev: 1-ff7b4f7168c42a35431f815bd48ddbf2
admin: false
chef_type: client
json_class: Chef::ApiClient
name: chef-validator
public_key: -----BEGIN RSA PUBLIC KEY-----

MIIBCgKCAQEA6KepnhFvTGDXyhwFFc0gxO7exMgMqOcs5BPKa+0vo5ruC0jihz5I

CZoblwHTxzVoSryQY6kJzwJvD/S6csmDGu1Wr7wuY4hMr9vaAWv9t6ODfAX59VLT

dUlkas6KyXQdGWcYqMNaV0BSqd6/IqAOiEPdVx3TfGLMa9zc+odJ0tuqmIx7Line

Y4WtWYIctAp76RdyLLO78Vv06Mwd4CL8VSk+mT2eMZGiQL5zYf20S3zejsNFBHQo

0aA92RwmWm0x9zslPTBXBtQKM98KCR7tXDtTtkJUYD/5ne+Gl1Vzu/OHej4e3RpM
pz7TwsTrAu4SXXcUy22peVGpGivMMf61/QIDAQAB
-----END RSA PUBLIC KEY-----


Back on the new instance, if I cat the firstboot.json all looks well. It
looks as if the roles I applied to the server are listed:


root@domU-12-31-39-0E-09-37:~# cat /etc/chef/first-boot.json
{“run_list”:[“role[production]”,“role[base]”,“role[monitoring]”]}


But when I run chef-client this is what I see:

ubuntu@domU-12-31-39-15-1A-58:~$ sudo chef-client
[2012-12-31T04:54:32+00:00] INFO: *** Chef 10.16.2 ***
[2012-12-31T04:54:33+00:00] INFO: Client key /etc/chef/client.pem is not
present - registering
[2012-12-31T04:54:33+00:00] INFO: HTTP Request Returned 401 Unauthorized:
Failed to authenticate. Ensure that your client key is valid.

================================================================================
Chef encountered an error attempting to create the client “i-53ae5822”

Authentication Error:

Failed to authenticate to the chef server (http 401).

Server Response:

Failed to authenticate. Ensure that your client key is valid.

Relevant Config Settings:

chef_server_url "http://chef.mydomain.com:4000"
validation_client_name "chef-validator"
validation_key “/etc/chef/validation.pem”

If these settings are correct, your validation_key may be invalid.

[2012-12-31T04:54:33+00:00] FATAL: Stacktrace dumped to
/var/chef/cache/chef-stacktrace.out
[2012-12-31T04:54:33+00:00] FATAL: Net::HTTPServerException: 401
"Unauthorized"

When I do a knife client list and a knife node list I see the new ec2
instance. But the roles have not been applied. So i scp my validation.pem
up to the new instance, do a ‘diff’ between that one and the one at
/etc/chef/validation.pem and they are completely different.

And as mentioned I copy over the validation.pem to the right place and
bang! I have a new nagios server. I’m more or less willing to settle for
this level of coolness, but man it would be amazing if I could make it
seemless. Not to mention impress my coworkers at the big website where i
work who (believe it or not) are TOTALLY re-inventing the wheel by
invention their own ‘in-house’ version of chef. Or a very chef like command
line tool built in ruby that ties together puppet and mcollective. I kid
you not!

So the problem is that for some reason the only step that’s not working is
that the new instance is using a newly generated validation.pem from the
fresh chef install and not the one specified in my knife.rb. At least this
is what I believe should happen.

Can anyone be kind enough and perspicacious enough to point out where I’m
going wrong?

Thanks!
Tim

Thanks
Tim


GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


#2

Tim,

I believe the following part is what is overwriting your validating key
with something else.

I followed all the steps of the tutorial including cloning the git repo
and especially these particular steps:

mkdir ~/nagios-quick-start/.chef
cp ~/chef-repo/.chef/knife.rb ~/nagios-quick-start/.chef
cp ~/chef-repo/.chef/USERNAME.pem ~/nagios-quick-start/.chef
cp ~/chef-repo/.chef/validation.pem ~/nagios-quick-start/.chef

As per the link that you are following (
http://wiki.opscode.com/display/chef/Nagios+Quick+Start) this part is
valid only if you are using opscode chef server and not the community chef
server. If you have your own chef server, it may not be required to do
this step.

When you run knife ec2 server create, it will

  1. launch a new ec2 instance
  2. install chef on this instance
  3. bootstrap it. it will automatically copy the correct validation.pem key
    while this happens.

Can you remove the validation.pem from ~/nagios-quick-start/.chef and try.
Alternately, you could copy /etc/chef/validation.pem there.

Also, can you make sure that if you are using a AMI to launch instnace, it
does not have stale /etc/chef/validation.pem on it.

Thanks
Gourav Shah
Founder and Principal Consultant
Initcron | www.initcron.com


#3

you say you are putting validation.pem into ~/nagios-quick-start/.chef

but then in your knife.rb validation_key is set to
’/etc/chef/validation.pem’

change your knife.rb to point to
’/Users/dunphy/nagios-quick-start/.chef/validation.pem’ and perhaps that
should cover it? I’m guessing you have an old validation.pem on your
workstation in /etc/chef.

On Mon, Dec 31, 2012 at 1:34 AM, Gourav Shah gs@initcron.org wrote:

Tim,

I believe the following part is what is overwriting your validating key
with something else.

I followed all the steps of the tutorial including cloning the git repo
and especially these particular steps:

mkdir ~/nagios-quick-start/.chef
cp ~/chef-repo/.chef/knife.rb ~/nagios-quick-start/.chef
cp ~/chef-repo/.chef/USERNAME.pem ~/nagios-quick-start/.chef
cp ~/chef-repo/.chef/validation.pem ~/nagios-quick-start/.chef

As per the link that you are following (
http://wiki.opscode.com/display/chef/Nagios+Quick+Start) this part is
valid only if you are using opscode chef server and not the community chef
server. If you have your own chef server, it may not be required to do
this step.

When you run knife ec2 server create, it will

  1. launch a new ec2 instance
  2. install chef on this instance
  3. bootstrap it. it will automatically copy the correct validation.pem
    key while this happens.

Can you remove the validation.pem from ~/nagios-quick-start/.chef and
try. Alternately, you could copy /etc/chef/validation.pem there.

Also, can you make sure that if you are using a AMI to launch instnace, it
does not have stale /etc/chef/validation.pem on it.

Thanks
Gourav Shah
Founder and Principal Consultant
Initcron | www.initcron.com


#4

hey guys… yes as Jesse pointed out I was using an outdated validation.pem
at /etc/chef/validation.pem as referenced by my knife.rb. I actually
realized this when I noticed in a test run I did this morning was
referencing that file and failing to apply the roles I had specified on the
command line.

So I tried doing a ‘diff’ on /etc/chef/validation.pem and
/Users/dunphy/.chef/validation.pem and noticed that I recognized the cert
in /etc/chef as the one that i saw in /etc/chef/validation.pem on the ec2
instance that I created last night and failed. So, similar to what Jesse
suggested I copied the cert I had at .chef/validation.pem to
/etc/chef/validation.pem and voila! SUCCESS! I tried this before I saw
Jesse’ reply but I certainly appreciate both your input. I would certainly
have pulled what remains of my hair out trying to figure this out had I not
stumbled onto the answer myself or been advised of the right one.

Thanks again guys!
Tim

On Mon, Dec 31, 2012 at 10:09 AM, Jesse Campbell hikeit@gmail.com wrote:

you say you are putting validation.pem into ~/nagios-quick-start/.chef

but then in your knife.rb validation_key is set to
’/etc/chef/validation.pem’

change your knife.rb to point to
’/Users/dunphy/nagios-quick-start/.chef/validation.pem’ and perhaps that
should cover it? I’m guessing you have an old validation.pem on your
workstation in /etc/chef.

On Mon, Dec 31, 2012 at 1:34 AM, Gourav Shah gs@initcron.org wrote:

Tim,

I believe the following part is what is overwriting your validating key
with something else.

I followed all the steps of the tutorial including cloning the git repo
and especially these particular steps:

mkdir ~/nagios-quick-start/.chef
cp ~/chef-repo/.chef/knife.rb ~/nagios-quick-start/.chef
cp ~/chef-repo/.chef/USERNAME.pem ~/nagios-quick-start/.chef
cp ~/chef-repo/.chef/validation.pem ~/nagios-quick-start/.chef

As per the link that you are following (
http://wiki.opscode.com/display/chef/Nagios+Quick+Start) this part is
valid only if you are using opscode chef server and not the community chef
server. If you have your own chef server, it may not be required to do
this step.

When you run knife ec2 server create, it will

  1. launch a new ec2 instance
  2. install chef on this instance
  3. bootstrap it. it will automatically copy the correct validation.pem
    key while this happens.

Can you remove the validation.pem from ~/nagios-quick-start/.chef and
try. Alternately, you could copy /etc/chef/validation.pem there.

Also, can you make sure that if you are using a AMI to launch instnace,
it does not have stale /etc/chef/validation.pem on it.

Thanks
Gourav Shah
Founder and Principal Consultant
Initcron | www.initcron.com


GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


#5

On Monday, December 31, 2012 at 7:58 AM, Tim Dunphy wrote:

hey guys… yes as Jesse pointed out I was using an outdated validation.pem at /etc/chef/validation.pem as referenced by my knife.rb. I actually realized this when I noticed in a test run I did this morning was referencing that file and failing to apply the roles I had specified on the command line.

So I tried doing a ‘diff’ on /etc/chef/validation.pem and /Users/dunphy/.chef/validation.pem and noticed that I recognized the cert in /etc/chef as the one that i saw in /etc/chef/validation.pem on the ec2 instance that I created last night and failed. So, similar to what Jesse suggested I copied the cert I had at .chef/validation.pem to /etc/chef/validation.pem and voila! SUCCESS! I tried this before I saw Jesse’ reply but I certainly appreciate both your input. I would certainly have pulled what remains of my hair out trying to figure this out had I not stumbled onto the answer myself or been advised of the right one.

Thanks again guys!
Tim

FWIW, I wrote a key checker knife plugin. I probably ought to clean it up and add tests so it could be added to chef proper.

I developed it against Opscode Hosted Chef, and haven’t had an opportunity to test against the open source server, though I believe it should work.


Daniel DeLeo


#6

Hey Daniel,

Nice work and thanks! I’ll keep this in mind and give it a try if this
situation ever comes up again.

Best,
Tim

On Mon, Dec 31, 2012 at 1:46 PM, Daniel DeLeo dan@kallistec.com wrote:

On Monday, December 31, 2012 at 7:58 AM, Tim Dunphy wrote:

hey guys… yes as Jesse pointed out I was using an outdated validation.pem
at /etc/chef/validation.pem as referenced by my knife.rb. I actually
realized this when I noticed in a test run I did this morning was
referencing that file and failing to apply the roles I had specified on the
command line.

So I tried doing a ‘diff’ on /etc/chef/validation.pem and
/Users/dunphy/.chef/validation.pem and noticed that I recognized the cert
in /etc/chef as the one that i saw in /etc/chef/validation.pem on the ec2
instance that I created last night and failed. So, similar to what Jesse
suggested I copied the cert I had at .chef/validation.pem to
/etc/chef/validation.pem and voila! SUCCESS! I tried this before I saw
Jesse’ reply but I certainly appreciate both your input. I would certainly
have pulled what remains of my hair out trying to figure this out had I not
stumbled onto the answer myself or been advised of the right one.

Thanks again guys!
Tim

FWIW, I wrote a key checker knife plugin. I probably ought to clean it up
and add tests so it could be added to chef proper.

I developed it against Opscode Hosted Chef, and haven’t had an opportunity
to test against the open source server, though I believe it should work.

https://github.com/danielsdeleo/knife-plugins/blob/master/key_check.rb


Daniel DeLeo


GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B