Hello,
I'm still very new to chef so I'm hoping someone can point me in the right direction.
We're just beginning to roll out chef in our office and I have some questions related to the cookbooks and chef agents:
What would this community recommend I use to monitor that the chef cookbooks are running as scheduled. We have new relic, could it be used to alert us when cookbooks fail or did not run?
How do I monitor that chef agents are healthy on each machine it is installed on?
I'm looking for options whereby we can setup a daily task or process to either alert us or show us in a dashboard so we can investigate and keep our servers managed by Chef.
Thank you in advance for any advice you can provide me
The answer to your question is going to be very much dependent on the environment you're deploying to - particularly the OS, but also the method you use to trigger Chef Client runs.
For example, if you're running Windows Server and Chef runs via a Scheduled Task, you can have a Powershell query that will tell you if the task exists:
powershell -NoProfile -ExecutionPolicy Bypass -Command "Try { (Get-ScheduledTaskInfo -TaskName chef-client -ErrorAction Stop | Select LastRunTime).LastRunTime } Catch { 'No Scheduled Task Found' }"
If you're in Linux and running Chef via a service (that stays running in the background), you can run this Bash command to give you a count of the number of running chef-client
processes:
ps aux | grep chef-client | grep -v grep | wc -l
How you hook those into your monitoring system is up to you. Nagios and Zabbix, for example, are easily able to fire off those particular commands as part of their agent configuration (and Chef can even put them in place for you).
This should get you going in the right direction, but please don't hesitate to ask if you need more help.