How to parse webpage with chef

I’m automating a legacy web server. As part of the validation of the webserver, I need to scrape localhost/foobar to see if it contains the html body ‘Alive’

The following works, but:

  • feels not very ‘chef-y’, I’d rather not depend on nokogiri.
  • throws an ugly error
ruby_block 'validate legacy web health' do
  block do
    require 'nokogiri'
    doc = Nokogiri::HTML(open('http://localhost/foobar.aspx'))
    result = doc.at('body').content
    if result.strip.chomp != 'Alive'
      Chef::Application.fatal!('Web health checked failed')
    end
  end
  action :run
  only_if { node['roles'].include?('legacy-web') }
end

A better solution would be to use return codes (e.g 200, 500) however dev says they won’t refactor.

The body that the web server returns on success.

<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>

</title></head>
<body>
    Alive
</body>
</html>

I don’t see a way in the Chef::HTML class to parse html and return just the body.

Is there a cleaner way to do this?

You can use curl to parse html

curl -s http://localhost/foobar.aspx | grep Alive

it will return 0 if the ‘Alive’ is present or 1 if not present.

Not sure this is the -best- way, but if you just want to use HTTP response codes you could try HTTParty.

require 'HTTParty'
page = HTTParty.get('http://localhost/foobar.aspx')
if page.code != 200
  <stuff> 
end

We actually ship Nokogiri in Chef for cases like this (and because it’s a pain to compile). So if you can’t rely on just the status code, this is probably the best way.

1 Like