How to parse webpage with chef

spuder · November 10, 2016, 6:55pm

I’m automating a legacy web server. As part of the validation of the webserver, I need to scrape localhost/foobar to see if it contains the html body ‘Alive’

The following works, but:

feels not very ‘chef-y’, I’d rather not depend on nokogiri.
throws an ugly error

ruby_block 'validate legacy web health' do
  block do
    require 'nokogiri'
    doc = Nokogiri::HTML(open('http://localhost/foobar.aspx'))
    result = doc.at('body').content
    if result.strip.chomp != 'Alive'
      Chef::Application.fatal!('Web health checked failed')
    end
  end
  action :run
  only_if { node['roles'].include?('legacy-web') }
end

A better solution would be to use return codes (e.g 200, 500) however dev says they won’t refactor.

The body that the web server returns on success.

<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>

</title></head>
<body>
    Alive
</body>
</html>

I don’t see a way in the Chef::HTML class to parse html and return just the body.

Is there a cleaner way to do this?

jpujari · November 11, 2016, 3:05pm

You can use curl to parse html

curl -s http://localhost/foobar.aspx | grep Alive

it will return 0 if the ‘Alive’ is present or 1 if not present.

kawolfe · November 11, 2016, 3:08pm

Not sure this is the -best- way, but if you just want to use HTTP response codes you could try HTTParty.

require 'HTTParty'
page = HTTParty.get('http://localhost/foobar.aspx')
if page.code != 200
  <stuff> 
end

thommay · November 11, 2016, 3:13pm

We actually ship Nokogiri in Chef for cases like this (and because it’s a pain to compile). So if you can’t rely on just the status code, this is probably the best way.

Topic		Replies	Views
Nokogiri Woes Chef Infra (archive)	4	579	June 10, 2013
Capturing http response Chef Infra (archive)	1	449	June 16, 2014
HTML showing up on supermarket pages Chef Infra (archive)	1	267	June 11, 2015
RV: RE: Re: 500 "Internal Server Error" chef-server-webui version 0.9.12 Chef Infra (archive)	1	323	December 24, 2010
Embedded Chef Chef Infra (archive)	7	307	October 27, 2010

How to parse webpage with chef

Related topics