Strange delay between bootstrapping hosts


#1

Strange delay between bootstrapping hosts.

Want to know where to look. Any ideas?

Here is the problem:

We have two Openstack deployments/builds. OpenstackX and OpenstackY

If I do this on OpenstackX it completes the bootstrap and chef run ~1 min 43 seconds:

time knife openstack server create -N justinblahblah -f 958e7841-a90d-45ff-acf8-8a9c946bc5fe -S justin-openstack -G justin --network-ids c60e802c-516d-42f8-8b01-d9b120f245a9 --bootstrap-network dev-1-private-1a -i ~/.ssh/justin-openstack.pem -r

If I do the same thing on OpenstackY it completes the bootstrap and chef run in ~5min 30 seconds

time knife openstack server create -N justinblahblah -f 958e7841-a90d-45ff-acf8-8a9c946bc5fe -S justin-openstack -G justin --network-ids c60e802c-516d-42f8-8b01-d9b120f245a9 --bootstrap-network dev-1-private-1a -i ~/.ssh/justin-openstack.pem -r ?

Yes, the Openstack environments are a bit different. But data transfer testing in both environments to same Chef server reveals fast performance. Any ideas on where to look to find where the delays are coming?


Justin Franks
Lead Operations Engineer
SaaS, Cloud, Data Centers & Infrastructure
Lithium Technologies, Inc
225 Bush St., 15th Floor
San Francisco, CA 94104
tel: +1 415 757 3100 x3219


#2

On Jan 20, 2015, at 7:08 PM, Justin Franks justin.franks@lithium.com wrote:

Strange delay between bootstrapping hosts.
Want to know where to look. Any ideas?
Here is the problem:
We have two Openstack deployments/builds. OpenstackX and OpenstackY

If I do this on OpenstackX it completes the bootstrap and chef run ~1 min 43 seconds:
time knife openstack server create -N justinblahblah -f 958e7841-a90d-45ff-acf8-8a9c946bc5fe -S justin-openstack -G justin --network-ids c60e802c-516d-42f8-8b01-d9b120f245a9 --bootstrap-network dev-1-private-1a -i ~/.ssh/justin-openstack.pem -r

If I do the same thing on OpenstackY it completes the bootstrap and chef run in ~5min 30 seconds
time knife openstack server create -N justinblahblah -f 958e7841-a90d-45ff-acf8-8a9c946bc5fe -S justin-openstack -G justin --network-ids c60e802c-516d-42f8-8b01-d9b120f245a9 --bootstrap-network dev-1-private-1a -i ~/.ssh/justin-openstack.pem -r ​

Yes, the Openstack environments are a bit different. But data transfer testing in both environments to same Chef server reveals fast performance. Any ideas on where to look to find where the delays are coming?

I would probably write a report handler to do this. You can iterate over run_status.all_resources or run_status.updated_resources; for each resource, there is an elapsed_time accessor that hangs off it that you can do something with.

  • Julian

[ Julian C. Dunn jdunn@aquezada.com * Sorry, I’m ]
[ WWW: http://www.aquezada.com/staff/julian * only Web 1.0 ]
[ gopher://sdf.org/1/users/keymaker/ * compliant! ]
[ PGP: 91B3 7A9D 683C 7C16 715F 442C 6065 D533 FDC2 05B9 ]


#3

On Wednesday, January 21, 2015 at 7:36 AM, Julian C. Dunn wrote:

On Jan 20, 2015, at 7:08 PM, Justin Franks <justin.franks@lithium.com (mailto:justin.franks@lithium.com)> wrote:

Strange delay between bootstrapping hosts.
Want to know where to look. Any ideas?
Here is the problem:
We have two Openstack deployments/builds. OpenstackX and OpenstackY

If I do this on OpenstackX it completes the bootstrap and chef run ~1 min 43 seconds:
time knife openstack server create -N justinblahblah -f 958e7841-a90d-45ff-acf8-8a9c946bc5fe -S justin-openstack -G justin --network-ids c60e802c-516d-42f8-8b01-d9b120f245a9 --bootstrap-network dev-1-private-1a -i ~/.ssh/justin-openstack.pem -r

If I do the same thing on OpenstackY it completes the bootstrap and chef run in ~5min 30 seconds
time knife openstack server create -N justinblahblah -f 958e7841-a90d-45ff-acf8-8a9c946bc5fe -S justin-openstack -G justin --network-ids c60e802c-516d-42f8-8b01-d9b120f245a9 --bootstrap-network dev-1-private-1a -i ~/.ssh/justin-openstack.pem -r ​

Yes, the Openstack environments are a bit different. But data transfer testing in both environments to same Chef server reveals fast performance. Any ideas on where to look to find where the delays are coming?
Chef outputs the elapsed time at the end of the run, so if those times are similar, then it’s probably something to do with bootstrapping or openstack. If the chef runs account for the time difference, then use a profiler as Julian suggested (more on that below) to help pinpoint why Chef is running slower.

I would probably write a report handler to do this. You can iterate over run_status.all_resources or run_status.updated_resources; for each resource, there is an elapsed_time accessor that hangs off it that you can do something with.

  • Julian

There’s a handful of Chef profiler stuff already out there, this one was the first hit on google, but I know I’ve seen others: https://github.com/joemiller/chef-handler-profiler

[ Julian C. Dunn <jdunn@aquezada.com (mailto:jdunn@aquezada.com)> * Sorry, I’m ]
[ WWW: http://www.aquezada.com/staff/julian * only Web 1.0 ]
[ gopher://sdf.org/1/users/keymaker/ (http://sdf.org/1/users/keymaker/) * compliant! ]
[ PGP: 91B3 7A9D 683C 7C16 715F 442C 6065 D533 FDC2 05B9 ]


Daniel DeLeo


#4

Found the problem.
It is related to the Contrail SDN (Software Defined Networking) we use in our Prod Openstack. Bug in it.


Justin Franks
Lead Operations Engineer
SaaS, Cloud, Data Centers & Infrastructure
Lithium Technologies, Inc
225 Bush St., 15th Floor
San Francisco, CA 94104
tel: +1 415 757 3100 x3219


From: Daniel DeLeo ddeleo@kallistec.com on behalf of Daniel DeLeo dan@kallistec.com
Sent: Wednesday, January 21, 2015 9:41 AM
To: chef@lists.opscode.com
Subject: [chef] Re: Re: Strange delay between bootstrapping hosts

On Wednesday, January 21, 2015 at 7:36 AM, Julian C. Dunn wrote:

On Jan 20, 2015, at 7:08 PM, Justin Franks <justin.franks@lithium.com (mailto:justin.franks@lithium.com)> wrote:

Strange delay between bootstrapping hosts.
Want to know where to look. Any ideas?
Here is the problem:
We have two Openstack deployments/builds. OpenstackX and OpenstackY

If I do this on OpenstackX it completes the bootstrap and chef run ~1 min 43 seconds:
time knife openstack server create -N justinblahblah -f 958e7841-a90d-45ff-acf8-8a9c946bc5fe -S justin-openstack -G justin --network-ids c60e802c-516d-42f8-8b01-d9b120f245a9 --bootstrap-network dev-1-private-1a -i ~/.ssh/justin-openstack.pem -r

If I do the same thing on OpenstackY it completes the bootstrap and chef run in ~5min 30 seconds
time knife openstack server create -N justinblahblah -f 958e7841-a90d-45ff-acf8-8a9c946bc5fe -S justin-openstack -G justin --network-ids c60e802c-516d-42f8-8b01-d9b120f245a9 --bootstrap-network dev-1-private-1a -i ~/.ssh/justin-openstack.pem -r ​

Yes, the Openstack environments are a bit different. But data transfer testing in both environments to same Chef server reveals fast performance. Any ideas on where to look to find where the delays are coming?
Chef outputs the elapsed time at the end of the run, so if those times are similar, then it’s probably something to do with bootstrapping or openstack. If the chef runs account for the time difference, then use a profiler as Julian suggested (more on that below) to help pinpoint why Chef is running slower.

I would probably write a report handler to do this. You can iterate over run_status.all_resources or run_status.updated_resources; for each resource, there is an elapsed_time accessor that hangs off it that you can do something with.

  • Julian

There’s a handful of Chef profiler stuff already out there, this one was the first hit on google, but I know I’ve seen others: https://github.com/joemiller/chef-handler-profiler

[ Julian C. Dunn <jdunn@aquezada.com (mailto:jdunn@aquezada.com)> * Sorry, I’m ]
[ WWW: http://www.aquezada.com/staff/julian * only Web 1.0 ]
[ gopher://sdf.org/1/users/keymaker/ (http://sdf.org/1/users/keymaker/) * compliant! ]
[ PGP: 91B3 7A9D 683C 7C16 715F 442C 6065 D533 FDC2 05B9 ]


Daniel DeLeo