Chef provision aws converge fails every time after first converge

loafy · January 29, 2016, 9:19pm

I have a chef provisioning recipe that I’m using to create a node. I am running with vagrant and aws drivers. When using vagrant I can re-run the provision recipe as many times as I want to update the node. When using aws the first chef run works fine. Any future ones fail. I see the ‘waiting for my-node to be connectable’ go all the way to 110/120 waiting… then it says node is connectable. After this it takes forever for the chef run to finally fail. Maybe because there are retry/sleep or some long timeouts trying to connect to the aws instance?

I can connect to the aws instance via ssh so I know it is up and reachable. I’m using chef-zero. Not sure what other info might be useful so let me know if there is anything else I can add. I included part of end of failed chef run, slightly edited to remove some info.

Recipe: my_cluster::test
  * machine[my-master-staging] action converge
    - update node my-master-staging at chefzero://localhost:8889
    - waiting for my-master-staging (i-abcd1234 on aws) to be connectable (transport up and running) ...
    - been waiting 0/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
    - been waiting 10/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
    - been waiting 20/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
    - been waiting 30/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
    - been waiting 40/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
    - been waiting 50/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
    - been waiting 60/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
    - been waiting 70/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
    - been waiting 80/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
    - been waiting 90/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
    - been waiting 100/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
    - been waiting 110/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
    - my-master-staging is now connectable


[2016-01-29T15:32:11-05:00] ERROR: Unable to download /etc/chef/client.pem to /tmp/client.pem.1387406907 on centos@1.2.3.4 -- Er
ror: command 'cp /etc/chef/client.pem /tmp/client.pem.1387406907' exited with code 1.                                               

[2016-01-29T15:37:11-05:00] WARN: Unable to clean up /tmp/client.pem.1387406907 on centos@1.2.3.4 -- Error: command 'rm /tmp/cli
ent.pem.1387406907' exited with code 1.                                                                                             


    - generate private key (2048 bits)

    ================================================================================
    Error executing action `converge` on resource 'machine[my-master-staging]'
    ================================================================================
    
    RuntimeError
    ------------
Error: command 'mkdir -p /etc/chef' exited with code 1.

Thanks in advance for any help.

loafy · February 1, 2016, 5:32pm

I think I solved my problem. I turned up debug level for chef-client running on my provisioning node. I found that the chef-client run was stuck waiting for password at ‘sudo centos’. I added centos user to sudoers file and no longer have this problem. Not sure if it is the right/best solution but I’ll take it.

Topic		Replies	Views
Chef provisioning timeout Chef Infra (archive)	0	672	December 10, 2015
Chef-run failure for a custom recipe Chef Infra (archive)	2	675	May 7, 2015
Error using chef provisioner [Terraform] while bootstrapping a node Chef Infra (archive)	3	1561	August 30, 2016
Quorum Failure In Chef Delivery Push Jobs Chef Automate	2	2639	June 20, 2016
Chef stability? Chef Infra (archive)	12	429	November 18, 2010

Chef provision aws converge fails every time after first converge

Related topics