I have a chef provisioning recipe that I’m using to create a node. I am running with vagrant and aws drivers. When using vagrant I can re-run the provision recipe as many times as I want to update the node. When using aws the first chef run works fine. Any future ones fail. I see the ‘waiting for my-node to be connectable’ go all the way to 110/120 waiting… then it says node is connectable. After this it takes forever for the chef run to finally fail. Maybe because there are retry/sleep or some long timeouts trying to connect to the aws instance?
I can connect to the aws instance via ssh so I know it is up and reachable. I’m using chef-zero. Not sure what other info might be useful so let me know if there is anything else I can add. I included part of end of failed chef run, slightly edited to remove some info.
Recipe: my_cluster::test
* machine[my-master-staging] action converge
- update node my-master-staging at chefzero://localhost:8889
- waiting for my-master-staging (i-abcd1234 on aws) to be connectable (transport up and running) ...
- been waiting 0/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
- been waiting 10/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
- been waiting 20/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
- been waiting 30/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
- been waiting 40/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
- been waiting 50/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
- been waiting 60/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
- been waiting 70/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
- been waiting 80/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
- been waiting 90/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
- been waiting 100/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
- been waiting 110/120 -- sleeping 10 seconds for my-master-staging (i-abcd1234 on aws) to be connectable ...
- my-master-staging is now connectable
[2016-01-29T15:32:11-05:00] ERROR: Unable to download /etc/chef/client.pem to /tmp/client.pem.1387406907 on centos@1.2.3.4 -- Er
ror: command 'cp /etc/chef/client.pem /tmp/client.pem.1387406907' exited with code 1.
[2016-01-29T15:37:11-05:00] WARN: Unable to clean up /tmp/client.pem.1387406907 on centos@1.2.3.4 -- Error: command 'rm /tmp/cli
ent.pem.1387406907' exited with code 1.
- generate private key (2048 bits)
================================================================================
Error executing action `converge` on resource 'machine[my-master-staging]'
================================================================================
RuntimeError
------------
Error: command 'mkdir -p /etc/chef' exited with code 1.
Thanks in advance for any help.