Hey Varun, I would recommend checking out chef-provisioning (and chef-provisioning-aws https://github.com/chef/chef-provisioning-aws/ in particular). Unfortunately the README is pretty lacking at this point (we’re working on getting it into shape) but I can give a brief overview here.
First you need to create one node which acts as the provisioner machine. This is the machine that will talk to AWS and request instances, load balancers, etc. It should have connectivity to the AWS APIs. It needs to have chef installed as well as the chef-provisioning-aws gem. This can be handled by a recipe on the provisioning node which has chef gem ‘chef-provisioning-aws’
. You must also provide credentials which have access to perform all the operations you desire. For instances, this would be ec2 create, delete, etc. See the credentials https://github.com/chef/chef-provisioning-aws#credentials section of the README on how to specify credentials.
If you look at the ref_full.rb https://github.com/chef/chef-provisioning-aws/blob/master/docs/examples/ref_full.rb file you can see the different resources that chef-provisioning-aws can manage. It handles almost all EC2 objects and load_balancers, but does not currently support rds, dynamodb or cloudsearch. RDS support is in our backlog.
After an instance is provisioned I would recommend using a normal chef-client run to perform operations like you talk about (OS tweaks, installing software, process management, etc.). The goal of chef-provisioning is only to get a machine created and connected to the other AWS resources it needs. A non-provisioning chef recipe (again, ran on the instance) can also perform the tasks you describe like syncing log files and performing database backups. I try to think of the separation as a provisioning node and its recipes should only do enough to get a machine in place so we can install chef and have chef manage the node’s state
. Unless you’re trying to provision immutable infrastructure, which is another altogether different question.
To handle provisioning multiple environments you can load config from chef attributes. For example, lets use the following recipe:
require 'chef/provisioning/aws_driver'
with_driver 'aws::us-west-1'
aws_key_pair 'ref-key-pair'
with_machine_options :bootstrap_options => {
key_name: 'ref-key-pair',
image_id: 'ami-0d5b6c3d',
instance_type: 'm3.medium',
}
machine “my_machine"
If you have multiple environments you want to provision you can use node attributes to supply those different environments. In this case I would recommend having 1 node for provisioning the gamma environment and 1 node for provisioning the beta environment. The recipe would be updated to look like:
require 'chef/provisioning/aws_driver'
with_driver 'aws::us-west-1'
aws_key_pair node[:provisioning][:key_name]
with_machine_options :bootstrap_options => node[:provisioning]
machine “my_machine"
You can specify the :provisioning
node attribute as a hash of options in both a gamma
environment file and a beta
environment file. Then the gamma provisioning node will converge the gamma environment according to its configuration, and the beta will do likewise for the beta environment.
We do not currently maintain an inventory of the infrastructure separate from what can be queried in the recipe. For example you can create a VPC named my_vpc
and reference it in an aws_security_group
by its name my_vpc
without having to know the actual vpc identifier of vpc-123456
. See the ref_full.rb document I linked earlier for more examples of these references. If you wish to query information about a resource (such as its IP address) you can do this by accessing the aws_object
of a resource. This is described here https://github.com/chef/chef-provisioning-aws#looking-up-aws-objects.
If you already have a tool written that can pull inventory information from that MySql database, it could convert this information into the Ruby hash format that most AWS calls require. Thats how I would recommend using a Chef recipe to converge infrastructure information stored in a separate data store.
The AWS sdk is available from within the recipes, but our goal is to provide resources (a common Chef recipe abstraction) as the layer of interaction with all AWS objects. Rather than having to call aws.client.create_rds_database(…)
we would like to expose an aws_rds
resource with attributes that represent state users desire.
I hope this has answered your questions. Feel free to join us on gitter https://gitter.im/chef/chef-provisioning or during our next office hours http://www.meetup.com/Chef-Office-Hours/events/223415651/ (1pm PST on Thursday July 2nd) if you want a less asynchronous conversation!
Cheers
-Tyler
On Jun 23, 2015, at 7:36 AM, Varun Shankar shankarvarun1@gmail.com wrote:
We have an in-house tool (coded using PHP and Ruby) for provisioning and managing our application infrastructure in cloud. We want to port it to chef. Below are the capabilities of this tool which the Chef based solution also need to satisfy.
-
Use AWS SDK to launch/terminate and configure a wide range of AWS offerings (ec2, load balancers, rds, dynamodb, cloudsearch and more)
-
After launching ec2 instances, do things like:
- OS kernel tweaks
- install/configure system software,
- checkout application code from github
- generate app config
- start application processes
etc.
-
It is also capable of setting up new test environments, i.e. doing the above steps for all required server roles in one command. For ex: using one command I can set up a new environment called gamma (with it's own set of load balancers, ec2 instance roles, rds, dynamodb etc.)
For each environment, we maintain two types of data:
A. Data which always remains same for an environment. For ex: every time gamma env will be launched it will have the
same value for each of the below
- what all server roles to be configured in the env
- what all github repositories are deployed on each role
- what all processes should run on a role and on which port
- application level configuration
- what AMI will be used to launch instances
and so on. All this information is maintained in config files.
B. Dynamic data about the environment. Like:
- For each role what all instances are running in this environment, their public & private IP, instance-id
- DNS record for provisioned load balancers
- DNS record for provisioned databases
and so on. All this information is maintained in a MySQl database.
The tool reads required data from A and B above and generates config files on every role.
- I need some guidance on how the above architecture will fit in Chef.
- I have some specific questions:
- where does the dynamic data go? Do I still need to use MySQl database or Chef has a way to store that data
- how do I execute AWS APIs from Chef. For Ex: Do I still need to use the AWS SDK to set up a load balancer or a dynamodb table in AWS?
- we have also coded some infra management tasks like
- syncing log files to a log aggregation server
- backup database schemas
- free unused resources provisioned in AWS
etc. Do I still need to maintain my existing code repository which does these maintenance tasks or they too can be fitted in to Chef?
Thanks in advance.