Need guidance on porting existing in-house developed tool to Chef

We have an in-house tool (coded using PHP and Ruby) for provisioning and
managing our application infrastructure in cloud. We want to port it to
chef. Below are the capabilities of this tool which the Chef based solution
also need to satisfy.

  1. Use AWS SDK to launch/terminate and configure a wide range of AWS
    offerings (ec2, load balancers, rds, dynamodb, cloudsearch and more)

  2. After launching ec2 instances, do things like:

    • OS kernel tweaks
    • install/configure system software,
    • checkout application code from github
    • generate app config
    • start application processes
      etc.
  3. It is also capable of setting up new test environments, i.e. doing the
    above steps for all required server roles in one command. For ex: using one
    command I can set up a new environment called gamma (with it’s own set of
    load balancers, ec2 instance roles, rds, dynamodb etc.)

For each environment, we maintain two types of data:
A. Data which always remains same for an environment. For ex: every time
gamma env will be launched it will have the
same value for each of the below
- what all server roles to be configured in the env
- what all github repositories are deployed on each role
- what all processes should run on a role and on which port
- application level configuration
- what AMI will be used to launch instances
and so on. All this information is maintained in config files.

B. Dynamic data about the environment. Like:
- For each role what all instances are running in this environment,
their public & private IP, instance-id
- DNS record for provisioned load balancers
- DNS record for provisioned databases
and so on. All this information is maintained in a MySQl database.

The tool reads required data from A and B above and generates config files
on every role.

  1. I need some guidance on how the above architecture will fit in Chef.
  2. I have some specific questions:
  • where does the dynamic data go? Do I still need to use MySQl database or
    Chef has a way to store that data
  • how do I execute AWS APIs from Chef. For Ex: Do I still need to use the
    AWS SDK to set up a load balancer or a dynamodb table in AWS?
  • we have also coded some infra management tasks like
  • syncing log files to a log aggregation server
  • backup database schemas
  • free unused resources provisioned in AWS
    etc. Do I still need to maintain my existing code repository which
    does these maintenance tasks or they too can be fitted in to Chef?

Thanks in advance.

Hey Varun, I would recommend checking out chef-provisioning (and chef-provisioning-aws https://github.com/chef/chef-provisioning-aws/ in particular). Unfortunately the README is pretty lacking at this point (we’re working on getting it into shape) but I can give a brief overview here.

First you need to create one node which acts as the provisioner machine. This is the machine that will talk to AWS and request instances, load balancers, etc. It should have connectivity to the AWS APIs. It needs to have chef installed as well as the chef-provisioning-aws gem. This can be handled by a recipe on the provisioning node which has chef gem ‘chef-provisioning-aws’. You must also provide credentials which have access to perform all the operations you desire. For instances, this would be ec2 create, delete, etc. See the credentials https://github.com/chef/chef-provisioning-aws#credentials section of the README on how to specify credentials.

If you look at the ref_full.rb https://github.com/chef/chef-provisioning-aws/blob/master/docs/examples/ref_full.rb file you can see the different resources that chef-provisioning-aws can manage. It handles almost all EC2 objects and load_balancers, but does not currently support rds, dynamodb or cloudsearch. RDS support is in our backlog.

After an instance is provisioned I would recommend using a normal chef-client run to perform operations like you talk about (OS tweaks, installing software, process management, etc.). The goal of chef-provisioning is only to get a machine created and connected to the other AWS resources it needs. A non-provisioning chef recipe (again, ran on the instance) can also perform the tasks you describe like syncing log files and performing database backups. I try to think of the separation as a provisioning node and its recipes should only do enough to get a machine in place so we can install chef and have chef manage the node’s state. Unless you’re trying to provision immutable infrastructure, which is another altogether different question.

To handle provisioning multiple environments you can load config from chef attributes. For example, lets use the following recipe:

require 'chef/provisioning/aws_driver'

with_driver 'aws::us-west-1'

aws_key_pair 'ref-key-pair'

with_machine_options :bootstrap_options => {
key_name: 'ref-key-pair',
image_id: 'ami-0d5b6c3d',
instance_type: 'm3.medium',
}

machine “my_machine"

If you have multiple environments you want to provision you can use node attributes to supply those different environments. In this case I would recommend having 1 node for provisioning the gamma environment and 1 node for provisioning the beta environment. The recipe would be updated to look like:

require 'chef/provisioning/aws_driver'

with_driver 'aws::us-west-1'

aws_key_pair node[:provisioning][:key_name]

with_machine_options :bootstrap_options => node[:provisioning]

machine “my_machine"

You can specify the :provisioning node attribute as a hash of options in both a gamma environment file and a beta environment file. Then the gamma provisioning node will converge the gamma environment according to its configuration, and the beta will do likewise for the beta environment.

We do not currently maintain an inventory of the infrastructure separate from what can be queried in the recipe. For example you can create a VPC named my_vpc and reference it in an aws_security_group by its name my_vpc without having to know the actual vpc identifier of vpc-123456. See the ref_full.rb document I linked earlier for more examples of these references. If you wish to query information about a resource (such as its IP address) you can do this by accessing the aws_object of a resource. This is described here https://github.com/chef/chef-provisioning-aws#looking-up-aws-objects.

If you already have a tool written that can pull inventory information from that MySql database, it could convert this information into the Ruby hash format that most AWS calls require. Thats how I would recommend using a Chef recipe to converge infrastructure information stored in a separate data store.

The AWS sdk is available from within the recipes, but our goal is to provide resources (a common Chef recipe abstraction) as the layer of interaction with all AWS objects. Rather than having to call aws.client.create_rds_database(…) we would like to expose an aws_rds resource with attributes that represent state users desire.

I hope this has answered your questions. Feel free to join us on gitter https://gitter.im/chef/chef-provisioning or during our next office hours http://www.meetup.com/Chef-Office-Hours/events/223415651/ (1pm PST on Thursday July 2nd) if you want a less asynchronous conversation!

Cheers
-Tyler

On Jun 23, 2015, at 7:36 AM, Varun Shankar shankarvarun1@gmail.com wrote:

We have an in-house tool (coded using PHP and Ruby) for provisioning and managing our application infrastructure in cloud. We want to port it to chef. Below are the capabilities of this tool which the Chef based solution also need to satisfy.

  1. Use AWS SDK to launch/terminate and configure a wide range of AWS offerings (ec2, load balancers, rds, dynamodb, cloudsearch and more)

  2. After launching ec2 instances, do things like:

    • OS kernel tweaks
    • install/configure system software,
    • checkout application code from github
    • generate app config
    • start application processes
      etc.
  3. It is also capable of setting up new test environments, i.e. doing the above steps for all required server roles in one command. For ex: using one command I can set up a new environment called gamma (with it's own set of load balancers, ec2 instance roles, rds, dynamodb etc.)

For each environment, we maintain two types of data:
A. Data which always remains same for an environment. For ex: every time gamma env will be launched it will have the
same value for each of the below
- what all server roles to be configured in the env
- what all github repositories are deployed on each role
- what all processes should run on a role and on which port
- application level configuration
- what AMI will be used to launch instances
and so on. All this information is maintained in config files.

B. Dynamic data about the environment. Like:
- For each role what all instances are running in this environment, their public & private IP, instance-id
- DNS record for provisioned load balancers
- DNS record for provisioned databases
and so on. All this information is maintained in a MySQl database.

The tool reads required data from A and B above and generates config files on every role.

  1. I need some guidance on how the above architecture will fit in Chef.
  2. I have some specific questions:
  • where does the dynamic data go? Do I still need to use MySQl database or Chef has a way to store that data
  • how do I execute AWS APIs from Chef. For Ex: Do I still need to use the AWS SDK to set up a load balancer or a dynamodb table in AWS?
  • we have also coded some infra management tasks like
    • syncing log files to a log aggregation server
    • backup database schemas
    • free unused resources provisioned in AWS
      etc. Do I still need to maintain my existing code repository which does these maintenance tasks or they too can be fitted in to Chef?

Thanks in advance.

Thanks a lot. I somewhat get the idea. I would need some resources to get
started on this. It would be great if you can point me to some good
tutorials, or to some open source repository on Github where I can view the
implementation of chef-provisioning-aws, or to any other resource showing
any such implementation in detail.

On Wed, Jun 24, 2015 at 4:17 AM, Tyler Ball tball@chef.io wrote:

Hey Varun, I would recommend checking out chef-provisioning (and
chef-provisioning-aws https://github.com/chef/chef-provisioning-aws/ in
particular). Unfortunately the README is pretty lacking at this point
(we’re working on getting it into shape) but I can give a brief overview
here.

First you need to create one node which acts as the provisioner machine.
This is the machine that will talk to AWS and request instances, load
balancers, etc. It should have connectivity to the AWS APIs. It needs to
have chef installed as well as the chef-provisioning-aws gem. This can be
handled by a recipe on the provisioning node which has chef gem ‘chef-provisioning-aws’. You must also provide credentials which have
access to perform all the operations you desire. For instances, this would
be ec2 create, delete, etc. See the credentials
https://github.com/chef/chef-provisioning-aws#credentials section of
the README on how to specify credentials.

If you look at the ref_full.rb
https://github.com/chef/chef-provisioning-aws/blob/master/docs/examples/ref_full.rb file
you can see the different resources that chef-provisioning-aws can manage.
It handles almost all EC2 objects and load_balancers, but does not
currently support rds, dynamodb or cloudsearch. RDS support is in our
backlog.

After an instance is provisioned I would recommend using a normal
chef-client run to perform operations like you talk about (OS tweaks,
installing software, process management, etc.). The goal of
chef-provisioning is only to get a machine created and connected to the
other AWS resources it needs. A non-provisioning chef recipe (again, ran
on the instance) can also perform the tasks you describe like syncing log
files and performing database backups. I try to think of the separation as
a provisioning node and its recipes should only do enough to get a machine in place so we can install chef and have chef manage the node’s state.
Unless you’re trying to provision immutable infrastructure, which is
another altogether different question.

To handle provisioning multiple environments you can load config from chef
attributes. For example, lets use the following recipe:

require 'chef/provisioning/aws_driver'

with_driver 'aws::us-west-1'

aws_key_pair 'ref-key-pair'

with_machine_options :bootstrap_options => {
key_name: 'ref-key-pair',
image_id: 'ami-0d5b6c3d',
instance_type: 'm3.medium',
}

machine “my_machine"

If you have multiple environments you want to provision you can use node
attributes to supply those different environments. In this case I would
recommend having 1 node for provisioning the gamma environment and 1 node
for provisioning the beta environment. The recipe would be updated to look
like:

require 'chef/provisioning/aws_driver'

with_driver 'aws::us-west-1'

aws_key_pair node[:provisioning][:key_name]

with_machine_options :bootstrap_options => node[:provisioning]

machine “my_machine"

You can specify the :provisioning node attribute as a hash of options in
both a gamma environment file and a beta environment file. Then the
gamma provisioning node will converge the gamma environment according to
its configuration, and the beta will do likewise for the beta environment.

We do not currently maintain an inventory of the infrastructure separate
from what can be queried in the recipe. For example you can create a VPC
named my_vpc and reference it in an aws_security_group by its name
my_vpc without having to know the actual vpc identifier of vpc-123456.
See the ref_full.rb document I linked earlier for more examples of these
references. If you wish to query information about a resource (such as its
IP address) you can do this by accessing the aws_object of a resource.
This is described here
https://github.com/chef/chef-provisioning-aws#looking-up-aws-objects.

If you already have a tool written that can pull inventory information
from that MySql database, it could convert this information into the Ruby
hash format that most AWS calls require. Thats how I would recommend using
a Chef recipe to converge infrastructure information stored in a separate
data store.

The AWS sdk is available from within the recipes, but our goal is to
provide resources (a common Chef recipe abstraction) as the layer of
interaction with all AWS objects. Rather than having to call
aws.client.create_rds_database(…) we would like to expose an aws_rds
resource with attributes that represent state users desire.

I hope this has answered your questions. Feel free to join us on gitter
https://gitter.im/chef/chef-provisioning or during our next office hours
http://www.meetup.com/Chef-Office-Hours/events/223415651/ (1pm PST on
Thursday July 2nd) if you want a less asynchronous conversation!

Cheers
-Tyler

On Jun 23, 2015, at 7:36 AM, Varun Shankar shankarvarun1@gmail.com
wrote:

We have an in-house tool (coded using PHP and Ruby) for provisioning and
managing our application infrastructure in cloud. We want to port it to
chef. Below are the capabilities of this tool which the Chef based solution
also need to satisfy.

  1. Use AWS SDK to launch/terminate and configure a wide range of AWS
    offerings (ec2, load balancers, rds, dynamodb, cloudsearch and more)

  2. After launching ec2 instances, do things like:

    • OS kernel tweaks
    • install/configure system software,
    • checkout application code from github
    • generate app config
    • start application processes
      etc.
  3. It is also capable of setting up new test environments, i.e. doing the
    above steps for all required server roles in one command. For ex: using one
    command I can set up a new environment called gamma (with it's own set of
    load balancers, ec2 instance roles, rds, dynamodb etc.)

For each environment, we maintain two types of data:
A. Data which always remains same for an environment. For ex: every time
gamma env will be launched it will have the
same value for each of the below
- what all server roles to be configured in the env
- what all github repositories are deployed on each role
- what all processes should run on a role and on which port
- application level configuration
- what AMI will be used to launch instances
and so on. All this information is maintained in config files.

B. Dynamic data about the environment. Like:
- For each role what all instances are running in this environment,
their public & private IP, instance-id
- DNS record for provisioned load balancers
- DNS record for provisioned databases
and so on. All this information is maintained in a MySQl database.

The tool reads required data from A and B above and generates config files
on every role.

  1. I need some guidance on how the above architecture will fit in Chef.
  2. I have some specific questions:
  • where does the dynamic data go? Do I still need to use MySQl database or
    Chef has a way to store that data
  • how do I execute AWS APIs from Chef. For Ex: Do I still need to use the
    AWS SDK to set up a load balancer or a dynamodb table in AWS?
  • we have also coded some infra management tasks like
  • syncing log files to a log aggregation server
  • backup database schemas
  • free unused resources provisioned in AWS
    etc. Do I still need to maintain my existing code repository which
    does these maintenance tasks or they too can be fitted in to Chef?

Thanks in advance.