Blue/Green deployments with AWS Auto-Scaling Group and Terraform

One of the goals that my colleagues and I emphasis when implementing DevOps best-practices is seamless CI-CD process. While some AWS services are bundled with deployment strategies out-of-the-box, sometimes, we want to be agnostic to a specific “wrapping” service (Elastic Beanstalk, CodeDeploy) and run our apps on “bare” EC2 instances, thus gain more control. So how can we implement effective and resilient deployments without downtime?

Phasing with AWS Auto-Scaling Group (ASG)

The base assumption is that all EC2 instances must be wrapped and govern by Auto-Scaling Group (ASG) for the purposes of automatic scaling and management. As mentioned in previous posts, managing infrastructure should be done using IAC, and the following example shows how ASG is configured to perform modified blue-green deployment – “Phasing”, using Terraform.

The technique used to enforce phasing is to create lifecycle policy on the Launch Configuration (LC) and AutoScaling Group (ASG) which contains the instances we wish to update:

resource "aws_launch_configuration" "myapp" {
  image_id       = "${var.image}"
  instance_type  = "${var.type}"
  key_name       = "${var.key}"
  security_group = ["${var.security_group}"]

  lifecycle {
    create_before_destroy = true 
    }
}

resource "aws_autoscaling_group" "myapp" {
  availability_zones   = ["${split(",", var.availability_zones)}"]
  desired_capacity     = "${var.instances}"
  launch_configuration = "${aws_launch_configuration.myapp.name}"
  load_balancers       = ["${aws_elb.myapp.id}"]
  max_size             = "${var.instances}"
  min_elb_capacity     = "${var.instances}"
  min_size             = "${var.instances}"
  name                 = "myapp"
  vpc_zone_identifier  = ["${split(",", var.subnet_ids)}"]

  lifecycle { 
    create_before_destroy = true 
    }
}

The following flow is initiated once the provided AMI is changed:

  1. New LC is created with the fresh AMI.
  2. New ASG is created with the fresh LC (Blue).
  3. Terraform waits for the new ASG’s instances to spin up and attach to the “myapp” Elastic Load-Balancer (Once the instances respond to healthcheck).
  4. Once all new instances are InService, Terraform will destroy the old ASG (Green).
  5. Once old ASG is destroyed, Terraform destroys old LC.
  6. The new ASG is now treated as Green.

In case step 3 timeouts after 10 min (Can be easily changed), terraform will not destroy the Green ASG, not causing any downtime.

With this little trick you can introduce resilient, downtime-safe, deployment process with minimal effort.

Leave a Comment