Blue-Green Deployment with AWS OpsWorks
Blue-Green deployment is a technique in which a new parallel “green” environment is brought up, tested and then its switched with current “blue” Production environment. Which basically means all user traffic is now routed to new parallel “green” environment from current “blue” Production. This technique is often followed to ensure zero downtime and high quality deployments.
OpsWorks is an application management service provided by Amazon that lets its users setup Production ready application in minimal time possible with basic knowledge of configuration management tool Chef. Its architecture is such that is divides application infrastructure into different layers like load balancer layer, application layer, database layer, monitoring layer and custom layer where it allows to fully customize each part of it.
The best part is Amazon OpsWorks has inbuilt support for Blue-Green deployments. Which allows its users to clone their entire architecture just with a click of mouse.
And the next step is to route all traffic to Cloned “green” architecture, which can be done by change DNS mapping of the domain via R53. And thats it Blue-Green deployment done right.
Since this is the first link in the google search for OpsWorks Blue Green deploy search, I would suggest other option.
By the way, I have to point out that this comment is completely false: “the maximum time that it takes to update DNS records is 300 seconds”. The maximum time it takes a DNS record to resolve to an updated IP address is determined by the TTL set on the record (which allows you the throttle between the increased cost of more lookups vs. better control over what the record resolves to).
Joe is right. Even with a low TTL, the version of your site presented to users would still at the mercy of DNS cache points across the public internet.
Better approaches include selective updating (e.g. pull an instance from your LB, update it, put it back it) or last-minute process termination (update on a box but allow the old code to still be served until the new code is running – e.g. in new Unicorn processes – then kill the old ones).
In any of these cases (including that suggested by the author), a complicating factor is synchronization of your code release with a database migration. I don’t know what the best practice it there… perhaps to develop such that there is no synchronization required (i.e. phased releases).
It seems like the switch should be done at the load-balancer level, not R53, since a DNS change would take time to propagate (not to mention the end-user may be stuck with a DNS cache that doesn’t update using TTL).
Route53 is way ahead of the curve, the maximum time that it takes to update DNS records is 300 seconds. This time-gap also helps in connections drainage i.e all the existing connection would eventually complete in this time.