Enabling a TV Ad Firm to Reduce Risk and Boost Reliability with Canary Deployments

27 / Jun / 2024 by Karandeep Singh 0 comments

Introduction
In today’s fast-paced software development landscape, ensuring that new features and updates are delivered reliably and without disruption is crucial. Canary deployments, a critical strategy within the DevOps toolkit, offer a powerful method to achieve this goal. One such great journey is the story of a Global Advertising Management Platform client, a powerhouse in advertising and connected TVs. With a state-of-the-art Connected TV Advertising Management Platform, they needed a trusted partner to manage their critical deployment process. This blog explores how canary deployments helped the client, ensuring seamless ad delivery and enhancing their deployment pipeline’s efficiency and reliability.

What is Canary Deployment?
Canary deployments are a DevOps strategy that involves rolling out a new software version to a small, controlled subset of users before a full-scale release. This approach allows teams to monitor the impact of the latest changes in a real-world environment, quickly identify and resolve issues, and minimize risks associated with deployment.

Canary Deployment

Canary Deployment

Requirements and Use Cases

  • Testing Against Production Traffic: Our client needed to test various scenarios on a portion of their production traffic without disrupting the entire environment.
  • Selective Application Inclusion:  Recognizing that not all applications require a canary deployment, we identified the critical ones for this implementation. By focusing on the essentials, we ensured an efficient process.
  • Balancing Automation and Manual Intervention: While aiming for full automation, we ensured a manageable initial setup with a clear path to automation.
  • Ensuring Reliable Rollback: We implemented a seamless rollback process to revert to the previous state if the canary release encountered issues or failed approval.
  • Compatibility and Coexistence: Application changes were designed to support the canary version running alongside the current one, following best practices for containerized deployments.

Existing Deployment Strategy and Its Challenges

  • Our existing deployment strategy relied on the rolling update feature of AWS ECS Service. New code underwent rigorous testing in the pre-production environment, where automated regression and unit tests were conducted, supplemented by manual testing from the QA team. We used artifacts and Docker images built in a pre-production environment in production. However, deploying new code directly to production posed significant risks despite these measures.

    Existing Preprod Deployment

    Existing Preprod Deployment

  • With the rolling update approach, all new containers received 100% of the traffic once deployed in production. This setup left no room for error; any oversight or flaw missed during testing in lower environments could lead to sudden failures in production.
  • Moreover, our work heavily depended on real-time data that testing environments couldn’t mimic. Factors like user locations, IPs, and the flow of messages exchanged in real-time were critical and couldn’t be fully tested beforehand, leaving us at risk.
  • Traditional deployment methods also struggled to keep up with the rapid pace of our industry, failing to handle sudden traffic bursts or unexpected issues effectively. If we encountered problems such as an increase in 5xx errors, memory leaks, or High CPU usage, we were forced to roll back everything in production to the previous stable version, causing disruptions and potential losses.

    Existing Prod Deployment

    Existing Prod Deployment

  • Recognizing these shortcomings, we decided to transition to a canary deployment model. This approach allowed us to mitigate risks by gradually routing a small percentage of traffic to the new code, enabling us to observe its behavior in a real-world setting. In our upcoming discussion on canary deployment design, we’ll examine how this strategy addressed our challenges and ensured a more stable and reliable deployment process.

Canary Deployment Design
Jenkins Pipeline Integration

  • We added 3 new stages in our Jenkins pipeline dedicated to canary deployments. Canary Deployment, Approve/Disapprove Canary and Rollback Canary Applications.
  • The canary application, named in the format app1_canary, replicates the non-canary application app 1.
  • We also configured load balancer target groups to direct 1% of the traffic to the canary application and 99% to the non-canary application. These percentages are handled via Terraform and can be changed from case to case.

    Canary Deployment Strategy

    Canary Deployment Strategy

Liquibase Database Updates

  • Before deploying the new code, we perform database schema updates using Liquibase.
  • Liquibase is an open-source database schema change management tool. It allows us to define database changes in a format that is easy to track and rollback if necessary, ensuring database schema consistency and reliability during deployments.

Canary Deployment

  • The new code is deployed to the Canary application, where QA tests it with 1% traffic before the approval stage.

Approval Stage

  • The QA team tests the canary deployment with 1% traffic; if approved, the new code is deployed to the rest of the application handling 99% of the traffic.

    Approval Stage

    Approval Stage

Rollback

  1. Disapproval and Rollback:
    • If QA disapproves the canary deployment, a rollback is triggered.
    • The rollback involves reverting the canary application and database schema (using Liquibase) to the previous stable version.

      rollback

      rollback

    • This is achieved by running a dedicated canaryRollback Jenkins job that automates the rollback process.

      Rollback Strategy

      Rollback Strategy

Monitoring

  1. Kibana Logs and CloudWatch Dashboards:
    • QA uses Kibana logs to monitor real-time application logs during the canary deployment.
    • CloudWatch dashboards are set up to track key metrics such as CPU usage, memory usage, and HTTP response codes (5xx and 4xx.

      Cloudwatch Dashboard

      Cloudwatch Dashboard

  2. CloudWatch Alarms:
    • Specific CloudWatch alarms are configured for the canary application to alert the team of any anomalies or issues.
    • These alarms help quickly identify and respond to potential problems during the canary phase.
  3. Approval for Full Deployment:
    • After thorough testing and monitoring, QA approves the canary deployment if everything looks good. The new software version is now deployed to the entire application.
    • This ensures that the updated software handles 100% of the traffic, providing a seamless transition from canary to full deployment.

      approval stage

      approval stage

    • The final approval skips the canary rollback stage and proceeds with deploying the new software to the entire application.

By integrating canary deployments into our Jenkins pipeline, we can ensure a safer and more controlled rollout of new software versions, with the ability to quickly rollback if issues are detected.

Business Benefits After Moving To Canary Deployments

  • Maintained high availability and reliability by minimizing the impact of potential issues.
  • Deliver better customer experiences through well-tested, stable releases.
  • Operational Efficiency: Smoother, less stressful deployments and easier rollback if needed..
  • Stay competitive and innovative with faster, more frequent updates.
  • Enhance decision-making with insights gained from real-world user feedback during canary phases.
  • Real-World Testing: Allows for early detection of issues using real user interactions.

Conclusion

Transitioning to a canary deployment model was transformative for our client, significantly reducing deployment risks by introducing new code to a small percentage of traffic and allowing early issue detection. At TO THE NEW, we excel in simplifying AWS cloud migrations and implementing advanced deployment strategies. Our AWS Certified Architects and DevOps Engineers are committed to saving you time and resources while enhancing business efficiency and reliability. Stay tuned for more insights and developments as we continue our journey toward deployment excellence.

FOUND THIS USEFUL? SHARE IT

Leave a Reply

Your email address will not be published. Required fields are marked *