Using AWS Resilience Hub to track the resilience of an application deployed on Amazon ECS and using DynamoDB in a single region and multi-region scenarios on AWS
AWS Resilience Hub provides a central place to define, validate, and track the resilience of your applications on AWS
In the past, when the business or someone asks about whether or not the recovery time objective (RTO) and recovery point objective (RPO) of your application are met as per the business need, you would have needed to give an “guesstimate”, by making a manual calculation based on the deployment and pray to God, that it is correct 😆
With AWS Resilience Hub, you now have a central place to define, validate and track the resiliency of your AWS application. It can also help you to meet compliance and regulatory requirements.
- Analyze your infrastructure and get recommendations to improve the resiliency of your applications. In addition to architectural guidance for improving your applications’ resiliency, the recommendations provide code for implementing tests, alarms, and standard operating procedures (SOPs)
- Validate recovery time (RTO) and recovery point (RPO) targets under different conditions.
- Document the Resiliency posture of your application over a period of time
- Resilience Hub can be integrated into the CI/CD process to prevent potential resilience risks from being introduced into a production environment
The Resilience Hub supports an increasing number of AWS services as described at https://docs.aws.amazon.com/resilience-hub/latest/userguide/supported-resources.html , which also includes support for Amazon ECS, and DynamoDB, which are used in our application. Keep an eye out for support for additional AWS services in Resilience Hub.
My intention in this blog is to leverage AWS Resilience Hub to determine, whether or not the recovery time objective (RTO) and recovery point objective (RPO) for my applications deployed on Amazon ECS (Elastic Container Service) can be met as per the Resiliency Policy needed by business.
Let us deploy and test the following scenarios:
- An App deployed in a Single AWS Region — us-west-2 (Oregon), with multiple Availability Zones.
- Multi-tiered deployment of our app deployed in Amazon ECS and using Amazon DynamoDB
- The same App deployed in two AWS Regions in a multi-region scenario, I deployed them in us-west-2 (Oregon) and us-east-2 (Ohio) AWS Regions
As always, like any developer/solutions architect worth their salt, I always look for existing code/workshops to re-use 😃 I found an amazing Amazon ECS multi-region workshop, which I will leverage for my blog. The added benefit of this blog, it uses CDK for all deployments !!
Note: You will incur charges when you deploy this workshop and also when you use the AWS Resilience Hub. Please refer to the AWS pricing calculator as well as the Resilience Hub pricing page. As always, these are my proof of concepts, please use the official AWS documentation for the final word, and your mileage will wary ..
Lets get started. This is a slightly longer blog, as there are lots of screenshots ..
Single Region deployment of our app deployed in Amazon ECS
Follow the section related to the single region deployment in the ECS workshop.
The deployment architecture:
The first step, is to add an application in Resilience Hub, and since we deployed the ECS service using CDK, we will select CloudFormation:
Resilience Hub will identify the AWS resources deployed using the stack, in this case it will be the ECS service and the AWS Application Load Balancer.
and we need to add the Resilience policy that is needed to assess our application — this should be as per your business need.
and publish the app .. Let us now go to the application in Resilience Hub and assess the resiliency.
Well, I got a “Policy breached” for my first assessment with the following details 😒
So basically, while my ECS service has redundancy built-in with multiple ECS tasks which are spread across multiple availability zones and meets the Targetted RTO and RPO, the Resilience Hub has pointed out that the estimated time to manually redeploy an Application Load Balancer is around 20 mins and hence is beyond the RTO target of 15 mins !!
There were also some awesome operational recommendations by Resilience Hub which are specific to our deployment, which will also improve our overall Resilience score ..
Let me now create a revised Resilience Policy, with the revised RTO of 20 minutes for Application as we feel that it will satisfy my business need and the rest of the policy remains unchanged.
Let us now run the assessment again, make sure you change the assessment policy for the application is the correct resilience policy that we modified earlier.
Yes !! and now we are now compliant and meets the RTO and RPO ..
Resilience Hub will also recommend setting up Alarms, SOPs (standard operating procedures), and FIS (Fault Injection Simulator) experiments to enhance our application’s resiliency. It provides an very easy way to add them by providing CloudFormation templates customized for our application, which allows you to quickly provision them and validate these resiliency measures in the Cloud.
We deployed these operational recommendations using the CloudFormation template. By using Resilience Hub, we can keep track of the resilience posture of the application over time. As you can see the application posture increased from 28% (it failed to meet the resilience policy), to 50% (met the policy) and finally to 66% (after adding various Alarms as recommended). I did not add the FIS experiments, but the resiliency posture will improve further, if you add them.
This concludes the first part of our deployment in a single AWS region.
Multi-tiered deployment of our app deployed in Amazon ECS and using Amazon DynamodDB
While the focus of the previous segment was on the application deployed on Amazon ECS, we will be having a multi-tiered architecture in the real world. If you had noticed, our deployment also used DynamoDB. We can select multiple CloudFormation stacks (upto 20 as per documentation).
It does detect that DynamoDB has also been deployed ..
I ran an resiliency assessment for this multi-tiered deployment, and unfortunately it failed .. Resilience Hub is pointing out that there is no AWS Backup or Point-In-Time Recovery (PITR) configured for the Amazon DynamoDB table, which are quite valid recommendations, right?
There are also some great Resiliency recommendations:
So, I heeded the advice, and turned on point-in time recovery for my DynamoDB table.
Lets run the resiliency assessment again .. and we will now meet the Resiliency policy set by us !!
Multi Region deployment of our app deployed in Amazon ECS
Now that we have tested the Resiliency of our application deployed in a single Region, we will now expand this to a multi-region setup, with deployments in two AWS Regions — us-east-1 and us-east-2. Follow the section related to the multi region deployment in the ECS workshop.
The deployment architecture:
Global Critical Policy:
To cater to the needs of the business, who are now demanding a more stringent multi-region Resilient Policy and we create a new Resilience Policy.
When, we do a assessment on the existing application again, and it will fail as the application structure in Resilience Hub has not been updated and it is not aware of the second ECS service deployed in us-east-2.
Let us now add the second ECS service to the application structure on Resilience Hub.
Finally, let us publish a new version of the application and do an assessment ..
I found the part related to clubbing resources into a single AppComponent a little tricky, maybe since I was fairly new to this .. and I ran the assessment again and it passed !!
Like in the single region usecase, we will also add the database component, DynamoDB to the application.
Finally, we run the assessment again with DynamoDB and the results show that the Resiliency results for RTO and RPO, are within the resiliency policy specified for a global multi-region deployment.
YIPEEE !!
In Summary, AWS Resilience Hub provides a comprehensive view of your overall application portfolio resilience status through its dashboard. To help you track the resilience of applications, AWS Resilience Hub aggregates and organizes resilience events (such as unavailable database or failed resilience validation), alerts, and insights from services like Amazon CloudWatch, Amazon Route 53 Application Recovery Controller, and AWS FIS.
AWS Resilience Hub also generates a resilience score, a scale that indicates the level of implementation for recommended resilience tests, alarms, and recovery SOPs. This score is used to measure resilience improvements over time. Resilience Hub and its resilience checks can also be integrated into the CI/CD process to prevent potential resilience risks from being introduced into a production environment
Hope this blog was useful, and convinced you to try the awesome, awesome AWS Resilience Hub.
Thanks and namaskara 🙏 !!
Useful Resources
- AWS Resilience page — https://aws.amazon.com/resilience-hub/
- Amazon ECS multi-region workshop — https://catalog.workshops.aws/ecsmultiregion/en-US
- Resiliency Hub workshop — https://catalog.workshops.aws/aws-resilience-hub-lab/en-US
- Resilience Hub related blogs — https://aws.amazon.com/search/?searchQuery=resilience%20hub#facet_type=blogs