Kalio Princewill

Posted on May 28

Load Testing a Scalable AWS Application Using Grafana k6

#k6 #terraform #devops #loadtesting

In cloud-native applications, performance under load is a critical consideration. Ensuring applications can handle expected traffic and unexpected surges is key to maintaining a positive user experience and system stability. Load testing is essential in validating the resilience and scalability of modern applications.

This tutorial will demonstrate how to use Terraform to set up a scalable application infrastructure on Amazon Web Services (AWS). After the infrastructure has been provisioned, we will use Grafana K6 to conduct a variety of load tests, and we'll execute smoke, average, and spike load tests on the application. We will analyse the performance metrics captured during these tests, such as latency, throughput, and success rates, to understand how the architecture handles different traffic patterns and how its components, like the Auto Scaling Group and Application Load Balancer, contribute to its ability to scale and maintain reliability under stress.

Prerequitises

To effectively follow this guide, ensure you have the following prepared:

AWS Account. You'll need an active account, which you can sign up for on the AWS website if you don't have one.
Configured AWS CLI. Install the AWS Command Line Interface from the official AWS CLI installation guide and configure it with your credentials by following the AWS CLI configuration guide.
Terraform CLI Installed: Download and install the Terraform CLI from the official HashiCorp Terraform installation page.
Grafana k6 Installed: Install the k6 load testing tool by downloading it from the k6.io website.
AWS SSH Key Pair: Create an SSH key pair in your AWS region for EC2 access via the EC2 console, and note its name for your Terraform setup.
Basic Terraform Understanding: Familiarise yourself with Terraform's core concepts through the Terraform documentation.
Sufficient IAM Permissions: The AWS identity (user or role) that Terraform uses must have permissions to create and manage resources for this project, primarily involving services like VPCs, EC2 instances, ALBs, Auto Scaling Groups, Security Groups, and Subnets. While a broad policy like AdministratorAccess can be used for initial learning in a personal account (use with caution), the best practice for any environment is to apply the principle of least privilege. This involves creating a custom IAM policy with only the specific permissions needed for Terraform to manage these resources. You can learn how to do this by reviewing the AWS guide on creating IAM policies. If Terraform encounters permission errors during execution, the error messages will typically guide you on what specific permissions are missing.

Overview of the Scalable Infrastructure

Before we dive into provisioning, let's understand the architecture we'll be building on AWS. This setup is designed for scalability and high availability, ensuring our application can handle varying loads effectively. The key components, as illustrated in the diagram below, include:

Amazon Web Services (AWS) Cloud: Our infrastructure resides entirely within the AWS cloud, leveraging its robust and scalable services.
Virtual Private Cloud (VPC): We'll establish a custom VPC, which acts as a private, isolated section of the AWS cloud. This provides a secure network environment for our resources.
Internet Gateway (IGW): An IGW is attached to our VPC to allow communication between resources in our VPC and the internet. This is crucial for users to access our application.
Public Subnets across Multiple Availability Zones (AZs): Within our VPC, we will configure public subnets. To ensure high availability and fault tolerance, these subnets will be distributed across multiple Availability Zones (e.g., AZ a, AZ b, AZ c). If one AZ experiences an issue, our application can continue running in other AZs.
Application Load Balancer (ALB): The ALB serves as the single point of contact for clients and automatically distributes incoming application traffic across multiple targets, such as EC2 instances, in different Availability Zones. This enhances application availability and fault tolerance.
EC2 Instances: These are virtual servers in the AWS cloud where our simple web application will run. The application itself will be deployed using EC2 user data scripts.
Auto Scaling Group (ASG): The ASG is the core of our scalability. It automatically adjusts the number of EC2 instances running our application based on predefined conditions (like CPU utilisation or network traffic). If the load increases, the ASG launches more instances; if the load decreases, it terminates instances to save costs.
Target Group: The Application Load Balancer uses this group to route requests to one or more registered targets, which in our case are the EC2 instances managed by the Auto Scaling Group. The ALB checks the health of instances in the target group and only sends traffic to healthy instances.
EC2 Security Group: This acts as a virtual firewall for our EC2 instances, controlling inbound and outbound traffic. We'll configure it to allow traffic from the Application Load Balancer and necessary administrative access (e.g., SSH, if needed for debugging, though not directly used by the load test traffic itself).

This architecture ensures that incoming internet traffic passes through the IGW to the ALB. The ALB then distributes this traffic across healthy EC2 instances running in public subnets across different AZs. The Auto Scaling Group monitors these instances and scales the fleet in or out based on demand, providing both resilience and cost-effectiveness. Our Terraform scripts will define and provision all these components.

Provisioning the infrastructure with Terraform

This section outlines the steps in provisioning your application's infrastructure on AWS using Terraform. Terraform allows you to define your infrastructure as code, making the provisioning process repeatable, predictable, and versionable.

Cloning the infrastructure repository

The first step is to obtain the Terraform code that defines your AWS infrastructure.

1.Open your terminal.

2.Navigate to the directory where you want to clone the repository.

3.Clone the repository containing the Terraform configuration. Run

  git clone https://github.com/iamkalio/infra-load-testing

4.Change into the cloned repository directory:

  cd infra-load-testing/

This repository contains the Terraform configuration files (.tf files) that describe the desired state of your AWS resources.

Provisioning your infrastructure

Once you have the infrastructure code locally, you can use Terraform to provision the resources on AWS. Ensure you have the AWS CLI configured with appropriate credentials and permissions to create resources in your desired region.

1.Initialise your working directory: To initialise the directory and download the necessary provider plugins, run this command:

  terraform init

On successful initialisation, the .terraform and .terraform.lock.hcl files would be added to your directory.

2.Review the execution plan: Before making any changes to your infrastructure, review the plan that Terraform will execute. To see which resources will be created, modified, or destroyed, run:

  terraform plan

Carefully examine the output to ensure that Terraform intends to make the changes you expect.

3.Apply the configuration: Apply the configuration to provision the resources on AWS. Run this command to begin the process:

  terraform apply

Terraform will prompt you to confirm the action. Type yes and press Enter to proceed. Terraform will then create the defined infrastructure resources in your AWS account.

Pasting the ALB DNS name into your browser lets you view your active application.

Note the ALB DNS from the output

After a successful terraform apply, Terraform will output the values of any defined outputs in your configuration. It is common practice to output the DNS name of the Application Load Balancer (ALB), as this is the entry point for traffic directed towards your application running on the provisioned infrastructure.

Look for the outputs section in the console output after terraform apply completes. Identify and note down the DNS name listed for your ALB ( alb_dns_name). This is the URL you will use to access your deployed application once it's running on the provisioned infrastructure.

To retrieve this output value at any time after provisioning, run:

  terraform output alb_dns_name

Showcasing the infrastructure that is created on AWS

After successfully running terraform apply, the infrastructure defined in your code will be active in your AWS account. You can verify and explore the created resources using the AWS Management Console or the AWS CLI.

The infrastructure provisioned includes:

Virtual Private Cloud (VPC): A logically isolated network for your resources.

Subnets: Divisions within your VPC (e.g., public subnets for load balancers and private subnets for application instances and databases). In this case, it is a public subnet.

Internet Gateway: Allows communication between your VPC and the internet (usually associated with public subnets).

Route tables: Control the routing of network traffic within your VPC and to the internet.

Security groups: Act as virtual firewalls, controlling inbound and outbound traffic for your instances.

Application Load Balancer (ALB): Distributes incoming application traffic across multiple targets, such as EC2 instances or containers.

ALB target group: A logical grouping of targets (e.g., EC2 instances) that the ALB routes traffic to.

Compute resources: EC2 instances, ECS tasks, or EKS pods where your application will run.

Auto Scaling group: Manages the desired number of healthy instances for your application.

Running the load tests

Now that your application infrastructure is provisioned, you can create a load test script using k6 to simulate user traffic. This script will define the behaviour of virtual users accessing your application through the Application Load Balancer (ALB). Our load testing process will include a smoke test, an average load test, and a spike test to evaluate the application's performance across different scenarios.

Performing a smoke test

Let's begin with a quick smoke test. Smoke test is a lightweight load test designed to verify that your application is running, accessible, and responds correctly to basic requests. It's a crucial first step to catch any fundamental issues early on.

You will need a place to save your k6 test scripts. It's a good practice to keep your load test scripts separate from your infrastructure code.

1.Navigate to a suitable location outside of your Terraform infrastructure directory.

2.Create a directory for your load test scripts. You can name it load-tests

  mkdir load-tests

3.Change into the load-tests directory:

  cd load-tests

4.Create a new file named smoke-test.js.

5.Populate this file with the following k6 script. Remember to replace http://YOUR_ECB_ALB_DNS_NAME/ with the actual DNS name of your ALB.

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
    vus: 3, // Key for Smoke test. Keep it at 2, 3, max 5 VUs
    duration: '1m', // This can be shorter or just a few iterations
};

export default () => {
    // Replace 'http://YOUR_ECB_ALB_DNS_NAME/' with your actual public IP and port
    const urlRes = http.get('http://YOUR_ECB_ALB_DNS_NAME/');
    check(urlRes, { 'status is 200': (r) => r.status === 200 });
    sleep(1);
};

This script sets up a basic test scenario. It uses import statements for the necessary k6 modules. The options object configures the test to use a small number of virtual users (vus: 3) for a short duration: '1m'. The default function defines the actions each virtual user will perform: making an HTTP GET request to your application's URL, using check to ensure the response status is 200, and pausing with sleep(1).

To run the script, open your terminal, ensure you are in your load-tests folder, and execute the script:

k6 run smoke-test.js

After the test completes, k6 will output results summarising performance. For this smoke test, with 3 virtual users running for approximately 1 minute, the results confirmed the application's basic functionality and responsiveness:

HTTP request duration: The average request duration was 238.68ms, with 95% of requests completing within 279.67ms. The maximum duration was 852.69ms.
Checks and failures: All 143 checks succeeded, resulting in a 100.00% success rate and 0.00% HTTP request failures.
Throughput: The test generated a total of 143 HTTP requests at a rate of approximately 2.36 requests per second.

These results demonstrate a healthy and responsive application under light load. Concurrently, observing your AWS CloudWatch metrics during the test will show the request count received by your Application Load Balancer (ALB).

Performing an average load test

Next, simulate an average level of traffic that your application might typically handle. This test runs over a longer duration and gradually increases the load to sustained levels, providing insights into performance under expected conditions.

1.Create a new file named average-load-test.js in your load-tests folder.

2.Add the following k6 script to it, making sure to replace the placeholder URL with your actual target (the ALB DNS name).


import  http  from  'k6/http';

import { sleep, check } from  'k6';

export  const  options  = {

   // Key configurations for avg load test in this section

   stages:  [

{ duration:  '5m', target:  100 }, // traffic ramp-up from 1 to 100 users over 5 minutes.

{ duration:  '30m', target:  100 }, // stay at 100 users for 30 minutes

{ duration:  '5m', target:  0 }, // ramp-down to 0 users

   ],

};

export  default () => {

   const  urlRes  =  http.get('http://YOUR_ALB_DNS_NAME/');

   check(urlRes, { 'status is 200': (r) =>  r.status  ===  200 });

   sleep(1);

};

What does this script do?

This script uses the stages option to define different phases of the test, simulating a more realistic traffic pattern over time:

Ramp-up (5 minutes): The number of virtual users linearly increases from the default (usually 1) up to target: 100.
Sustained load (30 minutes): The test maintains a constant load of target: 100 virtual users. This is the core of the average load test, measuring performance under steady, typical traffic.
Ramp-down (5 minutes): The number of virtual users decreases from 100 down to target: 0, gracefully ending the test.

The default function remains the same as the smoke test, performing the basic request and status check for each user.

3.Execute the average load test script from your load-tests folder:

  k6 run average-load-test.js

Average load test results:

This test ran for approximately 40 minutes and aimed to simulate 100 virtual users. The results show 168,556 total HTTP requests were made, with 99.99% of checks succeeding. The average HTTP request duration was 245.49ms, with 95% of requests completing within 301.52ms. Notably, the test recorded a very high maximum request duration of 1 minute and 8 seconds, and some Request Failed warnings indicated timeouts. Despite these outliers, the overall success rate for HTTP requests was 0.00% failed (representing 2 failures out of over 168,000 requests), indicating the application largely handled the sustained load.

Concurrently, CloudWatch metrics for "Total requests acknowledged" by your ALB showed the traffic pattern of ramp up, sustained load, and ramp down, confirming the load balancer successfully received the requests. The total requests recorded by CloudWatch over this period exceeded 307,000, reflecting the cumulative traffic handled by the ALB.

Perform a spike load test

Finally, conduct a spike test. This simulates a sudden, dramatic surge in traffic. It's designed to reveal how your application behaves under extreme bursts of load and its ability to recover.

1.Create a new file named spike-load-test.js in your load-tests folder.

2.Add the following k6 script, replacing the placeholder URL with your target (ALB DNS name).

import  http  from  'k6/http';

import { sleep, check } from  'k6';

export  const  options  = {

   // Key configurations for spike in this section

   stages:  [

{ duration:  '2m', target:  2000 }, // fast ramp-up to a high point

       // No plateau

{ duration:  '1m', target:  0 }, // quick ramp-down to 0 users

   ],

};

export  default () => {

   // Replace 'http://YOUR_ALB_DNS_NAME/' with your actual ALB DNS name

   const  urlRes  =  http.get('http://YOUR_ALB_DNS_NAME/');

   check(urlRes, { 'status is 200': (r) =>  r.status  ===  200 });

   sleep(1);

};

The spike load script uses stages to create a sudden, sharp increase and decrease in virtual users:

Rapid ramp-up (2 minutes): The number of virtual users quickly increases from the default up to a very high target: 2000. This simulates a sudden traffic event.
Rapid ramp-down (1 minute): The number of users drops just as quickly back to target: 0. There is no sustained load phase in a typical spike test.

The default function remains the same, performing the request and status check.

3.Execute the spike test script from your load-tests folder:

  k6 run spike-load-test.js

This test completed relatively quickly, running for approximately 3 minutes. Under such a rapid and high surge in traffic, the test successfully simulated up to 2000 virtual users. The results show 138,618 total HTTP requests were made, with a 100.00% success rate for all checks and 0.00% HTTP request failures. The average HTTP request duration was 383.01ms, with 95% of requests completing within 646.53ms. The maximum duration observed was 4.29 seconds, indicating some requests experienced noticeable delays during the peak load. The system sustained a high throughput of approximately 766 requests per second during the test.

CloudWatch metrics for "Total requests acknowledged" by your ALB clearly showed a sharp peak of over 120,000 requests during the spike, followed by a rapid drop-off. This pattern mirrored the intense traffic load generated by k6, confirming that the ALB successfully handled the sudden surge in traffic.

Analysing Performance

The k6 load tests provide insight into how the provisioned infrastructure performs under varying traffic.

The architecture works by having the ALB distribute incoming requests across multiple instances. To handle increased traffic, the ASG automatically adjusts the number of instances running your application. Automatic scaling is driven by CloudWatch metrics like requests per target or CPU utilisation. When these metrics cross defined thresholds, CloudWatch alarms trigger scaling policies in the ASG. This tells the ASG to launch new instances. These new instances automatically register with the ALB, and the load is spread further.

This scaling process directly impacts the performance metrics observed in your k6 tests in several ways. As the ASG adds more instances, the load on each instance decreases. This allows requests to be processed faster, reducing overall latency, especially during traffic spikes. With more capacity available, instances are less likely to become overwhelmed. This reduces errors and dropped connections, leading to a higher percentage of successful requests even under significant load. Increased instances also mean the system can handle a higher total volume of requests per second sustainably, known as throughput. The ALB's health checks also ensure traffic only goes to healthy instances, contributing to overall reliability.

The k6 test results, particularly from the average and spike tests, demonstrate how effectively the ASG responds to handle the load. By scaling out, the architecture maintains performance and high reliability even during bursts of traffic. This dynamic adjustment of resources based on demand is a key benefit of the scalable infrastructure provisioned by Terraform.

Best practices for load testing applications

To ensure effective load testing and derive meaningful insights from your applications, follow these best practices:

Always test in a staging or test environment: Avoid running load tests directly on production systems unless necessary and with extreme caution. Use an isolated environment that closely mirrors your production setup.
Use realistic traffic patterns: Design your k6 scripts to mimic real user behaviour as closely as possible, including request types, request frequency, and user concurrency.
Start small and gradually increase the load: Begin with a low number of virtual users and gradually ramp up the load. This helps you identify bottlenecks incrementally and observe how your system scales.
Monitor AWS metrics alongside k6 output: Combine the client-side metrics from k6 with server-side metrics from AWS CloudWatch. This provides a holistic view, showing how your infrastructure (ALB, EC2/ECS/EKS, ASG, RDS) responds to the load.
Consider chaos testing for resilience validation: Once stability under load is confirmed, introduce controlled failures (e.g., terminating instances) to validate your system's resilience and recovery capabilities.

Cleaning up your provisioned resources

After completing your load tests and analysis, it is crucial to tear down the infrastructure you provisioned with Terraform to avoid ongoing AWS charges.

1.Navigate back to your Terraform infrastructure directory.

2.Destroy the provisioned resources by running the terraform destroy command:

    terraform destroy

3.Terraform will display a plan of all resources that will be destroyed. Type yes and press Enter to confirm the destruction of your AWS infrastructure.

This command will safely de-provision all the AWS resources that Terraform created, ensuring no unnecessary costs are incurred.

Conclusion

In this guide, we demonstrated an approach to validating the performance and scalability of cloud-native applications. We covered the entire process, including setting up a cloud infrastructure on AWS using Terraform, performing various load tests with Grafana k6, from basic smoke tests to more intensive average and spike scenarios, and finally, analyzing the performance metrics to understand how this architecture effectively handles diverse traffic conditions, especially through its auto scaling capabilities driven by CloudWatch.

The key takeaway is that performance under load is paramount for modern applications. Implementing a scalable infrastructure with components like auto scaling groups is critical for ensuring reliability, consistent latency, and high throughput. Combining regular load testing with robust monitoring is essential to confirm your application can dynamically adapt to demand and provide a seamless user experience.

Top comments (1)

Amarachi Iheanacho • Jun 3

Great read!