#SRE

Performance Testing Microservices with K6

As Vodafone Greece modernizes its infrastructure, our Componentized Enterprise Logical Layer (CELL) microservices platform plays a pivotal role in delivering digital services. Ensuring the performance and stability of these services, especially under high-load conditions, is crucial to maintaining seamless customer experiences. The SRE team has integrated K6 performance testing into our development lifecycle to ensure that our microservices meet the demands of traffic surges, scalability, and resilience.

The Importance of Performance Testing

Performance testing is critical to maintaining a smooth user experience, especially for key microservices that power our operations. By testing these systems under various load conditions, we gain a deeper understanding of their behavior, allowing us to prevent potential issues before they impact production environments.

Our primary objectives with performance testing are:

  • Ensuring resilience under high traffic and stress scenarios.
  • Identifying and resolving bottlenecks before they escalate.
  • Supporting continuous deployment by validating stability through smoke tests after each update.

Types of Performance Tests We Conduct

To ensure the reliability of our platform, we use a combination of different performance tests, each tailored to a specific use case:

1. Smoke Tests:

After each deployment, we run smoke tests to confirm that the critical functionality remains intact. These quick, lightweight tests are essential for catching breaking issues early.

Why it's crucial: Given the importance of our platform, it's vital to ensure that every deployment is thoroughly tested. Smoke tests provide quick feedback, preventing any disruptions in service.

Example of smoke test:

import http from 'k6/http';
import { check } from 'k6';

export let options = {
  vus: 1, // Minimal VUs for smoke test
  duration: '30s', // Short duration
};

export default function () {
  let res = http.get('https://api.example.com/v1/service');
  check(res, {
    'status is 200': (r) => r.status === 200,
  });
}

2. Load Testing:

Load testing simulates normal traffic conditions to measure how our microservices handle sustained loads. With K6, we mimic real-world usage to ensure that our platform can scale efficiently as traffic increases.

Focus Areas:

  • Response times.
  • CPU and memory utilization.
  • Stability during peak usage.

How it helps: These tests ensure that the system can handle real-world traffic smoothly, keeping performance consistent even as user demand grows.

During stress and spike tests, monitoring CPU, memory, and network bandwidth allows us to identify resource constraints and adjust infrastructure accordingly. This helps us ensure that even during peak loads, the system maintains optimal performance.

Example of load test:

export let options = {
  stages: [
    { duration: '2m', target: 100 }, // Ramp-up to 100 users
    { duration: '5m', target: 100 }, // Sustained load for 5 minutes
    { duration: '2m', target: 0 },   // Ramp-down
  ],
};

export default function () {
  let res = http.get('https://api.example.com/v1/service');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 200ms': (r) => r.timings.duration < 200,
  });
}

In this example, 100 virtual users are steadily introduced to simulate a typical load scenario. The test evaluates how well the service responds under normal operating conditions.

We also use Virtual User Minutes (VUMs) to estimate the total user load and better understand resource requirements. This is especially critical when planning for large-scale events or high-traffic periods, helping us optimize resource allocation efficiently.

3. Stress Testing:

Stress testing pushes the system beyond its usual limits to observe its breaking point and recovery capabilities. This test is crucial for understanding how the platform behaves under extreme load conditions.

Why we do it:  By identifying failure points, we can prepare for sudden surges in traffic and ensure the platform can gracefully degrade rather than fail catastrophically.

Example of stress test:

export let options = {
  stages: [
    { duration: '2m', target: 50 }, // Ramp-up to 50 users
    { duration: '5m', target: 200 }, // Push to extreme load
    { duration: '2m', target: 0 },   // Ramp-down
  ],
};

export default function () {
  let res = http.get('https://api.example.com/v1/service');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 400ms': (r) => r.timings.duration < 400,
  });
}

This test ramps up users far beyond typical traffic levels to find the system’s limits, ensuring that the service is prepared for sudden traffic surges.

4. Spike Testing:

Spike tests focus on handling sudden, dramatic increases in traffic over a short period, such as during product launches or promotions. This allows us to see how well the platform manages traffic bursts without degrading performance.

For instance, during a recent product launch, spike testing helped us identify and mitigate potential bottlenecks in real-time, ensuring uninterrupted service during high-traffic periods. This proactive approach helped prevent outages and maintain a smooth customer experience, even under intense loads.

Use cases: Sharp, unexpected traffic increases during special events.

Example of spike test:

export let options = {
  stages: [
    { duration: '1m', target: 50 }, // Baseline load
    { duration: '1m', target: 300 }, // Spike to peak load
    { duration: '3m', target: 300 }, // Hold at peak load
    { duration: '1m', target: 50 }, // Back to baseline
  ],
};

export default function () {
  let res = http.get('https://api.example.com/v1/service');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });
}

The Benefits of Performance Testing

  • Early Detection of Issues: Smoke tests immediately catch breaking changes after each deployment, reducing the risk of service downtime.
  • Scalability Assurance: Load tests confirm that our platform can scale efficiently to handle increased user traffic.
  • Resilience to Spikes: Stress and spike tests ensure stability during unexpected or extreme conditions, protecting against system failures.

By continuously testing under various conditions, we gain a clear understanding of how the platform behaves, allowing us to optimize performance and prevent issues before they occur.

Conclusion

Our microservices platform is integral to Vodafone Greece’s digital operations, and maintaining its performance is a top priority. Using K6 for smoke, load, stress, and spike tests, we can confidently deploy new features and updates, knowing that the system will continue to operate smoothly.

Performance testing has become an essential part of our strategy, allowing us to build a more resilient platform that can handle increasing demands without compromising on stability. 

While we provide insights into our testing strategy, we ensure that all proprietary information, such as internal URLs and sensitive data, remains confidential. This approach allows us to share technical knowledge without compromising security.


Note: The code snippets provided in this article are simplified examples intended for demonstration purposes. They do not represent actual production code or live testing configurations used in Vodafone Greece's systems.

Loading...