DEV Community

Cover image for Beginner friendly: deploy a Spring Boot application to Kubernetes
AISSAM ASSOUIK
AISSAM ASSOUIK

Posted on • Edited on

Beginner friendly: deploy a Spring Boot application to Kubernetes

Deploying our applications to Kubernetes may help us with a lot of heavy deploy-related tasks like service discovery and horizontal scaling... With Kubernetes, we don't need to include those concerns in our code, instead those concerns are exported to be handled by Kubernetes.

Use case

Text-To-Speech Spring Boot deploy on K8s

We are going to deploy two microservices to a Kubernetes cluster using the following Kubernetes resources:

  • Namespace, spring-boot, will help us to isolate our resources within our cluster.

  • Deployment, tts and tts-analytics, to manage the set of Pods running our Spring Boot applications (Text-To-Speech and Text-To-Speech Analytics microservices).

  • Services, to expose our running applications for pod-to-pod communication using a ClusterIP service type and a NodePort service type for outside-to-cluster communication.

  • HorizontalPodAutoscaler, showcase Kubernetes native autoscaling features that will target our deployment for TTS microservice in order to scale in and out based on resource utilization (like cpu and memory) across running replicas.

What these microservices do?

  • TTS, exposing one endpoint that serves the purpose of converting text entered by user to speech using FreeTTS Java Library.

  • TTS Analytics, a microservice that serves the role of doing analytics on user IP addresses and User Agent in order to provide device and country info (for this lab, we only mock this behavior).

Spring Boot

Worth to mention the benefits of using Jib Maven Plugin to build our Spring Boot applications images. Jib help us with image build optimization and customization.
Use of Kubernetes DNS records to communicate with services, like we do for our example TTS Analytics service assigned the DNS name: tts-analytics-svc.spring-boot.svc.cluster.local

Namespcae

We are going to create spring-boot namespace in order to group and isolate our Kubernetes for this lab. In reality for simple clusters, we should work with default namespace as mentioned in When to Use Multiple Namespaces.

apiVersion: v1
kind: Namespace
metadata:
  name: spring-boot
Enter fullscreen mode Exit fullscreen mode

Deployment

TTS and TTS Analytics microservices will be deployed using next manifest files. Additionally, we can container-level resource requests and limit with spec.containers[].resources field. For TTS deployment, we configured CPU and memory requests (the resources that will be allocated to our container on pod scheduling) to 100 milliCPU (=0.1 CPU) and 100 mebibyte respectively. For limits, 300 milliCPU (=0.3 CPU) and 300 mebibyte. What are we expecting by setting those limits?
CPU limits are hard limit, when a container uses near of the CPU limit, the kernel will restrict access to CPU by CPU throttling. Whicl will guaranties the container always using CPU less than configured limit. Memory limits in the other hand, are enforced with kernel with out of memory (OOM) kills. Which means if a container uses more memory than the configured limit it will be terminated.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: tts
  name: tts
  namespace: spring-boot
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tts
  template:
    metadata:
      labels:
        app: tts
    spec:
      imagePullSecrets:
      - name: regcred
      containers:
      - image: registry.hub.docker.com/${REPO_NAME}/tts:latest
        name: tts
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 300m
            memory: 300Mi
Enter fullscreen mode Exit fullscreen mode
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: tts-analytics
  name: tts-analytics
  namespace: spring-boot
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tts-analytics
  template:
    metadata:
      labels:
        app: tts-analytics
    spec:
      imagePullSecrets:
      - name: regcred
      containers:
      - image: registry.hub.docker.com/${REPO_NAME}/ttsanalytics:latest
        name: tts-analytics
        ports:
        - containerPort: 8090
        resources:
          requests:
            cpu: 400m
            memory: 100Mi
          limits:
            cpu: 1000m
            memory: 500Mi
Enter fullscreen mode Exit fullscreen mode

Depending on your Kubernetes cluster setup and environment. You may consider the following:
If you are using a private registry you may need to set imagePullSecrets by creating a docker-registry secret. For image field as well, you may need to add registry.hub.docker.com before repo name if you're using Docker Hub.

Service

We are exposing the TTS Analytics microservice with a ClusterIP service, the corresponding pods are exposed for inside cluster communication only. And the service assigned a virtual IP address, Kubernetes then load-balances traffic across the corresponding pods.
TTS microservice will be exposed to outside cluster communication, also it will be to communicate inside cluster as well since NodePort service is based on ClusterIP service. The only difference is that each node proxies the configured nodePort (30234) to our service, in other words, we can use the node public IP address to access our NodePort service using the port 30234.

apiVersion: v1
kind: Service
metadata:
  name: tts-svc
  namespace: spring-boot
spec:
  type: NodePort
  selector:
    app: tts
  ports:
    - port: 8080
      targetPort: 8080
      nodePort: 30234
Enter fullscreen mode Exit fullscreen mode
apiVersion: v1
kind: Service
metadata:
  name: tts-analytics-svc
  namespace: spring-boot
spec:
  type: ClusterIP
  selector:
    app: tts-analytics
  ports:
    - port: 8090
      targetPort: 8090
Enter fullscreen mode Exit fullscreen mode

Horizontal Pod Autoscaler

HPA is Kubernetes native autoscaling feature with which more pods are deployed to match the increase in demand and scale in if traffic is down. Basically, HPA controller (the resource that controlled the behavior of HPA resource) calculates the desired replicas count based on the ratio between the current metric value and current metric value.
HPA supports resource metrics like cpu and memory, with which we can set the target value, average value, or average utilization of that metrics on which HPA should trigger scaling actions. Worth to mention that metric type of Resource ** is pod-level scope, for more granular control we can use metric type **ContainerResource for container-level metric scope.
​For our example, we going to target TTS deployment. With bounds of 1 to 5 replicas, our deployment replicas count will not exceed 5 replicas on max traffic load and will be reduced to only one replica when traffic is down. We are using two metrics, cpu and memory at the same time, HPA controller will calculate the desired replicas count for each metrics separately and the max desired replicas count will be used among both calculated values.

  • CPU metric, using metrics target type of Utilization, HPA controller will calculate the ratio of current usage of cpu and requested cpu for all pods and ensure that the corresponding average utilization value is 60%.

  • Memory metric, using AverageValue metric target type, HPA controller will try to keep memory metric average value across all targeted pods equal to 500Mi.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: tts
  namespace: spring-boot
spec:
  maxReplicas: 5
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: tts
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 500Mi
Enter fullscreen mode Exit fullscreen mode

Demo

We are going to create all corresponding resources with the following kubeclt commands:

student@control-plane:~$ kubectl apply -f namespace.yaml
student@control-plane:~$ envsubst < tts-deploy.yaml | kubectl apply -f -
student@control-plane:~$ envsubst < tts-analytics-deploy.yaml | kubectl apply -f -
student@control-plane:~$ kubectl apply -f tts-hpa.yaml
Enter fullscreen mode Exit fullscreen mode

We expect the following Kubernetes resources are created:

student@control-plane:~$ kubectl -n spring-boot get pod,deploy,svc,hpa
NAME                                 READY   STATUS    RESTARTS   AGE
pod/tts-analytics-6b5f4577b5-6c9gg   1/1     Running   0          4m28s
pod/tts-d996c687d-v9wgx              1/1     Running   0          2m43s

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/tts             1/1     1            1           2m43s
deployment.apps/tts-analytics   1/1     1            1           4m28s

NAME                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
service/tts-analytics-svc   ClusterIP   10.97.203.149    <none>        8090/TCP         4m28s
service/tts-svc             NodePort    10.110.233.162   <none>        8080:30234/TCP   2m43s

NAME                                      REFERENCE        TARGETS                                MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/tts   Deployment/tts   cpu: 4%/60%, memory: 120954880/500Mi   1         5         1          95s
Enter fullscreen mode Exit fullscreen mode

Let's test if TTS and TTS Analytics are able to complete a request flow and communicate with each other:

aissam@aissam:/aissam/Downloads/test$ curl -X POST http://$PUBLIC_NODE_IP:30234/tts -H "Content-Type: application/json" -d '{"text
":"Hi this is a test!!"}' -OJ
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 48778  100 48748  100    30   2363      1  0:00:30  0:00:20  0:00:10 12316
aissam@aissam:/aissam/Downloads/test$ ls
f51b3d0d-0038-42a4-8eae-2a7ba1d38ce9.wav
Enter fullscreen mode Exit fullscreen mode
student@control-plane:~$ kubectl -n spring-boot logs tts-d996c687d-v9wgx -f

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/

 :: Spring Boot ::                (v3.4.5)

2025-05-17T16:28:07.715Z  INFO 1 --- [producer] [           main] com.example.tts.TtsApplication           : Starting TtsApplication using Java 21.0.7 with PID 1 (/app/classes started by root in /)
2025-05-17T16:28:07.883Z  INFO 1 --- [producer] [           main] com.example.tts.TtsApplication           : No active profile set, falling back to 1 default profile: "default"
2025-05-17T16:28:20.330Z  INFO 1 --- [producer] [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port 8080 (http)
2025-05-17T16:28:20.532Z  INFO 1 --- [producer] [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2025-05-17T16:28:20.533Z  INFO 1 --- [producer] [           main] o.apache.catalina.core.StandardEngine    : Starting Servlet engine: [Apache Tomcat/10.1.40]
2025-05-17T16:28:21.679Z  INFO 1 --- [producer] [           main] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2025-05-17T16:28:21.681Z  INFO 1 --- [producer] [           main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 13236 ms
2025-05-17T16:28:31.145Z  INFO 1 --- [producer] [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port 8080 (http) with context path '/'
2025-05-17T16:28:31.339Z  INFO 1 --- [producer] [           main] com.example.tts.TtsApplication           : Started TtsApplication in 28.549 seconds (process running for 31.811)
2025-05-17T16:28:51.978Z  INFO 1 --- [producer] [nio-8080-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring DispatcherServlet 'dispatcherServlet'
2025-05-17T16:28:51.979Z  INFO 1 --- [producer] [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : Initializing Servlet 'dispatcherServlet'
2025-05-17T16:28:51.984Z  INFO 1 --- [producer] [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : Completed initialization in 5 ms
2025-05-17T16:28:52.880Z  INFO 1 --- [producer] [nio-8080-exec-1] c.e.t.controller.TextToSpeechController  : textToSpeech request: TtsRequest[text=Hi this is a test!!]
2025-05-17T16:28:55.042Z  INFO 1 --- [producer] [nio-8080-exec-1] c.e.tts.service.TextToSpeechService      : do Something with analytics response: TtsAnalyticsResponse[device=Desktop, countryIso=FR]
Wrote synthesized speech to /output/f51b3d0d-0038-42a4-8eae-2a7ba1d38ce9.wav
Enter fullscreen mode Exit fullscreen mode
student@control-plane:~$ kubectl -n spring-boot logs tts-analytics-6b5f4577b5-6c9gg -f

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/

 :: Spring Boot ::                (v3.4.5)

2025-05-17T16:28:08.589Z  INFO 1 --- [consumer] [           main] c.e.t.TtsAnalyticsApplication            : Starting TtsAnalyticsApplication using Java 21.0.7 with PID 1 (/app/classes started by root in /)
2025-05-17T16:28:08.600Z  INFO 1 --- [consumer] [           main] c.e.t.TtsAnalyticsApplication            : No active profile set, falling back to 1 default profile: "default"
2025-05-17T16:28:11.989Z  INFO 1 --- [consumer] [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port 8090 (http)
2025-05-17T16:28:12.027Z  INFO 1 --- [consumer] [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2025-05-17T16:28:12.029Z  INFO 1 --- [consumer] [           main] o.apache.catalina.core.StandardEngine    : Starting Servlet engine: [Apache Tomcat/10.1.40]
2025-05-17T16:28:12.381Z  INFO 1 --- [consumer] [           main] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2025-05-17T16:28:12.421Z  INFO 1 --- [consumer] [           main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 3677 ms
2025-05-17T16:28:14.518Z  INFO 1 --- [consumer] [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port 8090 (http) with context path '/'
2025-05-17T16:28:14.552Z  INFO 1 --- [consumer] [           main] c.e.t.TtsAnalyticsApplication            : Started TtsAnalyticsApplication in 7.325 seconds (process running for 8.222)
2025-05-17T16:28:54.054Z  INFO 1 --- [consumer] [nio-8090-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring DispatcherServlet 'dispatcherServlet'
2025-05-17T16:28:54.055Z  INFO 1 --- [consumer] [nio-8090-exec-1] o.s.web.servlet.DispatcherServlet        : Initializing Servlet 'dispatcherServlet'
2025-05-17T16:28:54.059Z  INFO 1 --- [consumer] [nio-8090-exec-1] o.s.web.servlet.DispatcherServlet        : Completed initialization in 3 ms
2025-05-17T16:28:54.412Z  INFO 1 --- [consumer] [nio-8090-exec-1] c.e.t.controller.AnalyticsController     : doAnalytics request: TtsAnalyticsRequest[clientIp=10.0.0.212, userAgent=curl/8.5.0]
Enter fullscreen mode Exit fullscreen mode

Everything seems to work correctly!! Next we will overwhelm the TTS microservice with requests in order to increase load and then we inspect HPA behavior in response to load increase.
We are going to use the following script to run 50 request:

#!/usr/bin/env bash

# Replace with your actual node public IP
export PUBLIC_NODE_IP=...........

for i in $(seq 1 50); do
  curl -X POST http://"$PUBLIC_NODE_IP":30234/tts \
       -H "Content-Type: application/json" \
       -d '{"text":"Hi this is a test!!"}' \
       -OJ &
done

# Wait for all background requests to finish
wait
echo "All 100 POST requests have completed."
Enter fullscreen mode Exit fullscreen mode

Next, we get the behavior of HPA during before, during and after script finished:

student@control-plane:~$ kubectl -n spring-boot get hpa tts -w
NAME   REFERENCE        TARGETS                                MINPODS   MAXPODS   REPLICAS   AGE
tts    Deployment/tts   cpu: 4%/60%, memory: 156020736/500Mi   1         5         1          2m21s
tts    Deployment/tts   cpu: 3%/60%, memory: 156020736/500Mi   1         5         1          2m31s
tts    Deployment/tts   cpu: 4%/60%, memory: 156020736/500Mi   1         5         1          3m16s
tts    Deployment/tts   cpu: 47%/60%, memory: 181080064/500Mi   1         5         1          3m47s
tts    Deployment/tts   cpu: 302%/60%, memory: 263831552/500Mi   1         5         1          4m2s
tts    Deployment/tts   cpu: 300%/60%, memory: 268967936/500Mi   1         5         4          4m17s
tts    Deployment/tts   cpu: 248%/60%, memory: 168945664/500Mi   1         5         5          4m32s
tts    Deployment/tts   cpu: 296%/60%, memory: 133169152/500Mi   1         5         5          4m47s
tts    Deployment/tts   cpu: 265%/60%, memory: 118334259200m/500Mi   1         5         5          5m2s
tts    Deployment/tts   cpu: 250%/60%, memory: 136432844800m/500Mi   1         5         5          5m17s
tts    Deployment/tts   cpu: 166%/60%, memory: 152010752/500Mi       1         5         5          5m32s
tts    Deployment/tts   cpu: 70%/60%, memory: 153232179200m/500Mi    1         5         5          5m48s
tts    Deployment/tts   cpu: 63%/60%, memory: 153314918400m/500Mi    1         5         5          6m3s
tts    Deployment/tts   cpu: 61%/60%, memory: 153285427200m/500Mi    1         5         5          6m18s
tts    Deployment/tts   cpu: 60%/60%, memory: 153246105600m/500Mi    1         5         5          6m33s
tts    Deployment/tts   cpu: 60%/60%, memory: 153327206400m/500Mi    1         5         5          6m48s
tts    Deployment/tts   cpu: 56%/60%, memory: 153352601600m/500Mi    1         5         5          7m3s
tts    Deployment/tts   cpu: 62%/60%, memory: 153411584/500Mi        1         5         5          7m18s
tts    Deployment/tts   cpu: 60%/60%, memory: 153486950400m/500Mi    1         5         5          7m33s
tts    Deployment/tts   cpu: 55%/60%, memory: 153413222400m/500Mi    1         5         5          7m48s
tts    Deployment/tts   cpu: 59%/60%, memory: 153544294400m/500Mi    1         5         5          8m3s
tts    Deployment/tts   cpu: 59%/60%, memory: 153555763200m/500Mi    1         5         5          8m18s
tts    Deployment/tts   cpu: 56%/60%, memory: 153517260800m/500Mi    1         5         5          8m33s
tts    Deployment/tts   cpu: 61%/60%, memory: 153539379200m/500Mi    1         5         5          8m48s
tts    Deployment/tts   cpu: 59%/60%, memory: 153495142400m/500Mi    1         5         5          9m3s
tts    Deployment/tts   cpu: 60%/60%, memory: 153457459200m/500Mi    1         5         5          9m18s
tts    Deployment/tts   cpu: 60%/60%, memory: 153468108800m/500Mi    1         5         5          9m33s
tts    Deployment/tts   cpu: 60%/60%, memory: 153416499200m/500Mi    1         5         5          9m48s
tts    Deployment/tts   cpu: 59%/60%, memory: 153436979200m/500Mi    1         5         5          10m
tts    Deployment/tts   cpu: 52%/60%, memory: 153480396800m/500Mi    1         5         5          10m
tts    Deployment/tts   cpu: 59%/60%, memory: 153458278400m/500Mi    1         5         5          10m
tts    Deployment/tts   cpu: 63%/60%, memory: 153523814400m/500Mi    1         5         5          10m
tts    Deployment/tts   cpu: 46%/60%, memory: 153424691200m/500Mi    1         5         5          11m
tts    Deployment/tts   cpu: 59%/60%, memory: 153407488/500Mi        1         5         5          11m
tts    Deployment/tts   cpu: 60%/60%, memory: 153355878400m/500Mi    1         5         5          11m
tts    Deployment/tts   cpu: 61%/60%, memory: 153372262400m/500Mi    1         5         5          11m
tts    Deployment/tts   cpu: 63%/60%, memory: 153357516800m/500Mi    1         5         5          12m
tts    Deployment/tts   cpu: 60%/60%, memory: 153169100800m/500Mi    1         5         5          12m
tts    Deployment/tts   cpu: 60%/60%, memory: 153171558400m/500Mi    1         5         5          12m
tts    Deployment/tts   cpu: 61%/60%, memory: 153098649600m/500Mi    1         5         5          12m
tts    Deployment/tts   cpu: 62%/60%, memory: 153107660800m/500Mi    1         5         5          13m
tts    Deployment/tts   cpu: 61%/60%, memory: 153069158400m/500Mi    1         5         5          13m
tts    Deployment/tts   cpu: 61%/60%, memory: 152816844800m/500Mi    1         5         5          13m
tts    Deployment/tts   cpu: 44%/60%, memory: 152460492800m/500Mi    1         5         5          13m
tts    Deployment/tts   cpu: 3%/60%, memory: 152462131200m/500Mi     1         5         5          14m
tts    Deployment/tts   cpu: 3%/60%, memory: 152467046400m/500Mi     1         5         5          14m
tts    Deployment/tts   cpu: 3%/60%, memory: 152388403200m/500Mi     1         5         5          14m
tts    Deployment/tts   cpu: 3%/60%, memory: 152393318400m/500Mi     1         5         5          14m
tts    Deployment/tts   cpu: 3%/60%, memory: 151896064/500Mi         1         5         5          15m
tts    Deployment/tts   cpu: 3%/60%, memory: 151904256/500Mi         1         5         5          15m
tts    Deployment/tts   cpu: 3%/60%, memory: 151910809600m/500Mi     1         5         5          15m
tts    Deployment/tts   cpu: 3%/60%, memory: 151916544/500Mi         1         5         5          15m
tts    Deployment/tts   cpu: 3%/60%, memory: 151923916800m/500Mi     1         5         5          16m
tts    Deployment/tts   cpu: 3%/60%, memory: 151925555200m/500Mi     1         5         5          16m
tts    Deployment/tts   cpu: 3%/60%, memory: 151929651200m/500Mi     1         5         5          16m
tts    Deployment/tts   cpu: 3%/60%, memory: 151931289600m/500Mi     1         5         5          16m
tts    Deployment/tts   cpu: 3%/60%, memory: 151934566400m/500Mi     1         5         5          17m
tts    Deployment/tts   cpu: 3%/60%, memory: 151936204800m/500Mi     1         5         5          17m
tts    Deployment/tts   cpu: 3%/60%, memory: 151939481600m/500Mi     1         5         5          17m
tts    Deployment/tts   cpu: 3%/60%, memory: 151940300800m/500Mi     1         5         5          18m
tts    Deployment/tts   cpu: 3%/60%, memory: 151942758400m/500Mi     1         5         5          18m
tts    Deployment/tts   cpu: 3%/60%, memory: 151946035200m/500Mi     1         5         5          18m
tts    Deployment/tts   cpu: 3%/60%, memory: 158087168/500Mi         1         5         4          18m
tts    Deployment/tts   cpu: 3%/60%, memory: 193366016/500Mi         1         5         2          19m
tts    Deployment/tts   cpu: 3%/60%, memory: 193372160/500Mi         1         5         2          20m
tts    Deployment/tts   cpu: 4%/60%, memory: 193372160/500Mi         1         5         2          20m
tts    Deployment/tts   cpu: 4%/60%, memory: 193396736/500Mi         1         5         2          21m
tts    Deployment/tts   cpu: 3%/60%, memory: 193396736/500Mi         1         5         2          21m
tts    Deployment/tts   cpu: 4%/60%, memory: 193396736/500Mi         1         5         2          22m
tts    Deployment/tts   cpu: 3%/60%, memory: 193396736/500Mi         1         5         2          22m
tts    Deployment/tts   cpu: 4%/60%, memory: 193396736/500Mi         1         5         2          23m
tts    Deployment/tts   cpu: 4%/60%, memory: 193398784/500Mi         1         5         2          23m
tts    Deployment/tts   cpu: 3%/60%, memory: 193398784/500Mi         1         5         2          23m
tts    Deployment/tts   cpu: 4%/60%, memory: 257855488/500Mi         1         5         1          24m
tts    Deployment/tts   cpu: 3%/60%, memory: 257855488/500Mi         1         5         1          25m
Enter fullscreen mode Exit fullscreen mode
Events:
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  37m   horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  37m   horizontal-pod-autoscaler  New size: 5; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  22m   horizontal-pod-autoscaler  New size: 4; reason: All metrics below target
  Normal  SuccessfulRescale  22m   horizontal-pod-autoscaler  New size: 2; reason: All metrics below target
  Normal  SuccessfulRescale  17m   horizontal-pod-autoscaler  New size: 1; reason: All metrics below target
Enter fullscreen mode Exit fullscreen mode

Initially we had one replicas of our TTS microservice, then after load increase HPA controller try to match the target value by increasing replicas count. After the load was down, HPA controller takes 5 minutes to scale in from 5 replicas to 2 replicas then finaly to our MinReplicas. This period called Stabilization Window, it's used to stabilize replicas count when the metric keeps fluctuating.

Summary

We showed how we can deploy our Spring Boot microservices with Kubernetes and expose them for external and internal communication. In addition, we showcased how Kubernetes native autoscaling feature can help us efficiently use our cluster resources by scaling out on traffic increase and scale in on traffic decrease.

Project GitHub Repo

Top comments (0)