Deploying our applications to Kubernetes may help us with a lot of heavy deploy-related tasks like service discovery and horizontal scaling... With Kubernetes, we don't need to include those concerns in our code, instead those concerns are exported to be handled by Kubernetes.
Use case
We are going to deploy two microservices to a Kubernetes cluster using the following Kubernetes resources:
Namespace, spring-boot, will help us to isolate our resources within our cluster.
Deployment, tts and tts-analytics, to manage the set of Pods running our Spring Boot applications (Text-To-Speech and Text-To-Speech Analytics microservices).
Services, to expose our running applications for pod-to-pod communication using a ClusterIP service type and a NodePort service type for outside-to-cluster communication.
HorizontalPodAutoscaler, showcase Kubernetes native autoscaling features that will target our deployment for TTS microservice in order to scale in and out based on resource utilization (like cpu and memory) across running replicas.
What these microservices do?
TTS, exposing one endpoint that serves the purpose of converting text entered by user to speech using FreeTTS Java Library.
TTS Analytics, a microservice that serves the role of doing analytics on user IP addresses and User Agent in order to provide device and country info (for this lab, we only mock this behavior).
Spring Boot
Worth to mention the benefits of using Jib Maven Plugin to build our Spring Boot applications images. Jib help us with image build optimization and customization.
Use of Kubernetes DNS records to communicate with services, like we do for our example TTS Analytics service assigned the DNS name: tts-analytics-svc.spring-boot.svc.cluster.local
Namespcae
We are going to create spring-boot namespace in order to group and isolate our Kubernetes for this lab. In reality for simple clusters, we should work with default namespace as mentioned in When to Use Multiple Namespaces.
apiVersion: v1
kind: Namespace
metadata:
name: spring-boot
Deployment
TTS and TTS Analytics microservices will be deployed using next manifest files. Additionally, we can container-level resource requests and limit with spec.containers[].resources field. For TTS deployment, we configured CPU and memory requests (the resources that will be allocated to our container on pod scheduling) to 100 milliCPU (=0.1 CPU) and 100 mebibyte respectively. For limits, 300 milliCPU (=0.3 CPU) and 300 mebibyte. What are we expecting by setting those limits?
CPU limits are hard limit, when a container uses near of the CPU limit, the kernel will restrict access to CPU by CPU throttling. Whicl will guaranties the container always using CPU less than configured limit. Memory limits in the other hand, are enforced with kernel with out of memory (OOM) kills. Which means if a container uses more memory than the configured limit it will be terminated.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: tts
name: tts
namespace: spring-boot
spec:
replicas: 1
selector:
matchLabels:
app: tts
template:
metadata:
labels:
app: tts
spec:
imagePullSecrets:
- name: regcred
containers:
- image: registry.hub.docker.com/${REPO_NAME}/tts:latest
name: tts
ports:
- containerPort: 8080
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 300m
memory: 300Mi
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: tts-analytics
name: tts-analytics
namespace: spring-boot
spec:
replicas: 1
selector:
matchLabels:
app: tts-analytics
template:
metadata:
labels:
app: tts-analytics
spec:
imagePullSecrets:
- name: regcred
containers:
- image: registry.hub.docker.com/${REPO_NAME}/ttsanalytics:latest
name: tts-analytics
ports:
- containerPort: 8090
resources:
requests:
cpu: 400m
memory: 100Mi
limits:
cpu: 1000m
memory: 500Mi
Depending on your Kubernetes cluster setup and environment. You may consider the following:
If you are using a private registry you may need to set imagePullSecrets by creating a docker-registry secret. For image field as well, you may need to add registry.hub.docker.com before repo name if you're using Docker Hub.
Service
We are exposing the TTS Analytics microservice with a ClusterIP service, the corresponding pods are exposed for inside cluster communication only. And the service assigned a virtual IP address, Kubernetes then load-balances traffic across the corresponding pods.
TTS microservice will be exposed to outside cluster communication, also it will be to communicate inside cluster as well since NodePort service is based on ClusterIP service. The only difference is that each node proxies the configured nodePort (30234) to our service, in other words, we can use the node public IP address to access our NodePort service using the port 30234.
apiVersion: v1
kind: Service
metadata:
name: tts-svc
namespace: spring-boot
spec:
type: NodePort
selector:
app: tts
ports:
- port: 8080
targetPort: 8080
nodePort: 30234
apiVersion: v1
kind: Service
metadata:
name: tts-analytics-svc
namespace: spring-boot
spec:
type: ClusterIP
selector:
app: tts-analytics
ports:
- port: 8090
targetPort: 8090
Horizontal Pod Autoscaler
HPA is Kubernetes native autoscaling feature with which more pods are deployed to match the increase in demand and scale in if traffic is down. Basically, HPA controller (the resource that controlled the behavior of HPA resource) calculates the desired replicas count based on the ratio between the current metric value and current metric value.
HPA supports resource metrics like cpu and memory, with which we can set the target value, average value, or average utilization of that metrics on which HPA should trigger scaling actions. Worth to mention that metric type of Resource ** is pod-level scope, for more granular control we can use metric type **ContainerResource for container-level metric scope.
βFor our example, we going to target TTS deployment. With bounds of 1 to 5 replicas, our deployment replicas count will not exceed 5 replicas on max traffic load and will be reduced to only one replica when traffic is down. We are using two metrics, cpu and memory at the same time, HPA controller will calculate the desired replicas count for each metrics separately and the max desired replicas count will be used among both calculated values.
CPU metric, using metrics target type of Utilization, HPA controller will calculate the ratio of current usage of cpu and requested cpu for all pods and ensure that the corresponding average utilization value is 60%.
Memory metric, using AverageValue metric target type, HPA controller will try to keep memory metric average value across all targeted pods equal to 500Mi.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: tts
namespace: spring-boot
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: tts
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 500Mi
Demo
We are going to create all corresponding resources with the following kubeclt commands:
student@control-plane:~$ kubectl apply -f namespace.yaml
student@control-plane:~$ envsubst < tts-deploy.yaml | kubectl apply -f -
student@control-plane:~$ envsubst < tts-analytics-deploy.yaml | kubectl apply -f -
student@control-plane:~$ kubectl apply -f tts-hpa.yaml
We expect the following Kubernetes resources are created:
student@control-plane:~$ kubectl -n spring-boot get pod,deploy,svc,hpa
NAME READY STATUS RESTARTS AGE
pod/tts-analytics-6b5f4577b5-6c9gg 1/1 Running 0 4m28s
pod/tts-d996c687d-v9wgx 1/1 Running 0 2m43s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/tts 1/1 1 1 2m43s
deployment.apps/tts-analytics 1/1 1 1 4m28s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/tts-analytics-svc ClusterIP 10.97.203.149 <none> 8090/TCP 4m28s
service/tts-svc NodePort 10.110.233.162 <none> 8080:30234/TCP 2m43s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/tts Deployment/tts cpu: 4%/60%, memory: 120954880/500Mi 1 5 1 95s
Let's test if TTS and TTS Analytics are able to complete a request flow and communicate with each other:
aissam@aissam:/aissam/Downloads/test$ curl -X POST http://$PUBLIC_NODE_IP:30234/tts -H "Content-Type: application/json" -d '{"text
":"Hi this is a test!!"}' -OJ
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 48778 100 48748 100 30 2363 1 0:00:30 0:00:20 0:00:10 12316
aissam@aissam:/aissam/Downloads/test$ ls
f51b3d0d-0038-42a4-8eae-2a7ba1d38ce9.wav
student@control-plane:~$ kubectl -n spring-boot logs tts-d996c687d-v9wgx -f
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v3.4.5)
2025-05-17T16:28:07.715Z INFO 1 --- [producer] [ main] com.example.tts.TtsApplication : Starting TtsApplication using Java 21.0.7 with PID 1 (/app/classes started by root in /)
2025-05-17T16:28:07.883Z INFO 1 --- [producer] [ main] com.example.tts.TtsApplication : No active profile set, falling back to 1 default profile: "default"
2025-05-17T16:28:20.330Z INFO 1 --- [producer] [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port 8080 (http)
2025-05-17T16:28:20.532Z INFO 1 --- [producer] [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2025-05-17T16:28:20.533Z INFO 1 --- [producer] [ main] o.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/10.1.40]
2025-05-17T16:28:21.679Z INFO 1 --- [producer] [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
2025-05-17T16:28:21.681Z INFO 1 --- [producer] [ main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 13236 ms
2025-05-17T16:28:31.145Z INFO 1 --- [producer] [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port 8080 (http) with context path '/'
2025-05-17T16:28:31.339Z INFO 1 --- [producer] [ main] com.example.tts.TtsApplication : Started TtsApplication in 28.549 seconds (process running for 31.811)
2025-05-17T16:28:51.978Z INFO 1 --- [producer] [nio-8080-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring DispatcherServlet 'dispatcherServlet'
2025-05-17T16:28:51.979Z INFO 1 --- [producer] [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet : Initializing Servlet 'dispatcherServlet'
2025-05-17T16:28:51.984Z INFO 1 --- [producer] [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet : Completed initialization in 5 ms
2025-05-17T16:28:52.880Z INFO 1 --- [producer] [nio-8080-exec-1] c.e.t.controller.TextToSpeechController : textToSpeech request: TtsRequest[text=Hi this is a test!!]
2025-05-17T16:28:55.042Z INFO 1 --- [producer] [nio-8080-exec-1] c.e.tts.service.TextToSpeechService : do Something with analytics response: TtsAnalyticsResponse[device=Desktop, countryIso=FR]
Wrote synthesized speech to /output/f51b3d0d-0038-42a4-8eae-2a7ba1d38ce9.wav
student@control-plane:~$ kubectl -n spring-boot logs tts-analytics-6b5f4577b5-6c9gg -f
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v3.4.5)
2025-05-17T16:28:08.589Z INFO 1 --- [consumer] [ main] c.e.t.TtsAnalyticsApplication : Starting TtsAnalyticsApplication using Java 21.0.7 with PID 1 (/app/classes started by root in /)
2025-05-17T16:28:08.600Z INFO 1 --- [consumer] [ main] c.e.t.TtsAnalyticsApplication : No active profile set, falling back to 1 default profile: "default"
2025-05-17T16:28:11.989Z INFO 1 --- [consumer] [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port 8090 (http)
2025-05-17T16:28:12.027Z INFO 1 --- [consumer] [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2025-05-17T16:28:12.029Z INFO 1 --- [consumer] [ main] o.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/10.1.40]
2025-05-17T16:28:12.381Z INFO 1 --- [consumer] [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
2025-05-17T16:28:12.421Z INFO 1 --- [consumer] [ main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 3677 ms
2025-05-17T16:28:14.518Z INFO 1 --- [consumer] [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port 8090 (http) with context path '/'
2025-05-17T16:28:14.552Z INFO 1 --- [consumer] [ main] c.e.t.TtsAnalyticsApplication : Started TtsAnalyticsApplication in 7.325 seconds (process running for 8.222)
2025-05-17T16:28:54.054Z INFO 1 --- [consumer] [nio-8090-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring DispatcherServlet 'dispatcherServlet'
2025-05-17T16:28:54.055Z INFO 1 --- [consumer] [nio-8090-exec-1] o.s.web.servlet.DispatcherServlet : Initializing Servlet 'dispatcherServlet'
2025-05-17T16:28:54.059Z INFO 1 --- [consumer] [nio-8090-exec-1] o.s.web.servlet.DispatcherServlet : Completed initialization in 3 ms
2025-05-17T16:28:54.412Z INFO 1 --- [consumer] [nio-8090-exec-1] c.e.t.controller.AnalyticsController : doAnalytics request: TtsAnalyticsRequest[clientIp=10.0.0.212, userAgent=curl/8.5.0]
Everything seems to work correctly!! Next we will overwhelm the TTS microservice with requests in order to increase load and then we inspect HPA behavior in response to load increase.
We are going to use the following script to run 50 request:
#!/usr/bin/env bash
# Replace with your actual node public IP
export PUBLIC_NODE_IP=...........
for i in $(seq 1 50); do
curl -X POST http://"$PUBLIC_NODE_IP":30234/tts \
-H "Content-Type: application/json" \
-d '{"text":"Hi this is a test!!"}' \
-OJ &
done
# Wait for all background requests to finish
wait
echo "All 100 POST requests have completed."
Next, we get the behavior of HPA during before, during and after script finished:
student@control-plane:~$ kubectl -n spring-boot get hpa tts -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
tts Deployment/tts cpu: 4%/60%, memory: 156020736/500Mi 1 5 1 2m21s
tts Deployment/tts cpu: 3%/60%, memory: 156020736/500Mi 1 5 1 2m31s
tts Deployment/tts cpu: 4%/60%, memory: 156020736/500Mi 1 5 1 3m16s
tts Deployment/tts cpu: 47%/60%, memory: 181080064/500Mi 1 5 1 3m47s
tts Deployment/tts cpu: 302%/60%, memory: 263831552/500Mi 1 5 1 4m2s
tts Deployment/tts cpu: 300%/60%, memory: 268967936/500Mi 1 5 4 4m17s
tts Deployment/tts cpu: 248%/60%, memory: 168945664/500Mi 1 5 5 4m32s
tts Deployment/tts cpu: 296%/60%, memory: 133169152/500Mi 1 5 5 4m47s
tts Deployment/tts cpu: 265%/60%, memory: 118334259200m/500Mi 1 5 5 5m2s
tts Deployment/tts cpu: 250%/60%, memory: 136432844800m/500Mi 1 5 5 5m17s
tts Deployment/tts cpu: 166%/60%, memory: 152010752/500Mi 1 5 5 5m32s
tts Deployment/tts cpu: 70%/60%, memory: 153232179200m/500Mi 1 5 5 5m48s
tts Deployment/tts cpu: 63%/60%, memory: 153314918400m/500Mi 1 5 5 6m3s
tts Deployment/tts cpu: 61%/60%, memory: 153285427200m/500Mi 1 5 5 6m18s
tts Deployment/tts cpu: 60%/60%, memory: 153246105600m/500Mi 1 5 5 6m33s
tts Deployment/tts cpu: 60%/60%, memory: 153327206400m/500Mi 1 5 5 6m48s
tts Deployment/tts cpu: 56%/60%, memory: 153352601600m/500Mi 1 5 5 7m3s
tts Deployment/tts cpu: 62%/60%, memory: 153411584/500Mi 1 5 5 7m18s
tts Deployment/tts cpu: 60%/60%, memory: 153486950400m/500Mi 1 5 5 7m33s
tts Deployment/tts cpu: 55%/60%, memory: 153413222400m/500Mi 1 5 5 7m48s
tts Deployment/tts cpu: 59%/60%, memory: 153544294400m/500Mi 1 5 5 8m3s
tts Deployment/tts cpu: 59%/60%, memory: 153555763200m/500Mi 1 5 5 8m18s
tts Deployment/tts cpu: 56%/60%, memory: 153517260800m/500Mi 1 5 5 8m33s
tts Deployment/tts cpu: 61%/60%, memory: 153539379200m/500Mi 1 5 5 8m48s
tts Deployment/tts cpu: 59%/60%, memory: 153495142400m/500Mi 1 5 5 9m3s
tts Deployment/tts cpu: 60%/60%, memory: 153457459200m/500Mi 1 5 5 9m18s
tts Deployment/tts cpu: 60%/60%, memory: 153468108800m/500Mi 1 5 5 9m33s
tts Deployment/tts cpu: 60%/60%, memory: 153416499200m/500Mi 1 5 5 9m48s
tts Deployment/tts cpu: 59%/60%, memory: 153436979200m/500Mi 1 5 5 10m
tts Deployment/tts cpu: 52%/60%, memory: 153480396800m/500Mi 1 5 5 10m
tts Deployment/tts cpu: 59%/60%, memory: 153458278400m/500Mi 1 5 5 10m
tts Deployment/tts cpu: 63%/60%, memory: 153523814400m/500Mi 1 5 5 10m
tts Deployment/tts cpu: 46%/60%, memory: 153424691200m/500Mi 1 5 5 11m
tts Deployment/tts cpu: 59%/60%, memory: 153407488/500Mi 1 5 5 11m
tts Deployment/tts cpu: 60%/60%, memory: 153355878400m/500Mi 1 5 5 11m
tts Deployment/tts cpu: 61%/60%, memory: 153372262400m/500Mi 1 5 5 11m
tts Deployment/tts cpu: 63%/60%, memory: 153357516800m/500Mi 1 5 5 12m
tts Deployment/tts cpu: 60%/60%, memory: 153169100800m/500Mi 1 5 5 12m
tts Deployment/tts cpu: 60%/60%, memory: 153171558400m/500Mi 1 5 5 12m
tts Deployment/tts cpu: 61%/60%, memory: 153098649600m/500Mi 1 5 5 12m
tts Deployment/tts cpu: 62%/60%, memory: 153107660800m/500Mi 1 5 5 13m
tts Deployment/tts cpu: 61%/60%, memory: 153069158400m/500Mi 1 5 5 13m
tts Deployment/tts cpu: 61%/60%, memory: 152816844800m/500Mi 1 5 5 13m
tts Deployment/tts cpu: 44%/60%, memory: 152460492800m/500Mi 1 5 5 13m
tts Deployment/tts cpu: 3%/60%, memory: 152462131200m/500Mi 1 5 5 14m
tts Deployment/tts cpu: 3%/60%, memory: 152467046400m/500Mi 1 5 5 14m
tts Deployment/tts cpu: 3%/60%, memory: 152388403200m/500Mi 1 5 5 14m
tts Deployment/tts cpu: 3%/60%, memory: 152393318400m/500Mi 1 5 5 14m
tts Deployment/tts cpu: 3%/60%, memory: 151896064/500Mi 1 5 5 15m
tts Deployment/tts cpu: 3%/60%, memory: 151904256/500Mi 1 5 5 15m
tts Deployment/tts cpu: 3%/60%, memory: 151910809600m/500Mi 1 5 5 15m
tts Deployment/tts cpu: 3%/60%, memory: 151916544/500Mi 1 5 5 15m
tts Deployment/tts cpu: 3%/60%, memory: 151923916800m/500Mi 1 5 5 16m
tts Deployment/tts cpu: 3%/60%, memory: 151925555200m/500Mi 1 5 5 16m
tts Deployment/tts cpu: 3%/60%, memory: 151929651200m/500Mi 1 5 5 16m
tts Deployment/tts cpu: 3%/60%, memory: 151931289600m/500Mi 1 5 5 16m
tts Deployment/tts cpu: 3%/60%, memory: 151934566400m/500Mi 1 5 5 17m
tts Deployment/tts cpu: 3%/60%, memory: 151936204800m/500Mi 1 5 5 17m
tts Deployment/tts cpu: 3%/60%, memory: 151939481600m/500Mi 1 5 5 17m
tts Deployment/tts cpu: 3%/60%, memory: 151940300800m/500Mi 1 5 5 18m
tts Deployment/tts cpu: 3%/60%, memory: 151942758400m/500Mi 1 5 5 18m
tts Deployment/tts cpu: 3%/60%, memory: 151946035200m/500Mi 1 5 5 18m
tts Deployment/tts cpu: 3%/60%, memory: 158087168/500Mi 1 5 4 18m
tts Deployment/tts cpu: 3%/60%, memory: 193366016/500Mi 1 5 2 19m
tts Deployment/tts cpu: 3%/60%, memory: 193372160/500Mi 1 5 2 20m
tts Deployment/tts cpu: 4%/60%, memory: 193372160/500Mi 1 5 2 20m
tts Deployment/tts cpu: 4%/60%, memory: 193396736/500Mi 1 5 2 21m
tts Deployment/tts cpu: 3%/60%, memory: 193396736/500Mi 1 5 2 21m
tts Deployment/tts cpu: 4%/60%, memory: 193396736/500Mi 1 5 2 22m
tts Deployment/tts cpu: 3%/60%, memory: 193396736/500Mi 1 5 2 22m
tts Deployment/tts cpu: 4%/60%, memory: 193396736/500Mi 1 5 2 23m
tts Deployment/tts cpu: 4%/60%, memory: 193398784/500Mi 1 5 2 23m
tts Deployment/tts cpu: 3%/60%, memory: 193398784/500Mi 1 5 2 23m
tts Deployment/tts cpu: 4%/60%, memory: 257855488/500Mi 1 5 1 24m
tts Deployment/tts cpu: 3%/60%, memory: 257855488/500Mi 1 5 1 25m
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 37m horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 37m horizontal-pod-autoscaler New size: 5; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 22m horizontal-pod-autoscaler New size: 4; reason: All metrics below target
Normal SuccessfulRescale 22m horizontal-pod-autoscaler New size: 2; reason: All metrics below target
Normal SuccessfulRescale 17m horizontal-pod-autoscaler New size: 1; reason: All metrics below target
Initially we had one replicas of our TTS microservice, then after load increase HPA controller try to match the target value by increasing replicas count. After the load was down, HPA controller takes 5 minutes to scale in from 5 replicas to 2 replicas then finaly to our MinReplicas. This period called Stabilization Window, it's used to stabilize replicas count when the metric keeps fluctuating.
Summary
We showed how we can deploy our Spring Boot microservices with Kubernetes and expose them for external and internal communication. In addition, we showcased how Kubernetes native autoscaling feature can help us efficiently use our cluster resources by scaling out on traffic increase and scale in on traffic decrease.
Top comments (0)