In our previous blog, we explored how Kubernetes volumes help preserve data across container restarts. We worked with emptyDir
, which retains data as long as the pod is running but loses it when the pod is deleted. Then, we improved this setup using hostPath
, which allows a container to persist data at a specified directory on the node.
This worked seamlessly in Minikube because it runs on a single-node cluster. But what happens in multi-node cloud environments? The same solution will fail because Kubernetes dynamically schedules pods across multiple nodes, meaning the data stored in hostPath
on one node won’t be available to a pod running on another node.
To solve this, we need Persistent Volumes (PVs) and CSI (Container Storage Interface).
Introducing Persistent Volumes
A Persistent Volume (PV) is independent of individual pods and nodes, providing a stable and reusable storage location across the cluster.
To configure it, we create a new file host-pv.yaml
:
apiVersion: v1
kind: PersistentVolume
metadata:
name: host-pv
spec:
capacity:
storage: 1Gi
volumeMode: Block
accessModes:
- ReadWriteOnce
hostPath:
path: /data
type: DirectoryOrCreate
Breaking Down the Configuration
apiVersion: v1
&kind: PersistentVolume
Defines the resource type as a Persistent Volume.metadata.name: host-pv
Assigns a unique name to the PV, making it identifiable when claimed.Storage Capacity (
capacity.storage: 1Gi
)
Specifies the storage capacity, set to 1GiB in this example.Volume Mode (
volumeMode: Block
)
Declares the volume mode.Block
is useful for low-level storage needs, but many useFilesystem
for general applications.-
Access Modes (
accessModes
)
Controls how pods can interact with the volume:-
ReadWriteOnce
: Only one pod can mount it as read-write. -
ReadOnlyMany
: Multiple pods can access it, but only in read-only mode. -
ReadWriteMany
: Multiple pods can read and write simultaneously.
-
Host Path (
hostPath
)
Maps/data
on the node as the volume’s storage location.type: DirectoryOrCreate
ensures the directory exists.
Claiming the Persistent Volume
A Persistent Volume Claim (PVC) requests a PV from Kubernetes and ensures a pod can access it. Define this in host-pvc.yaml
:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: host-pvc
spec:
volumeName: host-pv
accessModes:
- ReadWriteOnce
storageClassName: standard
resources:
requests:
storage: 1Gi
Explanation
volumeName: host-pv
Directly references our previously created PV.Access Modes (
accessModes
)
Specifies access rights, ensuring only one pod can write at a time.Storage Class (
storageClassName: standard
)
Defines the underlying storage provisioner. If using cloud services like AWS or GCP, this would be different (e.g.,gp2
for AWS).Resource Requests (
resources.requests.storage: 1Gi
)
Requests 1GiB of storage from an available PV.
Integrating Persistent Volume into Kubernetes Deployment
Finally, update deployment.yaml
to use the PVC:
apiVersion: apps/v1
kind: Deployment
metadata:
name: story-deployment
spec:
replicas: 2
selector:
matchLabels:
app: story
template:
metadata:
labels:
app: story
spec:
containers:
- name: story
image: mayankcse1/kub-data-01-starting-setup-stories:1
volumeMounts:
- name: story-volume
mountPath: /app/story
volumes:
- name: story-volume
persistentVolumeClaim:
claimName: host-pvc
Key Enhancements
- We increased replicas to 2, ensuring multiple instances of our application run.
- The PVC (host-pvc) is mounted inside the container at
/app/story
, making persistent storage accessible.
Why Persistent Volumes Matter in Multi-Node Clusters
Unlike hostPath
, which binds storage to a single node, Persistent Volumes ensure data availability across multiple pods and nodes. Even if a pod fails and Kubernetes reschedules it on a new node, the data remains intact.
Summary
Moving from standard Docker Volumes to Kubernetes Volumes and finally to Persistent Volumes is essential for building scalable, cloud-ready applications.
For further exploration, refer to the official Kubernetes storage documentation.
With Persistent Volumes, your application's data survives pod failures, rescheduling, and multi-node deployments, making it reliable for production environments. Now you’re ready to manage stateful applications at scale.
Top comments (0)