Introduction
At Bacancy, we were stuck with infrastructure challenges that come with scaling. This involves taking care of the consistency, automation, and security across every layer, from containerized workloads to the operating system beneath them. While Kubernetes has helped handle most of the orchestration for our clients, it does not include built-in tools to manage operating system-level configurations on cluster nodes.
OpenShift, which is built on Kubernetes, fills this gap with the MachineConfigPool (MCP) Server. The MCP Server enables declarative, scalable management of node configurations within OpenShift clusters. This article focuses on how we use the OpenShift MCP Server to simplify node management in production.
What Is the OpenShift MCP Server?
MCP stands for MachineConfigPool. In OpenShift, the MCP Server is responsible for managing the configuration of nodes belonging to a specific pool. It monitors changes in MachineConfig objects and ensures that the corresponding nodes are updated accordingly.
In simple terms, it lets you manage your nodes declaratively, just like Kubernetes lets you manage your applications. Instead of having to log into each node to change system files, update SSH keys, or modify kernel arguments, you just need to define the configuration once, and the OpenShift MCP Server will make sure that every targeted node stays aligned with it.
The MCP Server includes these key components:
- MachineConfig: Defines the desired state for a node, such as file contents, installed packages, kernel settings, and more.
- MachineConfigPool: A set of nodes that share the same configuration. Nodes are grouped using labels.
- MachineConfigDaemon (MCD): Runs on each node and applies the MachineConfig to that node.
- MachineConfigOperator (MCO): Orchestrates the entire process. It watches for changes, coordinates rollouts, and ensures consistency.
Together, these components provide a full control system for managing node configurations declaratively, securely, and at scale.
Why We Needed MCP at our Organization?
Before adopting the OpenShift MCP Server, we managed node-level changes using a mix of configuration management tools, shell scripts, and some custom automation. It worked in smaller environments, but once our platform grew, the problems started to pile up.
We faced:
- Inconsistent system packages across worker nodes.
- SSH key mismatches after team updates.
- Configuration drift after patching or manual intervention.
- Difficulty tracking what version of a config was running on which node.
We needed a system that gave us version control, visibility, safety during rollouts, and reduced the operational burden. The MCP Server provided all of that and integrated well into our OpenShift-based platform.
How We Use the OpenShift MCP Server in Production
Here’s a step-by-step direction that defines how we use the MCP Server by OpenShift in our production environments:
1. Define and Version Base Configurations
Our team starts by writing a MachineConfig
that defines the exact configuration we want on each node type. This can include turning off unused kernel modules, updating sysctl settings, configuring timeouts, or installing specific tools.
Here is a simple example:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
name: custom-worker-config
labels:
machineconfiguration.openshift.io/role: worker
spec:
config:
ignition:
version: 3.2.0
storage:
files:
- path: /etc/sysctl.d/99-custom.conf
contents:
source: data:text/plain;charset=utf-8;base64,Y29uZmlnIGNvbnRlbnRzIGhlcmU=
We save all our configuration files in Git, so we always know which version is running in which environment. If anything breaks, we can easily revert back to an earlier version without any guesswork.
2. Group Nodes into Separate Pools
To reduce risk, we do not apply configurations to all nodes at once. We segment our clusters into smaller groups using custom labels. For example:
-
worker-canary
: A single non-critical node -
worker-core
: Production-facing nodes -
worker-batch
: Background job runners
Each pool receives the same configuration, but we roll it out in phases. Canary nodes receive updates first. If the configuration behaves as expected, we extend the rollout to the other pools.
3. Automate Monitoring and Rollout Observability
We use the MCP’s native status reporting to monitor updates. Using oc get mcp
, we can see the rollout status, pause it if needed, or identify nodes that are stuck or unhealthy.
We also set up Prometheus alerts tied to the MCP status. If any pool fails to complete a rollout or reports a degraded state, we get notified immediately and halt further changes.
This observability allows us to push configuration changes without any worry, knowing that issues will be caught early and contained.
4. Recover and Roll Back Quickly
When something goes wrong, rolling back to the last stable version is easy. We simply apply the previous version of the MachineConfig
. The MCP Server identifies the delta and starts rolling nodes back to the earlier configuration. Since we manage everything in Git, this is just a matter of restoring the previous commit and reapplying it.
We have had situations where a small kernel flag change resulted in performance degradation. With the help of the OpenShift MCP Server, the rollback was quick and automated. There was no need to log in to each machine or run emergency patch scripts.
Top 5 Best Practices for OpenShift MCP Server
with experience and implementation, we have established a few internal best practices that make MCP usage safer and more effective:
1. Keep Configs Minimal and Purposeful
Do not overload the MachineConfig
with too many responsibilities. Limit it to low-level configurations that belong on the OS level. Use Kubernetes-native tools like ConfigMaps, Secrets, and DaemonSets for application-specific changes.
2. Use Node Labels to Control Scope
Apply fine-grained labels on your nodes and use them to control which pools receive which configurations. This avoids unintentional rollouts and gives better control over changes across environments.
3. Always Test in a Canary Pool First
Even when changes appear small, always test on non-critical nodes first. Kernel-level changes, security patches, or file permissions can have unexpected side effects.
4. Document and Version Every Change
Store every MachineConfig
in Git with proper documentation. Make sure the changes pass through pull requests, approvals, and CI checks, just as you would do for application code. Infrastructure-level changes deserve the same effort.
When Should you not use the MCP Server?
While the OpenShift MCP Server is powerful, it is not a solution for everything. Avoid using it for temporary overrides, frequent application-level changes, or tasks better suited for Kubernetes workloads.
If your use case involves dynamic updates, runtime configuration, or secret management, look to Kubernetes-native mechanisms. MCP is best used for enforcing consistent, secure, and stable system-level configurations across the cluster.
Final Thoughts
At Bacancy, the OpenShift MCP Server has helped us manage node configurations in a simple, automated, and reliable way. It helped remove manual steps, reduced errors, and gave us full control over how our OpenShift clusters run at the system level.
If you are planning to scale your infrastructure or want to simplify node management, MCP is worth your attention. And if you need more guidance and support, our DevOps consulting services can help. Bacancy offers hands-on expertise to design, implement, and optimize your Kubernetes and OpenShift platforms for long-term success.
Top comments (0)