I was tasked to migrate from a Red Hat-hosted 3scale portal to a self-hosted version in ROSA (Red Hat OpenShift Service on AWS). This presented quite a challenge as my knowledge in Kubernetes was mostly theoretical, based on studying for the Kubernetes and Cloud Native Associate (KCNA) certification exam.
The goal was to recreate a self-hosted version of 3scale using an operator in ROSA, but what I thought would be a straightforward deployment turned into a valuable learning experience.
What is Red Hat-Managed/hosted 3scale?
When using Red Hat-hosted 3scale (also known as "SaaS" or managed 3scale), all infrastructure complexities are abstracted away. Red Hat handles the deployment, maintenance, updates, and scaling of the platform.
As a user, you simply access a provided portal URL and focus on managing your APIs rather than worrying about the underlying infrastructure. Your daily tasks revolve around the actual API management activities like adding backends, configuring products, creating applications, setting up authentication, and managing rate limits.
It's a convenient option that requires minimal operational overhead, allowing your team to focus on API strategy rather than platform management.
What is self-hosted 3scale?
In contrast, self-hosted 3scale brings both flexibility and responsibility. You gain complete control over your deployment configuration, integration with internal systems, customization options, and data locality.
Since the infrastructure runs on Kubernetes (in my case, ROSA - Red Hat OpenShift Service on AWS), you have access to all the native Kubernetes capabilities for scaling, monitoring, and management.
However, this freedom comes with the need to manage the entire application lifecycle within the Kubernetes ecosystem: installation via operators or templates, configuration through custom resources, scaling via horizontal pod autoscalers, implementing backup strategies, and handling upgrades.
You're responsible for ensuring high availability with proper pod distribution, performance tuning through resource allocation, and troubleshooting any issues that arise in both the 3scale application components and the underlying Kubernetes resources.
The Migration
Migrating from managed to self-hosted represented a significant shift in responsibilities, and I was about to discover just how much Red Hat had been handling behind the scenes.
This blog post documents a real-world troubleshooting journey that encountered and overcame significant challenges:
Missing Routes for Admin Access
DNS resolution issues preventing access to Red Hat's container registry
Architecture mismatch between my ARM-based MacBook for and the x86_64 docker container images required for deployment
PVC Access Mode Issues
Resource Constraints
Missing Service for App Components
By sharing this experience, I hope to help others who might encounter similar issues during their deployment process, especially those who are transitioning from theoretical Kubernetes knowledge to practical application.
The Initial Deployment Attempt
We started by creating a dedicated namespace for our 3scale deployment:
oc create namespace 3scale-backup
After switching to this namespace (oc project 3scale-backup), we downloaded the 3scale API Management Platform template:
curl -o amp.yml https://raw.githubusercontent.com/3scale/3scale-amp-openshift-templates/master/amp/amp.yml
Then we tried to deploy 3scale using this template:
oc new-app --file=amp.yml \
--param WILDCARD_DOMAIN=apps.[domain of your openshift].openshiftapps.com \
--param ADMIN_PASSWORD=password123
The template processing appeared successful, creating numerous resources:
- Imagestreams
- Deployment configs
- Services
- Routes
- Persistent volume claims
- Secrets
oc get all -n [your namespace]
However, when checking the status of the pods, we noticed that many deployments were either not starting, with errors, crashLoopBackOff or stuck in initialization phases:
oc get pods
While some components like Redis and database pods were running fine, critical components like backend-listener, backend-worker, and backend-cron were not deploying at all.
The system components were also failing during initialization.
Challenge 1: Missing Routes for Admin Access
Our first challenge was that the URLs for accessing the admin portal https://3scale-admin.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com
were showing "Application is not available".
The reason was simple - the template had not created the necessary routes for our self-hosted 3scale services. We manually created them:
oc create route edge system-admin --service=system-provider --hostname=3scale-admin.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com
oc create route edge system-developer --service=system-developer --hostname=3scale.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com
oc create route edge system-master --service=system-master --hostname=master.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com
However, after creating the routes, the admin portal still wasn't accessible. Digging into the logs with oc logs system-app-1-hook-pre
, we discovered a more fundamental issue.
Challenge 2: DNS Resolution Issues
The pre-deployment hook was failing with a specific error:
ThreeScale::Core::APIClient::ConnectionError: connection refused: backend-listener:80
Further investigation revealed that the backend components weren't deployed at all. When checking the deployment configs:
oc get dc/backend-listener
NAME REVISION DESIRED CURRENT TRIGGERED BY
backend-listener 0 1 0 config,image(amp-backend:2.12)
We saw that backend-listener, backend-worker, and backend-cron had REVISION 0 and CURRENT 0, indicating they hadn't been deployed.
The root cause was found in the imagestream:
oc describe imagestream amp-backend
This showed an error:
error: Import failed (InternalError): Internal error occurred: registry.redhat.com/3scale-amp2/backend-rhel8:3scale2.12: Get "https://registry.redhat.com/v2/": dial tcp: lookup registry.redhat.com on 100.10.0.11:23: no such host
Our OpenShift cluster couldn't resolve the hostname registry.redhat.com due to DNS issues. This was confirmed by attempting to run:
nslookup registry.redhat.com
Which returned "No answer" from the DNS server.
Or technically it does not pull the image required from the RedHat registry to the pods.
So our workaround was to manually pull the image to the docker (locally), then push that image to the namespace's private RedHat registry.
Challenge 3: Architecture Mismatch
While working to address the DNS issues , we discovered another challenge - we were trying to pull Red Hat's container images on an ARM64-based machine (likely an Apple Silicon Mac), but the images were only available for x86_64 architecture.
When attempting to pull the images directly:
docker login [RedHat Credentials]
docker pull registry.redhat.io/3scale-amp2/backend-rhel8:3scale2.12
We received:
no matching manifest for linux/arm64/v8 in the manifest list entries
The Solution Process
We implemented a multi-step solution to overcome these challenges:
Step 1: Authentication with Red Hat Registry
First, we logged in to the Red Hat Container Registry:
docker login registry.redhat.io
Step 2: Architecture-aware Image Pulling
Because I was using macOS and having the docker desktop installed in it, the pulled image does not match with the operating system's arch.
To overcome the architecture mismatch, we explicitly specified the platform when pulling:
docker pull --platform linux/amd64 registry.redhat.io/3scale-amp2/backend-rhel8:3scale2.12
This successfully pulled the image by using Rosetta 2 emulation on macOS.
Step 3: Exposing the OpenShift Registry
To make our OpenShift registry accessible:
oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":true}}' --type=merge
Step 4: Pushing Images to Internal Registry
We pushed the pulled images to our OpenShift internal registry:
# Get credentials
TOKEN=$(oc whoami -t)
REGISTRY=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}')
# Login to registry
docker login -u kubeadmin -p $TOKEN $REGISTRY
# Tag and push
docker tag registry.redhat.io/3scale-amp2/backend-rhel8:3scale2.12 $REGISTRY/[namespace]/amp-backend:2.12
docker push $REGISTRY/[namespace]/amp-backend:2.12
Step 5: Updating ImageStreams
We updated the imagestream to point to our locally pushed image:
oc tag $REGISTRY/[namespace]/amp-backend:2.12 amp-backend:2.12 --source=docker
This automatically triggered the deployment due to the ImageChange
trigger on the deployment config.
Results
After implementing these steps for the backend-listener component, the deployment began successfully (at least for this resource!).
Challenge 4: PVC Access Mode Issues
So we went to the 3Scale self-hosted Admin Portal and checked that still it's not working.
We checked the pods and found out that some of them are having issues.
The error logs show that several deployments are failing because their pods are taking too long to become available (timeout errors):
-
apicast-production-1-deploy
: "pods took longer than 1800 seconds to become available" -
system-sidekiq-1-deploy
: "pods took longer than 1200 seconds to become available" -
system-sphinx-1-deploy
: "pods took longer than 1200 seconds to become available"
This typically happens when pods are stuck in a pending or initializing state for too long.
So we checked the logs of the problematic pods and checked the PVC as well.
# Check logs for apicast-production deployment
oc logs apicast-production-1-deploy
# Check logs for system-sidekiq deployment
oc logs system-sidekiq-1-deploy
# Check logs for system-sphinx deployment
oc logs system-sphinx-1-deploy
# Check events for the pending pod
oc describe pod system-app-1-hook-pre
We discovered a storage issue where the system-storage PVC was failing to provision:
oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
backend-redis-storage Bound pvc-ss987s5b-026a-4srg-au97-549d8958933a 1Gi RWO gp3 53m
mysql-storage Bound pvc-72s43210-s033-4c8w-ar53-043bf3kk1496 1Gi RWO gp3 53m
system-redis-storage Bound pvc-s3r1111d2-4d57-41dd-9066-c488eda666d4 1Gi RWO gp3 53m
system-storage Pending
The error was related to access modes:
failed to provision volume with StorageClass "gp3": rpc error: code = InvalidArgument desc = Volume capabilities MULTI_NODE_MULTI_WRITER not supported. Only AccessModes[ReadWriteOnce] supported.
We fixed it by creating a new PVC with the correct access mode:
# First, delete pods using this PVC
oc delete pod system-app-1-hook-pre
# Back up the current PVC definition
oc get pvc system-storage -o yaml > system-storage-pvc.yaml
# Delete the stuck PVC
oc delete pvc system-storage
# Create a new PVC with the correct settings
oc create -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: system-storage
namespace: [Namespace]
labels:
app: 3scale-api-management
threescale_component: system
threescale_component_element: app
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: gp3
EOF
After fixing the PVC issue, restart the deployments:
oc rollout retry dc/system-app
oc rollout retry dc/apicast-production
oc rollout retry dc/backend-listener
oc rollout retry dc/system-sidekiq
oc rollout retry dc/system-sphinx
The PVC issue got fixed, and the system-storage PVC is now correctly bound to a volume.
oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
backend-redis-storage Bound pvc-ss987s5b-026a-4srg-au97-549d8958933a 1Gi RWO gp3 53m
mysql-storage Bound pvc-72s43210-s033-4c8w-ar53-043bf3kk1496 1Gi RWO gp3 53m
system-redis-storage Bound pvc-s3r1111d2-4d57-41dd-9066-c488eda666d4 1Gi RWO gp3 53m
system-storage Bound pvc-2286196a-8885-490s-11c1-654320bd8a5a6 1Gi RWO gp3 116s
Challenge 5: Resource Constraints
Even after resolving the PVC issue, pods were still stuck in Pending state due to insufficient resources:
oc describe pod system-app-2-mr25z
...
Warning FailedScheduling 2m34s default-scheduler 0/9 nodes are available: 2 Insufficient cpu, 3 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 6 node(s) had volume node affinity conflict.
We reduced the resource requirements to make the pods fit on the available nodes:
oc patch dc/system-app -p '{"spec":{"template":{"spec":{"containers":[{"name":"system-master","resources":{"requests":{"cpu":"25m","memory":"400Mi"}}},{"name":"system-provider","resources":{"requests":{"cpu":"25m","memory":"400Mi"}}},{"name":"system-developer","resources":{"requests":{"cpu":"25m","memory":"400Mi"}}}]}}}}'
oc patch dc/apicast-production -p '{"spec":{"template":{"spec":{"containers":[{"name":"apicast-production","resources":{"requests":{"cpu":"25m","memory":"128Mi"}}}]}}}}'
oc patch dc/system-sidekiq -p '{"spec":{"template":{"spec":{"containers":[{"name":"system-sidekiq","resources":{"requests":{"cpu":"25m","memory":"250Mi"}}}]}}}}'
oc patch dc/system-sphinx -p '{"spec":{"template":{"spec":{"containers":[{"name":"system-sphinx","resources":{"requests":{"cpu":"25m","memory":"250Mi"}}}]}}}}'
After applying these patches, we restarted the failed components:
# Retry system-sidekiq and system-sphinx deployments
oc rollout retry dc/system-sidekiq
oc rollout retry dc/system-sphinx
That got fixed too!!
➜ oc get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
apicast-production ClusterIP xxx.xx.xxx.xxx <none> 8080/TCP,8090/TCP 83m
apicast-staging ClusterIP xxx.xx.xxx.xxx <none> 8080/TCP,8090/TCP 83m
backend-listener ClusterIP xxx.xx.xxx.xxx <none> 3000/TCP 83m
backend-redis ClusterIP xxx.xx.xxx.xxx <none> 6379/TCP 83m
system-developer ClusterIP xxx.xx.xxx.xxx <none> 3000/TCP 83m
system-master ClusterIP xxx.xx.xxx.xxx <none> 3000/TCP 83m
system-memcache ClusterIP xxx.xx.xxx.xxx <none> 11211/TCP 83m
system-mysql ClusterIP xxx.xx.xx.xxx <none> 3306/TCP 83m
system-provider ClusterIP xxx.xx.x.xxx <none> 3000/TCP 83m
system-redis ClusterIP xxx.xx.xxx.xxx <none> 6379/TCP 83m
system-sphinx ClusterIP xxx.xx.xxx.xxx <none> 9306/TCP 83m
zync ClusterIP xxx.xx.xxx.xx <none> 8080/TCP 83m
zync-database ClusterIP xxx.xx.xx.xxx <none> 5432/TCP 83m
➜ oc get routes
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
backend backend-3scale.apps.[YOUR-DOMAIN].openshiftapps.com backend-listener http edge/Allow None
system-admin 3scale-admin.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3000 edge/Allow None
system-developer 3scale.apps.[YOUR-DOMAIN].openshiftapps.com /developer system-app 3001 edge/Allow None
system-master master.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3002 edge/Allow None
system-provider 3scale.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3000 edge/Allow None
zync-3scale-api-hhhjs api-3scale-apicast-production.apps.[YOUR-DOMAIN].openshiftapps.com apicast-production gateway edge/Redirect None
zync-3scale-api-phh9n api-3scale-apicast-staging.apps.[YOUR-DOMAIN].p1.openshiftapps.com apicast-staging gateway edge/Redirect None
zync-3scale-master-nhhht HostAlreadyClaimed system-master http edge/Redirect None
zync-3scale-provider-q9hh9 HostAlreadyClaimed system-developer http edge/Redirect None
zync-3scale-provider-shh6z HostAlreadyClaimed system-provider http edge/Redirect None
Since the containers are starting up, it should be a matter of minutes before we can access the admin portal.
We were just pod status, and when system-app shows 3/3 ready, tried accessing the admin portal at:
https://3scale-admin.apps.[YOUR-DOMAIN].openshiftapps.com
But then, it is still UNAVAILABLE.
Challenge 6: Missing Service for App Components
Even after all pods were running, the admin portal was not accessible. The issue was that we created routes pointing to a service named "system-app" which didn't exist:
oc get routes
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
system-admin 3scale-admin.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3000 edge/Allow None
system-developer 3scale.apps.[YOUR-DOMAIN].openshiftapps.com /developer system-app 3001 edge/Allow None
system-master master.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3002 edge/Allow None
system-provider 3scale.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3000 edge/Allow None
oc describe service system-app
Error from server (NotFound): services "system-app" not found
We fixed this by creating the missing service:
bash
oc create -f - <<EOF
apiVersion: v1
kind: Service
metadata:
name: system-app
namespace: [NAMESPACE]
labels:
app: 3scale-api-management
spec:
ports:
- name: provider
port: 3000
protocol: TCP
targetPort: 3000
- name: developer
port: 3001
protocol: TCP
targetPort: 3001
- name: master
port: 3002
protocol: TCP
targetPort: 3002
selector:
deploymentConfig: system-app
type: ClusterIP
EOF
Final Result
After working through all these challenges, we finally had a fully operational 3scale deployment:
bash
oc get pods
NAME READY STATUS RESTARTS AGE
apicast-production-4-7hh00 1/1 Running 0 2m12s
apicast-staging-1-6hh00 1/1 Running 0 83m
backend-cron-2-6955a 1/1 Running 0 23m
backend-listener-1-5hh00 1/1 Running 0 26m
backend-redis-1-mhh005 1/1 Running 0 57m
backend-worker-2-lr8gb 1/1 Running 0 23m
system-app-3-7ln8g 3/3 Running 0 85s
system-memcache-1-xddig 1/1 Running 0 80m
system-mysql-1-ee4wt 1/1 Running 0 80m
system-redis-1-45hh0 1/1 Running 0 80m
zync-1-l7ghy 1/1 Running 0 80m
zync-database-1-dt3l9 1/1 Running 0 80m
zync-que-1-wwri9 1/1 Running 2 (80m ago) 80m
With all components running, finally, we were able to access the 3scale admin portal and begin configuring our APIs.
Verify Deployment
# Check all pods are running
oc get pods
# Expected output should show all pods in Running or Completed state:
# - system-app-X-XXXXX (3/3 Running)
# - apicast-production-X-XXXXX (1/1 Running)
# - apicast-staging-X-XXXXX (1/1 Running)
# - system-sidekiq-X-XXXXX (1/1 Running)
# - system-sphinx-X-XXXXX (1/1 Running)
# - backend-* pods (1/1 Running)
# - system-mysql-X-XXXXX (1/1 Running)
# - system-redis-X-XXXXX (1/1 Running)
# - zync-* pods (1/1 Running)
Key Lessons Learned
DNS Resolution is Critical: Ensure your OpenShift cluster can resolve external registry hostnames before attempting deployments that rely on them.
Architecture Awareness: When working with enterprise container images on ARM-based development machines, be explicit about architecture requirements using the --platform flag.
Manual Image Mirroring: In restricted environments, manually pulling and pushing images to an internal registry is a viable workaround.
ImageStream Mechanics: Understanding how OpenShift's ImageStreams work is essential for troubleshooting deployment issues.
Network Policies: In enterprise environments, network policies may restrict access to external registries, requiring coordination with network administrators.
Common Issues and Solutions
- Pod stuck in Pending: Usually resource constraints - reduce resource requests
- PVC mounting issues: Check storage class and access modes
- Routes not working: Ensure services exist and selectors match pods
- Application not available: Create missing system-app service
- Database connection issues: Check that system-mysql pod is running and accessible
Final Verification Checklist
- All pods are in Running state (except completed deployment/hook pods)
- Routes are accessible and don't show "Application not available"
- Can log into Admin Portal successfully
- Can access Developer Portal
- Master Portal is accessible (if needed)
- Default passwords have been changed
Once we successfully access the 3scale portals, we can begin migrating the existing 3scale components from another environment or start adding the new components in 3scale.
A. Backend APIs
- API definitions and configurations
- Authentication settings
- Rate limiting rules
B. Products (API Products)
- Product configurations
- Application plans
- Pricing rules
- Methods and metrics
C. Applications
- Application keys and secrets
- Application plans assignments
- Usage statistics (if needed)
D. Accounts and Users
- Developer accounts
- Admin users
- Access permissions
E. Policies
- Custom policies
- Policy chains
- Configuration settings
F. Developer Portal
- Custom pages and templates
- Documentation
- CMS content
Conclusion
Deploying complex solutions like 3scale API Management in restricted network environments or across architecture boundaries presents unique challenges. By understanding the underlying issues and implementing a systematic approach to manually mirror images, we were able to overcome these obstacles.
While this process requires more manual effort than a standard deployment, it demonstrates the flexibility of OpenShift's container management capabilities and provides a path forward for deployments in environments with similar restrictions.
Top comments (0)