Clariza Look

Posted on May 16 • Edited on May 27

Migrating to Self-Hosted 3scale API Management on ROSA Kubernetes

#kubernetes #apimanagement #devops #programming

I was tasked to migrate from a Red Hat-hosted 3scale portal to a self-hosted version in ROSA (Red Hat OpenShift Service on AWS). This presented quite a challenge as my knowledge in Kubernetes was mostly theoretical, based on studying for the Kubernetes and Cloud Native Associate (KCNA) certification exam.

The goal was to recreate a self-hosted version of 3scale using an operator in ROSA, but what I thought would be a straightforward deployment turned into a valuable learning experience.

What is Red Hat-Managed/hosted 3scale?

When using Red Hat-hosted 3scale (also known as "SaaS" or managed 3scale), all infrastructure complexities are abstracted away. Red Hat handles the deployment, maintenance, updates, and scaling of the platform.

As a user, you simply access a provided portal URL and focus on managing your APIs rather than worrying about the underlying infrastructure. Your daily tasks revolve around the actual API management activities like adding backends, configuring products, creating applications, setting up authentication, and managing rate limits.

It's a convenient option that requires minimal operational overhead, allowing your team to focus on API strategy rather than platform management.

What is self-hosted 3scale?

In contrast, self-hosted 3scale brings both flexibility and responsibility. You gain complete control over your deployment configuration, integration with internal systems, customization options, and data locality.

Since the infrastructure runs on Kubernetes (in my case, ROSA - Red Hat OpenShift Service on AWS), you have access to all the native Kubernetes capabilities for scaling, monitoring, and management.

However, this freedom comes with the need to manage the entire application lifecycle within the Kubernetes ecosystem: installation via operators or templates, configuration through custom resources, scaling via horizontal pod autoscalers, implementing backup strategies, and handling upgrades.

You're responsible for ensuring high availability with proper pod distribution, performance tuning through resource allocation, and troubleshooting any issues that arise in both the 3scale application components and the underlying Kubernetes resources.

The Migration

Migrating from managed to self-hosted represented a significant shift in responsibilities, and I was about to discover just how much Red Hat had been handling behind the scenes.

This blog post documents a real-world troubleshooting journey that encountered and overcame significant challenges:

Missing Routes for Admin Access
DNS resolution issues preventing access to Red Hat's container registry
Architecture mismatch between my ARM-based MacBook for and the x86_64 docker container images required for deployment
PVC Access Mode Issues
Resource Constraints
Missing Service for App Components

By sharing this experience, I hope to help others who might encounter similar issues during their deployment process, especially those who are transitioning from theoretical Kubernetes knowledge to practical application.

The Initial Deployment Attempt

We started by creating a dedicated namespace for our 3scale deployment:

oc create namespace 3scale-backup

After switching to this namespace (oc project 3scale-backup), we downloaded the 3scale API Management Platform template:

curl -o amp.yml https://raw.githubusercontent.com/3scale/3scale-amp-openshift-templates/master/amp/amp.yml

Then we tried to deploy 3scale using this template:

oc new-app --file=amp.yml \
  --param WILDCARD_DOMAIN=apps.[domain of your openshift].openshiftapps.com \
  --param ADMIN_PASSWORD=password123

The template processing appeared successful, creating numerous resources:

Imagestreams
Deployment configs
Services
Routes
Persistent volume claims
Secrets

oc get all -n [your namespace]

However, when checking the status of the pods, we noticed that many deployments were either not starting, with errors, crashLoopBackOff or stuck in initialization phases:

oc get pods

While some components like Redis and database pods were running fine, critical components like backend-listener, backend-worker, and backend-cron were not deploying at all.

The system components were also failing during initialization.

Challenge 1: Missing Routes for Admin Access

Our first challenge was that the URLs for accessing the admin portal https://3scale-admin.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com were showing "Application is not available".

The reason was simple - the template had not created the necessary routes for our self-hosted 3scale services. We manually created them:

oc create route edge system-admin --service=system-provider --hostname=3scale-admin.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com
oc create route edge system-developer --service=system-developer --hostname=3scale.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com
oc create route edge system-master --service=system-master --hostname=master.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com

However, after creating the routes, the admin portal still wasn't accessible. Digging into the logs with oc logs system-app-1-hook-pre, we discovered a more fundamental issue.

Challenge 2: DNS Resolution Issues

The pre-deployment hook was failing with a specific error:

ThreeScale::Core::APIClient::ConnectionError: connection refused: backend-listener:80

Further investigation revealed that the backend components weren't deployed at all. When checking the deployment configs:

oc get dc/backend-listener
NAME               REVISION   DESIRED   CURRENT   TRIGGERED BY
backend-listener   0          1         0         config,image(amp-backend:2.12)

We saw that backend-listener, backend-worker, and backend-cron had REVISION 0 and CURRENT 0, indicating they hadn't been deployed.

The root cause was found in the imagestream:

oc describe imagestream amp-backend

This showed an error:

error: Import failed (InternalError): Internal error occurred: registry.redhat.com/3scale-amp2/backend-rhel8:3scale2.12: Get "https://registry.redhat.com/v2/": dial tcp: lookup registry.redhat.com on 100.10.0.11:23: no such host

Our OpenShift cluster couldn't resolve the hostname registry.redhat.com due to DNS issues. This was confirmed by attempting to run:

nslookup registry.redhat.com

Which returned "No answer" from the DNS server.

Or technically it does not pull the image required from the RedHat registry to the pods.

So our workaround was to manually pull the image to the docker (locally), then push that image to the namespace's private RedHat registry.

Challenge 3: Architecture Mismatch

While working to address the DNS issues , we discovered another challenge - we were trying to pull Red Hat's container images on an ARM64-based machine (likely an Apple Silicon Mac), but the images were only available for x86_64 architecture.

When attempting to pull the images directly:

docker login [RedHat Credentials]
docker pull registry.redhat.io/3scale-amp2/backend-rhel8:3scale2.12

We received:

no matching manifest for linux/arm64/v8 in the manifest list entries

The Solution Process

We implemented a multi-step solution to overcome these challenges:

Step 1: Authentication with Red Hat Registry

First, we logged in to the Red Hat Container Registry:

docker login registry.redhat.io

Step 2: Architecture-aware Image Pulling

Because I was using macOS and having the docker desktop installed in it, the pulled image does not match with the operating system's arch.

To overcome the architecture mismatch, we explicitly specified the platform when pulling:

docker pull --platform linux/amd64 registry.redhat.io/3scale-amp2/backend-rhel8:3scale2.12

This successfully pulled the image by using Rosetta 2 emulation on macOS.

Step 3: Exposing the OpenShift Registry

To make our OpenShift registry accessible:

oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":true}}' --type=merge

Step 4: Pushing Images to Internal Registry

We pushed the pulled images to our OpenShift internal registry:

# Get credentials
TOKEN=$(oc whoami -t)
REGISTRY=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}')

# Login to registry
docker login -u kubeadmin -p $TOKEN $REGISTRY

# Tag and push
docker tag registry.redhat.io/3scale-amp2/backend-rhel8:3scale2.12 $REGISTRY/[namespace]/amp-backend:2.12
docker push $REGISTRY/[namespace]/amp-backend:2.12

Step 5: Updating ImageStreams

We updated the imagestream to point to our locally pushed image:

oc tag $REGISTRY/[namespace]/amp-backend:2.12 amp-backend:2.12 --source=docker

This automatically triggered the deployment due to the ImageChange trigger on the deployment config.

Results

After implementing these steps for the backend-listener component, the deployment began successfully (at least for this resource!).

Challenge 4: PVC Access Mode Issues

So we went to the 3Scale self-hosted Admin Portal and checked that still it's not working.

We checked the pods and found out that some of them are having issues.

The error logs show that several deployments are failing because their pods are taking too long to become available (timeout errors):

apicast-production-1-deploy: "pods took longer than 1800 seconds to become available"
system-sidekiq-1-deploy: "pods took longer than 1200 seconds to become available"
system-sphinx-1-deploy: "pods took longer than 1200 seconds to become available"

This typically happens when pods are stuck in a pending or initializing state for too long.

So we checked the logs of the problematic pods and checked the PVC as well.

# Check logs for apicast-production deployment
oc logs apicast-production-1-deploy

# Check logs for system-sidekiq deployment
oc logs system-sidekiq-1-deploy

# Check logs for system-sphinx deployment
oc logs system-sphinx-1-deploy

# Check events for the pending pod
oc describe pod system-app-1-hook-pre

We discovered a storage issue where the system-storage PVC was failing to provision:

oc get pvc
NAME                    STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
backend-redis-storage   Bound     pvc-ss987s5b-026a-4srg-au97-549d8958933a   1Gi        RWO            gp3            53m
mysql-storage           Bound     pvc-72s43210-s033-4c8w-ar53-043bf3kk1496   1Gi        RWO            gp3            53m
system-redis-storage    Bound     pvc-s3r1111d2-4d57-41dd-9066-c488eda666d4   1Gi        RWO            gp3            53m
system-storage          Pending

The error was related to access modes:

failed to provision volume with StorageClass "gp3": rpc error: code = InvalidArgument desc = Volume capabilities MULTI_NODE_MULTI_WRITER not supported. Only AccessModes[ReadWriteOnce] supported.

We fixed it by creating a new PVC with the correct access mode:

# First, delete pods using this PVC
oc delete pod system-app-1-hook-pre

# Back up the current PVC definition
oc get pvc system-storage -o yaml > system-storage-pvc.yaml

# Delete the stuck PVC
oc delete pvc system-storage

# Create a new PVC with the correct settings
oc create -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: system-storage
  namespace: [Namespace]
  labels:
    app: 3scale-api-management
    threescale_component: system
    threescale_component_element: app
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: gp3
EOF

After fixing the PVC issue, restart the deployments:

oc rollout retry dc/system-app
oc rollout retry dc/apicast-production
oc rollout retry dc/backend-listener
oc rollout retry dc/system-sidekiq
oc rollout retry dc/system-sphinx

The PVC issue got fixed, and the system-storage PVC is now correctly bound to a volume.

oc get pvc

NAME                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
backend-redis-storage   Bound     pvc-ss987s5b-026a-4srg-au97-549d8958933a   1Gi        RWO            gp3            53m
mysql-storage           Bound     pvc-72s43210-s033-4c8w-ar53-043bf3kk1496   1Gi        RWO            gp3            53m
system-redis-storage    Bound     pvc-s3r1111d2-4d57-41dd-9066-c488eda666d4   1Gi        RWO            gp3            53m
system-storage          Bound    pvc-2286196a-8885-490s-11c1-654320bd8a5a6   1Gi        RWO            gp3            116s

Challenge 5: Resource Constraints

Even after resolving the PVC issue, pods were still stuck in Pending state due to insufficient resources:

oc describe pod system-app-2-mr25z

...
Warning  FailedScheduling  2m34s  default-scheduler  0/9 nodes are available: 2 Insufficient cpu, 3 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 6 node(s) had volume node affinity conflict.

We reduced the resource requirements to make the pods fit on the available nodes:

oc patch dc/system-app -p '{"spec":{"template":{"spec":{"containers":[{"name":"system-master","resources":{"requests":{"cpu":"25m","memory":"400Mi"}}},{"name":"system-provider","resources":{"requests":{"cpu":"25m","memory":"400Mi"}}},{"name":"system-developer","resources":{"requests":{"cpu":"25m","memory":"400Mi"}}}]}}}}'
oc patch dc/apicast-production -p '{"spec":{"template":{"spec":{"containers":[{"name":"apicast-production","resources":{"requests":{"cpu":"25m","memory":"128Mi"}}}]}}}}'
oc patch dc/system-sidekiq -p '{"spec":{"template":{"spec":{"containers":[{"name":"system-sidekiq","resources":{"requests":{"cpu":"25m","memory":"250Mi"}}}]}}}}'
oc patch dc/system-sphinx -p '{"spec":{"template":{"spec":{"containers":[{"name":"system-sphinx","resources":{"requests":{"cpu":"25m","memory":"250Mi"}}}]}}}}'

After applying these patches, we restarted the failed components:

# Retry system-sidekiq and system-sphinx deployments
oc rollout retry dc/system-sidekiq
oc rollout retry dc/system-sphinx

That got fixed too!!


➜  oc get services

NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
apicast-production   ClusterIP   xxx.xx.xxx.xxx   <none>        8080/TCP,8090/TCP   83m
apicast-staging      ClusterIP   xxx.xx.xxx.xxx    <none>        8080/TCP,8090/TCP   83m
backend-listener     ClusterIP   xxx.xx.xxx.xxx   <none>        3000/TCP            83m
backend-redis        ClusterIP   xxx.xx.xxx.xxx     <none>        6379/TCP            83m
system-developer     ClusterIP   xxx.xx.xxx.xxx    <none>        3000/TCP            83m
system-master        ClusterIP   xxx.xx.xxx.xxx     <none>        3000/TCP            83m
system-memcache      ClusterIP   xxx.xx.xxx.xxx    <none>        11211/TCP           83m
system-mysql         ClusterIP   xxx.xx.xx.xxx     <none>        3306/TCP            83m
system-provider      ClusterIP   xxx.xx.x.xxx      <none>        3000/TCP            83m
system-redis         ClusterIP   xxx.xx.xxx.xxx   <none>        6379/TCP            83m
system-sphinx        ClusterIP   xxx.xx.xxx.xxx   <none>        9306/TCP            83m
zync                 ClusterIP   xxx.xx.xxx.xx     <none>        8080/TCP            83m
zync-database        ClusterIP   xxx.xx.xx.xxx    <none>        5432/TCP            83m

➜  oc get routes

NAME                         HOST/PORT                                                             PATH         SERVICES             PORT      TERMINATION     WILDCARD
backend                      backend-3scale.apps.[YOUR-DOMAIN].openshiftapps.com                               backend-listener     http      edge/Allow      None
system-admin                 3scale-admin.apps.[YOUR-DOMAIN].openshiftapps.com                                 system-app           3000      edge/Allow      None
system-developer             3scale.apps.[YOUR-DOMAIN].openshiftapps.com                          /developer   system-app           3001      edge/Allow      None
system-master                master.apps.[YOUR-DOMAIN].openshiftapps.com                                       system-app           3002      edge/Allow      None
system-provider              3scale.apps.[YOUR-DOMAIN].openshiftapps.com                                       system-app           3000      edge/Allow      None
zync-3scale-api-hhhjs        api-3scale-apicast-production.apps.[YOUR-DOMAIN].openshiftapps.com                apicast-production   gateway   edge/Redirect   None
zync-3scale-api-phh9n        api-3scale-apicast-staging.apps.[YOUR-DOMAIN].p1.openshiftapps.com                   apicast-staging      gateway   edge/Redirect   None
zync-3scale-master-nhhht     HostAlreadyClaimed                                                                 system-master        http      edge/Redirect   None
zync-3scale-provider-q9hh9   HostAlreadyClaimed                                                                 system-developer     http      edge/Redirect   None
zync-3scale-provider-shh6z   HostAlreadyClaimed                                                                 system-provider      http      edge/Redirect   None

Since the containers are starting up, it should be a matter of minutes before we can access the admin portal.

We were just pod status, and when system-app shows 3/3 ready, tried accessing the admin portal at:

https://3scale-admin.apps.[YOUR-DOMAIN].openshiftapps.com

But then, it is still UNAVAILABLE.

Challenge 6: Missing Service for App Components

Even after all pods were running, the admin portal was not accessible. The issue was that we created routes pointing to a service named "system-app" which didn't exist:

oc get routes

NAME                         HOST/PORT                                                  PATH        SERVICES      PORT    TERMINATION  WILDCARD
system-admin                 3scale-admin.apps.[YOUR-DOMAIN].openshiftapps.com                     system-app    3000    edge/Allow   None
system-developer             3scale.apps.[YOUR-DOMAIN].openshiftapps.com              /developer   system-app    3001    edge/Allow   None
system-master                master.apps.[YOUR-DOMAIN].openshiftapps.com                           system-app    3002    edge/Allow   None
system-provider              3scale.apps.[YOUR-DOMAIN].openshiftapps.com                           system-app    3000    edge/Allow   None

oc describe service system-app

Error from server (NotFound): services "system-app" not found

We fixed this by creating the missing service:

bash
oc create -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: system-app
  namespace: [NAMESPACE]
  labels:
    app: 3scale-api-management
spec:
  ports:
  - name: provider
    port: 3000
    protocol: TCP
    targetPort: 3000
  - name: developer
    port: 3001
    protocol: TCP
    targetPort: 3001
  - name: master
    port: 3002
    protocol: TCP
    targetPort: 3002
  selector:
    deploymentConfig: system-app
  type: ClusterIP
EOF

Final Result
After working through all these challenges, we finally had a fully operational 3scale deployment:

bash
oc get pods

NAME                        READY   STATUS      RESTARTS      AGE
apicast-production-4-7hh00  1/1     Running     0             2m12s
apicast-staging-1-6hh00     1/1     Running     0             83m
backend-cron-2-6955a        1/1     Running     0             23m
backend-listener-1-5hh00    1/1     Running     0             26m
backend-redis-1-mhh005      1/1     Running     0             57m
backend-worker-2-lr8gb      1/1     Running     0             23m
system-app-3-7ln8g          3/3     Running     0             85s
system-memcache-1-xddig     1/1     Running     0             80m
system-mysql-1-ee4wt        1/1     Running     0             80m
system-redis-1-45hh0        1/1     Running     0             80m
zync-1-l7ghy                1/1     Running     0             80m
zync-database-1-dt3l9       1/1     Running     0             80m
zync-que-1-wwri9            1/1     Running     2 (80m ago)   80m

With all components running, finally, we were able to access the 3scale admin portal and begin configuring our APIs.

Verify Deployment

# Check all pods are running
oc get pods

# Expected output should show all pods in Running or Completed state:
# - system-app-X-XXXXX (3/3 Running)
# - apicast-production-X-XXXXX (1/1 Running)
# - apicast-staging-X-XXXXX (1/1 Running)
# - system-sidekiq-X-XXXXX (1/1 Running)
# - system-sphinx-X-XXXXX (1/1 Running)
# - backend-* pods (1/1 Running)
# - system-mysql-X-XXXXX (1/1 Running)
# - system-redis-X-XXXXX (1/1 Running)
# - zync-* pods (1/1 Running)

Key Lessons Learned

DNS Resolution is Critical: Ensure your OpenShift cluster can resolve external registry hostnames before attempting deployments that rely on them.
Architecture Awareness: When working with enterprise container images on ARM-based development machines, be explicit about architecture requirements using the --platform flag.
Manual Image Mirroring: In restricted environments, manually pulling and pushing images to an internal registry is a viable workaround.
ImageStream Mechanics: Understanding how OpenShift's ImageStreams work is essential for troubleshooting deployment issues.
Network Policies: In enterprise environments, network policies may restrict access to external registries, requiring coordination with network administrators.

Common Issues and Solutions

Pod stuck in Pending: Usually resource constraints - reduce resource requests
PVC mounting issues: Check storage class and access modes
Routes not working: Ensure services exist and selectors match pods
Application not available: Create missing system-app service
Database connection issues: Check that system-mysql pod is running and accessible

Final Verification Checklist

All pods are in Running state (except completed deployment/hook pods)
Routes are accessible and don't show "Application not available"
Can log into Admin Portal successfully
Can access Developer Portal
Master Portal is accessible (if needed)
Default passwords have been changed

Once we successfully access the 3scale portals, we can begin migrating the existing 3scale components from another environment or start adding the new components in 3scale.

A. Backend APIs

API definitions and configurations
Authentication settings
Rate limiting rules

B. Products (API Products)

Product configurations
Application plans
Pricing rules
Methods and metrics

C. Applications

Application keys and secrets
Application plans assignments
Usage statistics (if needed)

D. Accounts and Users

Developer accounts
Admin users
Access permissions

E. Policies

Custom policies
Policy chains
Configuration settings

F. Developer Portal

Custom pages and templates
Documentation
CMS content

Conclusion

Deploying complex solutions like 3scale API Management in restricted network environments or across architecture boundaries presents unique challenges. By understanding the underlying issues and implementing a systematic approach to manually mirror images, we were able to overcome these obstacles.

While this process requires more manual effort than a standard deployment, it demonstrates the flexibility of OpenShift's container management capabilities and provides a path forward for deployments in environments with similar restrictions.

DEV Community