Skip to main content

Scaling Instances: RunMCP, OpenAPI, API-first

Scaling RunMCP is essential for production workloads. Whether you’re serving thousands of users or running mission-critical APIs, follow these patterns for reliability and performance.

Scaling Strategies​

  • Horizontal scaling: Add more gateway instances behind a load balancer
  • Stateless design: Keep gateway state external (e.g., config, sessions)
  • Health checks: Use readiness and liveness probes in orchestrators
  • Autoscaling: Use Kubernetes HPA or cloud-native scaling solutions
tip

Monitor gateway CPU, memory, and response times. Scale out before bottlenecks impact users.

Example: Load Balancer Setup​

load_balancer:
type: round_robin
backends:
- runmcp-1.internal
- runmcp-2.internal
- runmcp-3.internal

Example: Kubernetes Horizontal Pod Autoscaler​

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: runmcp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: runmcp-gateway
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
caution

Test scaling and failover in staging environments before going live.

Best Practices​

  • Use stateless gateway design for easy scaling
  • Automate scaling and failover with orchestrators
  • Monitor and alert on key metrics

Further Reading​