Scaling Instances: RunMCP, OpenAPI, API-first
Scaling RunMCP is essential for production workloads. Whether youβre serving thousands of users or running mission-critical APIs, follow these patterns for reliability and performance.
Scaling Strategiesβ
- Horizontal scaling: Add more gateway instances behind a load balancer
- Stateless design: Keep gateway state external (e.g., config, sessions)
- Health checks: Use readiness and liveness probes in orchestrators
- Autoscaling: Use Kubernetes HPA or cloud-native scaling solutions
tip
Monitor gateway CPU, memory, and response times. Scale out before bottlenecks impact users.
Example: Load Balancer Setupβ
load_balancer:
type: round_robin
backends:
- runmcp-1.internal
- runmcp-2.internal
- runmcp-3.internal
Example: Kubernetes Horizontal Pod Autoscalerβ
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: runmcp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: runmcp-gateway
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
caution
Test scaling and failover in staging environments before going live.
Best Practicesβ
- Use stateless gateway design for easy scaling
- Automate scaling and failover with orchestrators
- Monitor and alert on key metrics