Scaling Instances: RunMCP, OpenAPI, API-first

Scaling RunMCP is essential for production workloads. Whether you’re serving thousands of users or running mission-critical APIs, follow these patterns for reliability and performance.

Scaling Strategies

Horizontal scaling: Add more gateway instances behind a load balancer
Stateless design: Keep gateway state external (e.g., config, sessions)
Health checks: Use readiness and liveness probes in orchestrators
Autoscaling: Use Kubernetes HPA or cloud-native scaling solutions

tip

Monitor gateway CPU, memory, and response times. Scale out before bottlenecks impact users.

Example: Load Balancer Setup

load_balancer:
  type: round_robin
  backends:
    - runmcp-1.internal
    - runmcp-2.internal
    - runmcp-3.internal

Example: Kubernetes Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: runmcp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: runmcp-gateway
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

caution

Test scaling and failover in staging environments before going live.

Best Practices

Use stateless gateway design for easy scaling
Automate scaling and failover with orchestrators
Monitor and alert on key metrics

chatMCP

Overview

API Reference

HAPI Server

Overview

API Reference

runMCP

Overview

API Reference

Scaling Instances: RunMCP, OpenAPI, API-first

Scaling Strategies

Example: Load Balancer Setup

Example: Kubernetes Horizontal Pod Autoscaler

Best Practices

Further Reading

Overview

API Reference

Overview

API Reference

Overview

API Reference

Scaling Strategies​

Example: Load Balancer Setup​

Example: Kubernetes Horizontal Pod Autoscaler​

Best Practices​

Further Reading​

Scaling Strategies

Example: Load Balancer Setup

Example: Kubernetes Horizontal Pod Autoscaler

Best Practices

Further Reading