From Demo to Production: Building an MCP Control Plane Inside Your Agent Architecture

Most AI agent demos are impressive for about five minutes.

Then someone asks: Who authorized that tool call? What data did it access? Can we audit that?

The answer is usually silence.

That’s the gap between a demo and a production system. And the thing that closes it isn’t a better prompt — it’s a control plane inserted between your agent and every MCP tool it can reach.

Here’s how to build it.

The Core Idea

The agent has one address: the control plane. That’s it.

It doesn’t know where MCP servers live. It doesn’t hold credentials. It can’t route around policy. The control plane is the only entity in the system with network access to the MCP layer — and the agent talks to nothing else.

Agent  →  Control Plane  →  MCP Server A (market data)
              ↑          →  MCP Server B (trade execution)
         Tool Registry   →  MCP Server C (research/docs)

The control plane is a router. The agent calls two endpoints — /tools at startup, /invoke at runtime — and has no idea how many MCP servers sit behind them. That topology is entirely the control plane’s concern.

Four pillars enforce this. All code. No prompt engineering.

1. Discovery: The Registry Layer

This is where most teams get it wrong. They hard-code tool definitions into the agent’s system prompt and call it done. The agent sees everything. That’s not access control — it’s a flat namespace with a label on it.

The correct model: the agent discovers tools through the control plane, not from MCP servers directly.

On startup, the agent calls /tools. The control plane queries a Service Catalog — a DynamoDB table or a GitOps-managed JSON file — and returns only the tool names that agent’s role is permitted to see. Crucially, the response contains names only. No server URLs. No routing information. The agent gets a menu, not a map.

That filtered list is what populates the agent’s tool definitions.

A “Research Agent” is never told that “Execute Trade” exists. It can’t attempt to call it. It can’t be jailbroken into calling it. The tool is simply absent from its world.

That’s least privilege at discovery time — before a single request is ever made.

2. Identity: Scoped, Short-Lived Credentials

Agents should never own long-lived credentials. If they do, a compromised agent is a compromised system.

The pattern: agents operate on Service Tokens or OIDC delegation. When an agent wants to invoke a tool, the control plane validates its Service Identity first. If valid, it mints a short-lived, scoped token from your IAM provider — AWS STS or HashiCorp Vault — specific to that one tool call, scoped to the specific MCP server being called.

The token expires. The blast radius of a compromised agent is bounded to a single, already-expiring credential against one server.

This is standard infrastructure security. Apply it to agents exactly the same way.

3. Policy: Hard Guardrails in Code

This is the most important pillar — and the one most teams skip.

The instinct is to put guardrails in the prompt. “Never move funds in production.” That’s not a control. That’s a suggestion. LLMs hallucinate. They get jailbroken. Prompts drift.

Move the logic into code. Use Open Policy Agent (OPA) and write Rego policies that the control plane evaluates before any request touches an MCP server:

allow = true {
  input.action == "read_positions"
  input.agent_role == "analyst"
}

allow = false {
  input.action == "move_funds"
  input.environment == "production"
}

If the policy evaluates to false, the request is dropped. Not logged. Not flagged for review. Dropped.

The LLM doesn’t get a vote on this. It never reaches the tool.

4. Observability: The Audit Wrapper

Every interaction gets wrapped in a logging decorator. Structured JSON. Four fields that matter:

Who — Agent ID
What — MCP tool invoked
Where — Data source targeted
Was — Result or failure status

Ship those logs to Kafka or an ELK stack. More importantly, propagate a trace_id from the initial user prompt all the way through the control plane to the MCP server and back. Every hop in the chain carries the same ID.

When a compliance audit lands on your desk, that trace ID lets you reconstruct the entire chain of thought and data access path for any agent, any session, any point in time.

That’s not a nice-to-have. In regulated environments, it’s the requirement.

What It Looks Like in Code

Here’s a minimal Python control plane. Two endpoints: one for discovery, one for invocation. The agent calls both. It calls nothing else.

The tool registry maps each role’s actions directly to the MCP server that handles them. The agent never sees this mapping — it only sees the tool names:

import uuid
import logging
import requests
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

app = FastAPI()
logger = logging.getLogger("control_plane")
logging.basicConfig(level=logging.INFO)

# --- Tool registry: maps role → action → MCP server URL ---
# The agent sees tool names only. Server URLs stay here.

TOOL_REGISTRY = {
    "analyst": {
        "read_positions": "http://mcp-marketdata:8001",
        "run_report":     "http://mcp-research:8003",
    },
    "trader": {
        "read_positions": "http://mcp-marketdata:8001",
        "execute_trade":  "http://mcp-execution:8002",
    },
}

POLICY_RULES = {
    ("execute_trade", "production"): False,  # hard block
}

def get_scoped_token(agent_id: str, action: str, mcp_url: str) -> str:
    # In production: call Vault or AWS STS, scoped to this server
    return f"tok_{agent_id}_{action}_{uuid.uuid4().hex[:8]}"

# --- Request models ---

class DiscoveryRequest(BaseModel):
    agent_id:   str
    agent_role: str

class ToolRequest(BaseModel):
    agent_id:    str
    agent_role:  str
    action:      str
    environment: str
    payload:     dict = {}

# --- 1. Discovery endpoint ---
# Returns tool names only. No URLs. No routing info.
# The agent builds its tool definitions from this list.

@app.post("/tools")
def get_tools(req: DiscoveryRequest):
    role_tools = TOOL_REGISTRY.get(req.agent_role, {})
    tool_names = list(role_tools.keys())
    logger.info({"agent_id": req.agent_id, "role": req.agent_role,
                 "tools_returned": tool_names})
    return {"tools": tool_names}

# --- 2. Invocation endpoint ---
# Resolves the MCP server, enforces policy, mints a token,
# and calls the MCP server. The agent has no direct route to any of this.

@app.post("/invoke")
def invoke_tool(req: ToolRequest):
    trace_id = uuid.uuid4().hex

    role_tools = TOOL_REGISTRY.get(req.agent_role, {})

    # Discovery check — is this tool in this agent's allowed list?
    if req.action not in role_tools:
        logger.info({"trace_id": trace_id, "agent_id": req.agent_id,
                     "action": req.action, "result": "denied_discovery"})
        raise HTTPException(status_code=403, detail="Tool not available for this role")

    # Policy check — hard rules, no exceptions
    if POLICY_RULES.get((req.action, req.environment)) is False:
        logger.info({"trace_id": trace_id, "agent_id": req.agent_id,
                     "action": req.action, "result": "denied_policy"})
        raise HTTPException(status_code=403, detail="Action blocked by policy")

    # Resolve the MCP server for this action
    mcp_url = role_tools[req.action]

    # Mint a short-lived token scoped to this server and action
    token = get_scoped_token(req.agent_id, req.action, mcp_url)

    # Control plane calls the MCP server — not the agent
    response = requests.post(
        f"{mcp_url}/invoke",
        json={"action": req.action, "payload": req.payload},
        headers={"Authorization": f"Bearer {token}",
                 "X-Trace-Id": trace_id}
    )
    result = response.json()

    # Structured audit log — includes which MCP server was called
    logger.info({"trace_id": trace_id, "agent_id": req.agent_id,
                 "action": req.action, "mcp_server": mcp_url,
                 "environment": req.environment,
                 "result": "allowed", "status": result["status"]})

    return {"trace_id": trace_id, "result": result}

The agent’s startup sequence is two lines:

# Agent startup — discover available tools through the control plane
response = requests.post("http://control-plane/tools",
                         json={"agent_id": "agent-42", "agent_role": "analyst"})
available_tools = response.json()["tools"]
# → ["read_positions", "run_report"]
# No URLs. No server topology. Just the tool names it's allowed to use.

At runtime, every tool call goes to /invoke. The control plane resolves the server, mints the token, and makes the call. The agent is completely isolated from the MCP layer.

What the Router Gives You

Routing through the control plane isn’t just indirection for its own sake. It unlocks capabilities that a direct agent-to-MCP connection can never have:

Fan-out: A single action can call multiple MCP servers and aggregate results — useful for cross-system reads where the agent shouldn’t care about the seams
Versioning: Route read_positions_v2 to a new MCP server without the agent knowing anything changed. Zero agent-side deployment
Circuit breaking: If mcp-execution is down, the control plane fails fast and returns a clean error rather than hanging the agent mid-task
Per-server token scoping: Each MCP server gets a credential scoped specifically to it — not a shared token that opens the entire layer

The Stack

If you’re building this today, the control plane looks like this:

Language: Python or Go
Protocol: FastMCP or the official MCP SDK
Policy engine: Open Policy Agent
Cache: Redis for tool schemas and server topology
Audit store: PostgreSQL for the trail

None of this is exotic. It’s the same infrastructure patterns that have governed data pipelines and API gateways for years. The only thing new is the surface being controlled.

The full working source code for this control plane is available at github.com/datris/sample-code.

The Takeaway

The LLM is not the system. The LLM is a component of the system.

An MCP control plane is what enforces that distinction. The agent discovers tools through it, invokes tools through it, and never holds a credential or a server address. Add as many MCP servers as your architecture demands — the agent doesn’t care, because it was never exposed to the topology in the first place.

Discovery, identity, policy, and observability — implemented as code, not instructions — is what takes an agent from a demo that works most of the time to infrastructure you can actually run in production.

Build the control plane. Then let the agents loose.