Building Production MCP Servers: Lessons from Prometheus MCP

The Model Context Protocol (MCP) is rapidly becoming the standard for connecting AI agents to external data sources. After building and maintaining the Prometheus MCP Server — now used by hundreds of teams — here's what we've learned about building MCP servers that actually work in production.

Why MCP Matters

Most AI integrations today are brittle. They rely on custom API wrappers, hardcoded prompts, and fragile parsing logic. MCP changes this by providing a standardized protocol that any AI model can use to discover and interact with external tools and data sources.

Think of it as USB for AI — a universal interface that lets any model talk to any data source.

The Architecture That Works

After iterating through several approaches, we settled on an architecture pattern that balances simplicity with production readiness:

. **Thin transport layer** — Handle stdio or HTTP/SSE transport without coupling to business logic
. **Tool definitions as schemas** — Each tool is a well-defined JSON schema that the AI model can reason about
. **Async everything** — All data fetching is async with proper timeout handling
. **Structured error responses** — AI models need clear error messages to recover gracefully

Production Lessons

Lesson 1: Timeouts are non-negotiable. Your MCP server will be called by AI agents that have their own timeout budgets. If your Prometheus query takes 30 seconds, the agent session is dead.

Lesson 2: Schema descriptions matter more than you think. The AI model reads your tool descriptions to decide how to use them. Vague descriptions lead to misuse. Be precise about what each parameter does and what the response contains.

Lesson 3: Monitor your MCP server like any production service. Ironic for a monitoring tool, but we instrument our MCP server with the same rigor we apply to any production system. Request latency, error rates, and tool invocation patterns tell you when something is going wrong before your users notice.

What's Next

We're working on MCP servers for more enterprise data sources and building tooling to make MCP server development faster. If you're building AI agents that need to talk to your infrastructure, reach out — we've done this before.

Building Production MCP Servers: Lessons from Prometheus MCP

Why MCP Matters

The Architecture That Works

Production Lessons

What's Next

Need help building this?