The MCP observability stack is growing in the wrong direction. Sentry launched MCP server monitoring. Grafana built MCP observability dashboards. Datadog, Rollbar, Prometheus - they all added MCP integrations in early 2026. Every one of them monitors the AI agent from the outside. None of them give the agent a log store it can use.

LogPiper is a different kind of MCP tool. It's not an observability dashboard. It's not a metrics collector. It's a logging service - two HTTP endpoints, a query engine, and an SSE stream - that the AI agent writes to and reads from directly. The agent is both producer and consumer of the data. This paper describes why the "logging service as MCP tool" pattern works differently from "monitoring for MCP," and how LogPiper's architecture makes it possible.

Outside-In vs. Inside-Out

The distinction matters because it determines who benefits from the data.

Outside-in tools (Sentry, Datadog, Grafana, OpenTelemetry): These instrument the MCP server or transport layer. They track tool call counts, latency percentiles, error rates, token throughput. The data flows into dashboards that ops teams read. The AI agent never sees it. An MCP server monitored by Sentry knows its p99 latency. The AI agent calling that server doesn't.

Inside-out tools (LogPiper): The AI agent writes structured log entries. The AI agent queries them. The data lives in a store the agent can access through the same HTTP protocol it already uses. Not a monitoring dashboard for humans. Infrastructure for agents.

This isn't a criticism of outside-in tools. They're solving a real problem. Production MCP deployments need monitoring. You need to know when your tool server is dropping requests, when latency spikes, when error rates climb. Sentry does this well. Datadog does this well.

LogPiper solves a different problem: giving an AI coding agent structured runtime data during development. The agent instruments code with log POSTs, reproduces the failure, queries the structured errors, reads the exact HTTP bodies, and fixes the code. That's a feedback loop. Outside-in monitoring gives you a graph. Inside-out logging gives the agent a conversation with the runtime.

The Architecture

LogPiper is two HTTP endpoints plus a query engine. No SDK, no client library, no protocol beyond HTTP and JSON.

Ingestion

POST /log accepts a single JSON entry and returns {"ok": true}. POST /logs accepts a batch array and returns {"ok": true, "count": N}. Both endpoints are unauthenticated. Any process on localhost can write. The design is fire-and-forget: a 2-second timeout, no retries, and a bare exception catch on the client side. Logging never blocks your application.

curl -X POST http://127.0.0.1:9998/log \
  -H "Content-Type: application/json" \
  -d '{
    "source": "my-app",
    "level": "error",
    "event": "api.timeout",
    "message": "Model inference exceeded 30s limit",
    "data": {"model": "llama-3.2-3b", "durationMs": 31200},
    "correlationId": "job_abc123"
  }'

Query

GET /logs returns a JSON array sorted by timestamp. Filters narrow the results:

  • level - minimum severity (debug, info, warn, error)
  • source - origin app or script name
  • event - prefix match (event=http matches http.request, http.response, http.error)
  • correlationId - groups related entries across processes
  • since / until - ISO 8601 time range
  • limit - max entries to return
# Recent errors from any source
curl "http://127.0.0.1:9998/logs?level=error&limit=20"

# All HTTP traffic for one workflow execution
curl "http://127.0.0.1:9998/logs?correlationId=exec_abc12345"

# Engine events only
curl "http://127.0.0.1:9998/logs?event=engine&limit=50"

Stream

GET /logs/stream is an SSE endpoint. Open it with the same filter parameters and matching entries arrive the moment they're ingested. No polling. An AI agent or monitoring script can subscribe to errors in real time.

Storage

In-memory circular buffer. 5,000 entries. Oldest evicted first. No persistence by default. Export the buffer to JSON on demand with POST /export. This keeps the system fast and prevents disk bloat. LogPiper is a debugging scratchpad, not a time-series database.

The data flow is simple enough to draw:

AI Agent (Claude Code, Cursor, any HTTP client)
    | POST /log
    v
LogPiper (in ToolPiper, port 9998)
    | stored in circular buffer
    v
AI Agent
    | GET /logs?level=error
    v
Structured error data with full HTTP bodies

Why HTTP and Not MCP Protocol

LogPiper's ingestion endpoints are plain HTTP, not MCP tool calls. This is deliberate.

MCP tool calls go through the AI client's tool-use machinery. The client formats the tool invocation as tokens, sends them to the model, the model generates the call, the client executes it. That's token cost, round-trip latency, and rate limits. For a logging endpoint that needs to handle thousands of writes per minute, MCP protocol overhead is unacceptable.

A requests.post() call from instrumented Python code costs nothing and takes single-digit milliseconds. A fetch() from a browser extension, a curl from a shell script, a URLSession.data() from Swift - they all hit the same HTTP endpoint with the same latency profile. No model in the loop. No token budget consumed.

The implication: LogPiper works without an MCP connection. It's reachable from any process on localhost via HTTP. The AI agent can also call it directly using curl or fetch without going through MCP tooling. ToolPiper's MCP tools (status, models, chat) are the high-level control plane. LogPiper is low-level data plane infrastructure that any process, MCP-connected or not, can use.

The design principle generalizes: use HTTP for the data plane, MCP for the control plane. High-frequency data (logs, metrics, streaming responses) should flow over direct HTTP. Low-frequency commands ("load this model," "run this test," "take a browser snapshot") are fine as MCP tool calls.

The LogEntry Schema

Each log entry has seven fields. All are optional except that at least one of event, message, or data must be present.

FieldTypePurpose
timestampISO 8601 stringWhen it happened. Auto-set to now if omitted.
sourcestringOrigin app or script. Your app name goes here. Used for ?source= filtering.
levelenumdebug, info, warn, error. Comparable severity. ?level=warn returns warn + error.
eventstringDot-prefixed taxonomy. http.request, engine.crash, pipeline.step.3. Prefix-matched on query.
messagestringHuman-readable description. Optional. Useful when the AI queries raw text.
dataJSON objectArbitrary payload. Request bodies, config state, timing, error details. The interesting part.
correlationIdstringGroups related entries. ToolPiper auto-generates these for workflows (exec_abc12345). Your code can pass its own.

The event prefix system is the key design choice. event=http matches http.request, http.response, http.error. event=engine matches engine.load, engine.unload, engine.crash, engine.restart. You get hierarchical filtering without a separate tag system, without a label taxonomy, without any schema to define upfront. The event string is free-form. Your apps can use any prefix scheme: myapp.auth.login, myapp.queue.stall, deploy.staging.rollback.

HTTP Body Capture

When requests flow through ToolPiper - LLM inference, cloud API proxy, MCP tool calls, model downloads - LogPiper captures full request and response bodies, truncated at 8KB. This is the feature that saves the most debugging time in AI-assisted development.

The AI agent doesn't need to guess what was sent to the model. It queries event=http and reads the exact messages array. It doesn't need to guess why the API returned a 422. The validation error is in the response body. When a cloud API returns a 400 because the tool schema has an unsupported keyword, the rejection message is captured verbatim.

// http.request entry in the buffer
{
  "event": "http.request",
  "data": {
    "url": "/v1/chat/completions",
    "method": "POST",
    "requestBody": "{\"model\":\"llama-3.2-3b\",\"messages\":[{\"role\":\"user\",\"content\":\"Summarize this document\"}]}"
  }
}

// http.error entry for the same request
{
  "event": "http.error",
  "data": {
    "url": "/v1/chat/completions",
    "status": 503,
    "responseBody": "{\"error\":{\"type\":\"model_not_loaded\",\"message\":\"llama-3.2-3b is not loaded. Call /engine/load first.\"}}",
    "durationMs": 12
  }
}

Streaming responses are detected and tagged: isStreaming: true, chunkCount: N. Binary content shows type and size instead of a body. The 8KB truncation handles the 99% case. Full prompt-and-response payloads for LLM calls are almost always under 8KB. Large RAG contexts might be truncated, but the error details never are.

Correlation IDs

ToolPiper auto-generates correlation IDs for workflow executions. Format: exec_abc12345. When a pipeline transcribes audio, summarizes the text, and generates speech, every log entry in the chain - the HTTP request to the STT backend, the inference call to the LLM, the TTS request, and any errors along the way - shares one ID.

Query by correlation ID and you get the complete timeline of a single operation across all components. One query, the whole story. No timestamp correlation. No grepping across log files.

# Everything that happened in workflow exec_7f2a1
curl "http://127.0.0.1:9998/logs?correlationId=exec_7f2a1"

Your own code can pass correlation IDs too. Assign a job ID when a pipeline starts, include it in every log POST, and the multi-step execution becomes a single queryable unit. This is how you trace a request through three services without building distributed tracing infrastructure.

What We Didn't Build (and Why)

No persistence to disk (by default). LogPiper is a debugging tool, not an audit log. In-memory keeps it fast and avoids disk bloat. The 5,000-entry circular buffer handles hours of active debugging. If you need a snapshot, POST /export writes the buffer to ~/Library/Application Support/ToolPiper/exports/ as JSON. But persistence isn't the default because we've seen what happens when logging tools grow into log management platforms. They get slow. LogPiper stays fast by not trying to be Elasticsearch.

No authentication on ingestion. Any local process should be able to log without credential management. No session key, no token, no handshake on /log or /logs (query). The stream endpoint and management endpoints (clear, export) require ToolPiper's session key. The trade-off: convenience vs. security. Localhost-only access makes this acceptable for a development tool. If you're running ToolPiper on a machine where untrusted processes might inject fake log entries, you have bigger problems than logging.

No MCP-specific ingestion format. Log entries are generic JSON. They don't know about MCP tools, tool schemas, or MCP sessions. They don't carry MCP-specific metadata. This keeps the system usable by non-MCP processes - shell scripts, CI runners, browser extensions, native apps. MCP tool calls through ToolPiper are logged automatically as HTTP events, so you get MCP visibility anyway, through the HTTP body capture layer.

No UI. The interface is HTTP. Build your own viewer if you want one. Pipe the SSE stream into jq. Write a React dashboard. Use a VS Code extension that renders JSON tables. We'd rather ship a solid API than a mediocre dashboard, and we'd rather spend the engineering time on the query engine than on a log viewer that competes with tools developers already use.

The Bidirectional Pattern

Most developer tools are unidirectional. You configure them, and they produce output you read. Sentry captures errors you view in a dashboard. Datadog collects metrics you graph. The data flows one way: from the system to the human.

LogPiper is bidirectional. The agent writes data AND reads it back. The agent is the producer and the consumer. This creates a feedback loop that unidirectional tools can't replicate.

In practice, it looks like this:

  1. The AI agent writes code with log instrumentation (POST /log calls at key points)
  2. The code runs and fails
  3. The agent queries the log store (GET /logs?level=error)
  4. The agent reads the exact error, including the HTTP response body that explains the failure
  5. The agent fixes the code with real data, not a guess
  6. Repeat until clean

The critical step is 3 to 4. The agent's fix quality depends entirely on the data quality at the query step. Terminal output gives the agent a fragment. LogPiper gives it the full picture: structured fields, HTTP bodies, correlation across processes, and precise filtering so the agent finds the relevant entries without parsing noise.

The bidirectional pattern is what makes LogPiper different from both outside-in monitoring and ephemeral debug logs. Sentry is write-only from the agent's perspective. Cursor Debug Mode is bidirectional but ephemeral. LogPiper is bidirectional and persistent (within the buffer window). The agent can write a log at 10 AM, query it at 2 PM, and the entry is still there.

Lessons for MCP Tool Builders

Building LogPiper surfaced three design principles that apply to any MCP tool that handles high-frequency data.

1. Give the agent infrastructure, not endpoints. An MCP tool the agent calls once is useful. A service the agent writes to continuously and queries on demand is more useful. The difference is the relationship: a tool is a transaction, infrastructure is a resource the agent can lean on across an entire debugging session. When you're designing an MCP tool, ask whether the agent would benefit from persistent state it can query later. If yes, you're building infrastructure.

2. HTTP beats MCP protocol for the data plane. Log ingestion happens thousands of times per minute. MCP tool calls are expensive - they consume tokens, require model inference for tool selection, and go through the client's rate limiting. Use HTTP for high-frequency data (logging, streaming, metrics). Use MCP for low-frequency control ("load a model," "take a snapshot," "run a test"). The protocols serve different layers. Mixing them up means paying model inference costs for what should be a direct HTTP POST.

3. Bidirectional beats unidirectional. An agent that can only read data is limited. An agent that can write data AND read it back can build feedback loops. The write-query-act cycle is the pattern that turns AI assistants from code generators into debugging partners. If your MCP tool produces data, consider whether the agent should also be able to write to the same store. If it can, you've built a feedback loop. If it can't, you've built a dashboard.

Where LogPiper Fits in the Stack

The MCP observability space is splitting into layers, and they're complementary.

Transport observability (OpenTelemetry for MCP): Traces MCP requests across client and server. Distributed tracing for MCP deployments. Useful for understanding how tool calls flow through a production system. The agent doesn't interact with this layer.

Production monitoring (Sentry, Datadog, Grafana): Error tracking, latency dashboards, alerting. Ops teams use this to keep MCP servers healthy. The agent can query some of this through MCP integrations, but the data is aggregate, not instance-specific.

Agent-native logging (LogPiper): A log store the agent writes to and reads from. Development-time debugging. Instance-specific data - the exact HTTP bodies, the exact error messages, the exact sequence of events that led to a failure. Not aggregate. Not for ops. For the agent that's trying to fix the code right now.

You probably need all three in a mature MCP deployment. OpenTelemetry for the infra team. Sentry for the ops team. LogPiper for the AI agent that's building and debugging the system.

Limitations

LogPiper runs on localhost only. It's not distributed. If your debugging problem spans two machines, LogPiper covers the local one.

The 5,000-entry buffer is fixed. For a focused debugging session, it's enough. For a process that generates thousands of entries per minute, you'll need to query frequently or export periodically.

The agent still has to be told to check the logs. It won't do this automatically. You can put "query LogPiper before guessing" in your CLAUDE.md or system prompt, and the AI will follow the instruction. But automatic log-check-on-failure is a workflow improvement that doesn't exist yet. The human closes the loop by prompting the query.

macOS only, because ToolPiper is macOS only. Linux and Windows developers can't use this today.

ToolPiper is a free download from the Mac App Store. LogPiper is included in every installation.

This is spoke 5 in the vibe debugging content cluster. For a step-by-step Claude Code integration, see How to Debug with Claude Code Using a Local Log Bus. For the full LogPiper endpoint reference, see LogPiper: A Universal Logging Bus That Ships Free Inside ToolPiper.