What is vibe debugging?

Vibe debugging describes the gap between AI-generated code and the ability to debug it. AI coding assistants can write code fluently but can't observe what that code does at runtime. They can't see HTTP responses, read logs from other processes, or inspect state that wasn't printed to the terminal. Vibe debugging is what happens when the AI's fix is a guess, because it doesn't have the runtime data to make it a diagnosis.

Can AI coding assistants debug their own code?

Yes, with the right observability. When an AI agent can only read terminal output, it's limited to whatever text is currently visible in the buffer. When it has access to a structured log store like LogPiper, it can query errors by level, filter by source or event type, read full HTTP request and response bodies, and correlate entries across processes using correlation IDs. The debugging ability scales with the observability available.

Does LogPiper work with Claude Code, Cursor, and Windsurf?

Yes. LogPiper is plain HTTP. Any tool that can make HTTP requests can write to it (POST /log) and query it (GET /logs). There's no SDK, no plugin, and no MCP requirement. Claude Code can curl the endpoints directly. Cursor and Windsurf can do the same through their terminal or tool-use capabilities. The workflow is identical across all clients.

Is LogPiper a replacement for production monitoring?

No. LogPiper is an in-memory debugging tool for local development. It holds 5,000 entries in a circular buffer, runs on localhost only, and doesn't persist logs to disk unless you explicitly export them. For production observability, use your existing stack (Sentry, Datadog, Grafana, or similar). LogPiper fills a different role: giving AI coding assistants structured runtime data during development.

Vibe Debugging: The Observability Gap in AI-Assisted Development

Vibe coding changed how we write software. It didn't change how we debug it.

AI coding assistants can generate a thousand lines of working code from a natural language prompt. They handle boilerplate, API integrations, data transformations, test scaffolding. The promise is real and the productivity gain is measurable. But the moment that generated code runs and something goes wrong, the assistant goes blind. It can't see the runtime. It can't inspect the HTTP response that came back malformed. It can't correlate an error in service A with a timeout in service B. It can't read a stack trace that scrolled past three minutes ago.

The feedback loop breaks at the most important step: understanding what went wrong.

The 80/20 Wall

Every developer building with AI assistants has hit the same wall. The AI gets the first 80% right. Often impressively right. The code compiles, the structure is clean, the patterns are idiomatic. Then you run it.

The last 20% is where the work lives. Integration bugs. Race conditions. Wrong API response shapes. An auth token that expires mid-request. A WebSocket that silently disconnects. The kind of failures that only show up at runtime, in the gap between what the code is supposed to do and what it actually does.

Stack Overflow's 2024 developer survey found that 66% of developers spend more time fixing AI-generated code than they expected. That number maps directly to this wall. The AI wrote the code. Now it can't fix it, because it can't observe the failure. You're left copying error messages into the chat, describing what you see in the terminal, and hoping the assistant can reconstruct enough context to help.

That's not a feedback loop. That's a game of telephone.

Why Terminal Output Isn't Enough

Terminal output is the default observability layer for most development. Print statements, console.log, stderr. It works when you're the one reading it. It falls apart when an AI agent needs to read it.

Terminal output is write-only. You print, it scrolls past, it's gone. If you weren't watching when the error happened, you missed it. If the relevant error was 200 lines ago, or in a different terminal entirely, it might as well not exist. There's no filtering, no structured data, no cross-process correlation.

AI agents read terminal output through a narrow window. They see the last N lines of whatever the process printed to stdout. If the bug produced a stack trace that's now off-screen, the agent can't scroll up. If the error happened in a background process running in a different terminal tab, the agent has no access to it at all. If the critical information was an HTTP response body buried inside a verbose log stream, the agent has to parse unstructured text to find it.

Cursor, Claude Code, Windsurf, and every other AI coding assistant share this limitation. They can execute commands and read the output. But "read the output" means "read a snapshot of whatever text is currently visible." That's a straw. The debugging problem needs a fire hose.

The Feedback Loop and Its Missing Piece

The productive pattern with AI coding assistants follows a tight loop: write, run, check, fix. The assistant writes code. You run it (or the assistant runs it). Something checks whether it worked. If it didn't, the assistant fixes the issue. Repeat until clean.

Addy Osmani described this as the "LLM Coding Workflow" - the iterative cycle that turns AI-assisted development from a one-shot generation into a convergent debugging process. Owain Lewis put the key insight more bluntly: "If you can't debug it with the information available, neither can the agent."

The loop works when "check" means "read test output." Tests are structured. They pass or fail. The failure message is usually specific enough for the agent to act on. But tests don't cover everything, and they especially don't cover the failures you didn't anticipate.

The loop breaks when "check" means "figure out why a multi-step AI pipeline returned garbage at step 3 of 5." Or "determine why the API proxy returns 502 but only when the upstream model is still loading." Or "understand why the WebSocket connection drops after exactly 60 seconds." These are runtime debugging problems. They require observability, not test coverage.

The write-run-check-fix loop has a missing piece. The agent needs a way to observe runtime behavior that's persistent, structured, and queryable. Not ephemeral terminal output. Not a test that somebody thought to write in advance. A log store that the agent can both write to and read from.

What LogPiper Is

LogPiper is a real-time logging bus built into ToolPiper. It's two HTTP endpoints for ingestion, a query endpoint with structured filters, and an SSE stream for real-time monitoring. That's the whole thing.

Write a log entry:

curl -X POST http://127.0.0.1:9998/v1/logs \
  -H "Content-Type: application/json" \
  -d '{"source": "my-app", "level": "error", "event": "api.timeout",
       "message": "Upstream model took 45s, proxy killed the request",
       "data": {"model": "llama-3.2-3b", "timeoutMs": 45000},
       "correlationId": "req_abc123"}'

Query logs after the fact:

curl "http://127.0.0.1:9998/v1/logs?level=error&limit=20"

The buffer holds 5,000 entries in a circular queue. Oldest entries get evicted when the buffer is full. Ingestion is fire-and-forget with a 2-second timeout, so it never blocks your application. The ingestion endpoints are unauthenticated because the whole point is zero-friction writes from any local process.

No SDK. No client library. No dependency to install. If your language can make an HTTP POST, it can log to LogPiper. If your language can make an HTTP GET, it can query LogPiper. The interface is the protocol you already know.

Every HTTP request that flows through ToolPiper is automatically logged with full request and response bodies (truncated at 8KB). Streaming responses are tagged with chunk counts. Engine lifecycle events (load, unload, crash, restart) are captured. Correlation IDs link related entries across processes. When the agent queries event=http&limit=50, it gets the exact payloads that crossed the wire, not a summary, not a status code, the actual JSON bodies.

How Is This Different From What Already Exists?

Three categories of tools touch this space. They solve different problems.

Cursor Debug Mode

Cursor spins up a temporary HTTP log server for a single debug session. The AI agent can write logs to it and read them back. This is genuinely useful for quick, focused debugging within Cursor. The logs vanish when the session ends. It only works inside Cursor. If you're using Claude Code, Windsurf, or any other client, this doesn't exist for you.

Sentry, Datadog, Grafana MCP Integrations

These monitoring platforms have released MCP server integrations that let AI agents query production telemetry. Sentry's MCP server gives agents access to error reports and stack traces. Datadog's gives access to metrics and dashboards. These tools monitor your AI agent from the outside. They answer "how is the system performing?" and "what errors are users hitting?" They don't answer "what did this specific HTTP request body contain during this specific local debugging session?"

Production observability and local development debugging are different problems. One watches deployed systems at scale. The other helps you figure out why your code doesn't work yet.

LogPiper

LogPiper occupies a different position. The AI agent writes structured logs AND queries them. It's bidirectional. Persistent until cleared. Cross-tool, because it's plain HTTP. The agent can POST an error log from instrumented code, then query that same log 10 minutes later when it's trying to understand a pattern across multiple runs.

Sentry monitors your AI agent. Cursor gives your AI agent a temporary log server. LogPiper gives your AI agent a persistent, queryable log store it can write to and read from.

The Workflow in Practice

Here's what this looks like when you're debugging with an AI coding assistant. This works identically in Claude Code, Cursor, Windsurf, or any tool that can make HTTP requests.

Step 1: Clear the buffer.

curl -X POST http://127.0.0.1:9998/v1/logs/clear

Start clean. No noise from previous sessions.

Step 2: Tell the agent to instrument the code.

The agent adds log POSTs at key points: before and after API calls, at branch points in business logic, around the code that's failing. Each log entry includes a source, level, event type, and structured data. The agent picks the instrumentation points because it wrote the code and knows where the likely failure modes are.

// The agent adds this around the suspected failure point
try {
  const response = await fetch(apiUrl, { method: 'POST', body: payload });
  const data = await response.json();
  await fetch('http://127.0.0.1:9998/v1/logs', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    body: JSON.stringify({
      source: 'my-app', level: 'info', event: 'api.response',
      message: `Got ${response.status} from ${apiUrl}`,
      data: { status: response.status, body: data, requestPayload: payload }
    })
  }).catch(() => {});
} catch (err) {
  await fetch('http://127.0.0.1:9998/v1/logs', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    body: JSON.stringify({
      source: 'my-app', level: 'error', event: 'api.error',
      message: err.message,
      data: { url: apiUrl, payload: payload }
    })
  }).catch(() => {});
  throw err;
}

Step 3: Reproduce the bug.

Run the code. Trigger the failure. The logs land in LogPiper's buffer as they happen.

Step 4: The agent queries the logs.

curl "http://127.0.0.1:9998/v1/logs?level=error&limit=20"

The agent sees the exact error, including the full HTTP response body, the request payload that triggered it, the timestamp, and the source. No copying and pasting from a terminal window. No "can you describe what you see?" The structured data is right there.

Step 5: The agent fixes the code.

With the actual error data in context, the agent can make a targeted fix instead of a guess. If the API returned a 422 with a validation error, the agent sees the validation message. If the response shape was wrong, the agent sees the actual shape. If the request payload was malformed, the agent sees exactly what it sent.

Step 6: Repeat until clean.

Clear, reproduce, query, fix. Each cycle narrows the problem. The agent builds a mental model of the runtime behavior from structured data, not from your narration of what you saw in a terminal.

Why Bidirectional Matters

The word that matters in this architecture is "bidirectional." The agent doesn't only read logs that something else wrote. It writes logs from code it instrumented, and then reads those same logs to understand what happened. This closes the loop in a way that terminal output never can.

Consider the difference. With terminal output, the agent writes a print statement, you run the code, the output scrolls past, and either you copy-paste the relevant line into the chat or the agent reads whatever's in the visible terminal buffer. Three handoff points where information gets lost.

With LogPiper, the agent writes a structured log POST, the code runs, the log entry lands in the buffer, and the agent queries the buffer directly. Zero handoffs. Zero information loss. The agent wrote the instrumentation, so it knows exactly what fields to query and what the values mean.

The correlation ID pattern makes this even more powerful for multi-step workflows. The agent assigns a correlation ID to a job, instruments every step with that ID, runs the job, and then queries all entries for that ID. The complete timeline of a pipeline execution, from input to output to failure, in one query. No timestamp correlation. No grepping across log files. One ID, one query, the whole story.

What Changes When the Agent Can Self-Diagnose

The practical difference is fewer cycles and better fixes.

Without runtime observability, the agent guesses. It sees a failure message (if you provide one) and hypothesizes about the cause. "The API might be returning a different format." "The auth token might have expired." "The timeout might be too short." Each guess produces a code change. Each code change requires a full reproduce cycle. Some guesses are wrong, and the agent needs another round.

With a queryable log store, the agent knows. It doesn't hypothesize about the API response format. It reads the actual response. It doesn't guess whether the token expired. It sees the 401 with the expiry timestamp in the response body. It doesn't wonder about the timeout. It sees the duration logged at 45,230ms against a 30,000ms limit.

The shift is from "the AI can write code" to "the AI can understand what the code does at runtime." That's the gap vibe coding opened. The AI is extraordinary at generation. It's been blind at observation. LogPiper gives it eyes.

In practice, this means fewer back-and-forth cycles per bug. Fewer "I don't know what went wrong, let me try something different" guesses. More "I see the error, here's the fix." The debugging conversation gets shorter because the agent has the data it needs on the first query.

Beyond HTTP: What LogPiper Captures Automatically

Any code the agent instruments with log POSTs is only part of the picture. ToolPiper itself logs every HTTP request and response that flows through it, automatically and with full bodies. If your application talks to ToolPiper for inference, proxy, or any other endpoint, those interactions are already in the buffer before the agent writes a single line of instrumentation code.

The automatic capture includes:

LLM inference requests and responses - the exact prompt, the exact completion, the model name, the duration
Cloud API proxy traffic - what was sent to OpenAI, Anthropic, or Gemini, and what came back, including error responses with their full bodies
MCP tool invocations - input parameters, output, execution time. When an AI assistant calls a ToolPiper MCP tool, the call is logged with enough detail to replay it
Engine lifecycle events - model load, model unload, crashes with exit codes and last stderr output, automatic restarts
Streaming response metadata - chunk counts, content types, whether the response was SSE or NDJSON

The agent can query event=http.error&limit=20 and immediately see every failed HTTP interaction, with the response body that explains why it failed. No instrumentation required for any traffic that touches ToolPiper.

Limitations

LogPiper is an in-memory debugging tool with a 5,000-entry circular buffer. Old entries get evicted. It's not an archival system and it's not a replacement for production observability. If you need log persistence, export the buffer to disk before clearing it.

It works on localhost only. It's not a distributed tracing system. If your debugging problem spans multiple machines, LogPiper covers the local machine and you'll need something else for the remote side.

The ingestion endpoints are unauthenticated by design, for zero-friction local writes. This is a feature for development and a non-starter for anything exposed to a network.

The AI agent still needs to be told to check the logs. It won't do it automatically. You have to include "query LogPiper for errors" as part of your debugging prompt, or the agent will fall back to reading terminal output like it always has. MCP tool calls through ToolPiper are logged automatically, but the agent querying those logs is still a manual step. Automatic log-check-on-failure is a workflow improvement we're thinking about, but it's not built yet.

And the buffer size is fixed at 5,000 entries. For a focused debugging session, this is plenty. For a long-running process that generates thousands of log entries per minute, you'll need to query frequently or export periodically. The buffer is a debugging scratchpad, not a time-series database.

The Debugging Gap Vibe Coding Created

Vibe coding made generation fast. It didn't make debugging fast. The AI writes the code in seconds. When that code fails at runtime, the debugging cycle drops back to the pre-AI pace: manual observation, manual context gathering, manual relay of information to the assistant.

LogPiper is one answer to this gap. A persistent, structured, queryable log store that the AI agent can both write to and read from. The agent instruments code, reproduces failures, queries structured errors, and fixes the code with the actual runtime data in context. Not a guess. Not your description of what happened. The data itself.

The pattern is simple and it's tool-agnostic. Claude Code, Cursor, Windsurf, Zed, any MCP client, any tool that can make HTTP requests. The protocol is HTTP. The data is JSON. The workflow is clear, reproduce, query, fix.

ToolPiper is a free download from the modelpiper.com/download, and LogPiper is included in every installation.

This is the pillar article in the vibe debugging series. For a step-by-step guide using Claude Code, see How to Debug with Claude Code Using a Local Log Bus. For the technical architecture of LogPiper itself, see LogPiper: A Universal Logging Bus That Ships Free Inside ToolPiper.