Your local LLM returned garbage. The response is valid JSON, correct shape, proper finish reason. But the content is wrong. The summary ignores half the input. The translation is in the wrong language. The function call has the wrong arguments. Was the prompt malformed? Did the system message get truncated? Was the temperature set to 2.0 instead of 0.2? You can't tell from the output alone. You need to see what was actually sent.

Cloud APIs have dashboards for this. OpenAI shows your request history in the playground. Anthropic's console logs your calls. But when you run a local LLM through ToolPiper's API gateway, the request goes to localhost and the payload vanishes after the response comes back. If something went wrong, you're adding console.log and re-running. Or worse, you're guessing.

LogPiper fixes this. Every HTTP request that flows through ToolPiper is automatically logged with the full request body and the full response body. No instrumentation. No configuration. No intercepting proxy. The data is already there.

What Does HTTP Body Capture Actually Mean?

Most logging systems capture metadata: URL, status code, duration, maybe a few headers. That's fine for knowing that a request happened. It's useless for knowing why it failed.

LogPiper captures the payload. The JSON your app assembled and sent. The JSON the model returned. When a cloud API rejects your request, the error message in the response body tells you exactly what went wrong. When a local model produces bad output, the prompt in the request body tells you exactly what it was working with.

This happens automatically for every HTTP request through ToolPiper. Chat completions, embeddings, transcription, cloud proxy calls, MCP tool invocations. You don't opt in per-endpoint. You don't add middleware. ToolPiper's HTTP layer writes the log entry before your code even sees the response.

What Gets Captured?

Three event types cover the full request lifecycle.

http.request events contain the outbound data:

  • url - the endpoint that was hit
  • method - GET, POST, PUT, DELETE
  • requestBody - the full JSON payload your app sent
  • toolId - which tool or feature initiated the request

http.response events contain what came back:

  • responseBody - the full JSON response from the API
  • status - HTTP status code
  • contentType - response media type
  • durationMs - round-trip time in milliseconds

http.error events contain the failure details:

  • responseBody - the actual error message from the API, not a generic status
  • status - the error status code
  • url and method - what was attempted

Bodies are truncated at 8KB. For most API payloads (chat completions, embedding requests, model configs), 8KB covers the full content. Binary responses like audio or images show type and size instead of raw bytes. Streaming responses (SSE, NDJSON) are tagged with isStreaming: true and a chunkCount indicating how many chunks the stream produced.

Debugging a Bad Chat Completion

Here's a scenario that happens weekly if you're building against a local model. You send a chat completion request. The model responds. The response is structurally valid but the content is wrong.

Query the logs:

curl "http://127.0.0.1:9998/logs?event=http&limit=10"

The response is a JSON array. Find the http.request entry for your chat completion call and look at requestBody. The messages array is right there. Now you can see the problem.

Scenario 1: Silent variable substitution failure. The system prompt reads "You are a helpful assistant" instead of your custom prompt. The template variable didn't resolve. Your code passed the raw template string to the API. Without body capture, you'd have spent twenty minutes swapping prompts and re-running. With body capture, you see the literal text that was sent and the bug is obvious.

Scenario 2: Model name typo. The http.error entry shows {"error": {"message": "model 'llama-3.2-3' not found"}}. You typed llama-3.2-3 instead of llama-3.2-3b. The error response body from ToolPiper told you exactly what happened, but your app's HTTP client swallowed it and threw a generic "request failed" exception. LogPiper has the original.

Scenario 3: Wrong parameter value. The http.request entry shows "temperature": 2.0 in the request body. You meant 0.2. A config file had a missing decimal point. The model did what you asked, technically. It generated text with maximum randomness. The output looked like garbage because the input told it to be chaotic.

Each of these is a one-query diagnosis. Without body capture, each one is a multi-round guessing game where you change things, re-run, check the output, and hope you changed the right thing.

Debugging Cloud API Proxy Errors

ToolPiper proxies cloud API requests to OpenAI, Anthropic, and Gemini, injecting API keys from your macOS Keychain so they never touch your code or environment variables. When a proxied request fails, the cloud provider's error response contains the diagnosis. But most HTTP client libraries throw an exception with the status code and discard the body.

LogPiper keeps the body.

curl "http://127.0.0.1:9998/logs?event=http.error&limit=5"

The responseBody field contains the actual error message from the provider. "Invalid API key" means the Keychain entry is wrong or expired. "Rate limited" with a retry-after header means you've hit your quota. "The model 'gpt-4-turbo' does not exist" means OpenAI renamed the model and your code has a stale identifier. These error messages are specific and actionable, but only if you can read them. LogPiper makes sure you can.

This is particularly useful during provider API changes. Cloud providers deprecate model names, change rate limits, modify response formats. The first sign is usually an opaque error in your application. The responseBody in LogPiper tells you what changed.

Using Duration to Find Performance Problems

Every http.response event includes durationMs. This field is the round-trip time for the request, and it tells you things that no amount of prompt tweaking will reveal.

curl "http://127.0.0.1:9998/logs?event=http.response&limit=50"

Scan the durationMs values. Patterns emerge quickly.

If your first chat completion takes 12,000ms but subsequent ones take 800ms, the model is loading into memory on first use. That's normal behavior for local inference, but if your app doesn't account for the cold-start delay, users see a long hang on the first request. Now you know to add a warm-up call at startup.

If every request takes 45,000ms regardless of prompt length, the model might be too large for your available memory. ToolPiper serves models through llama.cpp, and when a model exceeds available RAM, the system pages to disk and inference grinds. The fix is to use a smaller quantization or a smaller model. The durationMs field is the signal that tells you to look.

If requests to a cloud proxy take 3,000ms on average but spike to 15,000ms intermittently, you're hitting provider-side load balancing or rate limit backoff. The timing data in LogPiper shows the pattern without requiring a dedicated profiling tool.

Streaming Response Detection

Chat streaming is the default for most AI applications. ToolPiper serves SSE and NDJSON streams for token-by-token output. LogPiper handles these differently from regular request/response pairs.

When a streaming response completes, LogPiper creates a single http.response entry with two extra fields:

  • isStreaming: true - confirms this was a streamed response, not a batch
  • chunkCount: 47 - how many chunks (typically tokens or token groups) the stream produced

A low chunk count might mean the model hit a stop sequence early. An unexpectedly high count might mean the model is rambling past your max_tokens limit (some backends soft-cap instead of hard-cutting). If the chunk count is zero, the stream connected but produced no data, which usually means the model failed to load or ran out of memory mid-generation.

The individual chunks aren't stored. That would fill the 5,000-entry buffer in seconds during a busy streaming session. The metadata gives you what you need to triage without the storage cost.

The Debugging Workflow

Four commands. Run them in order.

# 1. Clear old entries so you're working with a clean slate
curl -X POST http://127.0.0.1:9998/clear

# 2. Make the API call that's failing
# (run your app, hit the endpoint, trigger the bug)

# 3. Check what happened
curl "http://127.0.0.1:9998/logs?event=http&limit=20"

# 4. Narrow to errors if needed
curl "http://127.0.0.1:9998/logs?event=http.error&limit=10"

The /clear endpoint requires a session key header for authenticated requests, but /logs is unauthenticated. You can query logs from any terminal, any script, any AI assistant without managing credentials. The log ingestion and query path is intentionally open for local development.

If you need to preserve logs from a debugging session before clearing, POST /export writes the current buffer to a JSON file in ToolPiper's data directory.

Hand It to Your AI Assistant

The real power of structured HTTP body capture isn't reading the logs yourself. It's handing them to an AI coding assistant.

When Claude Code, Cursor, or any MCP-capable assistant is debugging your API integration, the logs are one command away. Tell the assistant:

"Query LogPiper for HTTP errors: curl http://127.0.0.1:9998/logs?event=http.error&limit=10"

The assistant runs the command, reads the structured JSON, and sees the full request body that caused the error plus the full response body that explains the failure. It doesn't need to guess what your code sent. It doesn't need to add print statements and re-run. The data is there, in a format it already understands.

If you're using ToolPiper's MCP server, the assistant can call the logs tool directly without constructing curl commands. Same data, native integration.

One query. One diagnosis. One fix. Compare that to the typical cycle: the assistant guesses at the problem, rewrites the code, you run it, it fails again, the assistant guesses differently. Three rounds of inference when one round of observation would have been enough.

What HTTP Body Capture Doesn't Cover

LogPiper captures HTTP traffic that flows through ToolPiper. Requests between your app and other services on your machine (a local database, a Redis instance, a separate API server) aren't captured unless they go through ToolPiper's proxy.

Bodies are truncated at 8KB. For the vast majority of API payloads, that's the full content. If you're sending a 50KB document as part of a RAG prompt, you'll see the first 8KB of it. The truncation is a trade-off between visibility and memory pressure in a 5,000-entry circular buffer.

Individual streaming chunks aren't stored. You get the metadata (was it streaming, how many chunks) but not the token-by-token content. If you need to debug a specific token in a stream, add a custom log call in your streaming handler using POST /log.

And this is macOS only. ToolPiper runs on your Mac with Apple Silicon. If you're debugging a remote server or a Linux CI pipeline, LogPiper isn't in the picture.

Try It

ToolPiper is a free download from the Mac App Store. Install it and LogPiper is already running. Next time an API call through ToolPiper returns something unexpected, skip the print statements. Query http://127.0.0.1:9998/logs?event=http&limit=10 and read what actually happened.

This is part of the vibe debugging series on observability for AI-assisted development. For the full LogPiper technical overview, see LogPiper: A Universal Logging Bus That Ships Free Inside ToolPiper.