Add BurnCap with zero effort.
You already code with an AI agent. Pick your stack, copy the prompt, and paste it into Claude Code, Cursor, or any agent — it installs the SDK, instruments every LLM call with the right token math, and wires up budget guardrails. No docs to read.
# Task: Integrate BurnCap into this project — Next.js + Vercel AI SDK
BurnCap is an out-of-band AI cost monitor and budget guardrail. After each LLM call we
send usage metadata (never prompts) to BurnCap; optionally we ask it for an advisory
budget decision before expensive calls. **Enforcement always stays in our code — BurnCap
only advises.**
## 0. Discover before changing anything
- Find every **server-side** LLM call site in this project and list them back to me.
- Identify the framework, where env vars live, and a home for a shared server util.
- Never expose the API key to client/browser code.
## 1. Install & configure
Install: `pnpm add @burncap/sdk`
Add **server-only** env vars (never client-exposed):
```bash
BURNCAP_API_KEY=bc_your_key_here
BURNCAP_URL=https://www.burncap.app
```
Create one shared client:
If `@burncap/sdk` isn't installable from your registry, replace the client with thin `fetch()` calls to the endpoints in section 4 — same JSON, same headers.
```ts
// lib/burncap.ts
import { BurnCap } from "@burncap/sdk";
export const burncap = new BurnCap({
apiKey: process.env.BURNCAP_API_KEY!,
baseUrl: process.env.BURNCAP_URL, // optional; defaults to https://www.burncap.app
// failMode defaults to "open": a BurnCap outage must never block your traffic.
});
```
## 2. Track usage after every server-side model call
Invariants (BurnCap miscounts otherwise):
- `input_tokens` **excludes** `cached_input_tokens`; `output_tokens` **excludes** `reasoning_tokens`.
- Always pass a stable `request_id` (the provider's response id) so retries never double-count.
- Never block the user's request path on tracking — it is best-effort.
Map the provider's usage object (adapt to the SDK version in this repo):
```ts
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { burncap } from "@/lib/burncap";
const { usage, response } = await generateText({ model: openai("gpt-5.5"), prompt });
burncap.trackUsageInBackground({
request_id: response.id,
provider: "openai", // the provider behind the model (anthropic/google for @ai-sdk/anthropic|google)
model: response.modelId,
input_tokens: (usage.inputTokens ?? 0) - (usage.cachedInputTokens ?? 0),
cached_input_tokens: usage.cachedInputTokens ?? 0,
output_tokens: (usage.outputTokens ?? 0) - (usage.reasoningTokens ?? 0),
reasoning_tokens: usage.reasoningTokens ?? 0,
feature: "chatbot",
customer_id: user?.id,
});
```
Use stable, low-cardinality `feature` labels per call site (e.g. "chatbot", "summarizer",
"agent-runs"). Other usage dimensions: `image_count`, `audio_seconds`, `tool_call_count`,
`session_id` (powers runaway-loop detection). Pass `actual_cost_usd` if you know it, else
BurnCap estimates from your workspace pricing table.
## 3. (Optional — confirm with me first) Budget guardrails
Gate expensive calls on the advisory `action`. We do the enforcing:
```ts
import { burncap } from "@/lib/burncap";
const gate = await burncap.checkBudget({ feature: "agent-runs", customerId: user.id });
if (gate.action === "block") return Response.json({ error: "budget_exceeded" }, { status: 402 });
if (gate.action === "use_cheaper_model") model = "gpt-5-mini";
if (gate.action === "warn") console.warn("AI budget nearly spent", gate.budgets);
// "continue" -> under budget, proceed as normal
```
`checkBudget` **fails open** on network errors by default — keep it that way on user-facing
paths. Use a closed/blocking fallback only for internal, spend-critical jobs.
## 4. API contract (source of truth)
Both endpoints use `Authorization: Bearer bc_your_key_here` against `https://www.burncap.app`.
- **POST `/api/v1/events`** — body `{ "events": [event, ...] }`, a bare array, or a single
event object. 1–500 events per call; idempotent by `request_id`.
- `200 → { inserted, duplicates, invalid, errors[], unpriced_models[] }`
- `429` rate limited (honor `Retry-After`) · `402 quota_exceeded` · `401` invalid key
- **GET `/api/v1/budget/check?feature=…&customer_id=…&environment=…`**
- `200 → { allowed, state, action, reason, budgets[] }`
- `action` ∈ `continue | warn | use_cheaper_model | block`; `allowed` is `false` only when
a hard-cap budget is exhausted. **Fail open** on any network error.
## Hard rules (do not violate)
1. Server-side only. The `bc_…` key must never reach the browser/client bundle.
2. Tracking is best-effort — never block or fail a user request because of it.
3. Always send a stable `request_id`. Honor the two token-exclusion invariants.
4. Keep `checkBudget` fail-open on anything user-facing.
5. Minimal, surgical changes — a thin shared client plus one tracking call per LLM call site.
No broad refactors.
## Verify before you're done
- Build / type-checks pass.
- Trigger one real LLM call; show me the `POST /api/v1/events` response with `inserted: 1`
(a retry of the same `request_id` returns `duplicates: 1`).
- Confirm the event appears in the BurnCap dashboard, then summarize the call sites you
instrumented and the `feature` labels you chose.
Next.js + Vercel AI SDK
Normalized usage from the ai package across every provider.
Node / Next.js + OpenAI SDK
The openai client — chat.completions usage mapping.
Node + Anthropic SDK
@anthropic-ai/sdk — cache-read vs cache-creation handled.
Node + Google Gemini SDK
@google/genai — usageMetadata + thinking tokens.
Python (FastAPI + OpenAI / Anthropic)
No SDK needed — thin httpx calls, fail-open guardrail.
Create a free workspace to get your API key — the in-app prompt bakes it in for you, so the agent has everything it needs.
Start freeBurnCap never proxies your model traffic and never stores prompts — these prompts only add out-of-band usage tracking and advisory budget checks. Enforcement always stays in your code.