For your AI agent

Add BurnCap with zero effort.

You already code with an AI agent. Pick your stack, copy the prompt, and paste it into Claude Code, Cursor, or any agent — it installs the SDK, instruments every LLM call with the right token math, and wires up budget guardrails. No docs to read.

# Task: Integrate BurnCap into this project — Next.js + Vercel AI SDK

BurnCap is an out-of-band AI cost monitor and budget guardrail. After each LLM call we
send usage metadata (never prompts) to BurnCap; optionally we ask it for an advisory
budget decision before expensive calls. **Enforcement always stays in our code — BurnCap
only advises.**

## 0. Discover before changing anything
- Find every **server-side** LLM call site in this project and list them back to me.
- Identify the framework, where env vars live, and a home for a shared server util.
- Never expose the API key to client/browser code.

## 1. Install & configure
Install: `pnpm add @burncap/sdk`

Add **server-only** env vars (never client-exposed):
```bash
BURNCAP_API_KEY=bc_your_key_here
BURNCAP_URL=https://www.burncap.app
```

Create one shared client:
If `@burncap/sdk` isn't installable from your registry, replace the client with thin `fetch()` calls to the endpoints in section 4 — same JSON, same headers.
```ts
// lib/burncap.ts
import { BurnCap } from "@burncap/sdk";

export const burncap = new BurnCap({
  apiKey: process.env.BURNCAP_API_KEY!,
  baseUrl: process.env.BURNCAP_URL, // optional; defaults to https://www.burncap.app
  // failMode defaults to "open": a BurnCap outage must never block your traffic.
});
```

## 2. Track usage after every server-side model call
Invariants (BurnCap miscounts otherwise):
- `input_tokens` **excludes** `cached_input_tokens`; `output_tokens` **excludes** `reasoning_tokens`.
- Always pass a stable `request_id` (the provider's response id) so retries never double-count.
- Never block the user's request path on tracking — it is best-effort.

Map the provider's usage object (adapt to the SDK version in this repo):
```ts
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { burncap } from "@/lib/burncap";

const { usage, response } = await generateText({ model: openai("gpt-5.5"), prompt });

burncap.trackUsageInBackground({
  request_id: response.id,
  provider: "openai", // the provider behind the model (anthropic/google for @ai-sdk/anthropic|google)
  model: response.modelId,
  input_tokens: (usage.inputTokens ?? 0) - (usage.cachedInputTokens ?? 0),
  cached_input_tokens: usage.cachedInputTokens ?? 0,
  output_tokens: (usage.outputTokens ?? 0) - (usage.reasoningTokens ?? 0),
  reasoning_tokens: usage.reasoningTokens ?? 0,
  feature: "chatbot",
  customer_id: user?.id,
});
```

Use stable, low-cardinality `feature` labels per call site (e.g. "chatbot", "summarizer",
"agent-runs"). Other usage dimensions: `image_count`, `audio_seconds`, `tool_call_count`,
`session_id` (powers runaway-loop detection). Pass `actual_cost_usd` if you know it, else
BurnCap estimates from your workspace pricing table.

## 3. (Optional — confirm with me first) Budget guardrails
Gate expensive calls on the advisory `action`. We do the enforcing:
```ts
import { burncap } from "@/lib/burncap";

const gate = await burncap.checkBudget({ feature: "agent-runs", customerId: user.id });
if (gate.action === "block") return Response.json({ error: "budget_exceeded" }, { status: 402 });
if (gate.action === "use_cheaper_model") model = "gpt-5-mini";
if (gate.action === "warn") console.warn("AI budget nearly spent", gate.budgets);
// "continue" -> under budget, proceed as normal
```
`checkBudget` **fails open** on network errors by default — keep it that way on user-facing
paths. Use a closed/blocking fallback only for internal, spend-critical jobs.

## 4. API contract (source of truth)
Both endpoints use `Authorization: Bearer bc_your_key_here` against `https://www.burncap.app`.

- **POST `/api/v1/events`** — body `{ "events": [event, ...] }`, a bare array, or a single
  event object. 1–500 events per call; idempotent by `request_id`.
  - `200 → { inserted, duplicates, invalid, errors[], unpriced_models[] }`
  - `429` rate limited (honor `Retry-After`) · `402 quota_exceeded` · `401` invalid key
- **GET `/api/v1/budget/check?feature=…&customer_id=…&environment=…`**
  - `200 → { allowed, state, action, reason, budgets[] }`
  - `action` ∈ `continue | warn | use_cheaper_model | block`; `allowed` is `false` only when
    a hard-cap budget is exhausted. **Fail open** on any network error.

## Hard rules (do not violate)
1. Server-side only. The `bc_…` key must never reach the browser/client bundle.
2. Tracking is best-effort — never block or fail a user request because of it.
3. Always send a stable `request_id`. Honor the two token-exclusion invariants.
4. Keep `checkBudget` fail-open on anything user-facing.
5. Minimal, surgical changes — a thin shared client plus one tracking call per LLM call site.
   No broad refactors.

## Verify before you're done
- Build / type-checks pass.
- Trigger one real LLM call; show me the `POST /api/v1/events` response with `inserted: 1`
  (a retry of the same `request_id` returns `duplicates: 1`).
- Confirm the event appears in the BurnCap dashboard, then summarize the call sites you
  instrumented and the `feature` labels you chose.

Tailored for your stack

Next.js + Vercel AI SDK

Normalized usage from the ai package across every provider.

Node / Next.js + OpenAI SDK

The openai client — chat.completions usage mapping.

Node + Anthropic SDK

@anthropic-ai/sdk — cache-read vs cache-creation handled.

Node + Google Gemini SDK

@google/genai — usageMetadata + thinking tokens.

Python (FastAPI + OpenAI / Anthropic)

No SDK needed — thin httpx calls, fail-open guardrail.

Create a free workspace to get your API key — the in-app prompt bakes it in for you, so the agent has everything it needs.

Start free

BurnCap never proxies your model traffic and never stores prompts — these prompts only add out-of-band usage tracking and advisory budget checks. Enforcement always stays in your code.