The audit log is the single most-load-bearing primitive in the ar-agents stack. The whole regulator pitch — "this isn't just a UI, it's mechanically forensic" — collapses if any of the four steps fails:
- Canonical serialization must be deterministic and stable across writes + reads (or the same entry would sign differently each time).
- HMAC-SHA256 must be computed identically on sign and verify (or every signed entry would falsely appear tampered).
- Storage must persist append-only across Edge instances (or reads from a different instance would see an empty log).
- Public re-verification must be available to any third party (or the "anyone can verify" claim is theoretical).
This page is the line-by-line walkthrough of how each step is implemented, why it's implemented that way, and what would break if it weren't.
1 · The data shape
Every entry is a flat object with these fields. The shape is defined in src/lib/audit.ts:
interface AuditEntry {
id: string; // ISO timestamp + 8-char random suffix
sessionId: string; // 8-64 char [A-Za-z0-9_-]
ts: string; // ISO 8601 UTC
tool: string; // "validate_cuit" | "crear_factura" | etc.
governance: AuditGovernance; // RFC-001 governance class
input: unknown; // canonical-JSON-serializable
output?: unknown; // optional, omitted on errored
errored?: boolean;
durationMs?: number;
hmac: string | null; // "sha256:<hex>" — null only when secret not wired
}Why this exact shape: the HMAC needs a fixed input space, so all fields are explicit, none are inferred at read time. The id is ISO-prefixed so the natural string sort matches chronological order — useful when a downstream consumer wants to merge entries from multiple sources without timestamp parsing. sessionId validates against a strict regex (/^[A-Za-z0-9_-]{8,64}$/) — short enough to be UUIDs, long enough to be opaque tokens, no characters that need URL-encoding.
2 · Canonical-JSON serialization
The HMAC is computed over a canonical JSON serialization of the entry, with object keys sorted alphabetically. Without this, two entries with the same data but different key insertion order would sign differently — and JavaScript's default JSON.stringify uses insertion order, not alphabetical.
function canonical(value: unknown): string {
if (value === null || typeof value !== "object")
return JSON.stringify(value);
if (Array.isArray(value))
return `[${value.map(canonical).join(",")}]`;
const obj = value as Record<string, unknown>;
const keys = Object.keys(obj).sort();
return `{${keys.map((k) => `${JSON.stringify(k)}:${canonical(obj[k])}`).join(",")}}`;
}Two invariants that took a real bug to find:
- Sign and verify must omit the same fields. The
hmacfield obviously can't be in the serialization (it doesn't exist yet at sign time, and verify would loop on itself). The original implementation had a subtle bug:signEntryreceived an object withhmac: nullalready set,verifyEntrydestructuredhmacout before serializing. Result: every signed entry appeared tampered on verify. Fixed in commit 184a424 — both functions now striphmacat runtime before serializing. Caught by the unit tests inapps/landing/test/audit.test.ts. - Stable across object construction. A test asserts that
canonical({a:1, b:2})===canonical({b:2, a:1}). If a downstream refactor breaks this (e.g., switching toMapinternally), the test fires.
3 · HMAC-SHA256 via Web Crypto
The signature uses Web Crypto's crypto.subtle.sign + verify. Web Crypto-only is a hard requirement of the Edge Runtime contract (see /architecture); node:cryptoisn't available there. The secret is a 64-char hex string from openssl rand -hex 32, lives in AUDIT_HMAC_SECRET env var, and is imported into a CryptoKey once per process and cached.
async function getHmacKey(): Promise<CryptoKey | null> {
const secret = process.env.AUDIT_HMAC_SECRET?.trim();
if (!secret) return null;
if (cachedKey.key && cachedKey.secret === secret) return cachedKey.key;
const key = await crypto.subtle.importKey(
"raw",
enc.encode(secret),
{ name: "HMAC", hash: "SHA-256" },
false, // not extractable — can't be exported back
["sign", "verify"],
);
cachedKey.key = key;
cachedKey.secret = secret;
return key;
}Why extractable: false: even though the secret already lives in process env, marking the key non-extractable prevents accidental serialization (e.g., via debugger tools or a downstream library that introspects keys). Defense in depth for ~2 lines of additional safety.
Verification uses crypto.subtle.verify directly — Web Crypto's implementation is constant-time on the byte comparison (per WebCrypto spec § 18). A naive === on hex strings would be timing-attack vulnerable. We never roll our own.
4 · Storage: Vercel KV (Upstash) with in-memory fallback
Entries land in a Vercel KV (Upstash Redis) list keyed by play:audit:{sessionId}. Append via RPUSH, read via LRANGE 0 -1, TTL via EXPIRE at 7 days. The TTL bounds cost — KV free tier has finite storage — and the 7-day window is long enough to span a forensic challenge cycle while short enough to let demo sessions naturally expire.
if (isKvWired()) {
try {
await kv.rpush(key(sessionId), entry);
await kv.expire(key(sessionId), ENTRY_TTL_SECONDS);
} catch {
// KV down — fall through to in-memory so the demo doesn't break.
const arr = memStore.get(sessionId) ?? [];
arr.push(entry);
memStore.set(sessionId, arr);
}
} else {
// No KV — in-memory only. Per-instance, no cross-Edge persistence.
// Accepted degradation for PR previews + local dev without secrets.
...
}Why botha KV path and an in-memory fallback: PR previews and local dev don't have KV credentials, but the demo still has to work for a maintainer testing a feature. The fallback is per-instance (Edge functions don't share memory across cold starts), so cross-instance reads return empty — but every smoke-test in CI hits the same instance once and verifies the round-trip works. Production-mode (KV) is verified end-to-end via the live probe session from earlier deploys.
5 · Read + verify lifecycle
The read endpoint (GET /api/play/audit/{sessionId}) returns all entries. The query param ?verify=1 additionally re-runs HMAC verification per entry and returns aggregate stats:
{
"sessionId": "...",
"backend": "vercel-kv",
"count": 5,
"entries": [...],
"verification": { // only when ?verify=1
"total": 5,
"verified": 5,
"tampered": 0,
"hmacWired": true
}
}The verification re-imports the same AUDIT_HMAC_SECRET, walks each entry, strips the hmac field, recomputes the canonical serialization, and compares the recomputed signature to the stored one via constant-time crypto.subtle.verify. Any mismatch increments tampered.
The endpoint is intentionally unauthenticated. Session ids are opaque enough that enumeration is not a meaningful attack (UUIDs + the 8-64 char regex make brute-force impractical), and the public-readability is a feature: anyone can ask "is this log clean?" without coordinating with the operator. RFC-001 § 9.2 hinges on this.
6 · Streaming reads via SSE
For consumers that want real-time updates (e.g., the live /dashboard view or a compliance ops tool watching tenants), GET /api/play/audit-stream/{sessionId} returns Server-Sent Events. Initial snapshot + delta-emit on a 2s tick + 15s keep-alive ping + 5min uptime cap. EventSource clients auto-reconnect.
Why polling KV every 2s rather than Redis pub/sub: the audit write is already a KV operation. Adding a separate pub/sub channel doubles the failure modes (entry lands in KV but pub/sub message lost). Polling against the same KV is simpler + idempotent — duplicate ticks read the same state and emit no events. The 2s tick is well under what KV can sustain on the free tier.
7 · The badge endpoint
GET /api/badge/{sessionId} returns a 24px shields.io-style SVG that updates with the verification state: blue "verified · N/M" when clean, red "tampered · N" when at least one signature mismatches, gray "no-hmac" or "no entries" otherwise. 60-second cache.
This is the surface that propagates the forensic claim virally. An operator embeds the badge in their landing page; any visitor sees a recomputable verification status without knowing what HMAC means. The badge link itself can be shared in WhatsApp / Slack / Twitter — preview cards render the SVG directly.
8 · Probative-value reasoning (RFC-001 § 9.2)
For the audit log to be legally probative, three things must hold:
- The signature must be reproducible by a third party. Anyone with the public-readable entry + knowledge of the canonical-JSON algorithm + the operator's server-held secret can recompute. The endpoint exposes the
?verify=1path so the third party doesn't need the secret to ask "is it clean?". - Tampering must leave a mechanical trail. Editing any field of a signed entry breaks the HMAC. The forgery path requires either (a) the secret, or (b) a collision in HMAC-SHA256, which is computationally infeasible.
- The operator must commit to retaining the secret. If the operator rotates
AUDIT_HMAC_SECRETmid-session, prior entries become un-verifiable. Production deployments should treat secret rotation as a forensic event itself: rotate, re-sign all entries with the new secret, publish the rotation event in a public log.
The full RFC-001 § 9 text covers the legal-framework arguments. This page is the engineering side of the same contract.
9 · How a regulator audits this in practice
- Open /play in a browser tab. Note the per-page-load
sessionIdshown in the audit pane. - Run any scenario. Tool calls land in the audit pane, each with an HMAC suffix.
- Open
/api/play/audit/{sessionId}?verify=1in a new tab. Confirm the JSON showsverification.verifiedequal toverification.totalandtampered: 0. - (Optional) attempt to demonstrate tampering: hit
POST /api/play/tamper-demoand confirm the response shows the original entry verifies, the mutated entry does not. The demo is read-only — it doesn't touch the live log — but it proves the algorithm catches edits mechanically. - (Optional) verify with your own toolkit: pull the same
?verify=1JSON, recompute HMAC-SHA256 using a server-side helper of your choice (the canonical-JSON algorithm is published above + insrc/lib/audit.ts), compare to the stored signature.
10 · Tests as the proof contract
The audit primitives have 16 unit tests in apps/landing/test/audit.test.ts. Each test is a clause of the proof contract:
- Sign + verify must agree on the input space (the bug-fix test).
- Tampering on input or tool name must be detected.
- Malformed HMAC strings must be rejected (no parse-side oracle).
- Object-key reordering must produce the same signature (canonical-JSON stability).
- The backend autodetects from env (
vercel-kvwhen KV vars present, elsein-memory). - Append + read order is preserved across both backends.
Plus 18 SSE primitive tests, 17 badge tests, 34 incorporate client tests = 85 TS tests. Plus 22 Python tests on the SDK port. Total: 107.
11 · Open questions
- Long-term retention: KV TTL is 7 days. For regulated workloads that need year-scale retention, the recommended pattern is a nightly cron that mirrors entries to S3 with object lock. The toolkit doesn't ship this yet — operators wire it.
- Multi-region replication: KV is currently sa-east-1 (São Paulo). Cross-region reads work via Upstash replication but add latency. Worth the trade-off for AR-side sociedades; might not be for global multi-tenant workloads.
- Threshold-based key rotation: a future iteration could split signing into "current" + "previous" keys to support rolling rotation without invalidating existing entries.
References
- src/lib/audit.ts — primary implementation.
- test/audit.test.ts — 16 unit tests.
- RFC-001 § 9 — legal framework.
- /verify — the public re-verification UI.
- /dashboard/{sessionId} — the live forensic timeline.
- W3C WebCrypto §HMAC operations — the spec we depend on.
- RFC 8785 — JSON Canonicalization Scheme — the spec our canonicalization is heavily inspired by (we ship a subset, not full JCS).