ar-agents is a Mercado Pago Agent Toolkit for the Vercel AI SDK 6. It ships 89 typed tools that an LLM agent can call directly to drive Mercado Pago billing flows: subscriptions, payments, refunds, checkout pro, marketplace OAuth, cuotas (installments), QR in-store, 3DS challenge resolution, point-of-sale devices, webhooks. Sidecar packages cover AFIP/ARCA, WhatsApp Business Cloud, banking (CBU/CVU + BCRA), and shipping (Andreani/OCA/Correo Argentino).

How is this different from the official mercadopago SDK?

The official mercadopago SDK is a thin REST client. It does not ship Vercel AI SDK tool schemas, does not implement webhook HMAC verification with replay protection, does not run on Edge Runtime (Node-only), and does not gate irreversible operations. ar-agents adds all of that on top of the underlying API: 89 typed tools, deterministic idempotency keys derived from inputs, programmatic human-in-the-loop on refund/cancel/delete, npm provenance attestation, Vercel KV adapters via subpath, OpenTelemetry instrumentation. You can use both packages in the same project; ar-agents wraps the underlying API directly without depending on the official SDK.

Does ar-agents work on Edge Runtime?

Yes. The whole package is Web Crypto-based with no node:crypto dependency, so it runs on Vercel Edge Functions, Cloudflare Workers, Deno, and any other V8-isolate runtime. Webhook signature verification, HMAC, and idempotency-key generation all use the Web Crypto API.

What is HITL (human-in-the-loop) in ar-agents?

Eight tools mutate state irreversibly (refund_payment, cancel_subscription, delete_customer_card, etc.). The toolkit accepts a requireConfirmation callback that gates each invocation: the tool function literally will not execute until your callback returns true. This is a programmatic gate, not just an LLM instruction. You can show the user a UI and wait for their explicit approval before any irreversible operation runs.

How does idempotency work?

Every POST request gets an auto-generated idempotency key. For LLM-driven retries, four mutating tools (create_payment, create_subscription, create_payment_preference, refund_payment) use a deterministic key derived from a SHA-256 hash of the meaningful inputs (external_reference, amount, payment_method, etc.). Same inputs produce the same key, so retries return the existing resource instead of double-charging the customer.

Yes. MIT license. No paid tier, no telemetry phone-home, no usage caps. The package is published to npm under the @ar-agents scope with SLSA v1 provenance attestations.

What about AFIP, WhatsApp, banking, shipping?

Sidecar packages cover the rest of the Argentine business stack: @ar-agents/identity (CUIT/CUIL validation + AFIP/ARCA padron lookup with monotributo category and IVA condition), @ar-agents/facturacion (AFIP/ARCA factura electronica via WSFE), @ar-agents/whatsapp (WhatsApp Business Cloud API with HMAC webhook verify and AR phone normalizer), @ar-agents/banking (CBU/CVU validation + BCRA Central de Deudores), @ar-agents/shipping (Andreani, OCA, Correo Argentino), @ar-agents/identity-attest (HMAC-signed verification orchestrator). Each ships independently to npm.

Is there a Model Context Protocol (MCP) server?

Yes. @ar-agents/mcp bundles all 7 ar-agents packages into a single MCP server compatible with Claude Desktop, Cursor, Codeium, Continue, Cline, or any MCP host. Auto-detects which packages to enable from environment variables. Listed on Glama (glama.ai/mcp/servers/ar-agents/ar-agents) and the official MCP Registry (io.github.ar-agents/mcp).

/architecture/audit-log · the HMAC + KV + verify lifecycle

The forensic primitive that lets RFC-001 § 9.2 claim 'legally probative'. Canonical-JSON serialization → HMAC-SHA256 signature → Vercel KV append-only storage → public read with server-side re-verify. Every design choice traced to a concrete failure mode it prevents.

The audit log is the single most-load-bearing primitive in the ar-agents stack. The whole regulator pitch — "this isn't just a UI, it's mechanically forensic" — collapses if any of the four steps fails:

Canonical serialization must be deterministic and stable across writes + reads (or the same entry would sign differently each time).
HMAC-SHA256 must be computed identically on sign and verify (or every signed entry would falsely appear tampered).
Storage must persist append-only across Edge instances (or reads from a different instance would see an empty log).
Public re-verification must be available to any third party (or the "anyone can verify" claim is theoretical).

This page is the line-by-line walkthrough of how each step is implemented, why it's implemented that way, and what would break if it weren't.

1 · The data shape

Every entry is a flat object with these fields. The shape is defined in src/lib/audit.ts:

interface AuditEntry {
  id: string;                    // ISO timestamp + 8-char random suffix
  sessionId: string;             // 8-64 char [A-Za-z0-9_-]
  ts: string;                    // ISO 8601 UTC
  tool: string;                  // "validate_cuit" | "crear_factura" | etc.
  governance: AuditGovernance;   // RFC-001 governance class
  input: unknown;                // canonical-JSON-serializable
  output?: unknown;              // optional, omitted on errored
  errored?: boolean;
  durationMs?: number;
  hmac: string | null;           // "sha256:<hex>" — null only when secret not wired
}

Why this exact shape: the HMAC needs a fixed input space, so all fields are explicit, none are inferred at read time. The id is ISO-prefixed so the natural string sort matches chronological order — useful when a downstream consumer wants to merge entries from multiple sources without timestamp parsing. sessionId validates against a strict regex (/^[A-Za-z0-9_-]{8,64}$/) — short enough to be UUIDs, long enough to be opaque tokens, no characters that need URL-encoding.

2 · Canonical-JSON serialization

The HMAC is computed over a canonical JSON serialization of the entry, with object keys sorted alphabetically. Without this, two entries with the same data but different key insertion order would sign differently — and JavaScript's default JSON.stringify uses insertion order, not alphabetical.

function canonical(value: unknown): string {
  if (value === null || typeof value !== "object")
    return JSON.stringify(value);
  if (Array.isArray(value))
    return `[${value.map(canonical).join(",")}]`;
  const obj = value as Record<string, unknown>;
  const keys = Object.keys(obj).sort();
  return `{${keys.map((k) => `${JSON.stringify(k)}:${canonical(obj[k])}`).join(",")}}`;
}

Two invariants that took a real bug to find:

Sign and verify must omit the same fields. Thehmacfield obviously can't be in the serialization (it doesn't exist yet at sign time, and verify would loop on itself). The original implementation had a subtle bug: signEntry received an object with hmac: null already set, verifyEntry destructured hmac out before serializing. Result: every signed entry appeared tampered on verify. Fixed in commit 184a424 — both functions now strip hmac at runtime before serializing. Caught by the unit tests in apps/landing/test/audit.test.ts.
Stable across object construction. A test asserts that canonical({a:1, b:2}) === canonical({b:2, a:1}). If a downstream refactor breaks this (e.g., switching to Map internally), the test fires.

3 · HMAC-SHA256 via Web Crypto

The signature uses Web Crypto's crypto.subtle.sign + verify. Web Crypto-only is a hard requirement of the Edge Runtime contract (see /architecture); node:cryptoisn't available there. The secret is a 64-char hex string from openssl rand -hex 32, lives in AUDIT_HMAC_SECRET env var, and is imported into a CryptoKey once per process and cached.

async function getHmacKey(): Promise<CryptoKey | null> {
  const secret = process.env.AUDIT_HMAC_SECRET?.trim();
  if (!secret) return null;
  if (cachedKey.key && cachedKey.secret === secret) return cachedKey.key;
  const key = await crypto.subtle.importKey(
    "raw",
    enc.encode(secret),
    { name: "HMAC", hash: "SHA-256" },
    false,        // not extractable — can't be exported back
    ["sign", "verify"],
  );
  cachedKey.key = key;
  cachedKey.secret = secret;
  return key;
}

Why extractable: false: even though the secret already lives in process env, marking the key non-extractable prevents accidental serialization (e.g., via debugger tools or a downstream library that introspects keys). Defense in depth for ~2 lines of additional safety.

Verification uses crypto.subtle.verify directly — Web Crypto's implementation is constant-time on the byte comparison (per WebCrypto spec § 18). A naive === on hex strings would be timing-attack vulnerable. We never roll our own.

4 · Storage: Vercel KV (Upstash) with in-memory fallback

Entries land in a Vercel KV (Upstash Redis) list keyed by play:audit:{sessionId}. Append via RPUSH, read via LRANGE 0 -1, TTL via EXPIRE at 7 days. The TTL bounds cost — KV free tier has finite storage — and the 7-day window is long enough to span a forensic challenge cycle while short enough to let demo sessions naturally expire.

if (isKvWired()) {
  try {
    await kv.rpush(key(sessionId), entry);
    await kv.expire(key(sessionId), ENTRY_TTL_SECONDS);
  } catch {
    // KV down — fall through to in-memory so the demo doesn't break.
    const arr = memStore.get(sessionId) ?? [];
    arr.push(entry);
    memStore.set(sessionId, arr);
  }
} else {
  // No KV — in-memory only. Per-instance, no cross-Edge persistence.
  // Accepted degradation for PR previews + local dev without secrets.
  ...
}

Why botha KV path and an in-memory fallback: PR previews and local dev don't have KV credentials, but the demo still has to work for a maintainer testing a feature. The fallback is per-instance (Edge functions don't share memory across cold starts), so cross-instance reads return empty — but every smoke-test in CI hits the same instance once and verifies the round-trip works. Production-mode (KV) is verified end-to-end via the live probe session from earlier deploys.

5 · Read + verify lifecycle

The read endpoint (GET /api/play/audit/{sessionId}) returns all entries. The query param ?verify=1 additionally re-runs HMAC verification per entry and returns aggregate stats:

{
  "sessionId": "...",
  "backend": "vercel-kv",
  "count": 5,
  "entries": [...],
  "verification": {            // only when ?verify=1
    "total": 5,
    "verified": 5,
    "tampered": 0,
    "hmacWired": true
  }
}

The verification re-imports the same AUDIT_HMAC_SECRET, walks each entry, strips the hmac field, recomputes the canonical serialization, and compares the recomputed signature to the stored one via constant-time crypto.subtle.verify. Any mismatch increments tampered.

The endpoint is intentionally unauthenticated. Session ids are opaque enough that enumeration is not a meaningful attack (UUIDs + the 8-64 char regex make brute-force impractical), and the public-readability is a feature: anyone can ask "is this log clean?" without coordinating with the operator. RFC-001 § 9.2 hinges on this.

6 · Streaming reads via SSE

For consumers that want real-time updates (e.g., the live /dashboard view or a compliance ops tool watching tenants), GET /api/play/audit-stream/{sessionId} returns Server-Sent Events. Initial snapshot + delta-emit on a 2s tick + 15s keep-alive ping + 5min uptime cap. EventSource clients auto-reconnect.

Why polling KV every 2s rather than Redis pub/sub: the audit write is already a KV operation. Adding a separate pub/sub channel doubles the failure modes (entry lands in KV but pub/sub message lost). Polling against the same KV is simpler + idempotent — duplicate ticks read the same state and emit no events. The 2s tick is well under what KV can sustain on the free tier.

7 · The badge endpoint

GET /api/badge/{sessionId} returns a 24px shields.io-style SVG that updates with the verification state: blue "verified · N/M" when clean, red "tampered · N" when at least one signature mismatches, gray "no-hmac" or "no entries" otherwise. 60-second cache.

This is the surface that propagates the forensic claim virally. An operator embeds the badge in their landing page; any visitor sees a recomputable verification status without knowing what HMAC means. The badge link itself can be shared in WhatsApp / Slack / Twitter — preview cards render the SVG directly.

8 · Probative-value reasoning (RFC-001 § 9.2)

For the audit log to be legally probative, three things must hold:

The signature must be reproducible by a third party. Anyone with the public-readable entry + knowledge of the canonical-JSON algorithm + the operator's server-held secret can recompute. The endpoint exposes the?verify=1path so the third party doesn't need the secret to ask "is it clean?".
Tampering must leave a mechanical trail. Editing any field of a signed entry breaks the HMAC. The forgery path requires either (a) the secret, or (b) a collision in HMAC-SHA256, which is computationally infeasible.
The operator must commit to retaining the secret. If the operator rotates AUDIT_HMAC_SECRET mid-session, prior entries become un-verifiable. Production deployments should treat secret rotation as a forensic event itself: rotate, re-sign all entries with the new secret, publish the rotation event in a public log.

The full RFC-001 § 9 text covers the legal-framework arguments. This page is the engineering side of the same contract.

9 · How a regulator audits this in practice

Open /play in a browser tab. Note the per-page-load sessionId shown in the audit pane.
Run any scenario. Tool calls land in the audit pane, each with an HMAC suffix.
Open /api/play/audit/{sessionId}?verify=1 in a new tab. Confirm the JSON shows verification.verified equal to verification.total and tampered: 0.
(Optional) attempt to demonstrate tampering: hit POST /api/play/tamper-demoand confirm the response shows the original entry verifies, the mutated entry does not. The demo is read-only — it doesn't touch the live log — but it proves the algorithm catches edits mechanically.
(Optional) verify with your own toolkit: pull the same?verify=1 JSON, recompute HMAC-SHA256 using a server-side helper of your choice (the canonical-JSON algorithm is published above + in src/lib/audit.ts), compare to the stored signature.

10 · Tests as the proof contract

The audit primitives have 16 unit tests in apps/landing/test/audit.test.ts. Each test is a clause of the proof contract:

Sign + verify must agree on the input space (the bug-fix test).
Tampering on input or tool name must be detected.
Malformed HMAC strings must be rejected (no parse-side oracle).
Object-key reordering must produce the same signature (canonical-JSON stability).
The backend autodetects from env (vercel-kv when KV vars present, else in-memory).
Append + read order is preserved across both backends.

Plus 18 SSE primitive tests, 17 badge tests, 34 incorporate client tests = 85 TS tests. Plus 22 Python tests on the SDK port. Total: 107.

11 · Open questions

Long-term retention: KV TTL is 7 days. For regulated workloads that need year-scale retention, the recommended pattern is a nightly cron that mirrors entries to S3 with object lock. The toolkit doesn't ship this yet — operators wire it.
Multi-region replication: KV is currently sa-east-1 (São Paulo). Cross-region reads work via Upstash replication but add latency. Worth the trade-off for AR-side sociedades; might not be for global multi-tenant workloads.
Threshold-based key rotation: a future iteration could split signing into "current" + "previous" keys to support rolling rotation without invalidating existing entries.

References

src/lib/audit.ts — primary implementation.
test/audit.test.ts — 16 unit tests.
RFC-001 § 9 — legal framework.
/verify — the public re-verification UI.
/dashboard/{sessionId} — the live forensic timeline.
W3C WebCrypto §HMAC operations — the spec we depend on.
RFC 8785 — JSON Canonicalization Scheme — the spec our canonicalization is heavily inspired by (we ship a subset, not full JCS).

The audit log lifecycle.