Guides
.md ↗

Rate limits & errors

One error envelope for everything, per-key token buckets, and the caps the gateway enforces.

Rate limits

Each key has a sustained requests-per-minute allowance and a burst capacity (a token bucket: burst requests available at once, refilling at the sustained rate). Both are set when the key is provisioned — ask [email protected] to raise them.

Every response reports where you stand:

HeaderMeaning
X-RateLimit-LimitYour sustained requests/minute
X-RateLimit-RemainingRequests left in the bucket right now
X-RateLimit-ResetSeconds until the bucket refills

Past the limit you get 429 with a Retry-After header:

json
{ "error": { "type": "rate_limit_exceeded", "message": "Rate limit exceeded. Retry after 1.2s.", "request_id": "…" } }

Honor Retry-After and back off; all generation requests are safe to retry (nothing is committed on a failed call).

The error envelope

Every error — validation, auth, rate limit, model failure, crash — is one shape:

json
{ "error": { "type": "…", "message": "…", "request_id": "…" } }
StatustypeMeaning
400invalid_requestSemantically invalid: over a cap, undecodable audio, unknown model, a conversation that breaks the turn rules.
401authentication_errorMissing or invalid API key.
404not_foundNo such path.
405method_not_allowedWrong HTTP method for the path.
422invalid_requestThe body doesn't match the schema (missing field, wrong type, out-of-range value).
429rate_limit_exceededOver your key's limit — honor Retry-After.
500internal_errorUnexpected failure on our side. Report it with the request_id.
502inference_errorThe model backend failed or timed out. Retryable.

Request caps

The gateway enforces hard caps before anything reaches a model:

CapValue
Text per request/turn8,000 characters
Turns per conversation64
Audio per turn25 MiB decoded WAV

Current values are always served at GET /v1/info under limits.

Debugging a failed call

  1. Read error.type — it's stable and machine-matchable; message is for humans.
  2. 4xx other than 429: fix the request (the message says which field).
  3. 429 / 502: retry with backoff (Retry-After for 429).
  4. Anything persistent: send us the request_id (also in the X-Request-ID response header).