Guides

Rate limits & errors

One error envelope for everything, per-key token buckets, and the caps the gateway enforces.

Rate limits

Each key has a sustained requests-per-minute allowance and a burst capacity (a token bucket: burst requests available at once, refilling at the sustained rate). Both are set when the key is provisioned — ask [email protected] to raise them.

Every response reports where you stand:

Header	Meaning
`X-RateLimit-Limit`	Your sustained requests/minute
`X-RateLimit-Remaining`	Requests left in the bucket right now
`X-RateLimit-Reset`	Seconds until the bucket refills

Past the limit you get 429 with a Retry-After header:

json

{ "error": { "type": "rate_limit_exceeded", "message": "Rate limit exceeded. Retry after 1.2s.", "request_id": "…" } }

Honor Retry-After and back off; all generation requests are safe to retry (nothing is committed on a failed call).

The error envelope

Every error — validation, auth, rate limit, model failure, crash — is one shape:

json

{ "error": { "type": "…", "message": "…", "request_id": "…" } }

Status	`type`	Meaning
`400`	`invalid_request`	Semantically invalid: over a cap, undecodable audio, unknown model, a conversation that breaks the turn rules.
`401`	`authentication_error`	Missing or invalid API key.
`404`	`not_found`	No such path.
`405`	`method_not_allowed`	Wrong HTTP method for the path.
`422`	`invalid_request`	The body doesn't match the schema (missing field, wrong type, out-of-range value).
`429`	`rate_limit_exceeded`	Over your key's limit — honor `Retry-After`.
`500`	`internal_error`	Unexpected failure on our side. Report it with the `request_id`.
`502`	`inference_error`	The model backend failed or timed out. Retryable.

Request caps

The gateway enforces hard caps before anything reaches a model:

Cap	Value
Text per request/turn	8,000 characters
Turns per conversation	64
Audio per turn	25 MiB decoded WAV

Current values are always served at GET /v1/info under limits.

Debugging a failed call

Read error.type — it's stable and machine-matchable; message is for humans.
4xx other than 429: fix the request (the message says which field).
429 / 502: retry with backoff (Retry-After for 429).
Anything persistent: send us the request_id (also in the X-Request-ID response header).