Guides
Rate limits & errors
One error envelope for everything, per-key token buckets, and the caps the gateway enforces.
Rate limits
Each key has a sustained requests-per-minute allowance and a burst capacity (a token bucket: burst requests available at once, refilling at the sustained rate). Both are set when the key is provisioned — ask [email protected] to raise them.
Every response reports where you stand:
| Header | Meaning |
|---|---|
X-RateLimit-Limit | Your sustained requests/minute |
X-RateLimit-Remaining | Requests left in the bucket right now |
X-RateLimit-Reset | Seconds until the bucket refills |
Past the limit you get 429 with a Retry-After header:
{ "error": { "type": "rate_limit_exceeded", "message": "Rate limit exceeded. Retry after 1.2s.", "request_id": "…" } }Honor Retry-After and back off; all generation requests are safe to retry (nothing is committed on a failed call).
The error envelope
Every error — validation, auth, rate limit, model failure, crash — is one shape:
{ "error": { "type": "…", "message": "…", "request_id": "…" } }| Status | type | Meaning |
|---|---|---|
400 | invalid_request | Semantically invalid: over a cap, undecodable audio, unknown model, a conversation that breaks the turn rules. |
401 | authentication_error | Missing or invalid API key. |
404 | not_found | No such path. |
405 | method_not_allowed | Wrong HTTP method for the path. |
422 | invalid_request | The body doesn't match the schema (missing field, wrong type, out-of-range value). |
429 | rate_limit_exceeded | Over your key's limit — honor Retry-After. |
500 | internal_error | Unexpected failure on our side. Report it with the request_id. |
502 | inference_error | The model backend failed or timed out. Retryable. |
Request caps
The gateway enforces hard caps before anything reaches a model:
| Cap | Value |
|---|---|
| Text per request/turn | 8,000 characters |
| Turns per conversation | 64 |
| Audio per turn | 25 MiB decoded WAV |
Current values are always served at GET /v1/info under limits.
Debugging a failed call
- Read
error.type— it's stable and machine-matchable;messageis for humans. 4xxother than429: fix the request (the message says which field).429/502: retry with backoff (Retry-Afterfor 429).- Anything persistent: send us the
request_id(also in theX-Request-IDresponse header).