# Rate limits & errors > One error envelope for everything, per-key token buckets, and the caps the gateway enforces. ## Rate limits Each key has a sustained requests-per-minute allowance and a burst capacity (a token bucket: `burst` requests available at once, refilling at the sustained rate). Both are set when the key is provisioned — ask [hello@kalpalabs.ai](mailto:hello@kalpalabs.ai) to raise them. Every response reports where you stand: | Header | Meaning | |---|---| | `X-RateLimit-Limit` | Your sustained requests/minute | | `X-RateLimit-Remaining` | Requests left in the bucket right now | | `X-RateLimit-Reset` | Seconds until the bucket refills | Past the limit you get `429` with a `Retry-After` header: ```json { "error": { "type": "rate_limit_exceeded", "message": "Rate limit exceeded. Retry after 1.2s.", "request_id": "…" } } ``` Honor `Retry-After` and back off; all generation requests are safe to retry (nothing is committed on a failed call). ## The error envelope Every error — validation, auth, rate limit, model failure, crash — is one shape: ```json { "error": { "type": "…", "message": "…", "request_id": "…" } } ``` | Status | `type` | Meaning | |---|---|---| | `400` | `invalid_request` | Semantically invalid: over a cap, undecodable audio, unknown model, a conversation that breaks the [turn rules](/conversations). | | `401` | `authentication_error` | Missing or invalid API key. | | `404` | `not_found` | No such path. | | `405` | `method_not_allowed` | Wrong HTTP method for the path. | | `422` | `invalid_request` | The body doesn't match the schema (missing field, wrong type, out-of-range value). | | `429` | `rate_limit_exceeded` | Over your key's limit — honor `Retry-After`. | | `500` | `internal_error` | Unexpected failure on our side. Report it with the `request_id`. | | `502` | `inference_error` | The model backend failed or timed out. Retryable. | ## Request caps The gateway enforces hard caps before anything reaches a model: | Cap | Value | |---|---| | Text per request/turn | 8,000 characters | | Turns per conversation | 64 | | Audio per turn | 25 MiB decoded WAV | Current values are always served at `GET /v1/info` under `limits`. ## Debugging a failed call 1. Read `error.type` — it's stable and machine-matchable; `message` is for humans. 2. `4xx` other than `429`: fix the request (the message says which field). 3. `429` / `502`: retry with backoff (`Retry-After` for 429). 4. Anything persistent: send us the `request_id` (also in the `X-Request-ID` response header).