# Rate limits & errors

> One error envelope for everything, per-key token buckets, and the caps the gateway enforces.

## Rate limits

Each key has a sustained requests-per-minute allowance and a burst capacity (a token bucket: `burst` requests available at once, refilling at the sustained rate). Both are set when the key is provisioned — ask [hello@kalpalabs.ai](mailto:hello@kalpalabs.ai) to raise them.

Every response reports where you stand:

| Header | Meaning |
|---|---|
| `X-RateLimit-Limit` | Your sustained requests/minute |
| `X-RateLimit-Remaining` | Requests left in the bucket right now |
| `X-RateLimit-Reset` | Seconds until the bucket refills |

Past the limit you get `429` with a `Retry-After` header:

```json
{ "error": { "type": "rate_limit_exceeded", "message": "Rate limit exceeded. Retry after 1.2s.", "request_id": "…" } }
```

Honor `Retry-After` and back off; all generation requests are safe to retry (nothing is committed on a failed call).

## The error envelope

Every error — validation, auth, rate limit, model failure, crash — is one shape:

```json
{ "error": { "type": "…", "message": "…", "request_id": "…" } }
```

| Status | `type` | Meaning |
|---|---|---|
| `400` | `invalid_request` | Semantically invalid: over a cap, undecodable audio, unknown model, a conversation that breaks the [turn rules](/conversations). |
| `401` | `authentication_error` | Missing or invalid API key. |
| `404` | `not_found` | No such path. |
| `405` | `method_not_allowed` | Wrong HTTP method for the path. |
| `422` | `invalid_request` | The body doesn't match the schema (missing field, wrong type, out-of-range value). |
| `429` | `rate_limit_exceeded` | Over your key's limit — honor `Retry-After`. |
| `500` | `internal_error` | Unexpected failure on our side. Report it with the `request_id`. |
| `502` | `inference_error` | The model backend failed or timed out. Retryable. |

## Request caps

The gateway enforces hard caps before anything reaches a model:

| Cap | Value |
|---|---|
| Text per request/turn | 8,000 characters |
| Turns per conversation | 64 |
| Audio per turn | 25 MiB decoded WAV |

Current values are always served at `GET /v1/info` under `limits`.

## Debugging a failed call

1. Read `error.type` — it's stable and machine-matchable; `message` is for humans.
2. `4xx` other than `429`: fix the request (the message says which field).
3. `429` / `502`: retry with backoff (`Retry-After` for 429).
4. Anything persistent: send us the `request_id` (also in the `X-Request-ID` response header).
