# Models > The public model registry, per-request model switching, and what it costs in latency. The API exposes stable public model ids that hide checkpoints and infrastructure. List them: ```bash curl -s https://api.kalpalabs.ai/v1/models -H "Authorization: Bearer $KALPA_API_KEY" ``` ```json { "data": [ { "id": "kalpa-conversational-v1", "display_name": "Kalpa Conversational v1", "description": "Flagship multi-speaker conversational speech model (TTS + converse).", "modes": ["converse", "tts"], "speakers": ["0", "1"], "default": true }, { "id": "kalpa-conversational-8b", "modes": ["converse", "tts"], "speakers": ["0", "1"], "default": false }, { "id": "kalpa-conversational-mini", "modes": ["converse", "tts"], "speakers": ["0", "1"], "default": false } ] } ``` | Model | Use it for | |---|---| | `kalpa-conversational-v1` | The flagship — best quality; the default when `model` is omitted. | | `kalpa-conversational-8b` | 8B variant — quality close to flagship at lower serving cost. | | `kalpa-conversational-mini` | Compact variant — fastest to load and cheapest to run. | ## Choosing a model per request Every generation endpoint takes a `model` field: ```json { "text": "…", "model": "kalpa-conversational-mini" } ``` - Omit it (or send `null`) for the default model. The response's `model` field always echoes the **resolved** public id, so logs stay unambiguous. - An unknown id, or a model that doesn't support the endpoint's mode, returns `400 invalid_request`. ## Speakers are per model Each card's `speakers` lists the role labels that model understands, in turn order. Don't hardcode them — read the card and use its labels. The details (and why wrong labels degrade audio) are in [Conversations](/conversations). ## Switching cost One model is resident on the accelerator at a time. Requests to the resident model are fast; **the first request after a switch pays the load** — from seconds for the mini model to tens of seconds for the flagship. If your traffic is latency-sensitive, keep it on one model rather than alternating, and expect the first call to a cold model to be slow (set client timeouts accordingly).