Models
The public model registry, per-request model switching, and what it costs in latency.
The API exposes stable public model ids that hide checkpoints and infrastructure. List them:
curl -s https://api.kalpalabs.ai/v1/models -H "Authorization: Bearer $KALPA_API_KEY"{
"data": [
{
"id": "kalpa-conversational-v1",
"display_name": "Kalpa Conversational v1",
"description": "Flagship multi-speaker conversational speech model (TTS + converse).",
"modes": ["converse", "tts"],
"speakers": ["0", "1"],
"default": true
},
{ "id": "kalpa-conversational-8b", "modes": ["converse", "tts"], "speakers": ["0", "1"], "default": false },
{ "id": "kalpa-conversational-mini", "modes": ["converse", "tts"], "speakers": ["0", "1"], "default": false }
]
}| Model | Use it for |
|---|---|
kalpa-conversational-v1 | The flagship — best quality; the default when model is omitted. |
kalpa-conversational-8b | 8B variant — quality close to flagship at lower serving cost. |
kalpa-conversational-mini | Compact variant — fastest to load and cheapest to run. |
Choosing a model per request
Every generation endpoint takes a model field:
{ "text": "…", "model": "kalpa-conversational-mini" }- Omit it (or send
null) for the default model. The response'smodelfield always echoes the resolved public id, so logs stay unambiguous. - An unknown id, or a model that doesn't support the endpoint's mode, returns
400 invalid_request.
Speakers are per model
Each card's speakers lists the role labels that model understands, in turn order. Don't hardcode them — read the card and use its labels. The details (and why wrong labels degrade audio) are in Conversations.
Switching cost
One model is resident on the accelerator at a time. Requests to the resident model are fast; the first request after a switch pays the load — from seconds for the mini model to tens of seconds for the flagship. If your traffic is latency-sensitive, keep it on one model rather than alternating, and expect the first call to a cold model to be slow (set client timeouts accordingly).