Guides

Models

The public model registry, per-request model switching, and what it costs in latency.

The API exposes stable public model ids that hide checkpoints and infrastructure. List them:

bash

curl -s https://api.kalpalabs.ai/v1/models -H "Authorization: Bearer $KALPA_API_KEY"

json

{
  "data": [
    {
      "id": "kalpa-conversational-v1",
      "display_name": "Kalpa Conversational v1",
      "description": "Flagship multi-speaker conversational speech model (TTS + converse).",
      "modes": ["converse", "tts"],
      "speakers": ["0", "1"],
      "default": true
    },
    { "id": "kalpa-conversational-8b",   "modes": ["converse", "tts"], "speakers": ["0", "1"], "default": false },
    { "id": "kalpa-conversational-mini", "modes": ["converse", "tts"], "speakers": ["0", "1"], "default": false }
  ]
}

Model	Use it for
`kalpa-conversational-v1`	The flagship — best quality; the default when `model` is omitted.
`kalpa-conversational-8b`	8B variant — quality close to flagship at lower serving cost.
`kalpa-conversational-mini`	Compact variant — fastest to load and cheapest to run.

Choosing a model per request

Every generation endpoint takes a model field:

json

{ "text": "…", "model": "kalpa-conversational-mini" }

Omit it (or send null) for the default model. The response's model field always echoes the resolved public id, so logs stay unambiguous.
An unknown id, or a model that doesn't support the endpoint's mode, returns 400 invalid_request.

Speakers are per model

Each card's speakers lists the role labels that model understands, in turn order. Don't hardcode them — read the card and use its labels. The details (and why wrong labels degrade audio) are in Conversations.

Switching cost

One model is resident on the accelerator at a time. Requests to the resident model are fast; the first request after a switch pays the load — from seconds for the mini model to tens of seconds for the flagship. If your traffic is latency-sensitive, keep it on one model rather than alternating, and expect the first call to a cold model to be slow (set client timeouts accordingly).