Reference

API reference

Every endpoint, field and error — generated from the committed OpenAPI contract.

Kalpa Speech API v0.1.0, generated from the committed contract (openapi.json). Base URL https://api.kalpalabs.ai; bodies are JSON; authenticated endpoints take Authorization: Bearer $KALPA_API_KEY. Every error is the one envelope described in Rate limits & errors. Each endpoint runs from this page: fill the fields, hit send — the request and response stay in sync on the right, copyable as curl, Python or JavaScript.

API key

Stored only in this browser; requests go straight to api.kalpalabs.ai. No key? Write to [email protected].

POST/v1/tts

Synthesize speech from text.

Render the given text as speech (24 kHz mono WAV) in the requested speaker's voice.

Body params

textrequired

string · 1 – 8000 chars

Text to speak.

model

string

Public model id (see GET /v1/models). Omit/null for the default model.

›params

object · 7 optional fields

depth_temperature

number · 0 – 1.5

Acoustic temperature; null = follow temperature.

max_new_tokens

integer · 16 – 2048 · default 512

penalty_window

integer · 1 – 80 · default 20

quantizers

integer · ≥ 1

Decode only the first N RVQ levels; null = full depth.

repetition_penalty

number · 0 – 6 · default 3

temperature

number · 0 – 1.5 · default 0.7

top_k

integer · ≥ 1

Backbone top-k; null = full vocabulary.

speaker

string · default "0"

Speaker role to render the text as (one of the model's speakers; see GET /v1/models).

Response 200 · TtsResponse

audioAudioPayload

audio.data_b64string

Base64-encoded 16-bit PCM WAV (mono).

audio.num_quantizersinteger

Number of RVQ levels decoded into this audio.

audio.sample_rateinteger

Sample rate of the audio in Hz.

audio.formatstring

Container/encoding of data_b64 (16-bit PCM WAV).

modelstring

request_idstring

textstring

The text that was spoken (echoes the request).

usageUsage

usage.input_audio_secondsnumber

Seconds of input audio supplied (converse).

usage.input_charsinteger

Characters of input text billed for this request.

usage.output_audio_secondsnumber

Seconds of audio generated.

metaobject

Backend-specific diagnostics (latency, frames, …).

curl -s https://api.kalpalabs.ai/v1/tts \
  -H "Authorization: Bearer $KALPA_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
  "text": "Hey there! How are you doing today?",
  "speaker": "0"
}'

200example

{
  "request_id": "req_9f2c1a",
  "model": "kalpa-conversational-v1",
  "text": "Hey there! How are you doing today?",
  "audio": {
    "format": "wav",
    "sample_rate": 24000,
    "num_quantizers": 32,
    "data_b64": "UklGRiQAAABXQVZFZm10…"
  },
  "usage": {
    "input_chars": 35,
    "input_audio_seconds": 0,
    "output_audio_seconds": 2.6
  },
  "meta": {}
}

POST/v1/converse

Complete the open (final) turn of a conversation.

Given a conversation, complete its last ('open') turn. A speaker-only open turn is authored (text + audio); an open turn with text is rendered as that speaker, conditioned on the prior turns (contextual TTS).

Body params

conversationrequired

array · 1 – 64 items

The conversation, oldest turn first; the last turn is the open turn to complete.

turn 1

audio_wav_b64

attach wav

speaker

text

turn 2 · open turn (completed by the model)

audio_wav_b64

attach wav

speaker

text

model

string

Public model id (see GET /v1/models). Omit/null for the default model.

›params

object · 7 optional fields

depth_temperature

number · 0 – 1.5

Acoustic temperature; null = follow temperature.

max_new_tokens

integer · 16 – 2048 · default 512

penalty_window

integer · 1 – 80 · default 20

quantizers

integer · ≥ 1

Decode only the first N RVQ levels; null = full depth.

repetition_penalty

number · 0 – 6 · default 3

temperature

number · 0 – 1.5 · default 0.7

top_k

integer · ≥ 1

Backbone top-k; null = full vocabulary.

Response 200 · ConverseResponse

modelstring

replyConverseReply

reply.speakerstring

reply.textstring

reply.audioAudioPayload | null

reply.audio.data_b64string

Base64-encoded 16-bit PCM WAV (mono).

reply.audio.num_quantizersinteger

Number of RVQ levels decoded into this audio.

reply.audio.sample_rateinteger

Sample rate of the audio in Hz.

reply.audio.formatstring

Container/encoding of data_b64 (16-bit PCM WAV).

request_idstring

usageUsage

usage.input_audio_secondsnumber

Seconds of input audio supplied (converse).

usage.input_charsinteger

Characters of input text billed for this request.

usage.output_audio_secondsnumber

Seconds of audio generated.

metaobject

curl -s https://api.kalpalabs.ai/v1/converse \
  -H "Authorization: Bearer $KALPA_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
  "conversation": [
    {
      "speaker": "0",
      "text": "Hi, who are you?"
    },
    {
      "speaker": "1"
    }
  ]
}'

200example

{
  "request_id": "req_4b81de",
  "model": "kalpa-conversational-v1",
  "reply": {
    "speaker": "1",
    "text": "I'm a speech model built by Kalpa Labs.",
    "audio": {
      "format": "wav",
      "sample_rate": 24000,
      "num_quantizers": 32,
      "data_b64": "UklGRiQAAABXQVZFZm10…"
    }
  },
  "usage": {
    "input_chars": 16,
    "input_audio_seconds": 0,
    "output_audio_seconds": 3.2
  },
  "meta": {}
}

GET/v1/models

List available public models.

Response 200 · ModelsResponse

dataModelCard[]

The available public models.

data[].display_namestring

Human-readable model name.

data[].idstring

Stable public model id used in the model request field.

data[].modesstring[]

Supported modes: subset of ["converse", "tts"].

data[].speakersstring[]

Valid role labels for a turn's speaker, in turn order (e.g. ["0", "1"]).

data[].defaultboolean

True for the model used when model is omitted.

data[].descriptionstring

What this model is for.

curl -s https://api.kalpalabs.ai/v1/models \
  -H "Authorization: Bearer $KALPA_API_KEY"

200example

{
  "data": [
    {
      "id": "kalpa-conversational-v1",
      "display_name": "Kalpa Conversational v1",
      "description": "Flagship multi-speaker conversational speech model (TTS + converse).",
      "modes": [
        "converse",
        "tts"
      ],
      "speakers": [
        "0",
        "1"
      ],
      "default": true
    },
    {
      "id": "kalpa-conversational-8b",
      "display_name": "Kalpa Conversational 8B",
      "description": "8B multi-speaker conversational speech model (TTS + converse).",
      "modes": [
        "converse",
        "tts"
      ],
      "speakers": [
        "0",
        "1"
      ],
      "default": false
    },
    {
      "id": "kalpa-conversational-mini",
      "display_name": "Kalpa Conversational Mini",
      "description": "Compact multi-speaker conversational speech model (TTS + converse).",
      "modes": [
        "converse",
        "tts"
      ],
      "speakers": [
        "0",
        "1"
      ],
      "default": false
    }
  ]
}

GET/v1/info

Backend info, default params, and limits.

Response 200 · InfoResponse

backendobject

Active backend description (name, kind, sample_rate, …).

defaultsobject

Default generation params.

limitsobject

Request-validation caps the gateway enforces.

param_schemaobject[]

UI metadata for the generation knobs.

curl -s https://api.kalpalabs.ai/v1/info \
  -H "Authorization: Bearer $KALPA_API_KEY"

200example

{
  "backend": {
    "name": "menka",
    "kind": "http",
    "sample_rate": 24000,
    "ready": true
  },
  "defaults": {
    "temperature": 0.7,
    "repetition_penalty": 3,
    "penalty_window": 20,
    "max_new_tokens": 512
  },
  "limits": {
    "max_text_chars": 8000,
    "max_conversation_turns": 64,
    "max_audio_bytes": 26214400
  },
  "param_schema": [
    "…"
  ]
}

GET/v1/usage

Your metered usage.

Running totals (requests, input characters, audio seconds) for the calling API key.

Response 200 · UsageSummaryResponse

input_audio_secondsnumber

input_charsinteger

key_idstring

output_audio_secondsnumber

requestsinteger

last_request_tsnumber | null

curl -s https://api.kalpalabs.ai/v1/usage \
  -H "Authorization: Bearer $KALPA_API_KEY"

200example

{
  "key_id": "acme",
  "requests": 1284,
  "input_chars": 91230,
  "input_audio_seconds": 411,
  "output_audio_seconds": 3120.5,
  "last_request_ts": 1751500000
}

GET/health

Liveness probe. No authentication.

Response 200 · HealthResponse

backendstring

readyboolean

statusstring