Reference
.md ↗

API reference

Every endpoint, field and error — generated from the committed OpenAPI contract.

Kalpa Speech API v0.1.0, generated from the committed contract (openapi.json). Base URL https://api.kalpalabs.ai; bodies are JSON; authenticated endpoints take Authorization: Bearer $KALPA_API_KEY. Every error is the one envelope described in Rate limits & errors. Each endpoint runs from this page: fill the fields, hit send — the request and response stay in sync on the right, copyable as curl, Python or JavaScript.

API key

Stored only in this browser; requests go straight to api.kalpalabs.ai. No key? Write to [email protected].

POST/v1/tts

Synthesize speech from text.

Render the given text as speech (24 kHz mono WAV) in the requested speaker's voice.

Body params
textrequired
string · 1 – 8000 chars

Text to speak.

model
string

Public model id (see GET /v1/models). Omit/null for the default model.

params
object · 7 optional fields
depth_temperature
number · 0 – 1.5

Acoustic temperature; null = follow temperature.

max_new_tokens
integer · 16 – 2048 · default 512
penalty_window
integer · 1 – 80 · default 20
quantizers
integer · ≥ 1

Decode only the first N RVQ levels; null = full depth.

repetition_penalty
number · 0 – 6 · default 3
temperature
number · 0 – 1.5 · default 0.7
top_k
integer · ≥ 1

Backbone top-k; null = full vocabulary.

speaker
string · default "0"

Speaker role to render the text as (one of the model's speakers; see GET /v1/models).

Response 200 · TtsResponse
audioAudioPayload
audio.data_b64string

Base64-encoded 16-bit PCM WAV (mono).

audio.num_quantizersinteger

Number of RVQ levels decoded into this audio.

audio.sample_rateinteger

Sample rate of the audio in Hz.

audio.formatstring

Container/encoding of data_b64 (16-bit PCM WAV).

modelstring
request_idstring
textstring

The text that was spoken (echoes the request).

usageUsage
usage.input_audio_secondsnumber

Seconds of input audio supplied (converse).

usage.input_charsinteger

Characters of input text billed for this request.

usage.output_audio_secondsnumber

Seconds of audio generated.

metaobject

Backend-specific diagnostics (latency, frames, …).

POST/v1/converse

Complete the open (final) turn of a conversation.

Given a conversation, complete its last ('open') turn. A speaker-only open turn is authored (text + audio); an open turn with text is rendered as that speaker, conditioned on the prior turns (contextual TTS).

Body params
conversationrequired
array · 1 – 64 items

The conversation, oldest turn first; the last turn is the open turn to complete.

turn 1
audio_wav_b64
speaker
text
turn 2 · open turn (completed by the model)
audio_wav_b64
speaker
text
model
string

Public model id (see GET /v1/models). Omit/null for the default model.

params
object · 7 optional fields
depth_temperature
number · 0 – 1.5

Acoustic temperature; null = follow temperature.

max_new_tokens
integer · 16 – 2048 · default 512
penalty_window
integer · 1 – 80 · default 20
quantizers
integer · ≥ 1

Decode only the first N RVQ levels; null = full depth.

repetition_penalty
number · 0 – 6 · default 3
temperature
number · 0 – 1.5 · default 0.7
top_k
integer · ≥ 1

Backbone top-k; null = full vocabulary.

Response 200 · ConverseResponse
modelstring
replyConverseReply
reply.speakerstring
reply.textstring
reply.audioAudioPayload | null
reply.audio.data_b64string

Base64-encoded 16-bit PCM WAV (mono).

reply.audio.num_quantizersinteger

Number of RVQ levels decoded into this audio.

reply.audio.sample_rateinteger

Sample rate of the audio in Hz.

reply.audio.formatstring

Container/encoding of data_b64 (16-bit PCM WAV).

request_idstring
usageUsage
usage.input_audio_secondsnumber

Seconds of input audio supplied (converse).

usage.input_charsinteger

Characters of input text billed for this request.

usage.output_audio_secondsnumber

Seconds of audio generated.

metaobject

GET/v1/models

List available public models.

Response 200 · ModelsResponse
dataModelCard[]

The available public models.

data[].display_namestring

Human-readable model name.

data[].idstring

Stable public model id used in the model request field.

data[].modesstring[]

Supported modes: subset of ["converse", "tts"].

data[].speakersstring[]

Valid role labels for a turn's speaker, in turn order (e.g. ["0", "1"]).

data[].defaultboolean

True for the model used when model is omitted.

data[].descriptionstring

What this model is for.

GET/v1/info

Backend info, default params, and limits.

Response 200 · InfoResponse
backendobject

Active backend description (name, kind, sample_rate, …).

defaultsobject

Default generation params.

limitsobject

Request-validation caps the gateway enforces.

param_schemaobject[]

UI metadata for the generation knobs.

GET/v1/usage

Your metered usage.

Running totals (requests, input characters, audio seconds) for the calling API key.

Response 200 · UsageSummaryResponse
input_audio_secondsnumber
input_charsinteger
key_idstring
output_audio_secondsnumber
requestsinteger
last_request_tsnumber | null

GET/health

Liveness probe. No authentication.

Response 200 · HealthResponse
backendstring
readyboolean
statusstring