API reference
Every endpoint, field and error — generated from the committed OpenAPI contract.
Kalpa Speech API v0.1.0, generated from the committed contract (openapi.json). Base URL https://api.kalpalabs.ai; bodies are JSON; authenticated endpoints take Authorization: Bearer $KALPA_API_KEY. Every error is the one envelope described in Rate limits & errors. Each endpoint runs from this page: fill the fields, hit send — the request and response stay in sync on the right, copyable as curl, Python or JavaScript.
Stored only in this browser; requests go straight to api.kalpalabs.ai. No key? Write to [email protected].
POST/v1/tts
Synthesize speech from text.
Render the given text as speech (24 kHz mono WAV) in the requested speaker's voice.
textrequiredText to speak.
modelPublic model id (see GET /v1/models). Omit/null for the default model.
›paramsobject · 7 optional fields
paramsdepth_temperatureAcoustic temperature; null = follow temperature.
max_new_tokenspenalty_windowquantizersDecode only the first N RVQ levels; null = full depth.
repetition_penaltytemperaturetop_kBackbone top-k; null = full vocabulary.
speakerSpeaker role to render the text as (one of the model's speakers; see GET /v1/models).
audioAudioPayloadaudio.data_b64stringBase64-encoded 16-bit PCM WAV (mono).
audio.num_quantizersintegerNumber of RVQ levels decoded into this audio.
audio.sample_rateintegerSample rate of the audio in Hz.
audio.formatstringContainer/encoding of data_b64 (16-bit PCM WAV).
modelstringrequest_idstringtextstringThe text that was spoken (echoes the request).
usageUsageusage.input_audio_secondsnumberSeconds of input audio supplied (converse).
usage.input_charsintegerCharacters of input text billed for this request.
usage.output_audio_secondsnumberSeconds of audio generated.
metaobjectBackend-specific diagnostics (latency, frames, …).
POST/v1/converse
Complete the open (final) turn of a conversation.
Given a conversation, complete its last ('open') turn. A speaker-only open turn is authored (text + audio); an open turn with text is rendered as that speaker, conditioned on the prior turns (contextual TTS).
conversationrequiredThe conversation, oldest turn first; the last turn is the open turn to complete.
audio_wav_b64speakertextaudio_wav_b64speakertextmodelPublic model id (see GET /v1/models). Omit/null for the default model.
›paramsobject · 7 optional fields
paramsdepth_temperatureAcoustic temperature; null = follow temperature.
max_new_tokenspenalty_windowquantizersDecode only the first N RVQ levels; null = full depth.
repetition_penaltytemperaturetop_kBackbone top-k; null = full vocabulary.
modelstringreplyConverseReplyreply.speakerstringreply.textstringreply.audioAudioPayload | nullreply.audio.data_b64stringBase64-encoded 16-bit PCM WAV (mono).
reply.audio.num_quantizersintegerNumber of RVQ levels decoded into this audio.
reply.audio.sample_rateintegerSample rate of the audio in Hz.
reply.audio.formatstringContainer/encoding of data_b64 (16-bit PCM WAV).
request_idstringusageUsageusage.input_audio_secondsnumberSeconds of input audio supplied (converse).
usage.input_charsintegerCharacters of input text billed for this request.
usage.output_audio_secondsnumberSeconds of audio generated.
metaobjectGET/v1/models
List available public models.
dataModelCard[]The available public models.
data[].display_namestringHuman-readable model name.
data[].idstringStable public model id used in the model request field.
data[].modesstring[]Supported modes: subset of ["converse", "tts"].
data[].speakersstring[]Valid role labels for a turn's speaker, in turn order (e.g. ["0", "1"]).
data[].defaultbooleanTrue for the model used when model is omitted.
data[].descriptionstringWhat this model is for.
GET/v1/info
Backend info, default params, and limits.
backendobjectActive backend description (name, kind, sample_rate, …).
defaultsobjectDefault generation params.
limitsobjectRequest-validation caps the gateway enforces.
param_schemaobject[]UI metadata for the generation knobs.
GET/v1/usage
Your metered usage.
Running totals (requests, input characters, audio seconds) for the calling API key.
input_audio_secondsnumberinput_charsintegerkey_idstringoutput_audio_secondsnumberrequestsintegerlast_request_tsnumber | nullGET/health
Liveness probe. No authentication.
backendstringreadybooleanstatusstring