Skip to content
Inkbox

Inkbox

ContactDocs

Jump to

Media Stream

When a call starts, Inkbox opens a WebSocket connection to your agent at the client_websocket_url you configured on the phone number or provided when placing the call. This connection carries the live call data between your agent and the caller for the duration of the call.

What flows over this connection (text, audio, or both) depends on how your agent configures itself. Inkbox can handle text-to-speech (TTS), speech-to-text (STT), or both on your behalf. See Choosing a mode below.


Connection flow

  1. Inkbox connects to your client_websocket_url with an X-Call-Context header containing the call_id, phone_number, and direction. If your organization has a signing key, the connection also includes X-Inkbox-Request-ID, X-Inkbox-Timestamp, and X-Inkbox-Signature headers.

  2. Your agent accepts the WebSocket and declares its capabilities via two response headers:

HeaderDefaultDescription
X-Use-Inkbox-Text-To-SpeechtrueIf true, Inkbox converts your text responses to speech. If false, your agent sends audio directly.
X-Use-Inkbox-Speech-To-TexttrueIf true, Inkbox transcribes the caller's speech and sends you text. If false, your agent receives raw audio.

If you omit these headers, both default to true (Inkbox handles everything).

  1. start event is sent to your agent with the call_control_id and media format details.

  2. Streaming begins. Text or audio flows bidirectionally depending on the mode.

  3. stop event is sent when the call ends.


Choosing a mode

The combination of the two response headers gives you four configurations. Use the selector below to explore what each mode looks like, including the exact WebSocket events your agent sends and receives.

Who handles what?

Text-to-speech (TTS)

Speech-to-text (STT)


Inkbox handles STT + TTS

Inkbox transcribes the caller and synthesizes your responses. Your agent only deals with text.

Your agent receives

Text (transcribed caller speech)

Your agent sends

Text (to be spoken to the caller)

Inkbox handles

Speech-to-text + text-to-speech

Simplest setup. Ideal when your agent is a text-based LLM and you want Inkbox to handle all audio processing.


WebSocket response headers

Your agent declares this configuration by setting these headers when accepting the WebSocket connection:

X-Use-Inkbox-Text-To-Speech: true
X-Use-Inkbox-Speech-To-Text: true

Events you receive (Inkbox → your agent)

EventDescription
startCall stream opened. Contains call metadata and media format.
transcriptCaller speech transcribed by Inkbox. Sent as interim results and a final result per utterance.
barge_inThe caller started speaking while your agent's TTS was playing. Inkbox interrupts playback.
stopCall ended.

start

JSONJSON

transcript

JSONJSON

barge_in

JSONJSON

stop

JSONJSON

Events you send (your agent → Inkbox)

EventDescription
textStream text to be spoken to the caller. Send chunks with `done: false`, then a final message with `done: true`. Inkbox converts each chunk to speech and plays it.
stopHang up the call from your side.

text

JSONJSON

stop

JSONJSON

Audio format

When your agent sends or receives audio (any mode where TTS or STT is handled by your agent), audio is encoded as PCMU (u-law) at 8 kHz, base64-encoded inside JSON messages. This is the standard telephony format used by Telnyx.


Transcripts

Regardless of mode, Inkbox persists call transcripts to the database as the call progresses. In modes where Inkbox handles STT, transcripts are captured automatically. In modes where your agent handles STT, you should send transcript events so Inkbox can persist them. You can retrieve transcripts after the call via the Transcripts API.


Call duration

Each call has a maximum duration of 10 minutes. When the limit is reached, Inkbox hangs up the call with hangup_reason: "max_duration". See Rate Limits for organization-level limits.


Setting your stream URL

Configure client_websocket_url on a phone number so it's used automatically for all auto-accepted calls:

JSONJSON

Or provide a client_websocket_url per-call when placing outbound calls or responding to incoming call webhooks.

Inkbox

Copyright © 2026 Inkbox

This site is protected by reCAPTCHA.

Google Privacy Policy and Terms of Service apply.

Website

Inkbox

Copyright © 2026 Inkbox

This site is protected by reCAPTCHA.

Google Privacy Policy and Terms of Service apply.

Website