Skip to content
Reference
Reference

Hear streaming frames

The streaming speech-to-text WebSocket emits bare JSON text frames (no transcript. prefix, no session.started, no usage frame). Render partials live, then commit on speech_final / final.

connect
wss://api.pyai.com/v1/audio/transcriptions/stream?model=pyai-hear&encoding=pcm16&sample_rate=16000&interim_results=true

Server to client frames

partial

every eager tick#partial

Live hypothesis (may also carry stable_text / active_text). Overwrite it per utterance_id.

Payload

{text, utterance_id, t_ms}

partial_stable

when a prefix locks in#partial_stable

The leading portion the recognizer no longer expects to revise.

Payload

{text, utterance_id, t_ms}

speech_final

on endpoint or after commit#speech_final

End of an utterance, stable. With grounding (Cue) it also carries a grounding array.

Payload

{text, utterance_id, t_ms, audio_ms}

final

follows speech_final#final

Corrected full-context transcript for the utterance.

Payload

{text, utterance_id, t_ms, audio_ms}

error

on fault#error

An error frame for a stream-level fault.

Payload

{code, message}

Client to server control frames

Stream binary audio frames (PCM16 little-endian mono at the negotiated sample_rate, or opus) continuously. Use these JSON text frames for control:

{"type":"commit"}

Force-finalize the current utterance (e.g. when your VAD detects end-of-turn). The engine flushes a speech_final / final for the audio buffered so far.

{"type":"config","grounding":true}

Send as the first JSON text frame to ground transcripts against the org's knowledge base; the session then meters as Cue.

{"type":"end"}

Ignored by the engine - this is NOT a flush. Use commit (or close the socket) to finalize.

Close codes

CodeMeaning
1000Normal close.
1008Auth/policy: bad key, missing hear:stream scope, or revoked engine token.
1011Engine error. A 1011 that arrives immediately after a final is a benign close, not a failed transcription.
4429Over the per-key / fleet concurrency cap.

Migrating from another streaming STT provider? See the Deepgram to PyAI guide for the full frame mapping.

Stream partials from one socket.

Start free with $50 in free credits - telephony-tuned STT with eager partials and a clean commit flush.

No credit card - OpenAI-compatible - cancel anytime