Speech-to-text

Switch from Deepgram to PyAI Hear

Streaming transcription tuned for 8 kHz call audio.

Hear is telephony-native speech-to-text with eager streaming partials over a single WebSocket. You map Deepgram's interim/is_final results to PyAI's partial / partial_stable / speech_final / final frames, force-finalize a turn with {"type":"commit"}, and optionally turn on knowledge-base grounding.

Get a free key See the comparison

Why teams move from Deepgram

Tuned for the phone

Hear is tuned on narrowband 8 kHz call audio, not studio takes - accuracy holds up on real lines.

Half-price batch

Queue recordings to the async jobs API at $0.0015/min - 50% off the realtime rate - with webhooks on completion.

Grounding when you need it

Send {"type":"config","grounding":true} to retrieve knowledge-base context inline for BYO-LLM turn detection.

Before and after

Before - Deepgram

import { createClient, LiveTranscriptionEvents } from "@deepgram/sdk";

const dg = createClient(process.env.DEEPGRAM_API_KEY);
const conn = dg.listen.live({ model: "nova-3", interim_results: true });
conn.on(LiveTranscriptionEvents.Transcript, (d) => {
  const text = d.channel.alternatives[0].transcript;
  if (d.is_final) commit(text); else render(text);
});
// pipe PCM16 audio frames: conn.send(chunk)

After - PyAI Hear

// Pass the key as a WebSocket subprotocol (browser-safe).
const ws = new WebSocket(
  "wss://api.pyai.com/v1/audio/transcriptions/stream" +
    "?model=pyai-hear&encoding=pcm16&sample_rate=16000&interim_results=true",
  ["pyai-key." + apiKey],
);
ws.onmessage = (e) => {
  const f = JSON.parse(e.data); // bare JSON frames
  if (f.type === "partial" || f.type === "partial_stable") render(f.text);
  else if (f.type === "speech_final" || f.type === "final") commit(f.text);
};
// stream PCM16 frames up; end a turn with:
// ws.send(JSON.stringify({ type: "commit" }))

Migration checklist

Swap the connection

Change the base URL or WebSocket URL, pass a PyAI key, and keep the old client where the API shape is compatible.

Map models, voices, and formats

Use the table below to replace model ids, voice ids, response formats, sample rates, and auth headers without rewriting the product flow.

Replay customer traffic

Run real prompts, recordings, and phone-call samples through both systems. Compare latency, quality, completion rate, and all-in cost.

Launch with guardrails

Start on free credits or test keys, add usage alerts, then enable Trace, Recap, or managed Agents when calls need production review.

If the system you are leaving already uses OpenAI-compatible transcription, speech, or realtime APIs, start with the smallest compatible swap: base URL, key, model, and voice. The real test is whether PyAI improves cost, latency, and caller outcomes on your own traffic.

What maps to what

Deepgram	PyAI
`Authorization: Token <key>`	`Subprotocol pyai-key.<key> (or ?api_key=)`
`wss://api.deepgram.com/v1/listen`	`wss://api.pyai.com/v1/audio/transcriptions/stream`
`interim_results=true`	`interim_results=true (same)`
`Interim Results frame`	`partial / partial_stable`
`is_final / speech_final`	`speech_final, then corrected final`
`Finalize / CloseStream`	`{"type":"commit"} (or close the socket)`

Good to know

Frames are bare JSON ({type, text, utterance_id, t_ms, ...}); there is no session.started or usage frame.
{"type":"end"} is ignored - use {"type":"commit"} or close the socket to flush a final.
Close codes: 1000 normal, 1008 auth/policy, 1011 engine error (benign right after a final), 4429 over the concurrency cap.

FAQ

How do interim results map?

Deepgram interim results map to PyAI partial / partial_stable; is_final maps to speech_final, followed by a corrected full-context final.

How do I force end-of-turn?

Send the text frame {"type":"commit"} (e.g. when your VAD detects end-of-turn). Closing the socket also flushes a final for buffered audio.

Can I add knowledge-base grounding?

Yes - send {"type":"config","grounding":true} as the first JSON frame; speech_final/final then carry a grounding array.

Built on

HearSpeech-to-text, telephony-native.CueTurn detection + grounded context for BYO pipelines.

Stream partials from one socket.

Start free with $50 in free credits - telephony-tuned STT with grounding when you want it.

Get a free key Model your spend

No credit card - OpenAI-compatible - cancel anytime