# PyAI - full agent index (llms-full.txt) > PyAI is a developer platform for telephony-grade voice AI behind one > OpenAI-compatible API. This file is the long-form, machine-readable index for > coding agents (Cursor, Claude, Copilot, etc.): products, canonical endpoints, > auth, audio formats, the streaming wire protocol, and stable error codes - so > you can generate CORRECT snippets without guessing. > > Source of truth is always the live OpenAPI spec. If anything here disagrees > with it, the spec wins: > - OpenAPI: https://api.pyai.com/openapi.json > - Human docs (try-it console): https://api.pyai.com/docs > - Short index: https://api.pyai.com/llms.txt ## Start here - Get a free key (no card, $50 in free credits): https://console.pyai.com - REST base URL: https://api.pyai.com/v1 - SDKs: `npm install @pyai/sdk` (TypeScript) - `pip install pyai-sdk` (Python) - OpenAI compatibility: point any OpenAI client at `base_url=https://api.pyai.com/v1` for `/v1/audio/transcriptions` and `/v1/audio/speech`. ## Authentication - REST: send the key as a bearer token - `Authorization: Bearer pyai_live_...` (`x-api-key: pyai_live_...` is an accepted alias). - Environments: `pyai_live_...` (production) and `pyai_test_...` (sandbox). - WebSocket (browser): you cannot set headers on a WS upgrade, so pass the key as a subprotocol: `Sec-WebSocket-Protocol: pyai-key.pyai_live_...`. - WebSocket (server-side): you may instead append `?api_key=pyai_live_...` to the URL. Do not put the key in any other query param. - Keys are opaque strings (<=512 chars). Never parse, split, decode, or log them. They are self-validating and work the instant they are created. ## Products (one line each) - Hear - speech-to-text, telephony-native (8 kHz), Whisper-compatible, eager streaming partials + half-price async batch. $0.003/min ($0.0015/min batch). - Speak - text-to-speech, streaming first byte in ~32-98 ms, 36 stock voices, free voice cloning + prompt-to-voice design. $0.06/min. - Omni - end-to-end realtime voice agents over one WebSocket, ~431 ms median turn-taking, knowledge-base grounding + tools, barge-in. $0.05/min, all-in. - Cue - turn detection + retrieved knowledge-base context for bring-your-own LLM/voice pipelines (the streaming STT surface with grounding on). $0.015/min. - Telephony - managed US phone numbers routed to an Omni agent. $0.01/min connected, billed on a 1-minute pulse. - Trace - automatic compliance & QA scorecards (TCPA/HIPAA/PII/brand-voice) on every call, with citations, redaction, and a tamper-evident audit hash. $0.05/min scanned (add-on). - Agents - fully managed Omni runtime with evals, monitoring, tools, knowledge bases, and Recap included. $0.08/min. ## Canonical endpoints | Product | Method + path | Wire | Scope | |---|---|---|---| | Hear - transcribe (file) | `POST /v1/audio/transcriptions` | REST | `hear:transcribe` | | Hear - streaming STT | `GET /v1/audio/transcriptions/stream` | WebSocket | `hear:stream` | | Hear - async batch | `POST/GET /v1/transcription/jobs` | REST | `hear:transcribe` + `transcribe:jobs` | | Speak - synthesize | `POST /v1/audio/speech` | REST | `voice:synthesize` | | Voices catalog | `GET /v1/voices`, `GET /v1/voices/{id}` | REST | any active key | | Voice cloning | `GET/POST /v1/voice/clones` | REST | `voice:clone` | | Cue - turn detection + KB | `GET /v1/audio/transcriptions/stream` (grounding on) | WebSocket | `hear:stream` | | Models catalog | `GET /v1/models` | REST | any active key | | Omni - voice agent | `wss://api.pyai.com/v1/omni?agent_id=` | WebSocket | `omni:session` | | Telephony - numbers | `POST /v1/telephony/numbers` | REST | `telephony:manage` | | Trace - scorecards | `GET /v1/trace/interactions` | REST | `trace:read` | A scope wildcard (`hear:*`, `voice:*`, `*`) covers the specific scopes above. ## Speak - POST /v1/audio/speech OpenAI-compatible request body: `{ model, input, voice, response_format, sample_rate, speed }`. - `model`: `pyai-voice`. - `voice`: a stock voice id from `GET /v1/voices` (e.g. `stock_ava_en_us`, `stock_emma_en_gb`) or a cloned voice id (e.g. `voice_abc`). Omit for the account default. - `response_format` (enum, default `mp3`): `wav`, `mp3`, `opus`, `aac`, `flac`, `pcm`, `g711_ulaw`, `g711_alaw`. - Response `Content-Type` by format: `audio/wav` (wav), `audio/mpeg` (mp3), `audio/ogg` (opus), `audio/aac` (aac), `audio/flac` (flac), `audio/pcm` (pcm), `audio/basic` (g711_ulaw / g711_alaw). - `pcm` returns raw, headerless 16-bit little-endian mono samples (no container) at `sample_rate` - feed it straight into voice-agent orchestrators. - `g711_ulaw` / `g711_alaw` return raw, headerless G.711 telephony audio at a fixed 8 kHz mono (for Twilio/Plivo/FreeSWITCH); `sample_rate` does not apply and is rejected unless set to `8000`. - `sample_rate`: optional 8000-48000 Hz (e.g. `8000`/`16000` telephony, `24000` wideband); omit for native 24 kHz. Most relevant with `response_format: pcm`. - Audio streams from the first byte; play progressively. Voice cloning enrollment and prompt-to-voice design are free - you pay only for synthesized audio. Example: ``` curl https://api.pyai.com/v1/audio/speech \ -H "Authorization: Bearer $PYAI_KEY" \ -d '{"model":"pyai-voice","input":"Hello from PyAI.","voice":"stock_ava_en_us","response_format":"g711_ulaw"}' \ --output hello.ulaw ``` ## Hear - POST /v1/audio/transcriptions (file) Multipart form: `file` (wav, mp3, m4a, flac, ogg), `model` (`pyai-hear`), `response_format` (enum: `json`, `text`, `verbose_json`; default `json`), `language` (ISO-639-1 hint, e.g. `en`). Request/response shapes match OpenAI. ``` curl https://api.pyai.com/v1/audio/transcriptions \ -H "Authorization: Bearer $PYAI_KEY" \ -F model=pyai-hear -F file=@call.wav ``` ## Hear - streaming STT (WebSocket) Connect: ``` wss://api.pyai.com/v1/audio/transcriptions/stream?model=pyai-hear&language=en&sample_rate=16000&encoding=pcm16&interim_results=true ``` Query params: `model` (default `pyai-hear`), `language` (ISO-639-1 hint), `sample_rate` (Hz, default 16000), `encoding` (`pcm16` or `opus`, default `pcm16`), `interim_results` (default `true`). Auth via the `pyai-key.` subprotocol (browser) or `?api_key=` (server). Requires `hear:stream`. Client -> server: - Stream binary audio frames continuously (PCM16 little-endian mono at the negotiated `sample_rate`, or opus); ~20 ms per frame keeps latency low. - Send the JSON text frame `{"type":"commit"}` to force-finalize the current utterance (e.g. when your VAD detects end-of-turn). Closing the socket also flushes a final for any buffered audio. - `{"type":"end"}` is ignored (it is NOT a flush) - use `commit` or close. - Cue (grounding): send `{"type":"config","grounding":true}` as the first JSON text frame to ground transcripts against the org's knowledge base; the session then meters as Cue. Server -> client (bare JSON text frames; no `transcript.` prefix, no `session.started`, no `usage` frame): | `type` | When | Payload | |---|---|---| | `partial` | every eager tick | `{text, utterance_id, t_ms}` (may carry `stable_text`/`active_text`); overwrite per `utterance_id` | | `partial_stable` | when a prefix locks in | `{text, utterance_id, t_ms}` - the portion the recognizer won't revise | | `speech_final` | on endpoint / `commit` | `{text, utterance_id, t_ms, audio_ms}` - end of an utterance, stable | | `final` | follows `speech_final` | `{text, utterance_id, t_ms, audio_ms}` - corrected full-context transcript | | `error` | on fault | `{code, message}` | With grounding (Cue), `speech_final` and `final` also carry a `grounding` array: `[{ "content": "...", "score": 0.87 }, ...]` (top-3 KB passages; `[]` when no KB is bound or retrieval times out - fail-open). `t_ms` is the absolute audio-timeline position; `audio_ms` is the per-utterance speech length; `utterance_id` groups one utterance's partials/finals. Close codes: `1000` normal - `1008` auth/policy (bad key, missing `hear:stream`) - `1011` engine error (a `1011` arriving right after a `final` is a benign close) - `4429` over the concurrency cap. ## Omni - realtime voice agent (WebSocket) - Native (preferred): `wss://api.pyai.com/v1/omni?agent_id=&format=pcm16&rate=24000` (use `rate=16000` for telephony). Send PCM16 binary frames up; receive agent audio down. Scope `omni:session`. - OpenAI-realtime-compatible alias: `wss://api.pyai.com/v1/realtime?model=pyai-omni-realtime&agent_id=`. - `agent_id` is an opaque label authorized by your key's org (PyAI stores no per-agent state); any id in your namespace is accepted and echoed to your own knowledge endpoint. `format` and `rate` are load-bearing on the connect URL. ``` const ws = new WebSocket( "wss://api.pyai.com/v1/omni?agent_id=front_desk&format=pcm16&rate=24000", ["pyai-key." + apiKey], ); ``` ## Error format Inference/data-plane errors use the OpenAI shape: ``` { "error": { "message": "...", "type": "rate_limit_error", "code": "rate_limit_exceeded", "param": null } } ``` Branch on `error.code`. Stable codes: | HTTP | code | Meaning | Action | |---|---|---|---| | 401 | `unauthorized` | Missing/invalid key | Stop; surface to user | | 403 | `forbidden` | Key lacks the scope | Add scope in console; don't retry | | 403 | `origin_not_allowed` | Publishable token origin not allow-listed | Fix origin; don't retry | | 400 | `invalid_agent_id` | `agent_id` malformed (control chars / too long) | Fix the id; don't retry | | 402 | `credit_exhausted` | Org out of prepaid credit | Add credit; don't retry | | 402 | `key_budget_exceeded` | Per-key monthly budget hit | Raise budget; don't retry | | 402 | `insufficient_quota` | Plan quota exhausted | Upgrade plan | | 429 | `rate_limit_exceeded` | Per-key/IP rate limit | Back off; honor `Retry-After` | | 429 | `concurrency_limit_exceeded` | Too many concurrent realtime sessions | Wait + retry | | 429 | `daily_cap_exceeded` | Sandbox/publishable daily cap | Wait until reset | Console/management routes (dashboard origin) instead use RFC 7807 `application/problem+json` with a `request_id`. ## Notes for agents - Never invent endpoints, params, voices, or model ids - fetch the OpenAPI spec. - A brand-new key on a fresh org can return `402 credit_exhausted` before phone verification; that's the billing gate, not a broken key. - Per-minute prices above are list prices; the live rate card and quotas live in the console and `GET /v1/models`.