Switch from ElevenLabs to PyAI Speak
Same text-to-speech, telephony-native and cheaper to predict.
PyAI Speak is low-latency streaming text-to-speech with free voice cloning and prompt-to-voice design, billed per minute instead of per character. It speaks the OpenAI /v1/audio/speech contract and can return raw 8 kHz g711_ulaw/g711_alaw straight for Twilio, Plivo, or FreeSWITCH.
Why teams move from ElevenLabs
Free voice cloning + design
Enrollment and prompt-to-voice design are free - you pay only for the audio you synthesize, per minute.
Telephony-native formats
Ask for response_format g711_ulaw or g711_alaw and get raw, headerless 8 kHz G.711 to feed your carrier directly - no transcoding step.
One OpenAI-compatible stack
Speak shares the /v1/audio/speech shape with Hear (STT) and Omni (agents), so one key and one base URL cover your whole voice pipeline.
Before and after
curl https://api.elevenlabs.io/v1/text-to-speech/21m00Tcm4TlvDq8ikWAM \
-H "xi-api-key: $ELEVEN_KEY" \
-H "Content-Type: application/json" \
-d '{"text":"Hello there.","model_id":"eleven_flash_v2_5","output_format":"ulaw_8000"}' \
--output hello.ulawcurl https://api.pyai.com/v1/audio/speech \
-H "Authorization: Bearer $PYAI_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"pyai-voice","voice":"stock_ava_en_us","input":"Hello there.","response_format":"g711_ulaw"}' \
--output hello.ulawMigration checklist
Swap the connection
Change the base URL or WebSocket URL, pass a PyAI key, and keep the old client where the API shape is compatible.
Map models, voices, and formats
Use the table below to replace model ids, voice ids, response formats, sample rates, and auth headers without rewriting the product flow.
Replay customer traffic
Run real prompts, recordings, and phone-call samples through both systems. Compare latency, quality, completion rate, and all-in cost.
Launch with guardrails
Start on free credits or test keys, add usage alerts, then enable Trace, Recap, or managed Agents when calls need production review.
If the system you are leaving already uses OpenAI-compatible transcription, speech, or realtime APIs, start with the smallest compatible swap: base URL, key, model, and voice. The real test is whether PyAI improves cost, latency, and caller outcomes on your own traffic.
What maps to what
| ElevenLabs | PyAI |
|---|---|
Header xi-api-key: <key> | Authorization: Bearer pyai_live_... |
POST /v1/text-to-speech/{voice_id} | POST /v1/audio/speech (voice in the body) |
model_id (e.g. eleven_flash_v2_5) | model: pyai-voice |
output_format: ulaw_8000 | response_format: g711_ulaw (8 kHz, raw) |
Per-character billing | Per-minute billing ($0.06/min) |
Paid voice cloning | Free cloning + prompt-to-voice design |
Good to know
- g711_ulaw / g711_alaw are fixed 8 kHz mono and raw (headerless); sample_rate does not apply and is rejected unless set to 8000.
- For wideband, use response_format wav/mp3/opus/aac/flac, or pcm for raw 16-bit little-endian mono at sample_rate.
- Browse voice ids with GET /v1/voices; pass a cloned voice id (voice_abc) once you enroll one.
FAQ
Is the response format compatible?
Speak supports wav, mp3, opus, aac, flac, pcm, g711_ulaw, and g711_alaw via response_format (default mp3). The ulaw_8000 you used on the other provider maps to g711_ulaw.
What does cloning cost?
Voice cloning enrollment and prompt-to-voice design are free; you pay only for synthesized audio, billed per minute.
Can I keep streaming playback?
Yes - audio streams from the first byte (~32-98 ms TTFB) so playback starts almost immediately.
Clone a voice for free and stream it.
Start free with $50 in free credits - no card. Swap one base URL and you're synthesizing.
No credit card - OpenAI-compatible - cancel anytime