Text-to-speech

Switch from ElevenLabs to PyAI Speak

Same text-to-speech, telephony-native and cheaper to predict.

PyAI Speak is low-latency streaming text-to-speech with free voice cloning and prompt-to-voice design, billed per minute instead of per character. It speaks the OpenAI /v1/audio/speech contract and can return raw 8 kHz g711_ulaw/g711_alaw straight for Twilio, Plivo, or FreeSWITCH.

Get a free key See the comparison

Why teams move from ElevenLabs

Free voice cloning + design

Enrollment and prompt-to-voice design are free - you pay only for the audio you synthesize, per minute.

Telephony-native formats

Ask for response_format g711_ulaw or g711_alaw and get raw, headerless 8 kHz G.711 to feed your carrier directly - no transcoding step.

One OpenAI-compatible stack

Speak shares the /v1/audio/speech shape with Hear (STT) and Omni (agents), so one key and one base URL cover your whole voice pipeline.

Before and after

Before - ElevenLabs

curl https://api.elevenlabs.io/v1/text-to-speech/21m00Tcm4TlvDq8ikWAM \
  -H "xi-api-key: $ELEVEN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello there.","model_id":"eleven_flash_v2_5","output_format":"ulaw_8000"}' \
  --output hello.ulaw

After - PyAI Speak

curl https://api.pyai.com/v1/audio/speech \
  -H "Authorization: Bearer $PYAI_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"pyai-voice","voice":"stock_ava_en_us","input":"Hello there.","response_format":"g711_ulaw"}' \
  --output hello.ulaw

Migration checklist

Swap the connection

Change the base URL or WebSocket URL, pass a PyAI key, and keep the old client where the API shape is compatible.

Map models, voices, and formats

Use the table below to replace model ids, voice ids, response formats, sample rates, and auth headers without rewriting the product flow.

Replay customer traffic

Run real prompts, recordings, and phone-call samples through both systems. Compare latency, quality, completion rate, and all-in cost.

Launch with guardrails

Start on free credits or test keys, add usage alerts, then enable Trace, Recap, or managed Agents when calls need production review.

If the system you are leaving already uses OpenAI-compatible transcription, speech, or realtime APIs, start with the smallest compatible swap: base URL, key, model, and voice. The real test is whether PyAI improves cost, latency, and caller outcomes on your own traffic.

What maps to what

ElevenLabs	PyAI
`Header xi-api-key: <key>`	`Authorization: Bearer pyai_live_...`
`POST /v1/text-to-speech/{voice_id}`	`POST /v1/audio/speech (voice in the body)`
`model_id (e.g. eleven_flash_v2_5)`	`model: pyai-voice`
`output_format: ulaw_8000`	`response_format: g711_ulaw (8 kHz, raw)`
`Per-character billing`	`Per-minute billing ($0.06/min)`
`Paid voice cloning`	`Free cloning + prompt-to-voice design`

Good to know

g711_ulaw / g711_alaw are fixed 8 kHz mono and raw (headerless); sample_rate does not apply and is rejected unless set to 8000.
For wideband, use response_format wav/mp3/opus/aac/flac, or pcm for raw 16-bit little-endian mono at sample_rate.
Browse voice ids with GET /v1/voices; pass a cloned voice id (voice_abc) once you enroll one.

FAQ

Is the response format compatible?

Speak supports wav, mp3, opus, aac, flac, pcm, g711_ulaw, and g711_alaw via response_format (default mp3). The ulaw_8000 you used on the other provider maps to g711_ulaw.

What does cloning cost?

Voice cloning enrollment and prompt-to-voice design are free; you pay only for synthesized audio, billed per minute.

Can I keep streaming playback?

Yes - audio streams from the first byte (~32-98 ms TTFB) so playback starts almost immediately.

Built on

SpeakText-to-speech that starts in milliseconds.

Clone a voice for free and stream it.

Start free with $50 in free credits - no card. Swap one base URL and you're synthesizing.

Get a free key Model your spend

No credit card - OpenAI-compatible - cancel anytime