Skip to content
All models
LiveREST

Speak

Text-to-speech that starts in milliseconds.

Low-latency streaming TTS with 36 stock voices, free instant voice cloning, and prompt-to-voice design. First audio bytes arrive in tens of milliseconds and play progressively.

Price
$0.06/min
Endpoint
POST /v1/audio/speech
Scope
voice:synthesize
Model id
pyai-voice
Async synthesis$0.04/min
Voice cloning (enroll)Free
Voice designFree

Hear a few stock voices

What you get

Snappy by default

Audio streams from the first byte, so playback starts almost instantly instead of after the whole clip renders.

Your voice, free to clone

Enroll a voice once and synthesize with it - cloning enrollment and prompt-to-voice design are both free.

Drop-in OpenAI shape

Same /v1/audio/speech contract your OpenAI client already speaks.

Streaming TTFB ~32-98 ms 36 stock voices Voice cloning (free) Designed voices (free) mp3 + wav

Time-to-first-byte ~32-98 ms on the streaming path; progressive playback the whole way through.

Start in minutes

cURL
curl https://api.pyai.com/v1/audio/speech \
  -H "Authorization: Bearer $PYAI_KEY" \
  -d '{"model":"pyai-voice","input":"Hello from PyAI.","voice":"stock_ava_en_us"}' \
  --output hello.mp3

FAQ

How many voices are there?

36 stock voices today, plus your own cloned voices and prompt-designed voices - browse the live catalog at GET /v1/voices.

What does cloning cost?

Enrollment is free; you pay only for the audio you synthesize, billed per minute.

Build with Speak today.

Start free with $50.00 in credit - no card. Your test key works instantly.

No credit card - OpenAI-compatible - cancel anytime