Speak
Text-to-speech that starts in milliseconds.
Low-latency streaming TTS with 36 stock voices, free instant voice cloning, and prompt-to-voice design. First audio bytes arrive in tens of milliseconds and play progressively.
- Price
- $0.06/min
- Endpoint
POST /v1/audio/speech- Scope
voice:synthesize- Model id
pyai-voice
Hear a few stock voices
What you get
Snappy by default
Audio streams from the first byte, so playback starts almost instantly instead of after the whole clip renders.
Your voice, free to clone
Enroll a voice once and synthesize with it - cloning enrollment and prompt-to-voice design are both free.
Drop-in OpenAI shape
Same /v1/audio/speech contract your OpenAI client already speaks.
Time-to-first-byte ~32-98 ms on the streaming path; progressive playback the whole way through.
Start in minutes
curl https://api.pyai.com/v1/audio/speech \
-H "Authorization: Bearer $PYAI_KEY" \
-d '{"model":"pyai-voice","input":"Hello from PyAI.","voice":"stock_ava_en_us"}' \
--output hello.mp3FAQ
How many voices are there?
36 stock voices today, plus your own cloned voices and prompt-designed voices - browse the live catalog at GET /v1/voices.
What does cloning cost?
Enrollment is free; you pay only for the audio you synthesize, billed per minute.
Build with Speak today.
Start free with $50.00 in credit - no card. Your test key works instantly.
No credit card - OpenAI-compatible - cancel anytime