Cast
Emotional long-form TTS for podcasts, narration, and audiobooks.
Async emotional text-to-speech built for long-form production: podcasts, narration, audiobooks, education, and brand audio. Cast pairs expressive voices with included commercial rights and free Voice Designer so teams can direct tone, pacing, and character without studio overhead.
- Price
- $0.02/min
- Endpoint
POST /v1/audio/speech- Scope
voice:synthesize- Model id
pyai-voice
What you get
Direction, not just generation
Guide delivery with emotion, pacing, and performance notes instead of settling for a flat narrator.
Designed voices included
Create brand, character, and narrator voices for free; pay only for the audio you render.
Built for long-form economics
Cast is $0.02/min ($1.20/hr), so a 10-hour audiobook is $12 with no credit math or character counting.
Async synthesis is optimized for quality and long-form cost at $0.02/min ($1.20/hr). Realtime Speak remains available for low-latency playback.
Start in minutes
curl https://api.pyai.com/v1/audio/speech \
-H "Authorization: Bearer $PYAI_KEY" \
-d '{"model":"pyai-voice","voice":"stock_ava_en_us","input":"Read this like a warm documentary narrator."}' \
--output chapter.mp3FAQ
How is Cast different from Speak?
Speak is the realtime TTS API. Cast is the customer-facing long-form workflow built on async synthesis, emotional direction, Voice Designer, and narration use cases.
What does Voice Designer cost?
Voice design and cloning enrollment are free; Cast audio is $0.02/min with commercial rights included.
Build with Cast today.
Start free with $50.00 in credit - no card. Your test key works instantly.
No credit card - OpenAI-compatible - cancel anytime