Skip to content
All models
LiveREST

Cast

Emotional long-form TTS for podcasts, narration, and audiobooks.

Async emotional text-to-speech built for long-form production: podcasts, narration, audiobooks, education, and brand audio. Cast pairs expressive voices with included commercial rights and free Voice Designer so teams can direct tone, pacing, and character without studio overhead.

Price
$0.02/min
Endpoint
POST /v1/audio/speech
Scope
voice:synthesize
Model id
pyai-voice
Voice cloning (enroll)Free
Voice designFree

What you get

Direction, not just generation

Guide delivery with emotion, pacing, and performance notes instead of settling for a flat narrator.

Designed voices included

Create brand, character, and narrator voices for free; pay only for the audio you render.

Built for long-form economics

Cast is $0.02/min ($1.20/hr), so a 10-hour audiobook is $12 with no credit math or character counting.

$1.20/hr finished audio Commercial rights included Voice Designer (free) Voice cloning (free) Podcast + audiobook workflows

Async synthesis is optimized for quality and long-form cost at $0.02/min ($1.20/hr). Realtime Speak remains available for low-latency playback.

Start in minutes

cURL
curl https://api.pyai.com/v1/audio/speech \
  -H "Authorization: Bearer $PYAI_KEY" \
  -d '{"model":"pyai-voice","voice":"stock_ava_en_us","input":"Read this like a warm documentary narrator."}' \
  --output chapter.mp3

FAQ

How is Cast different from Speak?

Speak is the realtime TTS API. Cast is the customer-facing long-form workflow built on async synthesis, emotional direction, Voice Designer, and narration use cases.

What does Voice Designer cost?

Voice design and cloning enrollment are free; Cast audio is $0.02/min with commercial rights included.

Build with Cast today.

Start free with $50.00 in credit - no card. Your test key works instantly.

No credit card - OpenAI-compatible - cancel anytime