Text-to-Speech (TTS) is the foundation of VoxFlow Studio. In this tutorial, we'll walk you through creating your first AI voiceover — from selecting a voice to exporting the final audio.
Step 1: Open the TTS Editor
Navigate to VoxFlow Studio and sign in. The TTS editor is the default view on the home page.
You'll see:
- A text input area on the left
- A voice selector on the right
- Settings for speed, pitch, and volume below
Step 2: Choose a Voice
Click on the voice selector to open the Voice Library. You can browse voices by:
- Language — Chinese, English, Japanese, Cantonese, and more
- Gender — Male or Female
- Style — Professional, Casual, Narrative, News, etc.
Pro tip: Use the "AI Select" feature to describe your ideal voice in natural language. For example: "A warm and gentle female voice for bedtime stories" — and AI will recommend the best match.
Each voice has a preview button (🔊) so you can hear a sample before selecting.
Step 3: Enter Your Text
Type or paste your text into the editor. You can also:
- Import a text file (.txt, .srt, .vtt) using the import button
- Use AI Generate to create content from a prompt
- Choose from Templates for common scenarios (ads, education, stories, etc.)
Adding Pauses
Insert <#0.5#> in your text to add a 0.5-second pause between sentences. This is great for dramatic effect or natural pacing.
Welcome to today's lesson. <#1.0#> We'll be exploring artificial intelligence.
Step 4: Adjust Settings
Fine-tune your audio with these controls:
| Setting | Range | Default | Notes |
|---|---|---|---|
| Speed | 0.5x – 2.0x | 1.0x | Slower for narration, faster for news |
| Pitch | -12 to +12 | 0 | Adjust voice tone |
| Volume | 0 – 10 | 5 | Output volume level |
VoxFlow uses the latest TTS model by default, optimized for both speed and quality.
Step 5: Generate and Export
Click the Generate button (or press Ctrl+Enter). Your audio will be synthesized in seconds.
Once generated:
- Play the audio to review
- Download as WAV or MP3
- Check the History panel to compare previous generations
Advanced: Multi-Voice Dubbing
For content with multiple speakers, switch to the Dubbing Editor:
- Format your text as
Speaker: dialogue - The editor auto-detects speakers
- Assign different voices to each speaker
- Synthesize all at once with natural timing
Alice: Welcome to our podcast about AI innovation.
Bob: Thanks for having me! I'm excited to discuss this.
Alice: Let's start with the basics. What is generative AI?
Quota Usage
Each TTS generation costs quota based on text length:
- 100 quota = 1 basic TTS synthesis
- Free tier: 10,000 monthly quota
- See Plans & Quota for upgrade options
Next Steps
- Try Voice Cloning to create a custom voice
- Explore the AI Podcast Generator for automated podcast production
- Check out Video Dubbing for multilingual video content
Happy creating! 🎧