Getting Started with VoxFlow TTS

VoxFlow Team 2026-03-20 3 min read

Text-to-Speech (TTS) is the foundation of VoxFlow Studio. In this tutorial, we'll walk you through creating your first AI voiceover — from selecting a voice to exporting the final audio.

Step 1: Open the TTS Editor

Navigate to VoxFlow Studio and sign in. The TTS editor is the default view on the home page.

You'll see:

A text input area on the left
A voice selector on the right
Settings for speed, pitch, and volume below

Step 2: Choose a Voice

Click on the voice selector to open the Voice Library. You can browse voices by:

Language — Chinese, English, Japanese, Cantonese, and more
Gender — Male or Female
Style — Professional, Casual, Narrative, News, etc.

Pro tip: Use the "AI Select" feature to describe your ideal voice in natural language. For example: "A warm and gentle female voice for bedtime stories" — and AI will recommend the best match.

Each voice has a preview button (🔊) so you can hear a sample before selecting.

Step 3: Enter Your Text

Type or paste your text into the editor. You can also:

Import a text file (.txt, .srt, .vtt) using the import button
Use AI Generate to create content from a prompt
Choose from Templates for common scenarios (ads, education, stories, etc.)

Adding Pauses

Insert <#0.5#> in your text to add a 0.5-second pause between sentences. This is great for dramatic effect or natural pacing.

Welcome to today's lesson. <#1.0#> We'll be exploring artificial intelligence.

Step 4: Adjust Settings

Fine-tune your audio with these controls:

Setting	Range	Default	Notes
Speed	0.5x – 2.0x	1.0x	Slower for narration, faster for news
Pitch	-12 to +12	0	Adjust voice tone
Volume	0 – 10	5	Output volume level

VoxFlow uses the latest TTS model by default, optimized for both speed and quality.

Step 5: Generate and Export

Click the Generate button (or press Ctrl+Enter). Your audio will be synthesized in seconds.

Once generated:

Play the audio to review
Download as WAV or MP3
Check the History panel to compare previous generations

Advanced: Multi-Voice Dubbing

For content with multiple speakers, switch to the Dubbing Editor:

Format your text as Speaker: dialogue
The editor auto-detects speakers
Assign different voices to each speaker
Synthesize all at once with natural timing

Alice: Welcome to our podcast about AI innovation.
Bob: Thanks for having me! I'm excited to discuss this.
Alice: Let's start with the basics. What is generative AI?

Quota Usage

Each TTS generation costs quota based on text length:

100 quota = 1 basic TTS synthesis
Free tier: 10,000 monthly quota
See Plans & Quota for upgrade options

Next Steps

Try Voice Cloning to create a custom voice
Explore the AI Podcast Generator for automated podcast production
Check out Video Dubbing for multilingual video content

Happy creating! 🎧