Back

Getting Started with VoxFlow TTS

Text-to-Speech (TTS) is the foundation of VoxFlow Studio. In this tutorial, we'll walk you through creating your first AI voiceover — from selecting a voice to exporting the final audio.

Step 1: Open the TTS Editor

Navigate to VoxFlow Studio and sign in. The TTS editor is the default view on the home page.

You'll see:

  • A text input area on the left
  • A voice selector on the right
  • Settings for speed, pitch, and volume below

Step 2: Choose a Voice

Click on the voice selector to open the Voice Library. You can browse voices by:

  • Language — Chinese, English, Japanese, Cantonese, and more
  • Gender — Male or Female
  • Style — Professional, Casual, Narrative, News, etc.

Pro tip: Use the "AI Select" feature to describe your ideal voice in natural language. For example: "A warm and gentle female voice for bedtime stories" — and AI will recommend the best match.

Each voice has a preview button (🔊) so you can hear a sample before selecting.

Step 3: Enter Your Text

Type or paste your text into the editor. You can also:

  • Import a text file (.txt, .srt, .vtt) using the import button
  • Use AI Generate to create content from a prompt
  • Choose from Templates for common scenarios (ads, education, stories, etc.)

Adding Pauses

Insert <#0.5#> in your text to add a 0.5-second pause between sentences. This is great for dramatic effect or natural pacing.

Welcome to today's lesson. <#1.0#> We'll be exploring artificial intelligence.

Step 4: Adjust Settings

Fine-tune your audio with these controls:

Setting Range Default Notes
Speed 0.5x – 2.0x 1.0x Slower for narration, faster for news
Pitch -12 to +12 0 Adjust voice tone
Volume 0 – 10 5 Output volume level

VoxFlow uses the latest TTS model by default, optimized for both speed and quality.

Step 5: Generate and Export

Click the Generate button (or press Ctrl+Enter). Your audio will be synthesized in seconds.

Once generated:

  1. Play the audio to review
  2. Download as WAV or MP3
  3. Check the History panel to compare previous generations

Advanced: Multi-Voice Dubbing

For content with multiple speakers, switch to the Dubbing Editor:

  1. Format your text as Speaker: dialogue
  2. The editor auto-detects speakers
  3. Assign different voices to each speaker
  4. Synthesize all at once with natural timing
Alice: Welcome to our podcast about AI innovation.
Bob: Thanks for having me! I'm excited to discuss this.
Alice: Let's start with the basics. What is generative AI?

Quota Usage

Each TTS generation costs quota based on text length:

  • 100 quota = 1 basic TTS synthesis
  • Free tier: 10,000 monthly quota
  • See Plans & Quota for upgrade options

Next Steps

  • Try Voice Cloning to create a custom voice
  • Explore the AI Podcast Generator for automated podcast production
  • Check out Video Dubbing for multilingual video content

Happy creating! 🎧