DEV Community

Cover image for How to Build PolySpeak: A Multilingual Phrase Practice App Using Python and ElevenLabs TTS
Developer Service
Developer Service

Posted on • Originally published at developer-service.blog

How to Build PolySpeak: A Multilingual Phrase Practice App Using Python and ElevenLabs TTS

Pronunciation is one of the most difficult aspects of language learning — and it becomes even more challenging when learners are working across multiple languages.

Traditional tools like flashcards or dictionary apps often fall short when it comes to helping users hear and mimic native-like speech.

Worse still, many text-to-speech (TTS) tools sound robotic or are limited to one language at a time, making it hard for learners to practice mixed-language phrases in a natural context.

This is where modern AI-powered TTS can make a huge difference.

By generating realistic, human-like voices from text input, learners can hear how native speakers would say a phrase — with correct pronunciation, rhythm, and intonation.

It's an especially powerful tool for solo learners without easy access to native speakers or tutors.

Among the available options, ElevenLabs stands out with its multilingual text-to-speech model, which can understand and fluidly pronounce over 30 languages — even when they're mixed in the same sentence.

In this tutorial, you’ll build PolySpeak, a simple web app that helps users practice multilingual phrases using ElevenLabs.

With just a few lines of Python and the ElevenLabs API, you'll create an interface where learners can input phrases, choose a voice, and instantly hear natural-sounding speech — all from their browser.

Whether you’re a language enthusiast, educator, or developer, this project is a great way to explore the power of multilingual TTS in a real-world use case.


SPONSORED By Python's Magic Methods - Beyond init and str

CTA Image

This book offers an in-depth exploration of Python's magic methods, examining the mechanics and applications that make these features essential to Python's design.

Get the eBook


Tools & Technologies

To build PolySpeak, we’ll use a small but powerful set of tools designed for rapid prototyping, clean UI, and seamless integration with ElevenLabs’ API.

Here’s a breakdown of what you’ll need:

Python 3.7+

The core language for this project. Python’s readability and vast ecosystem make it perfect for working with APIs and building lightweight web apps.

If you don’t already have Python installed, download it from python.org.

Streamlit

Streamlit is an open-source Python library that turns scripts into shareable web apps with minimal effort.

It's perfect for quickly building interactive tools without needing to dive into frontend code.

We'll use Streamlit to:

  • Display a text input area for multilingual phrases
  • Let users choose a voice
  • Trigger speech generation
  • Play and download the resulting audio

ElevenLabs Python SDK

ElevenLabs provides a Python SDK that makes it easy to access their text-to-speech features, including:

  • Multilingual voice synthesis with native-like pronunciation
  • Voice selection and customization
  • Audio generation and export

Full source code available at: https://github.com/nunombispo/PolySpeak-Article


Getting Started

Before we dive into building the PolySpeak app, let’s set up the essentials: creating an ElevenLabs account, installing dependencies.

ElevenLabs - Sign Up and Get Your API Key

  • Go to Eleven Labs and create a free account.
  • Once logged in, navigate to the "API" section of your dashboard.
  • Copy your API key (or create a new one) - this key is required to authenticate with the ElevenLabs service and should be kept private.

Install Required Python Packages

We’ll use the elevenlabs SDK for accessing the TTS API, streamlit for building the web interface, and python-dotenv for environment variables.

Install them using pip:

pip install elevenlabs streamlit python-dotenv
Enter fullscreen mode Exit fullscreen mode

Building the PolySpeak

With our tools and API set up, let’s now build the core functionality of the PolySpeak app — a lightweight interface for practicing multilingual phrases with ElevenLabs-generated audio.

Project Structure

Start with a simple file structure:

polyspeak/
│
├── .env                 # To store your ElevenLabs API key
├── app.py               # The main Streamlit app
└── requirements.txt     # For dependency management
Enter fullscreen mode Exit fullscreen mode

In your .env file, add:

ELEVEN_API_KEY=your-api-key-here
Enter fullscreen mode Exit fullscreen mode

This keeps your API key secure and separate from the source code.

App Code

Save the provided code into app.py:

import os
import streamlit as st
from elevenlabs import generate, set_api_key, voices
from dotenv import load_dotenv


# Load environment variables
load_dotenv()


# Set page config
st.set_page_config(
    page_title="PolySpeak - Language Practice",
    page_icon="🗣️",
    layout="wide"
)


# Initialize ElevenLabs API key
set_api_key(os.getenv("ELEVEN_API_KEY"))


# Get available voices
def get_available_voices():
    """Get list of available voices from ElevenLabs"""
    try:
        available_voices = voices()
        return {voice.name: voice.voice_id for voice in available_voices}
    except Exception as e:
        st.error(f"Error fetching voices: {str(e)}")
        return {}


# Generate audio
def generate_audio(text, voice_id):
    """Generate audio using ElevenLabs API"""
    try:
        audio = generate(
            text=text,
            voice=voice_id,
            model="eleven_multilingual_v2"
        )
        return audio
    except Exception as e:
        st.error(f"Error generating audio: {str(e)}")
        return None


# App title and description
st.title("🗣️ PolySpeak")
st.markdown("Practice pronunciation in multiple languages using AI-powered text-to-speech")


# Get available voices
voice_options = get_available_voices()


# Input section
with st.container():
    st.subheader("Enter Text to Practice")
    text_input = st.text_area(
        "Type or paste your text here",
        height=100,
        placeholder="Enter text in any language..."
    )

    # Voice selection
    selected_voice = st.selectbox(
        "Select a voice",
        options=list(voice_options.keys()),
        index=0 if voice_options else None
    )


# Generate audio button
if st.button("Generate Audio", type="primary"):
    if not text_input:
        st.warning("Please enter some text to practice")
    else:
        with st.spinner("Generating audio..."):
            audio = generate_audio(text_input, voice_options[selected_voice])
            if audio:
                st.audio(audio)

                # Download button
                st.download_button(
                    label="Download Audio",
                    data=audio,
                    file_name="practice_audio.mp3",
                    mime="audio/mpeg"
                )


# Footer
st.markdown("---")
st.markdown("Built with ❤️ using Streamlit and [ElevenLabs](https://try.elevenlabs.io/47hqn085lo2c) by [Developer Service](https://developer-service.blog)") 

Enter fullscreen mode Exit fullscreen mode

It does the following:

  • Loads your ElevenLabs API key securely.
  • Uses Streamlit to render the UI.
  • Fetches available voices from ElevenLabs.
  • Lets users enter text, select a voice, generate audio, and download the output.

Running The App

Run the app:

streamlit run app.py
Enter fullscreen mode Exit fullscreen mode

Open your browser to the displayed URL (usually http://localhost:8501) and try entering multilingual text like:

Good morning! 

Buongiorno! 

Guten Morgen! 

おはよう!
Enter fullscreen mode Exit fullscreen mode

You’ll hear the voice fluidly switch between languages.

Let's see a concrete example in practice:

Full source code at: https://github.com/nunombispo/PolySpeak-Article


Features & Functionality

Let’s break down the main features of the app and how they help language learners:

This simple interface becomes a powerful pronunciation tool, whether you're a beginner, polyglot, or educator designing learning aids.

Features


Ideas for Extending PolySpeak

Want to take this further?

Here are some ideas to evolve the app:

  • CSV Upload: Let users bulk-upload phrases and get batch audio output.
  • Flashcard Mode: Randomize phrases and hide the text until after playback.
  • Voice Cloning Integration: If users upgrade to the Starter or Creator plan, they could practice in their own cloned voice!
  • Bookmark Favorites: Save frequently practiced phrases for daily review.

These features would help learners create personalized, scalable pronunciation exercises.

If you're planning to use this tool regularly — or integrate it into your own product — consider using ElevenLabs’ premium offerings:

Starter Plan ($5/month)

  • Higher character limits, more voice slots.
  • Instant Voice Cloning: Just a 1-minute sample needed — perfect for creators on a budget or anyone curious about using their own voice for language learning.

Creator Plan ($22/month)

  • Professional Voice Cloning with longer training samples.
  • Projects: Convert entire scripts into rich, multi-speaker voiceovers — ideal for audiobooks or long-form content.

Conclusion

You’ve now built PolySpeak — a multilingual TTS-powered app to help learners practice pronunciation using ElevenLabs' industry-leading speech synthesis.

This project is a great launchpad for:

  • Language educators
  • Indie app developers
  • AI tool explorers
  • Anyone interested in multilingual speech applications

With realistic AI voices and seamless language blending, ElevenLabs opens up a world of creative possibilities.

Whether you stick with the basics or integrate premium features like voice cloning, you’re now equipped to build tools that help people connect across languages.


Follow me on Twitter: https://twitter.com/DevAsService

Follow me on Instagram: https://www.instagram.com/devasservice/

Follow me on TikTok: https://www.tiktok.com/@devasservice

Follow me on YouTube: https://www.youtube.com/@DevAsService

Top comments (0)