Language learners often ask: “Do I sound American or British?”
As a non-native English speaker and dev, I’ve always been curious about how my accent sounds to others. That curiosity led me to build accentvoice.net — a web tool that analyzes your spoken English and identifies your accent using speech recognition and classification models.
It started as a weekend project, but quickly evolved into something much more fun (and useful) than I expected.
🧠 What Is accentvoice.net?
accentvoice.net is a browser-based tool where users can record or upload a voice sample, and the system will:
Transcribe the speech using ASR (automatic speech recognition);
Analyze the speech signal and extract acoustic features;
Classify the accent based on pre-trained deep learning models;
Return a label like “American”, “British”, “Indian”, “Australian”, etc., with a confidence score and radar visualization.
It’s designed for:
Language learners working on pronunciation;
Voice actors checking how well they imitate an accent;
Developers interested in integrating accent detection into voice-enabled products.
🧩 Tech Stack & Architecture
Here’s a breakdown of how the system works under the hood:
🎤 1. Audio Preprocessing
pydub and ffmpeg for denoising, trimming, and normalization;
Voice Activity Detection (VAD) to remove silence;
Feature extraction using MFCCs (Mel-Frequency Cepstral Coefficients).
🗣️ 2. Speech-to-Text (ASR)
Uses OpenAI’s Whisper for initial transcription;
Considering replacing it with a lightweight Wav2Vec2 or Conformer model to reduce inference time;
Transcripts retain prosodic features for deeper phonetic analysis.
🔍 3. Accent Classification Model
Core model is a fine-tuned ECAPA-TDNN neural network;
Complemented by an XGBoost multiclass classifier for fallback validation;
Training datasets: Common Voice, L2-Arctic, custom accented samples;
Output is a softmax-based label distribution mapped to accent categories.
🌐 4. Web Frontend + API
API built with Flask + Gunicorn + Celery (async task processing);
React frontend with WaveSurfer.js to visualize audio waveform;
RESTful interface allows for future API integrations.
💡 Why Build This?
Accent is a major part of speech identity, but also one of the hardest things to measure. Unlike grammar or vocabulary, it’s not binary — it’s gradient and fuzzy.
I’ve seen a lot of ASR or TTS tools, but very few that give transparent feedback on pronunciation style or regional speech traits. That’s what I wanted to explore with this project.
Some cool use cases I’ve seen since launch:
ESL learners using it weekly to measure accent improvement;
Podcasters testing how “neutral” their voices sound;
Devs experimenting with accent-based personalization in apps (e.g., custom voice bots, learning platforms).
🚀 What's Coming Next?
Expansion into multilingual accent detection (e.g., Mandarin dialects, Spanish regional variants);
A developer-ready Accent Detection API;
“Benchmark mode” — compare your accent to target samples (e.g., RP British, General American);
AI-based pronunciation coach with suggestions on how to improve;
Visualization of accent drift over time via user history tracking.
🌐 Try It Live
The tool is live here:
👉 https://accentvoice.net
Give it a shot — record your voice, and see what accent the model hears. 😄
(Works best with ~5–10 seconds of clear English speech.)
📬 Open Questions
I'm looking for feedback from fellow devs:
Would you use accent detection in any of your projects?
Any ideas on integrating this into language learning or accessibility tools?
If I release the API, what features would you expect?
Happy to answer questions or dive deeper into the architecture. Thanks for reading!
Top comments (0)