DEV Community

Juddiy
Juddiy

Posted on

I Built an AI Tool That Detects Your English Accent — Here’s How It Works (accentvoice.net)

Image descriptionLanguage learners often ask: “Do I sound American or British?”
As a non-native English speaker and dev, I’ve always been curious about how my accent sounds to others. That curiosity led me to build accentvoice.net — a web tool that analyzes your spoken English and identifies your accent using speech recognition and classification models.

It started as a weekend project, but quickly evolved into something much more fun (and useful) than I expected.

🧠 What Is accentvoice.net?
accentvoice.net is a browser-based tool where users can record or upload a voice sample, and the system will:

Transcribe the speech using ASR (automatic speech recognition);

Analyze the speech signal and extract acoustic features;

Classify the accent based on pre-trained deep learning models;

Return a label like “American”, “British”, “Indian”, “Australian”, etc., with a confidence score and radar visualization.

It’s designed for:

Language learners working on pronunciation;

Voice actors checking how well they imitate an accent;

Developers interested in integrating accent detection into voice-enabled products.

🧩 Tech Stack & Architecture
Here’s a breakdown of how the system works under the hood:

🎤 1. Audio Preprocessing
pydub and ffmpeg for denoising, trimming, and normalization;

Voice Activity Detection (VAD) to remove silence;

Feature extraction using MFCCs (Mel-Frequency Cepstral Coefficients).

🗣️ 2. Speech-to-Text (ASR)
Uses OpenAI’s Whisper for initial transcription;

Considering replacing it with a lightweight Wav2Vec2 or Conformer model to reduce inference time;

Transcripts retain prosodic features for deeper phonetic analysis.

🔍 3. Accent Classification Model
Core model is a fine-tuned ECAPA-TDNN neural network;

Complemented by an XGBoost multiclass classifier for fallback validation;

Training datasets: Common Voice, L2-Arctic, custom accented samples;

Output is a softmax-based label distribution mapped to accent categories.

🌐 4. Web Frontend + API
API built with Flask + Gunicorn + Celery (async task processing);

React frontend with WaveSurfer.js to visualize audio waveform;

RESTful interface allows for future API integrations.

💡 Why Build This?
Accent is a major part of speech identity, but also one of the hardest things to measure. Unlike grammar or vocabulary, it’s not binary — it’s gradient and fuzzy.

I’ve seen a lot of ASR or TTS tools, but very few that give transparent feedback on pronunciation style or regional speech traits. That’s what I wanted to explore with this project.

Some cool use cases I’ve seen since launch:

ESL learners using it weekly to measure accent improvement;

Podcasters testing how “neutral” their voices sound;

Devs experimenting with accent-based personalization in apps (e.g., custom voice bots, learning platforms).

🚀 What's Coming Next?
Expansion into multilingual accent detection (e.g., Mandarin dialects, Spanish regional variants);

A developer-ready Accent Detection API;

“Benchmark mode” — compare your accent to target samples (e.g., RP British, General American);

AI-based pronunciation coach with suggestions on how to improve;

Visualization of accent drift over time via user history tracking.

🌐 Try It Live
The tool is live here:
👉 https://accentvoice.net

Give it a shot — record your voice, and see what accent the model hears. 😄
(Works best with ~5–10 seconds of clear English speech.)

📬 Open Questions
I'm looking for feedback from fellow devs:

Would you use accent detection in any of your projects?

Any ideas on integrating this into language learning or accessibility tools?

If I release the API, what features would you expect?

Happy to answer questions or dive deeper into the architecture. Thanks for reading!

Top comments (0)