In Tanzania πΉπΏ, scammers are getting smarter. They often pretend to be someone you know or trust a relative, a friend, a landlord, or even a job recruiter. Their goal? To trick you into sending them money.
Youβve probably seen texts like:
- β
Ni tumie kwa namba hii Jina litakuja SALOME KALUNGA, hiyo ni namba yangu mpya ya Halotel
β- β
Utanitumia kwenye ii 0615810764 airtel jina MARIAM NDUGAI namba yangu inadeni usiitumie
β- β
MZEE LUKA KIMBANGU tiba asili biashala kazi masomo utajili kesi kuludisha mke&mume piga (0787-406-889)(0787-406-889)
β- β
666,KARIBU FREEMASON UTIMIZE NDOTO KATIKA BIASHARA, KILIMO,UFUGAJI,MACHI MBO,MICHEZO N.K KWAMHITAJI KUJIUNGA PG: 0786543210 AU 0786543210
β
These messages are dangerous, deceptive, and sadly, very common.
As a Tanzanian tech enthusiast and developer, I wanted to do something about it.
So I created Bongoscam dataset an open dataset of over 1,500 Swahili SMS scam examples, and a basic machine learning model to help detect them.
π The Dataset: Swahili SMS Detection
I collected and labeled 1,508 real Swahili messages, split into two categories:
-
scam
:Suspicious, misleading, or fraudulent messages.
-
trust
:Legitimate or safe messages.
Example entries:
category | sms |
---|---|
scam | "IYO PESA ITUME KWENYE NAMBA HII 0657538690 JINA ITALETA Magomba Maila" |
trust | "Nashukuru kwa kupokea simu yangu. Tutalifanyia kazi." |
β‘οΈ Download the dataset on Kaggle:
π₯ swahili-sms-detection
π§ The Model: Simple but Effective
To demonstrate whatβs possible, I built a lightweight machine learning model using:
- π§Ή CountVectorizer
for converting text to numeric features
- π€ Multinomial Naive Bayes
classifier
- π 98.7% accuracy on test data
The model is wrapped in a Flask API and deployed as a simple website for public use.
You can test it live here:
π bongoscam.vercel.app
π¦ Project Structure
You can explore or contribute via GitHub:
π GitHub: BongoScamDetection
# Clone the repo
git clone https://github.com/Henryle-hd/BongoScamDetection
cd bongoscam
# Install frontend
cd frontend
npm install
# Install backend
cd backend
pip install -r requirements.txt
# Run backend
python main.py
# Run frontend
npm run dev
π API Example
Endpoint: POST /api/predict
Request:
{
"sms": "Iyo ela tuma humu kwenye vodacom 0655251448 Jina lije ALLY ISSA"
}
Response:
{
"prediction": "scam",
"sms": "Iyo ela tuma humu kwenye vodacom 0655251448 Jina lije ALLY ISSA"
}
π Why This Matters
This project isnβt just about coding. Itβs about digital safety.
Millions of people in East Africa rely on SMS for communication.
Without strong tools or education, theyβre vulnerable.
By:
- Open-sourcing the data
- Making the model public
- Supporting Swahili language
...I'm hoping this becomes a starting point for more localized ML solutions β in Swahili, for Africa, by Africans.
βοΈ Final Thoughts
BongoScam dataset is a small step toward fighting digital fraud in Tanzania, but I believe it can grow with your input.
If you're a:
- Developer π§βπ»
- Linguist π
- Security researcher π
- Student π
β¦thereβs something in this project for you.
π Test the tool at bongoscam.vercel.app
π Explore the dataset on Kaggle
π Contribute code via GitHub
π¬ Got feedback or want to collaborate? Drop a comment or find me on LinkedIn or GitHub.
Letβs build AI that speaks Swahili and protects people, not just data.
Top comments (2)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.