Skip to content

coder7475/sentiment_analysis_bangla

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Investigating the Impact of Sentiment Label Imbalance on Model Performance in Bangla Sentiment Analysis

Research Problem

Sentiment analysis models often suffer from degraded performance when trained on datasets with imbalanced class distributions, where certain sentiment labels (e.g., Positive, Negative, or Neutral) dominate. In the context of the Bangla Sentiment Dataset, which includes diverse sources like newspapers, social media, and blogs, such imbalances are likely due to the real-world nature of the data (e.g., social media may skew Negative due to criticism). This issue is particularly pronounced in low-resource languages like Bangla, where limited labeled data exacerbates the challenge, yet little research has explored effective mitigation strategies for Bangla sentiment analysis.

Research Aim

To investigate the impact of class imbalance on the performance of sentiment classification models using the Bangla Sentiment Dataset and to evaluate the effectiveness of techniques such as oversampling, undersampling, and weighted loss functions in improving model accuracy and robustness across diverse Bangla text sources.

Research Question

How does class imbalance in the Bangla Sentiment Dataset affect sentiment classification performance, and can techniques like oversampling (e.g., SMOTE), undersampling, or weighted loss functions mitigate these effects to improve model accuracy and generalization?

Table of Contents

Installation

To set up the project, clone the repository and run the folder_structure_setup.sh script to create the necessary directories and files.

To install virtual environment:

python3 -m venv venv

To activate the Python virtual environment, navigate to the project directory in your terminal and run the following command:

source venv/bin/activate

To install all packages from requirements.txt, run the following command:

pip install -e .

Usage

First run the the setup python file to initialize:

python3 setup.py

Dataset

The Bangla Sentiment Dataset is a curated collection of sentiment-rich textual data in Bangla, focused on recent and trending topics. This dataset has been compiled from diverse sources, including Bangladeshi online newspapers, social media platforms, and blogs, ensuring a wide spectrum of language styles and sentiment expressions.


Key Features


Focus on Recent Topics

The dataset emphasizes contemporary issues, trending discussions, and popular topics in Bangladeshi society. This includes sentiments on political developments, social movements, entertainment, cultural events, and other recent happenings.


Source Variety
  • Online Newspapers: Articles, editorials, headlines, and reader comments provide structured and semi-formal sentiment data.
  • Social Media: Posts, tweets, and comments reflect informal, conversational language with high emotional expressiveness.
  • Blogs: Opinion pieces and discussions offer detailed and context-rich sentiment content.

Sentiment Labels

Each entry in the dataset is annotated with one of the following sentiment categories:

  • Positive (1): Texts expressing happiness, agreement, or optimism.
  • Negative (0): Texts reflecting criticism, disagreement, or pessimism.
  • Neutral (2): Texts presenting balanced or factual statements with minimal emotional bias.

Linguistic and Stylistic Diversity

The dataset captures a range of Bangla language variations, including:

  • Formal and informal Bangla usage
  • Regional dialects
  • Transliterated Bangla (Banglish) commonly used on social media

Real-World Context

The inclusion of recent topics ensures that the dataset is relevant for analyzing public sentiment around current events and trends. This makes it particularly useful for real-time sentiment analysis applications.


This dataset provides an invaluable resource for researchers and practitioners aiming to explore sentiment analysis in Bangla, with a special emphasis on modern-day relevance and real-world applicability.

Directory Structure

├── data-source
├── src
│   ├── __init__.py
│   ├── components
│   │   ├── __init__.py
│   │   ├── data_ingestion.py
│   │   ├── data_transformation.py
│   │   ├── model_monitoring.py
│   │   └── model_trainer.py
│   ├── exception.py
│   ├── logger.py
│   ├── pipelines
│   │   ├── __init__.py
│   │   ├── prediction_pipeline.py
│   │   └── training_pipeline.py
│   └── utils.py
├── .gitignore
├── main.py
├── app.py
├── EDA.ipynb
├── README.md
├── requirements.txt
├── folder_structure_setup.sh
├── test-logging-integration.py
└── test-request.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages