NSU Logo

Capstone Project Showcase

CSE499B.16

December 2, 2024
Summer 2024
Department of ECE
North South University

Bangla Text Summarizer

Transforming lengthy Bengali articles into concise summaries using advanced NLP techniques

SummarizationTransformersPyTorchBengalimt5text2text-generation
Bytes of Bengal
Model Size
300M params
Tensor Type
F32
Base Model
google/mt5-small
Type
Seq2Seq
Training Length
210 Epochs
Training Data
20k News Articles
Latest Model
v4
Publisher
Hugging Face
License
MIT

Supervisor

Dr. Nafisa Noor [NaNr]
Assistant Professor
Dr. Nafisa Noor [NaNr]'s picture

Team Members

Md Tashfiqul Islam
161 1593 042
Md Tashfiqul Islam's picture
Tashin Mahmud Khan
201 1819 042
Tashin Mahmud Khan's picture
Amir Hamja Marjan
202 1171 642
Amir Hamja Marjan's picture
Md Simul Hossain
171 1949 642
Md Simul Hossain's picture

Problem Statement

The digital age has brought an unprecedented surge in Bangla news content, creating a paradox of information abundance and time scarcity.

Content Overload

daily articles

1000+

Time Constraint

reading time/day

4-6h

Information Discovery

irrelevant content

60%

Content Length

words/article

1500+

Impact on Readers

Reduced reader engagement

Information overload

Digital divide growth

Project Overview

The Bangla Text Summarizer, developed for CSE499B, leverages advanced NLP to create concise, accurate summaries of Bengali news articles, tackling digital information overload.

User Input

Bangla Long Essay
Raw Content
200-5000 Words
Text Summarizer

Model Output

Summarized Content
Concise Summary
~ 150 Words
CSE499B
Senior Design Project II

Key Features

Bengali Article Library

Extensive collection of diverse Bengali articles

Bengali Article Input

Seamless integration of new Bengali content

Concise Summaries

AI-powered extraction of key information

Responsive Design

Optimal viewing on all devices

Dark Mode Support

Enhanced reading experience in low light

Real-time Processing

Instant summarization and analysis

Revolutionizing Bengali content consumption through cutting-edge AI technology

Project Timeline

Web Crawler

Month 1

Data Collection

Month 2

Model Training

Month 3-4

Model Testing

Month 5

Web Interface

Month 6

Model + UI Connection

Month 7

Web Deploy

Month 8

System Architecture

System Architecture Diagram

Tech Stack

Model Tech Stack

  • Python logo
    Python
    v3.13.0
  • NumPy logo
    NumPy
    v2.1.3
  • Pandas logo
    Pandas
    v2.2.3
  • PyTorch logo
    PyTorch
    v2.5.1
  • spaCy logo
    spaCy
    v3.8.0
  • Matplotlib logo
    Matplotlib
    v3.9.2

Web UI Tech Stack

  • React logo
    React
    v19.0.0-rc
  • Next.js logo
    Next.js
    v15.0.3
  • TypeScript logo
    TypeScript
    v5.7.2
  • ESLint logo
    ESLint
    v9.15.0
  • Tailwind CSS logo
    Tailwind CSS
    v3.4.1
  • shadcn/ui logo
    shadcn/ui
    v2.1.6

ML/Dev Ops

  • Hugging Face logo
    Hugging Face
    v4.46.3
  • Inference API logo
    Inference API
    latest
  • Vercel logo
    Vercel
    v39.1.2

Training Loss

This table presents a comprehensive overview of our model's training progress, showcasing the evolution of loss across multiple rounds with varying epochs and data sizes.

RoundLossEpochsModelData Size
16.99873v310k
22.25383v310k
31.01435v310k
42.5004100v420k

BLEU & METEOR Scores

Content Coverage

BERTScore

Light Mode

Light Mode Preview

Dark Mode

Dark Mode Preview

Website QR Code

Scan to Visit

Github QR Code

Scan for GitHub