Research Sharing System

Self-explaining emotion classification through preference-aligned large language models

Views: 32

Author : Muhammad Hammad Fahim Siddiqui, Diana Inkpen, and Alexander Gelbukh

Affiliation : University of Ottawa

Country : Mexico

Category : NLP

Volume, Issue, Month, Year : 15, 10, May, 2025

Abstract :

Recent advancements in large language models (LLMs) have shown promise for NLP applications, yet producing accurate explanations remains a challenge. In this work, we introduce a self-explaining model for classifying emotions in X posts and construct a novel preference dataset using chain-of-thought prompting in GPT-4o. Using this dataset, we guide GPT-4o with preference alignment via the Direct Preference Optimization (DPO). Beyond GPT-4o, we adapt smaller models such as LLaMA 3 (8B) and DeepSeek (32B distilled) through preference tuning using Odds Ratio Preference Optimization (ORPO), significantly boosting their classification accuracy and explanation quality. Our approach achieves state-ofthe-art performance (68.85%) on the SemEval 2018 E-c multilabel emotion classification benchmark, exhibits comparable results on the DAIR AI multiclass dataset and attains a high sufficiency score—indicating the standalone effectiveness of the generated explanations. These findings highlight the impact of preference alignment for improving interpretability and enhancing classification.

Keyword : LLMs, preference alignment, emotion classification

Journal/ Proceedings Name : CS & IT

URL : https://aircconline.com/csit/abstract/v15n10/csit151027.html

User Name : alex
Posted 20-05-2026 on 14:25:13 AEDT

Related Research Work