Recent advancements in large language models (LLMs) have shown promise for NLP applications, yet producing accurate explanations remains a challenge. In this work, we introduce a self-explaining
model for classifying emotions in X posts and construct a novel preference dataset using chain-of-thought
prompting in GPT-4o. Using this dataset, we guide GPT-4o with preference alignment via the Direct
Preference Optimization (DPO). Beyond GPT-4o, we adapt smaller models such as LLaMA 3 (8B) and
DeepSeek (32B distilled) through preference tuning using Odds Ratio Preference Optimization (ORPO),
significantly boosting their classification accuracy and explanation quality. Our approach achieves state-ofthe-art performance (68.85%) on the SemEval 2018 E-c multilabel emotion classification benchmark, exhibits comparable results on the DAIR AI multiclass dataset and attains a high sufficiency score—indicating
the standalone effectiveness of the generated explanations. These findings highlight the impact of preference
alignment for improving interpretability and enhancing classification.