Skip to main navigation Skip to search Skip to main content

Leveraging Generative Artificial Intelligence for Enhanced Data Augmentation in Emotion Intensity Classification: A Comprehensive Framework for Cross-Dataset Transfer Learning

  • University of Sheffield

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

Data scarcity and stylistic heterogeneity pose major challenges for emotion intensity classification. This paper presents a cross-dataset augmentation framework that leverages prompt-conditioned generative models alongside deterministic and heuristic transformations to synthesize target-style examples for improved transfer learning. We introduce a unified taxonomy of augmentation strategies—Heuristic Lexical Perturbation (HLA), Prompt-Conditioned Generative Augmentation (CGA), Sequential Hybrid Pipeline (SHA), Rule-Guided Style Adaptation (DSGA), and Enhanced Hybrid Augmentation (EHA)—and detail an interpretability-oriented prompt engineering approach that conditions LLMs on authentic target exemplars and stylistic features extracted from the target dataset.

Augmented datasets were evaluated using multi-dimensional quality metrics (transformation quality, stylistic consistency, BLEU/CHRF, Self-BLEU, uniqueness) and downstream classification via a two-phase BERT-LSTM training with rigorous statistical testing. During source dataset pretraining and subsequent target dataset fine-tuning, CGA achieved the highest single-method gains in F1 and accuracy (F1 = 0.8816; accuracy = 0.8819, 95\% CI recalculated). HLA and SHA exhibited improved cross-domain stability, suggesting stronger domain-generalizable features. We observe systematic trade-offs between fluency, lexical diversity, and emotion fidelity: high surface similarity often correlates with classifier performance but does not fully capture affective authenticity.

We discuss methodological pitfalls, propose best practices for emotion-aware augmentation, and provide reproducible artifacts (prompts, example transformations, evaluation scripts) to facilitate further research in affective NLP.
Original languageEnglish
Title of host publicationGenerative AI in Intelligent Systems and Applications: Unleashing the Potential
PublisherSpringer Verlag
Publication statusPublished - 2026

Keywords

  • Cross-domain Transfer Learning
  • Emotion-aware NLP
  • LLM-based Data Augmentation
  • Text Generation Evaluation

Fingerprint

Dive into the research topics of 'Leveraging Generative Artificial Intelligence for Enhanced Data Augmentation in Emotion Intensity Classification: A Comprehensive Framework for Cross-Dataset Transfer Learning'. Together they form a unique fingerprint.

Cite this