Morpheme-Aware Hybrid Sentiment Analysis for Azerbaijani: A Lexicon–Embedding Fusion with Domain Adaptation


Aykac Y. E., ÖZKAN M., SAMET R.

IEEE Access, vol.14, pp.18828-18856, 2026 (SCI-Expanded, Scopus) identifier

  • Publication Type: Article / Article
  • Volume: 14
  • Publication Date: 2026
  • Doi Number: 10.1109/access.2026.3659042
  • Journal Name: IEEE Access
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Page Numbers: pp.18828-18856
  • Keywords: Azerbaijani, domain adaptation, low-resource languages, morphology-aware classification, neutral-class calibration, parameter-efficient fine-tuning, polarity lexicon, Sentiment analysis
  • Ankara University Affiliated: Yes

Abstract

Sentiment analysis for morphologically rich, low-resource languages remains challenging, particularly in three-class settings (negative/neutral/positive) and under domain shift. We study Azerbaijani, an agglutinative Turkic language where limited labeled resources and morpheme-level polarity shifts hinder robust modeling across heterogeneous domains. We make three contributions: (1) we construct a manually annotated multi-domain corpus of 124,251 user comments collected from diverse platforms and labeled by native speakers with sentiment and domain tags; (2) we propose MahSA (Morpheme-Aware Hybrid Sentiment Analysis), a neuro-symbolic hybrid model that integrates SentiAzNet lexicon signals with morphology-aware cues, character n-grams, and transformer representations; and (3) through extensive experiments across five domains, we show that MahSA achieves 0.725 macro-F1, improving over XLM-RoBERTa fine-tuning (0.688) and attaining approximate parity with QLoRA-adapted LLM baselines such as Qwen 2.5 (0.718) and Llama 3.1 (0.709), while relying on a substantially smaller backbone at inference than LLM-based alternatives. The largest gains are observed in the smallest domain (Public Services) and on the neutral class, highlighting the benefits of lexicon- and morphology-aware modeling in low-resource, multi-domain sentiment analysis.