Morpheme-Aware Hybrid Sentiment Analysis for Azerbaijani: A Lexicon–Embedding Fusion with Domain Adaptation

Aykac, Yusuf; ÖZKAN, MERVE; SAMET, REFİK

doi:10.1109/access.2026.3659042

Morpheme-Aware Hybrid Sentiment Analysis for Azerbaijani: A Lexicon–Embedding Fusion with Domain Adaptation

Aykac Y. E., ÖZKAN M., SAMET R.

IEEE Access, cilt.14, ss.18828-18856, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 14
Basım Tarihi: 2026
Doi Numarası: 10.1109/access.2026.3659042
Dergi Adı: IEEE Access
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Sayfa Sayıları: ss.18828-18856
Anahtar Kelimeler: Azerbaijani, domain adaptation, low-resource languages, morphology-aware classification, neutral-class calibration, parameter-efficient fine-tuning, polarity lexicon, Sentiment analysis
Ankara Üniversitesi Adresli: Evet

Özet

Sentiment analysis for morphologically rich, low-resource languages remains challenging, particularly in three-class settings (negative/neutral/positive) and under domain shift. We study Azerbaijani, an agglutinative Turkic language where limited labeled resources and morpheme-level polarity shifts hinder robust modeling across heterogeneous domains. We make three contributions: (1) we construct a manually annotated multi-domain corpus of 124,251 user comments collected from diverse platforms and labeled by native speakers with sentiment and domain tags; (2) we propose MahSA (Morpheme-Aware Hybrid Sentiment Analysis), a neuro-symbolic hybrid model that integrates SentiAzNet lexicon signals with morphology-aware cues, character n-grams, and transformer representations; and (3) through extensive experiments across five domains, we show that MahSA achieves 0.725 macro-F1, improving over XLM-RoBERTa fine-tuning (0.688) and attaining approximate parity with QLoRA-adapted LLM baselines such as Qwen 2.5 (0.718) and Llama 3.1 (0.709), while relying on a substantially smaller backbone at inference than LLM-based alternatives. The largest gains are observed in the smallest domain (Public Services) and on the neutral class, highlighting the benefits of lexicon- and morphology-aware modeling in low-resource, multi-domain sentiment analysis.