IEEE Access, cilt.14, ss.18828-18856, 2026 (SCI-Expanded, Scopus)
Sentiment analysis for morphologically rich, low-resource languages remains challenging, particularly in three-class settings (negative/neutral/positive) and under domain shift. We study Azerbaijani, an agglutinative Turkic language where limited labeled resources and morpheme-level polarity shifts hinder robust modeling across heterogeneous domains. We make three contributions: (1) we construct a manually annotated multi-domain corpus of 124,251 user comments collected from diverse platforms and labeled by native speakers with sentiment and domain tags; (2) we propose MahSA (Morpheme-Aware Hybrid Sentiment Analysis), a neuro-symbolic hybrid model that integrates SentiAzNet lexicon signals with morphology-aware cues, character n-grams, and transformer representations; and (3) through extensive experiments across five domains, we show that MahSA achieves 0.725 macro-F1, improving over XLM-RoBERTa fine-tuning (0.688) and attaining approximate parity with QLoRA-adapted LLM baselines such as Qwen 2.5 (0.718) and Llama 3.1 (0.709), while relying on a substantially smaller backbone at inference than LLM-based alternatives. The largest gains are observed in the smallest domain (Public Services) and on the neutral class, highlighting the benefits of lexicon- and morphology-aware modeling in low-resource, multi-domain sentiment analysis.