Enhanced sentence representation for extractive text summarization: Investigating the syntactic and semantic features and their contribution to sentence scoring


MUTLU BİLGE B., SEZER E.

Expert Systems with Applications, cilt.227, 2023 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 227
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1016/j.eswa.2023.120302
  • Dergi Adı: Expert Systems with Applications
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Computer & Applied Sciences, INSPEC, Metadex, Public Affairs Index, Civil Engineering Abstracts
  • Anahtar Kelimeler: Enhanced sentence representation, Extractive text summarization, Sentence scoring, Summarization corpora, Syntactic and semantic features
  • Ankara Üniversitesi Adresli: Evet

Özet

The primary challenge faced in extractive text summarization is related to the scoring of sentences, with the critical factor for scoring being the manner in which the sentence representation is conducted. This study aims to investigate this hypothesis and to perform a detailed analysis of the impact of sentence representation techniques that have been used both semantically and syntactically. The study initially evaluated the empirical impact of individual syntactic and semantic features on the accuracy of summarization. To examine syntactic usage, a comprehensive list of 40 syntactic features was developed, while semantic representation was accomplished using sentence embeddings. Subsequently, an improved feature set was proposed that jointly utilizes syntactic and semantic features. To assess the impact of this feature set on the resulting summaries, the proposed sentence representation was tested on three distinct summarization corpora consisting of lengthy scientific documents across diverse domains. The assessment of summary evaluation and classification performance evaluation metrics was conducted to evaluate the quality of the resulting summaries. The findings of the experiments indicated that the summaries generated by the proposed feature set performed better than not only those obtained using individual features but even summaries produced by state-of-the-art methods.