Turkish corpus on health sciences Saǧlik Bilimleri Türkçe Derlemi


Demir M. Ç., Sulubulut M. K., ARAL A.

11th Turkish National Software Engineering Symposium, UYMS 2017, Alanya, Türkiye, 18 - 20 Ekim 2017, cilt.1980, ss.304-311 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 1980
  • Basıldığı Şehir: Alanya
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.304-311
  • Anahtar Kelimeler: Health sciences Turkish corpus, Turkish corpus, Turkish language processing
  • Ankara Üniversitesi Adresli: Evet

Özet

Recently as a result of developments in data mining and machine learning fields, data becomes more important day by day and developed softwares rely on data collected from different sources. With the help of studies conducted in linguistics, softwares are now capable of processing natural language. Corpus based methods are one of the methods of natural language processing. In this work, a Turkish corpus aimed to be used in medical researches is introduced. The created Turkish corpus consists of lemmas, part of speech tags and morphological analysis of each word. Corpus contains only publicly available academic journals published in health sciences. Coverage of corpus is measured against the national health sciences database. Corpus is available for all researchers provided that this report is cited.