Turkish corpus on health sciences Saǧlik Bilimleri Türkçe Derlemi

Demir M. Ç., Sulubulut M. K., ARAL A.

11th Turkish National Software Engineering Symposium, UYMS 2017, Alanya, Türkiye, 18 - 20 Ekim 2017, cilt.1980, ss.304-311, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası: 1980
Basıldığı Şehir: Alanya
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.304-311
Anahtar Kelimeler: Health sciences Turkish corpus, Turkish corpus, Turkish language processing
Ankara Üniversitesi Adresli: Evet

Özet

Recently as a result of developments in data mining and machine learning fields, data becomes more important day by day and developed softwares rely on data collected from different sources. With the help of studies conducted in linguistics, softwares are now capable of processing natural language. Corpus based methods are one of the methods of natural language processing. In this work, a Turkish corpus aimed to be used in medical researches is introduced. The created Turkish corpus consists of lemmas, part of speech tags and morphological analysis of each word. Corpus contains only publicly available academic journals published in health sciences. Coverage of corpus is measured against the national health sciences database. Corpus is available for all researchers provided that this report is cited.