11th Turkish National Software Engineering Symposium, UYMS 2017, Alanya, Türkiye, 18 - 20 Ekim 2017, cilt.1980, ss.304-311
Recently as a result of developments in data mining and machine learning fields, data becomes more important day by day and developed softwares rely on data collected from different sources. With the help of studies conducted in linguistics, softwares are now capable of processing natural language. Corpus based methods are one of the methods of natural language processing. In this work, a Turkish corpus aimed to be used in medical researches is introduced. The created Turkish corpus consists of lemmas, part of speech tags and morphological analysis of each word. Corpus contains only publicly available academic journals published in health sciences. Coverage of corpus is measured against the national health sciences database. Corpus is available for all researchers provided that this report is cited.