Hybrid model approach in data mining


BAKIRARAR B., Cosgun E., ELHAN A. H.

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2023 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1080/03610918.2023.2168012
  • Dergi Adı: COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Business Source Elite, Business Source Premier, CAB Abstracts, Compendex, Computer & Applied Sciences, Veterinary Science Database, zbMATH, Civil Engineering Abstracts
  • Anahtar Kelimeler: Data mining, Hybrid models, Machine learning, Performance metrics, Supervised learning algorithms
  • Ankara Üniversitesi Adresli: Evet

Özet

Studies on hybrid data mining approach has been increasing in recent years. Hybrid data mining is defined as an effective combination of various data mining techniques to use the power of each technique and compensate for each other's weaknesses. The purpose of this study is to present state-of-the-art data mining algorithms and applications and to propose a new hybrid data mining approach for classifying medical data. In addition, in the study, it was aimed to calculate performance metrics of data mining methods and to compare these metrics with the metrics obtained from the hybrid model. The study utilized simulated datasets produced on the basis of various scenarios and hepatitis dataset obtained from the UCI database. Supervised learning algorithms were used. In addition, hybrid models were created by combining these algorithms. In simulated datasets, it was observed that MCC values increased with a higher sample size and higher correlation between the independent variables. In addition, as the correlation between independent variables increased in imbalanced datasets, a noticeable increase was observed in the performance metrics of the group with lower sample size. A similar case was observed with the actual datasets.