A comparative analysis based on feature selection methods and machine learning algorithms for evaluating heart disease prediction performance Kalp hastalığı tahmin performansının değerlendirilmesinde özellik seçimi yöntemlerine ve makine öğrenmesi algoritmalarına dayalı karşılaştırmalı bir analiz


Creative Commons License

Pervin Ö. A., Pehlivan N. Y., WEBER G.

Journal of the Faculty of Engineering and Architecture of Gazi University, vol.41, no.1, pp.253-269, 2026 (SCI-Expanded, Scopus, TRDizin) identifier

Abstract

Heart disease has become one of the leading causes of death worldwide in recent years. In this study, ML algorithms including KNN, SVM, DT, RF, and LR were employed to predict heart disease status. The classification performance of ML algorithms can be adversely affected by class imbalance and the presence of a large number of features in the dataset. Therefore, the SMOTE was applied to balance the dataset. To identify relevant features, feature selection methods including LASSO, ElasticNet, and LARS were utilized. Classification performance was evaluated using accuracy, precision, recall, F1-score, MCC, n-MCC, and ROC-AUC. Comparative analyses were conducted on a real-world dataset with and without the application of SMOTE and feature selection methods. According to the results, the highest accuracy (0.90), precision (0.89) and recall (0.90) are computed from the RF and LARS+KNN with SMOTE. The highest F1-score (0.90) is handled by the RF model with SMOTE. The highest n-MCC (0.94) and ROC-AUC (0.98) are obtained from the LARS+KNN with SMOTE. It has been observed that the performance of all ML algorithms considered in the study increases significantly when SMOTE is used to address the class imbalance problem and feature selection methods are employed to eliminate irrelevant features.