VI. International Applied Statistics Congress (UYIK – 2025), Ankara, Türkiye, 14 - 16 Mayıs 2025, sa.1252, ss.174, (Özet Bildiri)
In this study, we have analyzed credit risk data containing some features of cardholders from a major bank in Taiwan from 2005. This dataset includes 24 characteristics, such as payment information, demographic factors, credit information, bill statements, etc., of 30,000 card clients. The aim of the study is to bring about accurate predictive models by analyzing the variables that determine whether a cardholder is classified as credible or non-credible based on their risk factors. To do this, we have utilized supervised machine learning algorithms, such as Logistic Regression, Nearest Neighbors, Support Vector Machines, Decision Trees, Random Forest, AdaBoost, Gradient Boosting, Naive Bayes, etc. First, after the data pre-processing, we have used feature selection methods to solve problems such as multicollinearity in the data. Then, we have randomly divided the data into two parts. The first and second parts have been used for training the model and testing the prediction success, respectively. We have obtained fitted models according to supervised machine learning algorithms. For classification algorithms, hyperparameter values that give the best accuracy value have been identified by using the Grid Search technique. Also, we have used the K-fold Cross Validation procedure to evaluate the modeling performance of the algorithms. Finally, we have compared the models in terms of the classification accuracy measures.