A cluster tree based model selection approach for logistic regression classifier

Tanju, ÖZGE; KALAYLIOĞLU AKYILDIZ, ZEYNEP

doi:10.1080/00949655.2018.1437442

A cluster tree based model selection approach for logistic regression classifier

Tanju Ö., KALAYLIOĞLU AKYILDIZ Z. I.

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, cilt.88, sa.7, ss.1394-1414, 2018 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 88 Sayı: 7
Basım Tarihi: 2018
Doi Numarası: 10.1080/00949655.2018.1437442
Dergi Adı: JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.1394-1414
Anahtar Kelimeler: Model selection, logistic regression, classification, clustering similarity measures, INFORMATION CRITERION, NONLINEAR-REGRESSION, ALGORITHMS
Ankara Üniversitesi Adresli: Evet

Özet

Model selection methods are important to identify the best approximating model. To identify the best meaningful model, purpose of the model should be clearly pre-stated. The focus of this paper is model selection when the modelling purpose is classification. We propose a new model selection approach designed for logistic regression model selection where main modelling purpose is classification. The method is based on the distance between the two clustering trees. We also question and evaluate the performances of conventional model selection methods based on information theory concepts in determining best logistic regression classifier. An extensive simulation study is used to assess the finite sample performances of the cluster tree based and the information theoretic model selection methods. Simulations are adjusted for whether the true model is in the candidate set or not. Results show that the new approach is highly promising. Finally, they are applied to a real data set to select a binary model as a means of classifying the subjects with respect to their risk of breast cancer.