A deep learning architecture for analyzing and predicting customer churn data in e-commerce

Msallam, Mohammed; AR, YILMAZ; DEMİR, SALİH; TUĞRUL, BÜLENT

doi:10.7717/peerj-cs.3800

A deep learning architecture for analyzing and predicting customer churn data in e-commerce

Msallam M. M., AR Y., DEMİR S., TUĞRUL B.

PeerJ Computer Science, cilt.12, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 12
Basım Tarihi: 2026
Doi Numarası: 10.7717/peerj-cs.3800
Dergi Adı: PeerJ Computer Science
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, Directory of Open Access Journals
Anahtar Kelimeler: Big data, Churn analysis, SHapley Additive exPlanations (SHAP), Synthetic minority oversampling technique (SMOTE)
Ankara Üniversitesi Adresli: Evet

Özet

E-commerce companies face fierce competition. For e-commerce companies to succeed, they must keep and attract customers by offering them the best affordable services. Taking appropriate and timely actions to keep customers likely to churn is a top priority for e-commerce companies. This study analyzes an online e-commerce dataset and uses deep learning to build a model to predict customer churn. The proposed model has been trained and tested on a dataset published at Kaggle and evaluated based on various performance metrics. Due to the nature of the data set, the distribution of the classes is unbalanced. The experimental results show that the proposed architecture achieved the highest accuracy (94.25%) using the imbalanced training strategy. Further, the Synthetic Minority Oversampling Technique (SMOTE) was used to balance the class label distribution. Similar experiments were repeated on the balanced dataset to observe changes in performance metrics values. While the SMOTE-based model does not improve overall accuracy, it achieves higher recall values, indicating that potential churn customers are identified more precisely. Finally, we calculated SHapley Additive Explanations (SHAP) values to assess the model's interpretability and the impact of each feature on the prediction outcome.