Comparison of Tree-Based Machine Learning Algorithms for Classification of Livestock Breeds Based On Post-Thaw Spermatological Parameters


Özen D., Ozen H., Gul E. B., Olğaç K. T., Tekin K., Tırpan M. B., ...Daha Fazla

VETERINARY MEDICINE AND SCIENCE, cilt.11, sa.5, 2025 (SCI-Expanded, Scopus) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 11 Sayı: 5
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1002/vms3.70539
  • Dergi Adı: VETERINARY MEDICINE AND SCIENCE
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Ankara Üniversitesi Adresli: Evet

Özet

Reproductive efficiency is a crucial determinant of livestock productivity, with sperm quality being a key factor in successful fertilization. The quantitative assessment of spermatozoa using computer-assisted sperm analysis (CASA) yields valuable kinetic variables that can vary across cattle breeds. This study aimed (i) to classify post-thawed semen samples from Holstein, Simmental and Charolais bulls based on eight CASA-derived variables, progressive motility (PM), non-PM, velocity curve linear (VCL), velocity straight line (VSL), beat-cross frequency (BCF), amplitude of lateral head displacement (ALH), hyperactivity and velocity average path (VAP); (ii) to benchmark three tree-based classifiers, C5.0, random forest (RF) and stochastic gradient boosting (SGB), for their ability to assign ejaculates to the correct breed; and (iii) to identify the most informative predictors for breed discrimination within the algorithms. We applied and compared the predictive performance of three tree-based classification algorithms: C5.0, RF and SGB after the original dataset was randomly divided into the training and testing sets with 70%-30%, 75%-25% and 80%-20% ratios, respectively. Parameter tuning was carried out with the application of a 10-fold cross-validation technique with ten times repetition. The results showed that SGB achieved the highest performance for classification, with a mean balanced accuracy of 85.7% (86.4% for Holstein, 84.3% for Simmental and 86.5% for Charolais), followed by RF (83.5%) and C5.0 (73.5%). PM, hyperactivity and VSL were the most informative predictors. The results offer insights into breed-specific sperm characteristics, with potential implications for the development of breed-specific calibrations for CASA and ensure more efficient resource allocation in livestock production.