VETERINARY MEDICINE AND SCIENCE, cilt.11, sa.5, 2025 (SCI-Expanded, Scopus)
Reproductive efficiency is a crucial determinant of livestock productivity, with sperm quality being a key factor in successful fertilization. The quantitative assessment of spermatozoa using computer-assisted sperm analysis (CASA) yields valuable kinetic variables that can vary across cattle breeds. This study aimed (i) to classify post-thawed semen samples from Holstein, Simmental and Charolais bulls based on eight CASA-derived variables, progressive motility (PM), non-PM, velocity curve linear (VCL), velocity straight line (VSL), beat-cross frequency (BCF), amplitude of lateral head displacement (ALH), hyperactivity and velocity average path (VAP); (ii) to benchmark three tree-based classifiers, C5.0, random forest (RF) and stochastic gradient boosting (SGB), for their ability to assign ejaculates to the correct breed; and (iii) to identify the most informative predictors for breed discrimination within the algorithms. We applied and compared the predictive performance of three tree-based classification algorithms: C5.0, RF and SGB after the original dataset was randomly divided into the training and testing sets with 70%-30%, 75%-25% and 80%-20% ratios, respectively. Parameter tuning was carried out with the application of a 10-fold cross-validation technique with ten times repetition. The results showed that SGB achieved the highest performance for classification, with a mean balanced accuracy of 85.7% (86.4% for Holstein, 84.3% for Simmental and 86.5% for Charolais), followed by RF (83.5%) and C5.0 (73.5%). PM, hyperactivity and VSL were the most informative predictors. The results offer insights into breed-specific sperm characteristics, with potential implications for the development of breed-specific calibrations for CASA and ensure more efficient resource allocation in livestock production.