Machine learning approaches for binary classification of sorghum (Sorghum bicolor L.) seeds from image color features


Çiftci B., ÇETİN N., Günaydın S., KAPLAN M.

Journal of Food Composition and Analysis, cilt.140, 2025 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 140
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1016/j.jfca.2025.107208
  • Dergi Adı: Journal of Food Composition and Analysis
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Analytical Abstracts, BIOSIS, Biotechnology Research Abstracts, CAB Abstracts, Food Science & Technology Abstracts, Veterinary Science Database
  • Anahtar Kelimeler: Color, Genotype classification, Image analysis, Machine learning, Sorghum seed
  • Ankara Üniversitesi Adresli: Evet

Özet

Innovative approaches to seed classification for breeding and quality assessment simplify the process, reduce labor time, and inspire the design of grading machines and planters. This study aims to investigate machine learning-based binary classification for five sorghum genotypes based on their color attributes. Six machine learning models (multilayer perceptron, MLP; support vector machine, SVM; k-nearest neighbors kNN, random forest, RF; extreme gradient boosting, XGBoost and light gradient boosting machine, LightGBM) were created to evaluate the classification performance. As a result, the most successful models were k-nearest neighbors (k-NN) and Multilayer Perceptron (MLP), with an accuracy of 95.2 % for all color channels. Although the accuracy results of these two models were similar, the PRC Area and ROC Area values of MLP were higher. In all pairs, the sorghum seed genotypes of PI-2 from the other genotypes were discriminated, with the most outstanding accuracies being 100.0 % for all models. According to the confusion matrix, PI-4 followed the genotype pairs of PI-3 (99 out of 100 in the true class). The lowest accuracy was PI-1 and PI-5, with a value of 86.5 % by the k-NN model. In the k-NN model, the TPR was obtained as 0.920 for PI-1 and 0.810 for PI-5, and ROC Area was determined as 0.865 for both genotypes. The findings indicate that MLP and k-NN models are suitable and unbiased methods for classifying different genotypes of sorghum seeds. The results of this study contribute to the design of automatic classification machinery, seed breeding studies, improving feed quality efficiency and food safety by increasing the traceability of seeds.