Energy-time modelling of distributed multi-population genetic algorithms with dynamic workload in HPC clusters


Escobar J. J., Sánchez-Cuevas P., Prieto B., SAVRAN KIZILTEPE R., Díaz-del-Río F., Kimovski D.

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, cilt.167, 2025 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 167
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1016/j.future.2025.107753
  • Dergi Adı: FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
  • Anahtar Kelimeler: Distributed computing, Energy–time modelling, Genetic algorithms, Heterogeneous clusters, Parameter optimisation, Task scheduling
  • Ankara Üniversitesi Adresli: Hayır

Özet

Time and energy efficiency is a highly relevant objective in high-performance computing systems, with high costs for executing the tasks. Among these tasks, evolutionary algorithms are of consideration due to their inherent parallel scalability and usually costly fitness evaluation functions. In this respect, several scheduling strategies for workload balancing in heterogeneous systems have been proposed in the literature, with runtime and energy consumption reduction as their goals. Our hypothesis is that a dynamic workload distribution can befitted with greater precision using metaheuristics, such as genetic algorithms, instead of linear regression. Therefore, this paper proposes anew mathematical model to predict the energy-time behaviour of applications based on multi-population genetic algorithms, which dynamically distributes the evaluation of individuals among the CPU-GPU devices of heterogeneous clusters. An accurate predictor would save time and energy by selecting the best resource set before running such applications. The estimation of the workload distributed to each device has been carried out by simulation, while the model parameters have been fitted in a two-phase run using another genetic algorithm and the experimental energy-time values of the target application as input. When the new model is analysed and compared with another based on linear regression, the one proposed in this work significantly improves the baseline approach, showing normalised prediction errors of 0.081 for runtime and 0.091 for energy consumption, compared to 0.213 and 0.256 shown in the baseline approach.