TRAITEMENT DU SIGNAL, sa.5, ss.2673-2681, 2024 (SCI-Expanded)
Generative large language models (LLM) are trained for performing natural language processing (NLP) tasks but are known to have emergent properties that can go beyond generating trained text-based language responses. Recently, LLMs have been further augmented with multimodal capabilities such as image annotations and analysis. In this study, we aimed to investigate LLMs in terms of perceptual visual complexity analysis ability through evaluating graphical user interfaces. For this purpose, visual complexity evaluation of user interfaces (UI), which is a non-trivial task, was addressed to explore the possible roles and capabilities of the LLMs in this task. ChatGPT-4 and Bard, two of the most advanced multi modal LLMs, were explored and a comparative evaluation was conducted. According to this exploration, the two LLMs were able to evaluate the visual complexity of different input user interfaces and rank these regarding to their visual complexities. Although LLMs ranking were mostly similar to each other, relatively high differences with the user evaluation-based rankings were observed.