6th International Conference on Problems of Cybernetics and Informatics, PCI 2025, Baku, Azerbaycan, 25 - 28 Ağustos 2025, (Tam Metin Bildiri)
In Unix-like systems, users can interact with the operating system through a command-line interpreter such as Bash. Although the command-line interpreter provides a powerful interface to the computer, these commands can often be verbose and difficult to write. In this work, we evaluate the performance of several large language models (LLMs) in translating natural language prompts to bash commands. We investigate the results of providing feedback to the models via external tools, such as a Bash syntax analyzer. We also investigate an agentic approach where the models can make requests to use the tools. We use the NL2SH-ALFA dataset, and InterCode-ALFA benchmark to evaluate the results. We also investigate the comparative performance of using a non-English language for natural language prompts by creating a Turkish version of the test set.