A benchmark of expert-level academic questions to assess AI capabilities


Creative Commons License

Hendrycks D., Mazeika M., Zhang O., Hausenloy J., Ren R., Kim R., ...Daha Fazla

Nature, cilt.649, sa.8099, ss.1139-1146, 2026 (SCI-Expanded, Scopus) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 649 Sayı: 8099
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1038/s41586-025-09962-4
  • Dergi Adı: Nature
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, Chemical Abstracts Core, EMBASE, Geobase, INSPEC, MEDLINE, MLA - Modern Language Association Database, Psycinfo, zbMATH, Nature Index
  • Sayfa Sayıları: ss.1139-1146
  • Ankara Üniversitesi Adresli: Evet

Özet

Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve more than 90% accuracy on popular benchmarks such as Measuring Massive Multitask Language Understanding1, limiting informed measurement of state-of-the-art LLM capabilities. Here, in response, we introduce Humanity’s Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be an expert-level closed-ended academic benchmark with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable but cannot be quickly answered by internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a marked gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.