Evaluating accuracy and concordance of pathologists and the utility of AI assistance software for digital HER2 IHC assessment in breast cancer including HER2-ultralow scoring: An international multicenter observational study


Haab G. A., Cheng C., Quang L., Soliman M., Koo J., Tuncel E., ...Daha Fazla

Journal of Clinical Oncology, cilt.43, sa.16, 2025 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 43 Sayı: 16
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1200/jco.2025.43.16_suppl.1078
  • Dergi Adı: Journal of Clinical Oncology
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CINAHL, EMBASE, MEDLINE, Nature Index
  • Ankara Üniversitesi Adresli: Evet

Özet

Background: The emergence of novel therapeutic agents demonstrating improved progression-free survival (PFS) and overall survival (OS) in breast cancer patients with low HER2 expression underscores the need for accurate and reproducible HER2 status assessment. However, challenges such as subjective interpretation of immunohistochemistry (IHC) staining and variability in assay quality hinder diagnostic consistency. AI-based decision support software could enhance diagnostic accuracy and reproducibility. To date, systematic evaluation of pathologist performance in scoring low HER2 expression, as well as the role of AI assistance, remains limited in real-world, multicenter settings. Methods: Six academic centers from different countries provided digital HER2 IHC-stained breast cancer images (n = 728) generated with five whole-slide scanner models and one microscope camera. In a two-arm observational study, consensus ground truth (GT) scores were established by two expert pathologists per center without AI assistance. Subsequently, two additional pathologists (scorers) evaluated each case both without and with AI support. Scoring followed ASCO/CAP 2023 HER2 interpretation guidelines, with an additional subclassification of IHC 0 cases into “null” (IHC 0 with no staining) and “ultralow” (IHC 0 with membrane staining). Results: For the HER2-low decision range, AI software alone achieved 91.0% accuracy in distinguishing HER2 0 from 1+/2+/3+ scores against GT. Across the four categories, AI achieved 80.3% accuracy compared to 77.6% for scorers alone and 81.4% with AI assistance. AI support improved inter-reader agreement from 73.5% to 86.4%. When the HER2 ultralow category was included, AI assistance increased scorers’ average accuracy across all classes from 70.4% to 74.7% and boosted inter-reader agreement from 65.6% to 80.6%. For differentiating HER2 null from HER2 ultralow, AI improved scorers’ accuracy from 68.6% to 77.9%, resulting in 40% more cases being classified as HER2 ultralow and 65% reduction in the number of incorrectly scored HER2 null cases. Conclusions: This first international multicenter study on HER2 IHC diagnosis, including HER2 ultralow scoring highlights the challenges faced by pathologists and the significant benefits of AI decision-support systems in real-world settings. AI assistance improved pathologist concordance and accuracy, particularly at the HER2 null vs. ultralow boundary, reducing diagnostic errors. Incorporating AI into routine clinical diagnostics has the potential to optimize treatment selection for breast cancer patients. Research Sponsor: AstraZeneca.