Archives and Records, 2025 (AHCI, Scopus)
This study explores the application of Artificial Intelligence (AI)-driven technologies to improve the accessibility and usability of historical audio archives through the integration of Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) techniques. A dataset of 14 historical and oral history recordings, sourced from the Library of Congress and the Smithsonian Institution, was used to evaluate three ASR models: Whisper larger-v3, OWSM v3.1, and MMS-1b-all. Comparative analysis based on Word Error Rate (WER) revealed that Whisper larger-v3 consistently achieved the lowest error rates, while the open-source OWSM v3.1 demonstrated competitive performance, positioning it as a viable transparent alternative. The multilingual MMS-1b-all exhibited higher WERs, highlighting the trade-off between language coverage and domain-specific optimization. Downstream Named Entity Recognition (NER) and automated text summarization were conducted on the transcribed outputs, with human evaluations confirming high levels of informativeness, fluency, and faithfulness. This integrated approach underscores the potential of AI technologies to automate metadata generation, enhance the accessibility of archival content, and support more efficient archival workflows. Furthermore, the study addresses ethical considerations and proposes strategies for future integration of AI-generated metadata into existing cataloguing systems, offering valuable insights for archivists and information professionals engaged in digital preservation and heritage management.