Dynamic speaker localization based on a novel lightweight R–CNN model


ÇATALBAŞ M. C., Dobrisek S.

Neural Computing and Applications, cilt.35, sa.14, ss.10589-10603, 2023 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 35 Sayı: 14
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1007/s00521-023-08251-3
  • Dergi Adı: Neural Computing and Applications
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Applied Science & Technology Source, Biotechnology Research Abstracts, Compendex, Computer & Applied Sciences, Index Islamicus, INSPEC, zbMATH
  • Sayfa Sayıları: ss.10589-10603
  • Anahtar Kelimeler: Sound source localization, Deep regression network, R-CNN, GCC-PHAT, TDOA, SOUND SOURCE LOCALIZATION
  • Ankara Üniversitesi Adresli: Evet

Özet

© 2023, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.In this study, a novel sound localization approach is proposed that provides 3D coordinates of the real moving speaker. Sound recordings of a real user indoor environment were used for the proposed study. Four conventional microphones simultaneously recorded speech signals as the user moved between 14 predetermined locations. For extracting environment noise from recorded sound signals and accurately determining the origin of speech, z-score-based peak detection approach is used. The delays between acquired speech signals are calculated with the generalized cross-correlation phase transform approach. The determined delays are transformed into a special distance matrix, and each of these matrices is assigned to a particular speaker location in 3D space. A novel lightweight convolutional neural network-based deep regression network structure was constructed in order to learn the relationship between these distance matrices and real 3D location information. As a result, the sound localization problem has been transformed from an iterative solution to an innovative regression problem structure. With the low-cost traditional microphone structures and hardware used in this approach, the position of moving speaker is determined with high accuracy compared to the particle swarm optimization-based time difference of arrival approach. According to the performance comparison, the average localization deviation of 45.826 cm obtained in the time difference of arrival-based sound source localization approach was reduced to 16.298 cm in the proposed approach.