A supervised learning approach for detecting erroneous samples in embeddings


Saygili G.

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, cilt.28, sa.5, ss.2883-2894, 2020 (SCI-Expanded, Scopus, TRDizin) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 28 Sayı: 5
  • Basım Tarihi: 2020
  • Doi Numarası: 10.3906/elk-1909-162
  • Dergi Adı: TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.2883-2894
  • Anahtar Kelimeler: Dimensionality reduction, error estimation, t-SNE, random forests, matching, DIMENSIONALITY REDUCTION
  • Ankara Üniversitesi Adresli: Evet

Özet

Visualizing multidimensional data has been a crucial task in recent years regarding the growing amount of data from various sources. To achieve this, dimensionality reduction algorithms have been used to reduce the number of dimensions for visualization of the data on a screen. However, these algorithms may fail to faithfully represent high dimensional data in lower dimensions and eventually lead to erroneous visualizations. In this work, we propose an error detection algorithm for dimensionality reduction algorithms based on recently developed error prediction algorithms for medical image registration. The proposed algorithm matches the neighborhoods of high and low dimensional data with different similarity measures and predicts the errors using a random forest classifier. The results on three datasets show that the proposed algorithm can successfully detect errors with an accuracy up to 86% and area under the curve score of 0.81.