Sign Language Video Synthesis using Skeleton Sequence


Gencoglu S., Keles H. Y.

28th Signal Processing and Communications Applications Conference (SIU), ELECTR NETWORK, 5 - 07 Ekim 2020, (Tam Metin Bildiri) identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/siu49456.2020.9302436
  • Basıldığı Ülke: ELECTR NETWORK
  • Anahtar Kelimeler: Generative adversarial networks, conditional generative adversarial networks, convolutional neural networks, video to video synthesis
  • Ankara Üniversitesi Adresli: Evet

Özet

Generative Adversarial Networks (GANs) enable generating realistic synthetic images. However, majority of the research in this domain focus on image-to-image synthesis problem. The aim of this study is to develop a model that encodes high quality video frames, with true motion dynamics, using only a reference image frame and a skeleton sequence. In this context, Ankara University Turkish Sign Language dataset is used to synthesize new sign videos using a given signer frame as a reference and a skeleton stream. To solve this challenging problem, a conditional generative adversarial network (GAN) is designed, where skeletal data is used as a condition. Using the trained model, we are able to generate sign video streams with the given signer, where the motion dynamics are successfully and fluently encoded in the video. Moreover, we evaluated the quality of the generated images using Frechet Inception Distance (FID) metric; the FID score is 26.