Sign Language Video Synthesis using Skeleton Sequence

Gencoglu S., Keles H. Y.

28th Signal Processing and Communications Applications Conference (SIU), ELECTR NETWORK, 5 - 07 Ekim 2020, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/siu49456.2020.9302436
Basıldığı Ülke: ELECTR NETWORK
Anahtar Kelimeler: Generative adversarial networks, conditional generative adversarial networks, convolutional neural networks, video to video synthesis
Ankara Üniversitesi Adresli: Evet

Özet

Generative Adversarial Networks (GANs) enable generating realistic synthetic images. However, majority of the research in this domain focus on image-to-image synthesis problem. The aim of this study is to develop a model that encodes high quality video frames, with true motion dynamics, using only a reference image frame and a skeleton sequence. In this context, Ankara University Turkish Sign Language dataset is used to synthesize new sign videos using a given signer frame as a reference and a skeleton stream. To solve this challenging problem, a conditional generative adversarial network (GAN) is designed, where skeletal data is used as a condition. Using the trained model, we are able to generate sign video streams with the given signer, where the motion dynamics are successfully and fluently encoded in the video. Moreover, we evaluated the quality of the generated images using Frechet Inception Distance (FID) metric; the FID score is 26.