%0 Journal Article
%T ZS-GR: zero-shot gesture recognition from RGB-D videos
%A Razieh Rastgoo
%A Kourosh Kiani
%A Sergio Escalera
%J Multimedia Tools and Applications
%D 2023
%V 82
%F Razieh Rastgoo2023
%O HUPBA
%O exported from refbase (http://158.109.8.37/show.php?record=3879), last updated on Tue, 06 Feb 2024 15:10:49 +0100
%X Gesture Recognition (GR) is a challenging research area in computer vision. To tackle the annotation bottleneck in GR, we formulate the problem of Zero-Shot Gesture Recognition (ZS-GR) and propose a two-stream model from two input modalities: RGB and Depth videos. To benefit from the vision Transformer capabilities, we use two vision Transformer models, for human detection and visual features representation. We configure a transformer encoder-decoder architecture, as a fast and accurate human detection model, to overcome the challenges of the current human detection models. Considering the human keypoints, the detected human body is segmented into nine parts. A spatio-temporal representation from human body is obtained using a vision Transformer and a LSTM network. A semantic space maps the visual features to the lingual embedding of the class labels via a Bidirectional Encoder Representations from Transformers (BERT) model. We evaluated the proposed model on five datasets, Montalbano II, MSR Daily Activity 3D, CAD-60, NTU-60, and isoGD obtaining state-of-the-art results compared to state-of-the-art ZS-GR models as well as the Zero-Shot Action Recognition (ZS-AR).
%U https://link.springer.com/article/10.1007/s11042-023-15112-7
%P 43781-43796