Corpus

Overview

The SL-ReDu GSL corpus is an extensive RGB+D video collection of 21 informants with a duration of 36 hours, recorded under studio conditions suitable for GSL recognition, which covers the area of language education along with some general content. This database has been collected as part of the SL-ReDu project that focuses on the education use-case of systematic teaching of GSL as second language. The corpus contains three distinct RGB+D video subsets: (i) isolated signs; (ii) continuous signing; and (iii) fingerspelling.

signers — Sample RGB frames from the SL-ReDu GSL video data collection of 21 informants during signing.

Datasets statistics

Task	Signers	Unique content	Vocab. size	Avg. units /video	Videos	Frames	Duration (hrs:mins)
Isolated	21	369 signs	369 signs	1 sign	22,632	2,715,840	25:15
Continuous	21	799 sentences	294 glosses	2.86 glosses	5,930	889,500	8:24
Fingerspelling	21	950 words	24 letters	4.55 letters	1,554	234,360	2:17
Total	21	–	–	–	30,116	3,839,700	35:56

Download

Video data files and their annotations are available for download. Translation is also provided for the continuous phrases GSL corpus.

We also provide recommended data splits for training, validating, and testing the developed SLR models separately for each recognition task (isolated, continuous, fingerspelling), thus fostering comparable and reproducible research on the topic. Specifically, separately for each recognition task, the test set is kept identical under three different experimental frameworks, thus also allowing a fair comparison between these conditions. Namely:

MS: Multi-signer setting, where data from all signers are split between training, validation, and testing (a single fold is used).
SI: Signer-independent setting, where a 7-fold cross-validation framework is adopted. Each fold contains training and validation data from 18 signers, with testing performed on the remaining 3 (and the process repeating over all 7 folds to cover all signers).
SA: Signer-adapted setting, where a similar framework to the signer-independent scheme is used, but an additional set of adaptation data for the 3 test signers is introduced for each fold. This allows for adaptation experiments to be carried out. This adaptation set can be used as wished by the users of the database (e.g., for training and/or validation). Note that individual models may be adapted / tested per each of the 3 signers of any given fold.

In addition, the following split is also available:

MS2: A multi-signer setting with a more traditional data split ratio among the training, validation, and test sets (close to a 80%-10%-10% split), thus resulting in a smaller test set than the earlier MS split (again, a single fold is used).

Additional data splits may also be introduced in the future, following possible suggestions / requests by database users.

Publication

If you use this dataset, cite our work using the citation below:
@inproceedings{SL-REDU_Dataset23,
author = {K. Papadimitriou and G. Sapountzaki and K. Vasilaki and E. Efthimiou and S.-E. Fotinea and G. Potamianos},
title = {{SL-REDU GSL}: {A} Large Greek Sign Language Recognition Corpus},
booktitle={Proc. IEEE International Conference on Acoustics, Speech and Signal Processing Workshop on Sign Language Translation and Avatar Technology (ICASSPW-SLTAT)},
pages={1-5},
year = {2023},
doi={10.1109/ICASSPW59220.2023.10193306}}

Contact

For any queries regarding the dataset the contact emails are the following:
aipapadimitriou (at) uth (dot) gr
gpotam (at) ieee (dot) org