Overview
The SL-ReDu GSL corpus is an extensive RGB+D video collection of 21 informants with a duration of 36 hours, recorded under studio conditions suitable for GSL recognition, which covers the area of language education along with some general content. This database has been collected as part of the SL-ReDu project that focuses on the education use-case of systematic teaching of GSL as second language. The corpus contains three distinct RGB+D video subsets: (i) isolated signs; (ii) continuous signing; and (iii) fingerspelling.
Datasets statistics
Task | Signers | Unique content | Vocab. size | Avg. units /video | Videos | Frames | Duration (hrs:mins) |
---|---|---|---|---|---|---|---|
Isolated | 21 | 369 signs | 369 signs | 1 sign | 22,632 | 2,715,840 | 25:15 |
Continuous | 21 | 799 sentences | 294 glosses | 2.86 glosses | 5,930 | 889,500 | 8:24 |
Fingerspelling | 21 | 950 words | 24 letters | 4.55 letters | 1,554 | 234,360 | 2:17 |
Total | 21 | – | – | – | 30,116 | 3,839,700 | 35:56 |
Download
Video data files and their annotations are available for download. Translation is also provided for the continuous phrases GSL corpus.
We also provide recommended data splits for training, validating, and testing the developed SLR models separately for each recognition task (isolated, continuous, fingerspelling), thus fostering comparable and reproducible research on the topic. Specifically, separately for each recognition task, the test set is kept identical under three different experimental frameworks, thus also allowing a fair comparison between these conditions. Namely:
- MS: Multi-signer setting, where data from all signers are split between training, validation, and testing (a single fold is used).
- SI: Signer-independent setting, where a 7-fold cross-validation framework is adopted. Each fold contains training and validation data from 18 signers, with testing performed on the remaining 3 (and the process repeating over all 7 folds to cover all signers).
- SA: Signer-adapted setting, where a similar framework to the signer-independent scheme is used, but an additional set of adaptation data for the 3 test signers is introduced for each fold. This allows for adaptation experiments to be carried out. This adaptation set can be used as wished by the users of the database (e.g., for training and/or validation). Note that individual models may be adapted / tested per each of the 3 signers of any given fold.
- MS2: A multi-signer setting with a more traditional data split ratio among the training, validation, and test sets (close to a 80%-10%-10% split), thus resulting in a smaller test set than the earlier MS split (again, a single fold is used).
Publication
If you use this dataset, cite our work using the citation below:
@inproceedings{SL-REDU_Dataset23,
author = {K. Papadimitriou and G. Sapountzaki and K. Vasilaki and E. Efthimiou and S.-E. Fotinea and G. Potamianos},
title = {{SL-REDU GSL}: {A} Large Greek Sign Language Recognition Corpus},
booktitle={Proc. IEEE International Conference on Acoustics, Speech and Signal Processing Workshop on Sign Language Translation and Avatar Technology (ICASSPW-SLTAT)},
pages={1-5},
year = {2023},
doi={10.1109/ICASSPW59220.2023.10193306}}
Contact
For any queries regarding the dataset the contact emails are the following:
aipapadimitriou (at) uth (dot) gr
gpotam (at) ieee (dot) org