Welcome to the Language Resource Management Agency of SADiLaR. This repository provides access to all of the collections, data sets, tools and other language resources that are distributed by SADiLaR.

The repository will eventually replace all of the functionality of the original RMA site, with all of the resources available from the RMA, also available from this repository.

Select a community to browse its collections.

Language Resource Management Agency [526]
  • Ex Machina: Using NLP and statistical learning models to model eyewitness statements and choosing behaviour 

    Nortje, Alicia, et al. (Sadilar, 2019-05-07)
    This curated database includes data from various of empirical studies where eyewitness statements and descriptions were collected. The original studies, ...
  • Autshumato English-Tshivenḓa Parallel Corpora 

    McKellar, Cindy (North-West University; Centre for Text Technology (CTexT), 2023-12-12)
    Aligned parallel corpora for the following language pair: English-Tshivenḓa. Data was crawled from various multilingual government websites, sourced ...
  • Autshumato Monolingual Tshivenḓa Corpus 

    McKellar, Cindy (North-West University; Centre for Text Technology (CTexT), 2023-12-12)
    Monolingual corpus for Tshivenḓa. The data is given as a single UTF-8 text file, with each segment on a newline.
  • Morphologically annotated corpus for isiNdebele 

    Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
    NCHLT corpus of morphologically annotated tokens in isiNdebele converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data ...
  • Morphologically annotated corpus for isiXhosa 

    Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
    NCHLT corpus of morphologically annotated tokens in isiXhosa converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
  • Morphologically annotated corpus for isiZulu 

    Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
    NCHLT corpus of morphologically annotated tokens in isiZulu converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
  • Morphologically annotated corpus for Siswati 

    Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
    NCHLT corpus of morphologically annotated tokens in Siswati converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
  • Morphologically annotated corpus for Sesotho 

    Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
    NCHLT corpus of morphologically annotated tokens in Sesotho converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
  • Morphologically annotated corpus for Sepedi 

    Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
    NCHLT corpus of morphologically annotated tokens in Sepedi converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...
  • Morphologically annotated corpus for Setswana 

    Gaustad, Tanja (Centre for Text Technology (CTexT), 2024-01-31)
    NCHLT corpus of morphologically annotated tokens in Setswana converted to the tags used during phases 1 and 2 of the SADiLaR-II project. The data is ...

View more