CLARIN.SI's repositories
babushka-bench
Benchmarking NLP tools on Slovene, Croatian and Serbian
parlaspeech
Code for bootstrapping ASR datasets from parliamentary recordings and transcripts
reldi-tokeniser
A two-mode (standard, nonstandard) tokeniser for South Slavic languages
dialect-copa
Data for the DIALECT-COPA unshared task of dialectal causal common-sense reasoning
Slovenian-Language-Technologies-Overview
An ever-expanding overview of the knowledge on large language models (LLMs), speech technologies, and other NLP technologies for Slovenian language.
TEI-schema
Recommended TEI schema for CLARIN.SI resources, cf. also https://clarinsi.github.io/TEI-schema/
slobench-eval-docker
Repository for SloBench evaluation docker images
Slovene_normalizator
Slovene text normalization tool
clarin-dspace
LINDAT/CLARIN digital repository based on DSpace
slovene_g2p
A converter that converts Slovene words to their IPA and/or SAMPA transcriptions.
classla-training
Training scripts for the CLASSLA pipeline
hbs_features
Tool for extracting linguistic features with highest (known) variation among the HBS standards
mezzanine_resources
Repo for tracking resources for the Mezzanine project
parlasent_analysis
Code for ParlaSent research note
swell-editor
Editor for normalising learner texts (error annotation and tagging.)