CLARIN.SI

CLARIN.SI's repositories

classla

CLASSLA Fork of the Official Stanford NLP Python Library for Many Human Languages

Language:PythonNOASSERTION40 4 45

mte-msd

MULTEXT-East morphosyntactic specifications

Language:HTML10 4 3

babushka-bench

Benchmarking NLP tools on Slovene, Croatian and Serbian

Language:Python7 8 2

parlaspeech

Code for bootstrapping ASR datasets from parliamentary recordings and transcripts

Language:Jupyter NotebookApache-2.05 4 1

reldi-tokeniser

A two-mode (standard, nonstandard) tokeniser for South Slavic languages

Language:PythonApache-2.05 4 3

STARK

Language:PythonApache-2.04 70

benchich

BENCHić - the benchmark for Bosnian, Croatian, Montenegrin, Serbian (and friends)

Language:Python2 4 1

dialect-copa

Data for the DIALECT-COPA unshared task of dialectal causal common-sense reasoning

2 4 1

Slovenian-Language-Technologies-Overview

An ever-expanding overview of the knowledge on large language models (LLMs), speech technologies, and other NLP technologies for Slovenian language.

200

TEI-schema

Recommended TEI schema for CLARIN.SI resources, cf. also https://clarinsi.github.io/TEI-schema/

Language:XSLT2 40

cordex

Language:PythonMIT1 6 2

dialect-copa-zero

Language:Python1 40

drevesnik

Web portal for searching and displaying syntacically annotated corpora

Language:JavaScript1 50

slobench-eval-docker

Repository for SloBench evaluation docker images

Language:Perl1 20

Slovene_normalizator

Slovene text normalization tool

Language:PythonApache-2.01 60

clarin-dspace

LINDAT/CLARIN digital repository based on DSpace

Language:JavaBSD-3-Clause05 41

senta-dizajn

Language:TypeScriptApache-2.0040

slovene_g2p

A converter that converts Slovene words to their IPA and/or SAMPA transcriptions.

Language:PythonApache-2.0040

CLARINprojekt2024-koreferencnost

Language:PythonMIT000

classla-resources

Apache-2.0060

classla-training

Training scripts for the CLASSLA pipeline

Language:PythonApache-2.0060

conllu-diff

Language:PythonApache-2.0040

gigafida_segmentacija

Language:Python040

hbs_features

Tool for extracting linguistic features with highest (known) variation among the HBS standards

Language:Python030

mezzanine_resources

Repo for tracking resources for the Mezzanine project

030

parlasent_analysis

Code for ParlaSent research note

Language:Jupyter NotebookGPL-3.0030

ROG

Language:Elixir000

rsdo_gos

Software for the GOS corpus of spoken Slovenian

Language:C#050

swell-editor

Editor for normalising learner texts (error annotation and tagging.)

Language:TypeScriptMIT020

trankit-train

Language:PythonMIT060