ltgoslo / NorBERT

Large-scale language models for Norwegian

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NorBERT

This repository contains in-house code used in training and evaluating NorBERT-1 and NorBERT-2: large-scale Transformer-based language models for Norwegian. The models were trained by the Language Technology Group at the University of Oslo. The computations were performed on resources provided by UNINETT Sigma2 - the National Infrastructure for High Performance Computing and Data Storage in Norway.

For most of the training, BERT For TensorFlow from NVIDIA was used. We made minor changes to their code, see the patches_for_NVIDIA_BERT subdirectory.

NorBERT models training was conducted as a part of the NorLM project. Check this paper for more details:

Andrey Kutuzov, Jeremy Barnes, Erik Velldal, Lilja Øvrelid, Stephan Oepen. Large-Scale Contextualised Language Modelling for Norwegian, NoDaLiDa'21 (2021)

NorBERT-3

In 2023, we released a new family of NorBERT-3 language models for Norwegian. In general, we now recommend using these models:

NorBERT-3 is described in detail in this paper: NorBench – A Benchmark for Norwegian Language Models (Samuel et al., NoDaLiDa 2023)

Logo

About

Large-scale language models for Norwegian

License:Creative Commons Zero v1.0 Universal


Languages

Language:Python 77.0%Language:Shell 23.0%