zj360202 / models

A collection of large language models for natural language generation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Models

LLMs for NLG (in German)

This repository provides an comprehensive overview of available large language models (LLM) for natural language generation (NLG). Of particular interest to us are models with German language capabilities.

Name

Size

Model Card

License

Implementation

Paper

GPT-J

6B

EleutherAI/gpt-j-6B

MIT, Weights: Apache 2.0

https://github.com/kingoflolz/mesh-transformer-jax/

-

BLOOMZ

560M

1.1B

1.7B

3B

7.1B

176B

bigscience/bloomz-560m

bigscience/bloomz-1b1

bigscience/bloomz-1b7

bigscience/bloomz-3b

bigscience/bloomz-7b1

bigsience/bloomz

RAIL

https://github.com/bigscience-workshop/xmtf

Muennighoff, Niklas, et al. Crosslingual generalization through multitask finetuning. (2022). DOI: https://doi.org/10.48550/arXiv.2211.01786

BLOOM german

350M

1.5B

6.4B

malteos/bloom-350m-german

malteos/bloom-1b5-clp-german

malteos/bloom-6b4-clp-german

RAIL

https://github.com/malteos/clp-transfer

Ostendorff, Malte; Rehm, Georg. Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning. (2023). DOI: https://doi.org/10.48550/arXiv.2301.09626

OPT

125M

350M

1.3B

2.7B

6.7B

13B

30B

66B

facebook/opt-125m

facebook/opt-350m

facebook/opt-1.3b

facebook/opt-2.7b

facebook/opt-6.7b

facebook/opt-13b

facebook/opt-30b

facebook/opt-66b

OPT-LICENSE

https://github.com/facebookresearch/metaseq

Zhang, Susan, et al. Opt: Open pre-trained transformer language models. (2022). DOI: https://doi.org/10.48550/arXiv.2205.01068

FLAN-T5

80M

250M

780M

3B

11B

google/flan-t5-small

google/flan-t5-base

google/flan-t5-large

google/flan-t5-xl

google/flan-t5-xxl

Apache 2.0

https://github.com/google-research/t5x

Chung, Hyung Won, et al. Scaling instruction-finetuned language models. (2022). DOI: https://doi.org/10.48550/arXiv.2210.11416

MT0

300M

580M

1.2B

3.7B

13B

bigscience/mt0-small

bigscience/mt0-base

bigscience/mt0-large

bigscience/mt0-xl

bigscience/mt0-xxl

Apache 2.0

https://github.com/bigscience-workshop/xmtf

Muennighoff, Niklas, et al. Crosslingual generalization through multitask finetuning. (2022). DOI: https://doi.org/10.48550/arXiv.2211.01786

GPT2

117M

117M

1.5B

?

?

?

benjamin/gpt2-wechsel-german

malteos/gpt2-wechsel-german-ds-meg

malteos/gpt2-xl-wechsel-german

benjamin/gerpt2

benjamin/gerpt2-large

dbmdz/german-gpt2

MIT

https://github.com/CPJKU/wechsel https://github.com/bminixhofer/gerpt2

Minixhofer, Benjamin, et al. WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models. (2021). DOI: http://dx.doi.org/10.18653/v1/2022.naacl-main.293

mGPT

1.3B

sberbank-ai/mGPT

Apache 2.0

https://github.com/ai-forever/mgpt

Shliazhko, Oleh, et al. mgpt: Few-shot learners go multilingual. (2022). DOI: https://doi.org/10.48550/arXiv.2204.07580

MT5

300M

580M

1.2B

3.7B

13B

google/mt5-small

google/mt5-base

google/mt5-large

google/mt5-xl

google/mt5-xxl

Apache 2.0

https://github.com/google-research/multilingual-t5

Xue, Linting, et al. mT5: A massively multilingual pre-trained text-to-text transformer. (2020). DOI: https://doi.org/10.18653/v1/2021.naacl-main.41

About

A collection of large language models for natural language generation.

License:MIT License


Languages

Language:Jupyter Notebook 100.0%