There are 41 repositories under language-model topic.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
Code and documentation to train Stanford's Alpaca models, and generate the data.
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
:mag: Haystack is an open source NLP framework to interact with your data using Transformer models and LLMs (GPT-3 and alike). Haystack offers production-ready tools to quickly build ChatGPT-like question answering, semantic search, text generation, and more.
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
NeMo: a toolkit for conversational AI
A PyTorch-based Speech Toolkit
Google AI 2018 BERT pytorch implementation
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
An open source implementation of CLIP.
ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
Implementation of BERT that could load official pre-trained models for feature extraction and prediction
A curated list of pretrained sentence and word embedding models
Library to scrape and clean web pages to create massive datasets.
LSTM and QRNN Language Model Toolkit for PyTorch
Russian GPT3 models.
C++ Implementation of PyTorch Tutorials for Everyone
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
The implementation of DeBERTa
🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI
Self-contained Machine Learning and Natural Language Processing library in Go
Pre-trained Chinese ELECTRA（中文ELECTRA预训练模型）
General technology for enabling AI capabilities w/ LLMs and MLLMs
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
curated collection of papers for the nlp practitioner 📖👩🔬
Korean BERT pre-trained cased (KoBERT)
Cramming the training of a (BERT-type) language model into limited compute.
Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN