Boyuan Deng's starred repositories
Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
speechbrain
A PyTorch-based Speech Toolkit
open_llama
OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
Baichuan-7B
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
VisualGLM-6B
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
llm-numbers
Numbers every LLM developer should know
Baichuan-13B
A 13B large language model developed by Baichuan Intelligent Technology
langchain-serve
⚡ Langchain apps in production using Jina & FastAPI
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
torchsparse
[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
setuptools_scm
the blessed package to manage your versions by scm tags
scikit-build
Improved build system generator for CPython C, C++, Cython and Fortran extensions
vector-search-class-notes
Class notes for the course "Long Term Memory in AI - Vector Search and Databases" COS 597A @ Princeton Fall 2023
PromptCBLUE
PromptCBLUE: a large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain in Chinese
hai-platform
一种任务级GPU算力分时调度的高性能深度学习训练平台
scikit-build-core
A next generation Python CMake adaptor and Python API for plugins
open-lid-dataset
Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)
hai-platform-studio
配合 HAI Platform 使用的集成化用户界面