Jason's Lab's repositories
ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
CIF-HieraDist
[INTERSPEECH 2023] Knowledge Transfer from Pre-trained Language Models to Cif-based Recognizers via Hierarchical Distillation
damaihelper
支持大麦网,淘票票、缤玩岛等多个平台,演唱会演出抢票脚本
denoising-diffusion-pytorch
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models.
funNLP
NLP tips
gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
grok-1
Grok open release
hello-algo
《Hello 算法》:动画图解、一键运行的数据结构与算法教程,支持 Java, C++, Python, Go, JS, TS, C#, Swift, Rust, Dart, Zig 等语言。
InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
LaVIT
LaVIT: Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
Llama2-Chinese
Llama中文社区,最好的中文Llama大模型,完全开源可商用
LLaSM
第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。
LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
LLM-Conversation-Safety
[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
MedicalGPT
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
MetaGPT
🌟 The Multi-Agent Framework: Given one line Requirement, return PRD, Design, Tasks, Repo
MoE-LLaVA
Mixture-of-Experts for Large Vision-Language Models
patchelf
A small utility to modify the dynamic linker and RPATH of ELF executables
peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
RAG-Survey
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
stable-diffusion
A latent text-to-image diffusion model
stable-diffusion-webui
Stable Diffusion web UI
VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks