Beast code in Giters

zsy23's starred repositories

LLM-Pretrain-FineTune

Deepspeed、LLM、Medical_Dialogue、医疗大模型、预训练、微调

Language:Python21800

MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。

Language:PythonApache-2.0299600

Huatuo-Llama-Med-Chinese

Repo for BenTsao [original name: HuaTuo (华驼)], Instruction-tuning Large Language Models with Chinese Medical Knowledge. 本草（原名：华驼）模型仓库，基于中文医学知识的大语言模型指令微调

Language:PythonApache-2.0440600

Repository of DISC-MedLLM, it is a comprehensive solution that leverages Large Language Models (LLMs) to provide accurate and truthful medical response in end-to-end conversational healthcare services.

Language:PythonApache-2.045200

RadFM

The official code for "Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data".

Language:Python31100

XrayGLM

🩺 首个会看胸部X光片的中文多模态医学大模型 | The first Chinese Medical Multimodal Model that Chest Radiographs Summarization.

Language:PythonNOASSERTION84800

LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

Language:PythonMIT342900

awesome-deep-rl

A curated list of awesome Deep Reinforcement Learning resources.

MIT64000

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION436100

DISC-FinLLM

DISC-FinLLM，中文金融大语言模型（LLM），旨在为用户提供金融场景下专业、智能、全面的金融咨询服务。DISC-FinLLM, a Chinese financial large language model (LLM) designed to provide users with professional, intelligent, and comprehensive financial consulting services in financial scenarios.

Language:PythonApache-2.051500

DISC-LawLLM

DISC-LawLLM, an intelligent legal system utilizing large language models (LLMs) to provide a wide range of legal services

Language:PythonApache-2.048700

dataset

医学影像数据集列表『An Index for Medical Imaging Datasets』

234400

MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Language:PythonBSD-3-Clause2516000

TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Language:PythonApache-2.0736100

PayloadsAllTheThings

A list of useful payloads and bypass for Web Application Security and Pentest/CTF

Language:PythonMIT5849000

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

1070300

TigerBot

TigerBot: A multi-language multi-task LLM

Language:PythonApache-2.0222400

books

【编程随想】收藏的电子书清单（多个学科，含下载链接）

CC0-1.01788300

MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

MIT323200

bookcorpus

Crawl BookCorpus

Language:PythonMIT79600

ebooks

收藏的一些经典的历史、政治、心理、哲学、数学、计算机方面电子书(约10万本）

Language:JavaScript311500

LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Language:PythonApache-2.02664500

LLMDataHub

A quick guide (especially) for trending instruction finetuning datasets

MIT225400

spider

scripts and baselines for Spider: Yale complex and cross-domain semantic parsing and text-to-SQL challenge

Language:PythonApache-2.075900

test-suite-sql-eval

Semantic Evaluation for Text-to-SQL with Distilled Test Suites

Language:PythonApache-2.018500

llama-recipes

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

Language:Jupyter Notebook1050600

zsy23