YanShuang17

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

Language:Jupyter NotebookNOASSERTION10831 88 299

QAnything

Question and Answer based on Anything.

Language:PythonApache-2.010827 97 346

Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

Language:PythonApache-2.07019 77 387

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Language:PythonMIT6206 38 893

text2vec

text2vec, text to vector. 文本向量表征工具，把文本转化为向量矩阵，实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型，开箱即用。

Language:PythonApache-2.04295 30 146

Baichuan2

A series of large language models developed by Baichuan Intelligent Technology

Language:PythonApache-2.04041 40 387

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonApache-2.03535 33 1157

SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

Language:PythonMIT3325 27 265

BERT-NER-Pytorch

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Language:PythonMIT2040 13 104

EasyNLP

EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit

Language:PythonApache-2.02011 36 122

RAG-Survey

1596 29 15

entity-recognition-datasets

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

Language:PythonMIT1464 41 13

Awesome-Text2SQL

Curated tutorials and resources for Large Language Models, Text2SQL, Text2DSL、Text2API、Text2Vis and more.

MIT1423 16 4

BCEmbedding

Netease Youdao's open-source embedding and reranker models for RAG products.

Language:PythonApache-2.01239 9 68

OpenBuddy

Open Multilingual Chatbot for Everyone

Apache-2.01213 24 66

pymilvus

Python SDK for Milvus.

Language:PythonApache-2.0955 19 832

rank_bm25

A Collection of BM25 Algorithms in Python

Language:PythonApache-2.0930 10 31

Chinese-LlaMA2

Repo for adapting Meta LlaMA2 in Chinese! META最新发布的LlaMA2的汉化版！（完全开源可商用）

Language:Python748 17 11

LongBench

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

Language:PythonMIT556 6 63

EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Language:PythonApache-2.0555 9 36

PIXIU

This repository introduces PIXIU, an open-source resource featuring the first financial large language models (LLMs), instruction tuning data, and evaluation benchmarks to holistically assess financial LLMs. Our goal is to continually push forward the open-source development of financial artificial intelligence (AI).

Language:Jupyter NotebookMIT465 7 9

YanShuang17

Yanshuang's starred repositories

llama

LLaMA-Factory

docker_practice

jina

unsloth

ragflow

llama-recipes