Vicuna-LangChain

A simple LangChain-like implementation based on Sentence Embedding+local knowledge base, with Vicuna (FastChat) serving as the LLM. Supports both Chinese and English, and can process PDF, HTML, and DOCX formats of documents as knowledge base.

一个简单的类LangChain实现，基于Sentence Embedding+本地知识库，以Vicuna作为生成模型。支持中英双语，支持pdf、html和docx格式的文档作为知识库。中文文档

Introduction

This is a very simple LangChain-like implementation. The detailed implementation is as follows:

Extract the text from the documents in the knowledge base folder and divide them into text chunks with sizes of chunk_length.
Obtain the embedding of each text chunk through the shibing624/text2vec-base-chinese model.
Calculate the cosine similarity between the embedding of the question and the embedding of each text chunk.
Return the top k text chunks with the highest cosine similarity and generate a prompt with those.
Replace the prompt history with the initial question
Generate response with Vicuna according to the prompt.

Demo

Tell me about the Wei Lun Hall at The University of Hong Kong (A student dormitory)

❌ Without knowledge base -> Fabricated response ✅ With knowledge base -> Factual responce

Install

Clone FastChat repository

git clone https://github.com/lm-sys/FastChat.git

Add the following content after append_message (line 124) of FastChat/fastchat/conversation.py

def correct_message(self, message):
        self.messages[-2][-1] = message

Switch to FastChat/, and install the modified FastChat

cd FastChat
pip install .

Obtain the weight of Vicuna v1.1 follow the instruction
Clone this repository and install the requirements

git clone https://github.com/HaxyMoly/Vicuna-LangChain.git
pip install -r requirements.txt

Create a folder named documents, and put your documents in it. Note that only PDF, HTML, and DOCX documents are supported.

cd Vicuna-LangChain
mkdir documents

Have fun!

# With knowledge base
python vicuna_cli.py --vicuna-dir /path/to/vicuna/weights --knowledge-base

# Without knowledge base
python vicuna_cli.py --vicuna-dir /path/to/vicuna/weights --knowledge-base

简介

本项目是一个简单的langchain-like实现，具体实现如下：

提取知识库文件夹中的文档文本，分割成chunk_length大小的文本块
通过shibing624/text2vec-base-chinese模型计算各文本块的嵌入
计算问题文本嵌入和各文本块的嵌入的余弦相似度
返回余弦相似度最高的k个文本作为给定信息生成prompt
将prompt历史替换为最初问的问题
将prompt交给vicuna模型生成答案

演示

你知道病毒和基因编辑有什么关系吗？

❌ 不带知识库： ✅ 带知识库：

安装

克隆FastChat仓库

git clone https://github.com/lm-sys/FastChat.git

在FastChat/fastchat/conversation.py的append_message函数（第124行）后面添加以下内容

def correct_message(self, message):
        self.messages[-2][-1] = message

进入FastChat/目录，安装修改后的FastChat

cd FastChat
pip install .

按照 Vicuna说明获得Vicuna v1.1的权重
克隆本仓库并安装依赖

git clone https://github.com/HaxyMoly/Vicuna-LangChain.git
pip install -r requirements.txt

创建一个名为documents的文件夹，将你要作为知识库的文档放进去。注意，仅支持PDF、HTML和DOCX格式的文档。

cd Vicuna-LangChain
mkdir documents

试试看吧

# 启用知识库
python vicuna_cli.py --vicuna-dir /path/to/vicuna/weights --knowledge-base

# 不启用知识库
python vicuna_cli.py --vicuna-dir /path/to/vicuna/weights --knowledge-base

About

A simple LangChain-like implementation based on Sentence Embedding+local knowledge base, with Vicuna (FastChat) serving as the LLM. Supports both Chinese and English, and can process PDF, HTML, and DOCX formats of documents as knowledge base.

Apache License 2.0

Languages

Language:Python 100.0%