[Bug] rag功能不是很好使,请指点下

Question

[Bug] rag功能不是很好使,请指点下

sparkssssssss opened this issue a month ago · comments

sparkssssssss commented a month ago

Bug Description

没有如作者图中调用插件;
存储中可以看到向量文件,但是都是乱码

对话如下

pdf内容如下

Steps to Reproduce

暂无

Expected Behavior

暂无

Screenshots

No response

Deployment Method

Docker
Vercel
Server

Desktop OS

No response

Desktop Browser

No response

Desktop Browser Version

No response

Smartphone Device

No response

Smartphone OS

No response

Smartphone Browser

No response

Smartphone Browser Version

No response

Additional Logs

No response

Issues-translate-bot · Answer 1 · Wed Jun 05 2024 16:38:04 GMT+0800 (China Standard Time)

Bot detected the issue body's language is not English, translate it automatically.

Title: [Bug] The rag function is not very easy to use, please give me some advice.

Hk-Gosuto · Answer 2 · Wed Jun 05 2024 19:15:41 GMT+0800 (China Standard Time)

要不你把 pdf 发出来我试试？

Issues-translate-bot · Answer 3 · Wed Jun 05 2024 19:15:54 GMT+0800 (China Standard Time)

Bot detected the issue body's language is not English, translate it automatically.

How about you send the pdf so I can try it?

sparkssssssssss · Answer 4 · Wed Jun 05 2024 20:59:17 GMT+0800 (China Standard Time)

我感觉不是pdf的问题,我之前传过几次,也有正常显示的,但是我从来没有出现过调用插件rag那个提示,所以想在请教下,正确的姿势?
目前我尝试过:
1-关闭所有插件,无效
2-关闭所有插件,除了那个pdf查看的,无效
3-不管插件,无效
我使用的是最新night的镜像

020-GB_T 33133.2-2021 信息安全技术祖冲之序列密码算法第2部分：保密性算法.pdf

Issues-translate-bot · Answer 5 · Wed Jun 05 2024 20:59:30 GMT+0800 (China Standard Time)

Bot detected the issue body's language is not English, translate it automatically.

I don’t think it’s a problem with PDF. I’ve uploaded it several times before and it was displayed normally. However, I’ve never had the prompt to call the plug-in rag, so I’d like to ask for advice on the correct posture?
So far I've tried:
1-Close all plug-ins, invalid
2-Close all plug-ins, except the one for PDF viewing, which is invalid.
3- Regardless of the plug-in, it is invalid
I am using the latest night image

020-GB_T 33133.2-2021 Information Security Technology Zu Chongzhi Sequence Cipher Algorithm Part 2: Confidentiality Algorithm.pdf

Hk-Gosuto · Answer 6 · Thu Jun 06 2024 20:44:10 GMT+0800 (China Standard Time)

我试了以下这个pdf，该pdf文件应该做了特殊处理，直接在文件中复制出的文本也是乱码，这种文件没办法正确解析出文本的。

Issues-translate-bot · Answer 7 · Thu Jun 06 2024 20:44:23 GMT+0800 (China Standard Time)

Bot detected the issue body's language is not English, translate it automatically.

I tried the following PDF. This PDF file should have been specially processed. The text copied directly from the file is also garbled. This kind of file cannot correctly parse the text.

sparkssssssss · Answer 8 · Fri Jun 07 2024 13:07:15 GMT+0800 (China Standard Time)

我试了以下这个pdf，该pdf文件应该做了特殊处理，直接在文件中复制出的文本也是乱码，这种文件没办法正确解析出文本的。

我随便上一个pdf,生成了向量文件,但是问不出所以然

Issues-translate-bot · Answer 9 · Fri Jun 07 2024 13:07:28 GMT+0800 (China Standard Time)

Bot detected the issue body's language is not English, translate it automatically.

I tried the following PDF. This PDF file should have been specially processed. The text copied directly from the file is also garbled. This kind of file cannot correctly parse the text. ![image](https://private-user-images.githubusercontent.com/14031260/337267597-1b67047c-e8f2-43fd-b1f5-02437945d995.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3Mi OiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTc3MzY5ODYsIm5iZiI6MTcxNzczNjY4NiwicGF0aCI6Ii8xNDAzMTI2MC 8zMzcyNjc1OTctMWI2NzA0N2MtZThmMi00M2ZkLWIxZjUtMDI0Mzc5NDVkOTk1LnBuZz9YLUFtei1 BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUz UFFLNFpBJTJGMjAyNDA2MDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjA3VDA1MDQ0NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPW Q3NWNhMmM1MjI4MjcxODAwMmQwNmZkY2ZiMTMyZWFhNDAzNDM4MDIwNGVkMWZlMDQzYzA3MDUxOTJkNTdlY2UmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmV wb19pZD0wIn0.Ws3KWs67hYuWKlQZEPLYOoRxSEREFKt-JkRt0RdaWQA)

I uploaded a random pdf and generated a vector file, but I can’t figure out why.

Hk-Gosuto · Answer 10 · Tue Jun 11 2024 21:57:39 GMT+0800 (China Standard Time)

由于现在是基于插件的形式进行检索的，所以需要引导 gpt 使用 rag-search 才会返回相关上下文。
这个月事情有点多，等待后面的优化吧。

Issues-translate-bot · Answer 11 · Tue Jun 11 2024 21:57:51 GMT+0800 (China Standard Time)

Bot detected the issue body's language is not English, translate it automatically.

Since the retrieval is now based on a plug-in, you need to guide gpt to use rag-search to return the relevant context.
There are a lot of things going on this month, let’s wait for the optimization later.