[Bug]: Knowledge not being returned from brain in latest release

Question

[Bug]: Knowledge not being returned from brain in latest release

Croccodoyle opened this issue 3 months ago · comments

What happened?

Latst build. Context is visible in doc in brain but I get following message... "2024-03-09 12:41:07 [INFO] modules.brain.knowledge_brain_qa [knowledge_brain_qa.py:107]: No source documents found or source_documents is not a list." Could be because I switched brain from Ollam to Open AI model? When I check knowledge_brain_qa.py I see that Import "langchain.callbacks.streaming_aiter" could not be resolvedPylancereportMissingImports. I also see Import "pydantic_settings" could not be resolvedPylancereportMissingImports

Relevant log output

No response

Twitter / LinkedIn details

No response

dosubot · Answer 1 · Sat Mar 09 2024 22:38:31 GMT+0800 (China Standard Time)

Based on the error message you're receiving, it seems like the source_documents parameter passed to the generate_source function might not be a list or is None. This error occurs when the source_documents parameter is either None or not a list. You can add a type check at the beginning of the generate_source function to ensure it only proceeds if source_documents is a list. Here's an example of how you can do this:

def generate_source(self, source_documents):
    if not isinstance(source_documents, list):
        raise TypeError("source_documents must be a list")
    # rest of the function

Regarding the unresolved imports of langchain.callbacks.streaming_aiter and pydantic_settings in the knowledge_brain_qa.py file, it seems like the modules or packages might not be installed in your environment, or there could be a typo in the module name.

For the langchain.callbacks.streaming_aiter import, ensure that the langchain library is correctly installed in your environment. If it's a custom library or a part of a larger framework not publicly available, check that the path to the library is correctly set in your PYTHONPATH. If it's a publicly available library, you might need to install it using pip (e.g., pip install langchain) or adjust the import statement according to the correct module structure.

For the pydantic_settings import, the correct import for Pydantic settings is from pydantic itself, assuming you're trying to use Pydantic's BaseSettings class for configuration management. If pydantic_settings is a custom module, ensure it exists and is correctly placed in your project structure. If you meant to use Pydantic's BaseSettings, the correct import statement should be from pydantic:

from pydantic import BaseSettings

Ensure that Pydantic is installed in your environment (pip install pydantic).

If pydantic_settings refers to a custom module not related to Pydantic's BaseSettings, verify that the module exists in your project and that the import path is correct.

Regarding the structure and expected format of the source_documents parameter in the generate_source function, it is expected to be an iterable collection of objects, where each object represents a document and contains a metadata attribute. This metadata attribute is a dictionary that may include keys such as original_file_name and file_name. The function processes these documents to generate a list of Sources objects, where each Sources object includes the document's name, type (url or file), source URL, and original file name.

Here's a breakdown of the expected structure based on the code:

Each item in source_documents should be an object that has a metadata dictionary.
The metadata dictionary should have:
- original_file_name: Used to determine if the document is a URL (starts with "http") and to get the document's name if it is a URL.
- file_name: Used to get the document's name if it is not a URL.
The function checks if original_file_name exists and starts with "http" to classify the document as a URL; otherwise, it is treated as a file.
For files, a file_path is constructed using brain_id and the file_name from metadata, and a signed URL is generated for accessing the file.

Example of a minimal source_documents item structure based on the description:

source_documents = [
    {
        "metadata": {
            "original_file_name": "http://example.com/document1.pdf",
            "file_name": "document1.pdf"
        }
    },
    {
        "metadata": {
            "file_name": "document2.pdf"
        }
    }
]

This example includes two documents, one treated as a URL and the other as a file, based on the presence and content of the original_file_name key.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources

backend/modules/brain/knowledge_brain_qa.py

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.