activeloopai / deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

Home Page:https://activeloop.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] Langchain & Deeplake: SelfQueryRetriever Error on querying code

kaan9700 opened this issue · comments

Severity

None

Current Behavior

I have a deeplake vector database with code chunks of a project. According to an issue I want to find the corresponding code chunks. For this I have written a SelfQueryRetriever.
But it throws an error exactly when I mention an expression like 'train.py script' in the query. If I leave this out, I get no error. The whole thing is supposed to work automatically for all possible issues, so it is not possible to simply say to keep such expressions out of the issues.

Steps to Reproduce

def CustomRetriever(files, dataset_path,issue):

    metadata_field_info = [
        AttributeInfo(
            name="source",
            description="The soruce file the chunk was extracted from",
            type="string",
        ),
        AttributeInfo(
            name="file_name",
            description="The name of the file the chunk was extracted from",
            type="string",
        ),
        AttributeInfo(
            name="chunk_id",
            description="the id of the chunk",
            type="string",
        ),
    ]
    document_content_description = "The sourcecode of a project"
    model = ChatOpenAI(model="gpt-4")

    embeddings = OpenAIEmbeddings(disallowed_special=())
    db = DeepLake(dataset_path=dataset_path, read_only=True, embedding=embeddings, exec_option='python')
    docs = (db.similarity_search(query=" ", k=10000000))
    retriever = SelfQueryRetriever.from_llm(
        model, db, document_content_description, metadata_field_info, verbose=True
    )
    try:
        # Ihr Code, der den Fehler verursacht
        print('TEST', retriever.get_relevant_documents(
            f"Which documents contain code to resolve the following issue? -> {issue}"))
    except ValueError as e:


        print(traceback.format_exc())

Here is the error:

query='CNN instead of BERT model in train.py script, handle data better, generated using Tensorflow, integrated into logic, adapted to word vectors, change code' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='source', value='train.py'), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='file_name', value='train.py')]) limit=None

Traceback (most recent call last):
  File "/Users/kaanerbay/GitHub/Github_Issue_Solver/langchainLogic/retriever2.py", line 93, in CustomRetriever
    print('TEST', retriever.get_relevant_documents(
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/langchain/schema/retriever.py", line 208, in get_relevant_documents
    raise e
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/langchain/schema/retriever.py", line 201, in get_relevant_documents
    result = self._get_relevant_documents(
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/langchain/retrievers/self_query/base.py", line 135, in _get_relevant_documents
    docs = self.vectorstore.search(new_query, self.search_type, **search_kwargs)
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/langchain/vectorstores/base.py", line 121, in search
    return self.similarity_search(query, **kwargs)
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/langchain/vectorstores/deeplake.py", line 475, in similarity_search
    return self._search(
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/langchain/vectorstores/deeplake.py", line 348, in _search
    return self._search_tql(
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/langchain/vectorstores/deeplake.py", line 267, in _search_tql
    result = self.vectorstore.search(
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/deeplake/core/vectorstore/deeplake_vectorstore.py", line 429, in search
    utils.parse_search_args(
  File "/Users/kaanerbay/miniconda3/envs/main/lib/python3.10/site-packages/deeplake/core/vectorstore/vector_search/utils.py", line 229, in parse_search_args
    raise ValueError(
ValueError: User-specified TQL queries are not support for exec_option=python.

Here is the used issue:

a CNN should be used instead of the BERT model in the train.py script, because it can handle the type of data better.
The CNN should not be too complex, but also not too simple and should be generated using Tensorflow.
The CNN should be integrated into the logic and adapted according to the word vectors used. Change the code of it, as good as you can.

Expected/Desired Behavior

If you replace the expression 'train.py scripts' with for example 'training process', the error disappears and the query is executed correctly

Python Version

3.10.13

OS

MacOS Ventura 13.5.2

IDE

PyCharm

Packages

langchain==0.0.293, lark==1.1.7, deeplake==3.6.26

Additional Context

No response

Possible Solution

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR (Thank you!)

Hey @kaan9700!

Thanks for sharing the error? I am curious, have installed the latest deeplake version? Also have you installed deeplake[enterprise]. The problem is related to exec_option not being casted correctly, this either can be because you're using old deeplake version or haven't installed deeplake[enterprise].

To install deeplake[enterprise] please run the following command:

pip install 'deeplake[enterprise]'

Hey @adolkhan
i have already had deeplake[enterprise] installed. Unfortunately this is not the solution to this error. And I have the latest deeplake version installed.

I noticed that this error occurs when the query contains scripts like 'train.py' or 'package.json' in combination with text.

I see, thank you! will rerun the script and get back to you