tensorlakeai / indexify

A realtime and indexing and structured extraction engine for Unstructured Data to build Generative AI Applications

Home Page:https://getindexify.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Returning sources for a RAG

tzumby opened this issue · comments

commented

Hi there,

I'm trying to return the content sources when running questions through a basic RAG.
I found this example in langchain that looks very similar to the way you're retrieving answers in the basic RAG example.

def format_docs(docs):
    print(doc.page_content)
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain_from_docs = (
    {"context": retriever, "question": RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))}
    | prompt
    | model
    | StrOutputParser()
)

When I try to print the context, I get a list of documents:

[Document(page_content="Content 1"), Document(page_content="Content 2")]

But when I add this:

rag_chain_with_source = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)

I get an TypeError: Object of type Document is not JSON serializable

I'm actively debugging this and I feel I'm missing something very basic, but any help would be much appreciated!

@tzumby Looking into this. Thanks for reporting.

@tzumby This looks like an open issue in Langchain and a workaround is mentioned here - langchain-ai/langchain#2222 (comment)

Please let me know if this solves your issue!

commented

Thank you @diptanu, I'm going to track this there. I'm taking a step back and trying to understand Langchain a bit more because even with the pointers there, I still can't get it to work. Once I figure this out, I can contribute with some docs to Indexify if you think it would fit in with the rest of the resources.