Returning sources for a RAG

Question

Returning sources for a RAG

tzumby opened this issue 3 months ago · comments

Hi there,

I'm trying to return the content sources when running questions through a basic RAG.
I found this example in langchain that looks very similar to the way you're retrieving answers in the basic RAG example.

def format_docs(docs):
    print(doc.page_content)
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain_from_docs = (
    {"context": retriever, "question": RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))}
    | prompt
    | model
    | StrOutputParser()
)

When I try to print the context, I get a list of documents:

[Document(page_content="Content 1"), Document(page_content="Content 2")]

But when I add this:

rag_chain_with_source = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)

I get an TypeError: Object of type Document is not JSON serializable

I'm actively debugging this and I feel I'm missing something very basic, but any help would be much appreciated!

Diptanu Choudhury · Answer 1 · Wed Mar 13 2024 10:43:41 GMT+0800 (China Standard Time)

@tzumby Looking into this. Thanks for reporting.

Diptanu Choudhury · Answer 2 · Wed Mar 13 2024 10:50:17 GMT+0800 (China Standard Time)

@tzumby This looks like an open issue in Langchain and a workaround is mentioned here - langchain-ai/langchain#2222 (comment)

Please let me know if this solves your issue!

Raz · Answer 3 · Thu Mar 14 2024 01:28:05 GMT+0800 (China Standard Time)

Thank you @diptanu, I'm going to track this there. I'm taking a step back and trying to understand Langchain a bit more because even with the pointers there, I still can't get it to work. Once I figure this out, I can contribute with some docs to Indexify if you think it would fit in with the rest of the resources.