Returning sources for a RAG
tzumby opened this issue · comments
Hi there,
I'm trying to return the content sources when running questions through a basic RAG.
I found this example in langchain that looks very similar to the way you're retrieving answers in the basic RAG example.
def format_docs(docs):
print(doc.page_content)
return "\n\n".join(doc.page_content for doc in docs)
rag_chain_from_docs = (
{"context": retriever, "question": RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))}
| prompt
| model
| StrOutputParser()
)
When I try to print the context, I get a list of documents:
[Document(page_content="Content 1"), Document(page_content="Content 2")]
But when I add this:
rag_chain_with_source = RunnableParallel(
{"context": retriever, "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)
I get an TypeError: Object of type Document is not JSON serializable
I'm actively debugging this and I feel I'm missing something very basic, but any help would be much appreciated!
@tzumby Looking into this. Thanks for reporting.
@tzumby This looks like an open issue in Langchain and a workaround is mentioned here - langchain-ai/langchain#2222 (comment)
Please let me know if this solves your issue!
Thank you @diptanu, I'm going to track this there. I'm taking a step back and trying to understand Langchain a bit more because even with the pointers there, I still can't get it to work. Once I figure this out, I can contribute with some docs to Indexify if you think it would fit in with the rest of the resources.