Possible issue with filters not working with Langchainrb and Pinecone (serverless)
sbraford opened this issue · comments
Hello. I'm guessing I am just doing something incorrectly here, but in the case of the remote possibility that there's an underlying issues with langchainrb, thought I'd post this here.
I've distilled down our code to the basics here. At the core, we are just trying to load a legislation's bill text into Pinecone, then pass that context to Chat GPT and generate a summary of the legislation. It seems the filtering is not working. Sometimes langchain returns a summary that has nothing to do with the filtered context provided, and instead clearly is getting context from another text that has been loaded into Pinecone. Other times, langchain says the bill's "content and implications are not provided in the context." We are using Pinecone's serverless feature (in case that makes a difference).
An example service object that loads the data into Pinecone:
class PineconeService
def load_into_pinecone
client.add_texts(
ids: [123],
texts: [bill_text],
metadata: metadata
)
end
# Full bill text omitted here for the sake of brevity. But in testing, we fully load it.
def bill_text
"Senate File 19 - Introduced SENATE FILE 19 An Act regarding donated leave by state employees ..."
end
def metadata
{ id: 123,
title: "Legislation text for bill SF19",
bill_number: "SF19",
}
end
def client
@client ||= Langchain::Vectorsearch::Pinecone.new(
api_key: ENV['PINECONE_API_KEY'],
index_name: "ia-development",
environment: "us-east-1-aws",
llm: llm
)
end
def llm
@llm ||= Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_ACCESS_TOKEN"])
end
end
And then the Langchain service which prompts langchain to provide the bill summary:
class LangchainService
def process_bill
client.ask(question: prompt, filter: pinecone_filter)
end
def prompt
"You are an expert at summarizing legislation. Please summarize the legislation in the provided context."
end
def pinecone_filter
{ bill_number: bill.bill_number }
end
def client
@client ||= Langchain::Vectorsearch::Pinecone.new(
api_key: ENV['PINECONE_API_KEY'],
index_name: "ia-development",
environment: "us-east-1-aws",
llm: llm
)
end
def llm
llm_options = { completion_model_name: "gpt-4-1106-preview", chat_completion_model_name: "gpt-4-1106-preview" }
@llm ||= Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_ACCESS_TOKEN"], default_options: llm_options)
end
end
Any pointers in the right direction would be greatly appreciated!
I think it has something to do with not chunking the text correctly before submitting it to Pinecone. I'll close this ticket if I figure it out on my own!
I'm guessing this isn't a langchainrb-related issue, since no one else has reported it before. I'm going to go ahead and close this.