Possible issue with filters not working with Langchainrb and Pinecone (serverless)

Question

Possible issue with filters not working with Langchainrb and Pinecone (serverless)

sbraford opened this issue 3 months ago · comments

Hello. I'm guessing I am just doing something incorrectly here, but in the case of the remote possibility that there's an underlying issues with langchainrb, thought I'd post this here.

I've distilled down our code to the basics here. At the core, we are just trying to load a legislation's bill text into Pinecone, then pass that context to Chat GPT and generate a summary of the legislation. It seems the filtering is not working. Sometimes langchain returns a summary that has nothing to do with the filtered context provided, and instead clearly is getting context from another text that has been loaded into Pinecone. Other times, langchain says the bill's "content and implications are not provided in the context." We are using Pinecone's serverless feature (in case that makes a difference).

An example service object that loads the data into Pinecone:

class PineconeService

  def load_into_pinecone
    client.add_texts(
      ids: [123],
      texts: [bill_text],
      metadata: metadata
    )
  end

  # Full bill text omitted here for the sake of brevity. But in testing, we fully load it.
  def bill_text
    "Senate File 19 - Introduced SENATE FILE 19 An Act regarding donated leave by state employees ..."
  end

  def metadata
    { id: 123,
      title: "Legislation text for bill SF19",
      bill_number: "SF19",
    }
  end

  def client
    @client ||= Langchain::Vectorsearch::Pinecone.new(
      api_key: ENV['PINECONE_API_KEY'],
      index_name: "ia-development",
      environment: "us-east-1-aws",
      llm: llm
    )
  end

  def llm
    @llm ||= Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_ACCESS_TOKEN"])
  end

end

And then the Langchain service which prompts langchain to provide the bill summary:

class LangchainService

  def process_bill
    client.ask(question: prompt, filter: pinecone_filter)
  end

  def prompt
    "You are an expert at summarizing legislation. Please summarize the legislation in the provided context."
  end

  def pinecone_filter
    { bill_number: bill.bill_number }
  end

  def client
    @client ||= Langchain::Vectorsearch::Pinecone.new(
      api_key: ENV['PINECONE_API_KEY'],
      index_name: "ia-development",
      environment: "us-east-1-aws",
      llm: llm
    )
  end

  def llm
    llm_options = { completion_model_name: "gpt-4-1106-preview", chat_completion_model_name: "gpt-4-1106-preview" }
    @llm ||= Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_ACCESS_TOKEN"], default_options: llm_options)
  end

end

Any pointers in the right direction would be greatly appreciated!

Shanti Braford · Answer 1 · Tue May 07 2024 05:04:17 GMT+0800 (China Standard Time)

I think it has something to do with not chunking the text correctly before submitting it to Pinecone. I'll close this ticket if I figure it out on my own!

Shanti Braford · Answer 2 · Wed May 08 2024 03:51:44 GMT+0800 (China Standard Time)

I'm guessing this isn't a langchainrb-related issue, since no one else has reported it before. I'm going to go ahead and close this.