MaartenGr / KeyBERT

Minimal keyword extraction with BERT

Home Page:https://MaartenGr.github.io/KeyBERT/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KeyLLM error with bedrock model

HannaHUp opened this issue · comments

Hi
I'm using model in bedrock in sagemaker for this keyword extraction
I'm trying to implement this https://maartengr.github.io/KeyBERT/guides/keyllm.html#4-efficient-keyllm

It mentions here that we can use langchain for any LLm. So I used bedrock claude. But I got error.

Here is my code:

from langchain.chains.question_answering import load_qa_chain
from langchain.llms.bedrock import Bedrock

llm = Bedrock(model_id="anthropic.claude-v2", client=bedrock_runtime, model_kwargs={'max_tokens_to_sample':200})

# bedrock_embeddings = BedrockEmbeddings(client=bedrock_runtime)
chain = load_qa_chain(llm, chain_type = "stuff")

from keybert.llm import LangChain
from keybert import KeyLLM
# Create your LLM
llm = LangChain(chain)

# Load it in KeyLLM
kw_model = KeyLLM(llm)

documents = ["Travelers, truckers and shippers worldwide can thank the U.S. for helping keep a lid on oil and gasoline prices this year."]
keywords = kw_model.extract_keywords(documents)
keywords

Here is my error:

AttributeError Traceback (most recent call last)
Cell In[50], line 2
1 ### 1. Create Keywords with KeyLLM
----> 2 keywords = kw_model.extract_keywords(documents)

File /opt/conda/lib/python3.10/site-packages/keybert/_llm.py:126, in KeyLLM.extract_keywords(self, docs, check_vocab, candidate_keywords, threshold, embeddings)
123 keywords = [in_cluster_keywords[index] for index in range(len(docs))]
124 else:
125 # Extract keywords using a Large Language Model (LLM)
--> 126 keywords = self.llm.extract_keywords(docs, candidate_keywords)
128 # Only extract keywords that appear in the input document
129 if check_vocab:

File /opt/conda/lib/python3.10/site-packages/keybert/llm/_langchain.py:100, in LangChain.extract_keywords(self, documents, candidate_keywords)
98 prompt = prompt.replace("[CANDIDATES]", ", ".join(candidates))
99 input_document = Document(page_content=document)
--> 100 keywords = self.chain.run(input_documents=input_document, question=self.prompt).strip()
101 keywords = [keyword.strip() for keyword in keywords.split(",")]
102 all_keywords.append(keywords)

File /opt/conda/lib/python3.10/site-packages/langchain/chains/base.py:512, in Chain.run(self, callbacks, tags, metadata, *args, **kwargs)
507 return self(args[0], callbacks=callbacks, tags=tags, metadata=metadata)[
508 _output_key
509 ]
511 if kwargs and not args:
--> 512 return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[
513 _output_key
514 ]
516 if not kwargs and not args:
517 raise ValueError(
518 "run supported with either positional arguments or keyword arguments,"
519 " but none were provided."
520 )

File /opt/conda/lib/python3.10/site-packages/langchain/chains/base.py:312, in Chain.call(self, inputs, return_only_outputs, callbacks, tags, metadata, run_name, include_run_info)
310 except BaseException as e:
311 run_manager.on_chain_error(e)
--> 312 raise e
313 run_manager.on_chain_end(outputs)
314 final_outputs: Dict[str, Any] = self.prep_outputs(
315 inputs, outputs, return_only_outputs
316 )

File /opt/conda/lib/python3.10/site-packages/langchain/chains/base.py:306, in Chain.call(self, inputs, return_only_outputs, callbacks, tags, metadata, run_name, include_run_info)
299 run_manager = callback_manager.on_chain_start(
300 dumpd(self),
301 inputs,
302 name=run_name,
303 )
304 try:
305 outputs = (
--> 306 self._call(inputs, run_manager=run_manager)
307 if new_arg_supported
308 else self._call(inputs)
309 )
310 except BaseException as e:
311 run_manager.on_chain_error(e)

File /opt/conda/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py:123, in BaseCombineDocumentsChain._call(self, inputs, run_manager)
121 # Other keys are assumed to be needed for LLM prediction
122 other_keys = {k: v for k, v in inputs.items() if k != self.input_key}
--> 123 output, extra_return_dict = self.combine_docs(
124 docs, callbacks=_run_manager.get_child(), **other_keys
125 )
126 extra_return_dict[self.output_key] = output
127 return extra_return_dict

File /opt/conda/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py:170, in StuffDocumentsChain.combine_docs(self, docs, callbacks, **kwargs)
156 def combine_docs(
157 self, docs: List[Document], callbacks: Callbacks = None, **kwargs: Any
158 ) -> Tuple[str, dict]:
159 """Stuff all documents into one prompt and pass to LLM.
160
161 Args:
(...)
168 element returned is a dictionary of other keys to return.
169 """
--> 170 inputs = self._get_inputs(docs, **kwargs)
171 # Call predict on the LLM.
172 return self.llm_chain.predict(callbacks=callbacks, **inputs), {}

File /opt/conda/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py:126, in StuffDocumentsChain._get_inputs(self, docs, **kwargs)
111 """Construct inputs from kwargs and docs.
112
113 Format and the join all the documents together into one input with name
(...)
123 dictionary of inputs to LLMChain
124 """
125 # Format each document according to the prompt
--> 126 doc_strings = [format_document(doc, self.document_prompt) for doc in docs]
127 # Join the documents together to put them in the prompt.
128 inputs = {
129 k: v
130 for k, v in kwargs.items()
131 if k in self.llm_chain.prompt.input_variables
132 }

File /opt/conda/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py:126, in (.0)
111 """Construct inputs from kwargs and docs.
112
113 Format and the join all the documents together into one input with name
(...)
123 dictionary of inputs to LLMChain
124 """
125 # Format each document according to the prompt
--> 126 doc_strings = [format_document(doc, self.document_prompt) for doc in docs]
127 # Join the documents together to put them in the prompt.
128 inputs = {
129 k: v
130 for k, v in kwargs.items()
131 if k in self.llm_chain.prompt.input_variables
132 }

File /opt/conda/lib/python3.10/site-packages/langchain_core/prompts/base.py:248, in format_document(doc, prompt)
214 def format_document(doc: Document, prompt: BasePromptTemplate) -> str:
215 """Format a document into a string based on a prompt template.
216
217 First, this pulls information from the document from two sources:
(...)
246 >>> "Page 1: This is a joke"
247 """
--> 248 base_info = {"page_content": doc.page_content, **doc.metadata}
249 missing_metadata = set(prompt.input_variables).difference(base_info)
250 if len(missing_metadata) > 0:

AttributeError: 'tuple' object has no attribute 'page_content'

Here is my package version:
Name: keybert
Version: 0.8.3
Summary: KeyBERT performs keyword extraction with state-of-the-art transformer models.

Name: langchain
Version: 0.0.352
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain

Can you take a look? Thank you

Hmmm, not sure what is happening here. It might be that the document is not correctly processed or that LangChain has updated. Do you get the same error if you use another LLM in LangChain?

@MaartenGr No. I didn't try other LLM, I don't have much access to other LLM. Thank you

I just pushed a fix to the main branch which should have resolved your issue. Could you try it out?

I don't see any update version. It is still 0.8.3.

I pushed the changes to the main branch, so if you install it from the latest commit or the main branch, you should get the fix. Install it like so:

pip install git+https://github.com/MaartenGr/KeyBERT.git@master

and it should work. You will indeed still get 0.8.3 but that does not matter since I did not yet create an official update. I want to wait first until I get confirmation that this fix works.

In other words, use the above installation to try out the fix.

Hi Thank you.
The resuls doesn't look like what I was expecting. But it works now,
image

I tried to create Keywords with KeyLLM. The result is more like it split a summary.
I tried to extract Keywords with KeyLLM. It gave me empty result.

No problem! Prompt engineering is extremely important in using different LLMs. Some adhere to certain directions better than others. It helps to play around with prompts that work with Claude.

The empty results are due to check_vocab. Set it to False and it should work.

But in the example here.

I don't see where I can use custom prompt 1. Create Keywords with KeyLLM and 2. Extract Keywords with KeyLLM

You can add a custom prompt using the prompt parameter of keybert.llm.LangChain. See the docstrings: https://github.com/MaartenGr/KeyBERT/blob/master/keybert/llm/_langchain.py