MaartenGr / KeyBERT

Minimal keyword extraction with BERT

Home Page:https://MaartenGr.github.io/KeyBERT/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KeyLLM seems to use OpenAI parameters that are deprecated

lfoppiano opened this issue · comments

First of all, this tool is amazing :-)

I'm trying to use keyLLM using OpenAI API, but when I import the OpenAI module from keybert, I cannot not noticed that the default parameters look having quite old defaults, something like "gpt-3.5-instruct".

The code is something like this:

from keybert.llm import OpenAI

lc_chatgpt = OpenAI(model="gpt-3.5-turbo")
kw_model = KeyLLM(llm=lc_chatgpt)

[...]

keywords_abstracts = kw_model.extract_keywords(abstracts, embeddings=embeddings_abstracts, threshold=0.9)

When trying following your instructions I get a deprecation error:

openai.lib._old_api.APIRemovedInV1: 

You tried to access openai.Completion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742

here the libraries versions:

openai                             1.3.3
keybert                            0.8.3

Thank you in advance

Ah, that is correct! It seems that openai has updated their package with some breaking changes. Perhaps if you set openai to 0.28, it might just work. I'll make sure to update the backend so that it works with their newest release. That likely will introduce a breaking change since I want to only support openai>1.

@lfoppiano I just pushed a fix to #189, if you have the time. Could you check whether it works for you?

Hi,

I cant seem to get it to work.

I've installed keybert and openai as follows:

pip install keybert
pip install openai

The versions are:

keybert                   0.8.3
openai                    1.3.7

I've subsequently run the following:

import openai
from keybert.llm import OpenAI
from keybert import KeyLLM

client = openai.OpenAI(api_key=OpenAI.api_key)
llm = OpenAI(client)
kw_model = KeyLLM(llm)

[...]

keywords = kw_model.extract_keywords(docs, check_vocab=True)

However, I end up with the following error:

---------------------------------------------------------------------------
APIRemovedInV1                            Traceback (most recent call last)
Cell In[105], line 2
      1 # Extract keywords
----> 2 keywords = kw_model.extract_keywords(docs, check_vocab=True)

File ~\.conda\envs\PhDProjectsWork\Lib\site-packages\keybert\_llm.py:126, in KeyLLM.extract_keywords(self, docs, check_vocab, candidate_keywords, threshold, embeddings)
    123         keywords = [in_cluster_keywords[index] for index in range(len(docs))]
    124 else:
    125     # Extract keywords using a Large Language Model (LLM)
--> 126     keywords = self.llm.extract_keywords(docs, candidate_keywords)
    128 # Only extract keywords that appear in the input document
    129 if check_vocab:

File ~\.conda\envs\PhDProjectsWork\Lib\site-packages\keybert\llm\_openai.py:177, in OpenAI.extract_keywords(self, documents, candidate_keywords)
    175         response = chat_completions_with_backoff(**kwargs)
    176     else:
--> 177         response = openai.ChatCompletion.create(**kwargs)
    178     keywords = response["choices"][0]["message"]["content"].strip()
    180 # Use a non-chat model
    181 else:

File ~\.conda\envs\PhDProjectsWork\Lib\site-packages\openai\lib\_old_api.py:39, in APIRemovedInV1Proxy.__call__(self, *_args, **_kwargs)
     38 def __call__(self, *_args: Any, **_kwargs: Any) -> Any:
---> 39     raise APIRemovedInV1(symbol=self._symbol)

APIRemovedInV1: 

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742

Reading the help documentation for OpenAI with
help(OpenAI)
it shows:

|  Using the OpenAI API to extract keywords
 |  
 |      The default method is `openai.Completion` if `chat=False`.
 |      The prompts will also need to follow a completion task. If you
 |      are looking for a more interactive chats, use `chat=True`
 |      with `model=gpt-3.5-turbo`.

This would suggest that the error received is correct as openai.Completion is deprecated.

I thought the fix applied works for openai >1.0 ?
Could you help clarify what I'm not doing correctly?

On the other hand, if I try the following:

client = openai.OpenAI(api_key=OpenAI.api_key)
llm = OpenAI(client, chat=True, model="gpt-3.5-turbo")
kw_model = KeyLLM(llm)

I end up with the following error instead

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[108], line 3
      1 # Create LLM
      2 client = openai.OpenAI(api_key=OpenAI.api_key)
----> 3 llm = OpenAI(client, chat=True, model="gpt-3.5-turbo")
      5 # Load it in KeyLLM
      6 kw_model = KeyLLM(llm)

TypeError: OpenAI.__init__() got multiple values for argument 'model'

What am I doing wrong?

@adegboyegaFAU You are not using the fix. To install the fix, you should run the following instead:

pip install -U git+https://github.com/MaartenGr/KeyBERT@openai_fix

Works now! Thanks @MaartenGr. I'd actually previously tried that from @lfoppiano's post on fix #189 and it didn't work. Turns out that what I didn't do after uninstalling keybert then was to restart anaconda.

I really do love the tool by the way. Great work

@MaartenGr any estimate on when this fix will be released?

@lfoppiano I just pushed the fix to the main branch, an official release will follow either this or next week.

Great, thanks!
I've been testing it extensively these days and works fine

@MaartenGr the fix is not yet released, right?

@lfoppiano Can you provide a minimal working example? I am running into problems when using openai LLM for keyword generation.

openai.api_key = os.getenv('OPENAI_API_KEY')
llm = OpenAI(
    client = openai,
    model = "gpt-3.5-turbo-instruct",
    prompt = "Summarize the following text of keywords with a maximum of 5 keywords: \n\n-",
    chat = False,
    verbose = False,
    )

kw_model_2 = KeyLLM(llm)

year = 2010
texts_to_process = unique_keywords_2[year]
topics = kw_model_2.extract_keywords(texts_to_process)
KeyError                                  Traceback (most recent call last)
File [~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/pydantic/main.py:759](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/pydantic/main.py:759), in BaseModel.__getattr__(self, item)
    [758](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/pydantic/main.py:758) try:
--> [759](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/pydantic/main.py:759)     return pydantic_extra[item]
    [760](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/pydantic/main.py:760) except KeyError as exc:

KeyError: 'message'

The above exception was the direct cause of the following exception:

AttributeError                            Traceback (most recent call last)
Cell In[10], [line 17](vscode-notebook-cell:?execution_count=10&line=17)
     [15](vscode-notebook-cell:?execution_count=10&line=15) year = 2010
     [16](vscode-notebook-cell:?execution_count=10&line=16) texts_to_process = unique_keywords_2[year]
---> [17](vscode-notebook-cell:?execution_count=10&line=17) topics = kw_model_2.extract_keywords(texts_to_process)
     [19](vscode-notebook-cell:?execution_count=10&line=19) print(topics)

File [~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/keybert/_llm.py:126](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/keybert/_llm.py:126), in KeyLLM.extract_keywords(self, docs, check_vocab, candidate_keywords, threshold, embeddings)
    [123](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/keybert/_llm.py:123)         keywords = [in_cluster_keywords[index] for index in range(len(docs))]
    [124](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/keybert/_llm.py:124) else:
    [125](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/keybert/_llm.py:125)     # Extract keywords using a Large Language Model (LLM)
--> [126](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/keybert/_llm.py:126)     keywords = self.llm.extract_keywords(docs, candidate_keywords)
    [128](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/keybert/_llm.py:128) # Only extract keywords that appear in the input document
    [129](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/keybert/_llm.py:129) if check_vocab:

File [~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/keybert/llm/_openai.py:189](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/keybert/llm/_openai.py:189), in OpenAI.extract_keywords(self, documents, candidate_keywords)
    [187](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/keybert/llm/_openai.py:187)     else:
    [188](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/keybert/llm/_openai.py:188)         response = self.client.completions.create(model=self.model, prompt=prompt, **self.generator_kwargs)
--> [189](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/keybert/llm/_openai.py:189)     keywords = response.choices[0].message.content.strip()
    [190](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/keybert/llm/_openai.py:190) keywords = [keyword.strip() for keyword in keywords.split(",")]
    [191](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/keybert/llm/_openai.py:191) all_keywords.append(keywords)

File [~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/pydantic/main.py:761](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/pydantic/main.py:761), in BaseModel.__getattr__(self, item)
    [759](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/pydantic/main.py:759)         return pydantic_extra[item]
    [760](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/pydantic/main.py:760)     except KeyError as exc:
--> [761](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/pydantic/main.py:761)         raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}') from exc
    [762](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/pydantic/main.py:762) else:
    [763](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/fabmeyer/Dev/Python/Moritz_project/~/.local/share/virtualenvs/Moritz_project-3DWIh2uO/lib/python3.9/site-packages/pydantic/main.py:763)     if hasattr(self.__class__, item):

AttributeError: 'CompletionChoice' object has no attribute 'message'

@fabmeyer I use the gpt3.5-turbo openai model and chat=True.

I assembled an example from the code I've used (disclaimer: I did not test it):

client = openai.OpenAI()
chatgpt = OpenAI(client, model="gpt-3.5-turbo", chat=True)
kw_model = KeyLLM(llm=chatgpt)
model = SentenceTransformer('all-MiniLM-L6-v2')

abstracts = [work['abstract'] if 'abstract' in work and work['abstract'] is not None else "" for work in
                 works]
embeddings_abstracts = model.encode(abstracts, convert_to_tensor=True)
keywords_abstracts = kw_model.extract_keywords(abstracts, embeddings=embeddings_abstracts, threshold=0.5)

Ah right, I should definitely release an official version. Let me work on it for a bit and I'll let you know when I release 0.8.4.

Apologies for the late delay (and thanks for the ping)! I just pushed 0.8.4 to PyPI, so all changes to the main branch should now be in the official release.