MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

Home Page:https://maartengr.github.io/BERTopic/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AttributeError: 'NoneType' object has no attribute 'strip'

ytpf23 opened this issue · comments

commented

Getting the following error when using OpanAI representation model with Bertopic. When in logs one and the same cluster is visible two times, like here, cluster number 143 first time passes and later sends the error.

def setup_openai_client():
client = AzureOpenAI(
api_key=Params.openai_key,
api_version=Params.openai_version,
azure_endpoint= Params.openai_endpoint
)
prompt=bert_topic_label_prompt
return OpenAI(client, model=Params.openai_deployment_gpt3, chat=True, prompt=prompt, exponential_backoff=True)

024-04-10 10:55:23 - httpx - INFO - HTTP Request: POST https://openaigpt4access.openai.azure.com//openai/deployments/gpt3_5azure/chat/completions?api-version=2023-07-01-preview "HTTP/1.1 200 OK"
73%|████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 143/195 [01:02<00:22, 2.34it/s]2024-04-10 10:55:23 - httpx - INFO - HTTP Request: POST https://openaigpt4access.openai.azure.com//openai/deployments/gpt3_5azure/chat/completions?api-version=2023-07-01-preview "HTTP/1.1 200 OK"
73%|████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 143/195 [01:02<00:22, 2.29it/s]
Traceback (most recent call last):
File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\main.py", line 94, in
main(args.project_id, args.n_reviews, args.category)
File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\main.py", line 23, in main
File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\analysis\keywords_modeling.py", line 101, in final_auto_topics
topics_df = fit_bertopic_model(sentences, embeddings, embedding_model, umap_model, hdbscan_model, vectorizer_model, representation_models)
File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\analysis\keywords_modeling.py", line 72, in fit_bertopic_model
topics, probs = topic_model.fit_transform(sentences, embeddings)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic_bertopic.py", line 433, in fit_transform
self._extract_topics(documents, embeddings=embeddings, verbose=self.verbose)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic_bertopic.py", line 3637, in extract_topics
self.topic_representations
= self._extract_words_per_topic(words, documents)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic_bertopic.py", line 3938, in extract_words_per_topic
self.topic_aspects
[aspect] = aspect_model.extract_topics(self, documents, c_tf_idf, aspects)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\representation_openai.py", line 222, in extract_topics
label = response.choices[0].message.content.strip().replace("topic: ", "")
AttributeError: 'NoneType' object has no attribute 'strip'

Can you try installing BERTopic from its main branch? I believe a fix for this can be found there.

commented

Can you try installing BERTopic from its main branch? I believe a fix for this can be found there

Error is still there, I have clone the master branch

Could you share the full code and error message after cloning and installing the branch?

commented

def initialize_representation_models():
keybert_model = KeyBERTInspired()
openai_model = setup_openai_client()
return {
"KeyBERT": keybert_model,
"OpenAI": openai_model,
}

def setup_openai_client():
client = AzureOpenAI(
api_key=Params.openai_key,
api_version=Params.openai_version,
azure_endpoint= Params.openai_endpoint
)
prompt=bert_topic_label_prompt
return OpenAI(client, model=Params.openai_deployment_gpt3, chat=True, prompt=prompt, delay_in_seconds=0.3, diversity=0.2) #exponential_backoff=True,

def fit_bertopic_model(sentences, embeddings, embedding_model, umap_model, hdbscan_model, vectorizer_model, representation_model): #embedding_model,
topic_model = BERTopic(
embedding_model=embedding_model,
umap_model=umap_model,
hdbscan_model=hdbscan_model,
vectorizer_model=vectorizer_model,
representation_model=representation_model,
top_n_words=20,
verbose=True
)

topics, probs = topic_model.fit_transform(sentences, embeddings)
topics_df = topic_model.get_topic_info()
print(f"Number of unique topics found: {len(set(topics))}")
return topics_df

    2024-04-10 12:19:42,781 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm
2024-04-10 12:20:36,721 - BERTopic - Dimensionality - Completed ✓
2024-04-10 12:20:36,721 - BERTopic - Cluster - Start clustering the reduced embeddings
2024-04-10 12:20:39,694 - BERTopic - Cluster - Completed ✓
2024-04-10 12:20:39,700 - BERTopic - Representation - Extracting topics from clusters using representation models.
 77%|██████████████████████████████████████████████████████████████████████████████████████████████████▎                            | 151/195 [01:16<00:22,  1.97it/s]
Traceback (most recent call last):
  File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\main.py", line 94, in <module>
    main(args.project_id, args.n_reviews, args.category)
  File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\main.py", line 23, in main
    auto_topics = final_auto_topics(project_id=project_id, n_reviews=n_reviews)
  File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\analysis\keywords_modeling.py", line 101, in final_auto_topics
    topics_df = fit_bertopic_model(sentences, embeddings, embedding_model, umap_model, hdbscan_model, vectorizer_model, representation_models)
  File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\analysis\keywords_modeling.py", line 72, in fit_bertopic_model
    topics, probs = topic_model.fit_transform(sentences, embeddings)
  File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\_bertopic.py", line 433, in fit_transform
    self._extract_topics(documents, embeddings=embeddings, verbose=self.verbose)
  File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\_bertopic.py", line 3782, in _extract_topics
    self.topic_representations_ = self._extract_words_per_topic(words, documents)
  File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\_bertopic.py", line 4083, in _extract_words_per_topic
    self.topic_aspects_[aspect] = aspect_model.extract_topics(self, documents, c_tf_idf, aspects)
  File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\representation\_openai.py", line 223, in extract_topics
    label = response.choices[0].message.content.strip().replace("topic: ", "")
AttributeError: 'NoneType' object has no attribute 'strip'
(dictionary) PS C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code> 
    
    
commented

Do you have any suggestion?

I'm actually not sure what is happening here. I believe OpenAI should give back at least some value, especially when you check for it. It might be that OpenAI has some additional filters and does not accept certain input/output if it doesn't adhere to their guidelines.

One other thing that I can think of is that their API changed a while ago. Are you using the latest version of their package?

commented

Yes, I have implemented a custom solution and the problem is policy violation.

ERROR MESSAGE: Value Error 'Azure has not provided the response due to a content filter being triggered'

However, in bertopic I just don't get the error message it returns None and execution stops.

I will use my custom implementation to catch these errors, but I think many people may have this issue in the future

Thank you for your reply!