MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

Home Page:https://maartengr.github.io/BERTopic/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bertopic.representation.LangChain Unbound Local Error

eschaffn opened this issue · comments

When using the doc_length parameter in the LangChain() class from bertopic.representation the following error occurs:


2024-04-10 18:42:39,015 - BERTopic - Embedding - Transforming documents to embeddings.
Batches: 100%|████████████████████████████████████████████████████████| 247/247 [00:39<00:00,  6.27it/s]
2024-04-10 18:43:21,658 - BERTopic - Embedding - Completed ✓
2024-04-10 18:43:21,659 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm
2024-04-10 18:43:50,487 - BERTopic - Dimensionality - Completed ✓
2024-04-10 18:43:50,487 - BERTopic - Cluster - Start clustering the reduced embeddings
2024-04-10 18:43:50,883 - BERTopic - Cluster - Completed ✓
2024-04-10 18:43:50,883 - BERTopic - Representation - Extracting topics from clusters using representation models.
Traceback (most recent call last):
  File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/gradio/route_utils.py", line 230, in call_process_api
    output = await app.get_blocks().process_api(
  File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/gradio/blocks.py", line 1590, in process_api
    result = await self.call_function(
  File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/gradio/blocks.py", line 1176, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
  File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/gradio/utils.py", line 678, in wrapper
    response = f(*args, **kwargs)
  File "gradio_app.py", line 116, in initialize_model
    topics, probs = topic_model.fit_transform(data)
  File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/bertopic/_bertopic.py", line 433, in fit_transform
    self._extract_topics(documents, embeddings=embeddings, verbose=self.verbose)
  File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/bertopic/_bertopic.py", line 3637, in _extract_topics
    self.topic_representations_ = self._extract_words_per_topic(words, documents)
  File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/bertopic/_bertopic.py", line 3938, in _extract_words_per_topic
    self.topic_aspects_[aspect] = aspect_model.extract_topics(self, documents, c_tf_idf, aspects)
  File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/bertopic/representation/_langchain.py", line 172, in extract_topics
    chain_docs: List[List[Document]] = [
  File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/bertopic/representation/_langchain.py", line 173, in <listcomp>
    [
  File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/bertopic/representation/_langchain.py", line 175, in <listcomp>
    page_content=truncate_document(
  File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/bertopic/representation/_utils.py", line 57, in truncate_document
    return truncated_document
UnboundLocalError: local variable 'truncated_document' referenced before assignment

Code used to run:

chain = load_qa_chain(Ollama(model=args.llm), chain_type="stuff")

representation_model = {
    "LLM Summary": LangChain(
        chain=chain,
        nr_docs=4,
        doc_length=args.sentence_model_max_seq_len
    )
}

This is only an issue with the doc_length parameter.

Sorry, I had it configured incorretly.