bertopic.representation.LangChain Unbound Local Error
eschaffn opened this issue · comments
eschaffn commented
When using the doc_length
parameter in the LangChain()
class from bertopic.representation the following error occurs:
2024-04-10 18:42:39,015 - BERTopic - Embedding - Transforming documents to embeddings.
Batches: 100%|████████████████████████████████████████████████████████| 247/247 [00:39<00:00, 6.27it/s]
2024-04-10 18:43:21,658 - BERTopic - Embedding - Completed ✓
2024-04-10 18:43:21,659 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm
2024-04-10 18:43:50,487 - BERTopic - Dimensionality - Completed ✓
2024-04-10 18:43:50,487 - BERTopic - Cluster - Start clustering the reduced embeddings
2024-04-10 18:43:50,883 - BERTopic - Cluster - Completed ✓
2024-04-10 18:43:50,883 - BERTopic - Representation - Extracting topics from clusters using representation models.
Traceback (most recent call last):
File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/gradio/route_utils.py", line 230, in call_process_api
output = await app.get_blocks().process_api(
File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/gradio/blocks.py", line 1590, in process_api
result = await self.call_function(
File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/gradio/blocks.py", line 1176, in call_function
prediction = await anyio.to_thread.run_sync(
File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
return await future
File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/gradio/utils.py", line 678, in wrapper
response = f(*args, **kwargs)
File "gradio_app.py", line 116, in initialize_model
topics, probs = topic_model.fit_transform(data)
File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/bertopic/_bertopic.py", line 433, in fit_transform
self._extract_topics(documents, embeddings=embeddings, verbose=self.verbose)
File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/bertopic/_bertopic.py", line 3637, in _extract_topics
self.topic_representations_ = self._extract_words_per_topic(words, documents)
File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/bertopic/_bertopic.py", line 3938, in _extract_words_per_topic
self.topic_aspects_[aspect] = aspect_model.extract_topics(self, documents, c_tf_idf, aspects)
File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/bertopic/representation/_langchain.py", line 172, in extract_topics
chain_docs: List[List[Document]] = [
File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/bertopic/representation/_langchain.py", line 173, in <listcomp>
[
File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/bertopic/representation/_langchain.py", line 175, in <listcomp>
page_content=truncate_document(
File "/opt/conda/envs/scraawl-ng/lib/python3.8/site-packages/bertopic/representation/_utils.py", line 57, in truncate_document
return truncated_document
UnboundLocalError: local variable 'truncated_document' referenced before assignment
Code used to run:
chain = load_qa_chain(Ollama(model=args.llm), chain_type="stuff")
representation_model = {
"LLM Summary": LangChain(
chain=chain,
nr_docs=4,
doc_length=args.sentence_model_max_seq_len
)
}
This is only an issue with the doc_length
parameter.
eschaffn commented
Sorry, I had it configured incorretly.