[macOs] ingest.py: IndexError: list index out of range
xcodeassociated opened this issue · comments
Janusz Majchrzak commented
I'm trying to run python ingest.py
on my macbook with the sample Orca.pdf file as input, but I'm getting error:
2024-02-04 18:44:15,874 - INFO - ingest.py:144 - Loading documents from /Users/xcodeassociated/code/localGPT/SOURCE_DOCUMENTS
Importing: Orca_paper.pdf
/Users/xcodeassociated/code/localGPT/SOURCE_DOCUMENTS/Orca_paper.pdf loaded.
/Users/xcodeassociated/code/localGPT/SOURCE_DOCUMENTS/Orca_paper.pdf loading error:
partially initialized module 'charset_normalizer' has no attribute 'md__mypyc' (most likely due to a circular import)
2024-02-04 18:44:17,439 - INFO - ingest.py:153 - Loaded 1 documents from /Users/xcodeassociated/code/localGPT/SOURCE_DOCUMENTS
2024-02-04 18:44:17,439 - INFO - ingest.py:154 - Split into 0 chunks of text
2024-02-04 18:44:18,088 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large
load INSTRUCTOR_Transformer
max_seq_length 512
2024-02-04 18:44:19,221 - INFO - ingest.py:185 - Loaded embeddings from hkunlp/instructor-large
Traceback (most recent call last):
File "/Users/xcodeassociated/code/localGPT/ingest.py", line 198, in <module>
main()
File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/Users/xcodeassociated/code/localGPT/ingest.py", line 187, in main
db = Chroma.from_documents(
File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 613, in from_documents
return cls.from_texts(
File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 577, in from_texts
chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 187, in add_texts
embeddings = self._embedding_function.embed_documents(texts)
File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/langchain/embeddings/huggingface.py", line 169, in embed_documents
embeddings = self.client.encode(instruction_pairs, **self.encode_kwargs)
File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/InstructorEmbedding/instructor.py", line 524, in encode
if isinstance(sentences[0],list):
IndexError: list index out of range
The repo is upt-to-date. Can some take a look?
PS. I was able to make it work on my linux dektop PC running debian and cuda. It looks like macOs related issues...
Oluwasegun Apejoye commented
I had a similar issue on my Windows machine.
See this StackOverflow answer.
The issue is with the charset_normalizer
module.
Uninstalling and reinstalling it fixed the issue for me.