PromtEngineer / localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[macOs] ingest.py: IndexError: list index out of range

xcodeassociated opened this issue · comments

I'm trying to run python ingest.py on my macbook with the sample Orca.pdf file as input, but I'm getting error:

2024-02-04 18:44:15,874 - INFO - ingest.py:144 - Loading documents from /Users/xcodeassociated/code/localGPT/SOURCE_DOCUMENTS
Importing: Orca_paper.pdf
/Users/xcodeassociated/code/localGPT/SOURCE_DOCUMENTS/Orca_paper.pdf loaded.

/Users/xcodeassociated/code/localGPT/SOURCE_DOCUMENTS/Orca_paper.pdf loading error:
partially initialized module 'charset_normalizer' has no attribute 'md__mypyc' (most likely due to a circular import)

2024-02-04 18:44:17,439 - INFO - ingest.py:153 - Loaded 1 documents from /Users/xcodeassociated/code/localGPT/SOURCE_DOCUMENTS
2024-02-04 18:44:17,439 - INFO - ingest.py:154 - Split into 0 chunks of text
2024-02-04 18:44:18,088 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large
load INSTRUCTOR_Transformer
max_seq_length  512
2024-02-04 18:44:19,221 - INFO - ingest.py:185 - Loaded embeddings from hkunlp/instructor-large
Traceback (most recent call last):
  File "/Users/xcodeassociated/code/localGPT/ingest.py", line 198, in <module>
    main()
  File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/xcodeassociated/code/localGPT/ingest.py", line 187, in main
    db = Chroma.from_documents(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 613, in from_documents
    return cls.from_texts(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 577, in from_texts
    chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 187, in add_texts
    embeddings = self._embedding_function.embed_documents(texts)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/langchain/embeddings/huggingface.py", line 169, in embed_documents
    embeddings = self.client.encode(instruction_pairs, **self.encode_kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/ai/lib/python3.10/site-packages/InstructorEmbedding/instructor.py", line 524, in encode
    if isinstance(sentences[0],list):
IndexError: list index out of range

The repo is upt-to-date. Can some take a look?

PS. I was able to make it work on my linux dektop PC running debian and cuda. It looks like macOs related issues...

I had a similar issue on my Windows machine.

See this StackOverflow answer.
The issue is with the charset_normalizer module.

Uninstalling and reinstalling it fixed the issue for me.