jonfairbanks / local-rag

Ingest files for retrieval augmented generation (RAG) with open-source Large Language Models (LLMs), all without 3rd parties or sensitive data leaving your network.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

seems like attempting to index a website gets llama_index to reach out to openai?

wolfspyre opened this issue · comments

will create another issue for the chunk overlap not being propagated from the UI ... but

my scenario is I have ollama running on an adjacent host. and am running local-rag on my desktop.
upon opening it, I added a website to be indexed, after validating that local-rag saw my ollama instance and selecting the embedding model, and changing chunk size and overlap to 128 and 16 respectively for shiggles.

2024-03-18 16:36:51,440 - ollama - INFO - Ollama chat client created successfully
2024-03-18 16:36:51,472 - ollama - INFO - Ollama models loaded successfully
2024-03-18 16:37:20,302 - ollama - INFO - Ollama LLM instance created successfully
2024-03-18 16:37:20,303 - llama_index - INFO - Using cpu to generate embeddings
2024-03-18 16:37:22,624 - llama_index - INFO - Embedding model created successfully
2024-03-18 16:37:22,940 - llama_index - ERROR - Failed to create service_context: Got a larger chunk overlap (200) than chunk size (128), should be smaller.
2024-03-18 16:37:22,940 - rag_pipeline - INFO - Documents are already available; skipping document loading
2024-03-18 16:37:22,940 - rag_pipeline - INFO - Documents are already available; skipping document loading
Parsing nodes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 33.59it/s]
Generating embeddings:   0%|                                                                                                                                                                               | 0/7 [00:00<?, ?it/s]Retrying llama_index.embeddings.openai.base.get_embeddings in 0.2629274909803947 seconds as it raised AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-abc123. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}.
Retrying llama_index.embeddings.openai.base.get_embeddings in 1.5223636242693028 seconds as it raised AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-abc123. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}.
Retrying llama_index.embeddings.openai.base.get_embeddings in 3.117398967725308 seconds as it raised AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-abc123. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}.
Retrying llama_index.embeddings.openai.base.get_embeddings in 4.185735070177902 seconds as it raised AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-abc123. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}.
Retrying llama_index.embeddings.openai.base.get_embeddings in 2.398372666297842 seconds as it raised AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-abc123. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}.
2024-03-18 16:37:34,917 - llama_index - ERROR - Index creation failed: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-abc123. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
2024-03-18 16:37:34,917 - llama_index - ERROR - Error when creating Query Engine: 'bool' object has no attribute 'as_query_engine'

that.... doesn't seem like its spozed to do that, right? :)
(what info would be helpful here?)

not sure if you WANT me to create an issue for the overlap, as you've already got a todo about it in llama_index.py,
but the chunk overlap entry in service_context is commented out.... maybe this is intentional?

        service_context = ServiceContext.from_defaults(
            llm=llm,
            system_prompt=system_prompt,
            embed_model=embedding_model,
            chunk_size=int(chunk_size),
            # chunk_overlap=int(chunk_overlap),
        )

There is a relationship between chunk_size and chunk_overlap. Tweaking chunk sizes currently needs to be done with some care. Additional logic is needed on the UI side to better help users know when their settings may cause errors.

As for the error about reaching to OpenAI, this is somewhat of a bad error message coming out of Llama Index. Setting up the embed model with the provided values failed, so it starts trying to use OpenAI (its default).

my intent was to have a large overlap, and having lots of small files to help with small context... but after seeing the 200 hardcoded there I switched back to 1024 as chunksize... which didn't fix the problem I'm seeing:

with the model: jaigouk/nous-capybara-34b-q3:latest
and endpoint set to http://myotherhost:11434

resetting the embedding to bge-large-en-v1.5, or using salesforce/sfr-embedding-mistral
with the chunksize at 1024/overlap 200

adding a 1mb markdown file (the hugo documentation)

2024-03-18 17:47:39,244 - helpers - INFO - Directory ~/ML_AI/local-rag/data did not exist so creating it
2024-03-18 17:47:39,250 - helpers - INFO - Upload README.md saved to disk
2024-03-18 17:47:39,253 - llama_index - INFO - Service Context created successfully
2024-03-18 17:47:39,255 - rag_pipeline - INFO - Documents are already available; skipping document loading

and attempting to chat with the info just results in a response of

'Please confirm settings and upload files before proceeding'

I know that local-rag has connected, as I see gets of '/api/tags'
but I never see a post to 'api/chat' from local-rag...

nothing interesting is logged in local-rag.log

suggestions?

local-rag-36_2
local-rag-36_1
local-rag-36_3
local-rag-36_4

A possible contributor could be the lack of cuda ... as I'm running on macs... I haz no nvidia gear.. so with the included lockfile, the pipenv install doesn't succeed... however it does if you remove the lockfile and let pipenv re-sort it's deps.

I'm trying the same process now on the machine that's running ollama locally (mbp m2 96g) to see if it behaves differently (tho I doubt it'll change anything)

Under Settings > Advanced, at the bottom you should be able to view the Application State.

Here you will need a few elements to successfully chat:

  • documents - uploaded files/repos/websites are processed into documents
  • query_engine - a vectorized index of uploaded documents; will be used to chat with directly
  • service_context - holds configuration details regarding the embedding model, system prompt, etc.

Initially these will be NULL, but should have values after document processing. If any of these items are still NULL after processing, it may indicate that an error was encountered when loading documents, setting up the embedding model, or transforming the documents using the embedding model.

Sample View:
Screenshot_2

A possible contributor could be the lack of cuda ... as I'm running on macs... I haz no nvidia gear.. so with the included lockfile, the pipenv install doesn't succeed... however it does if you remove the lockfile and let pipenv re-sort it's deps.

I'm trying the same process now on the machine that's running ollama locally (mbp m2 96g) to see if it behaves differently (tho I doubt it'll change anything)

Not having cuda should not cause issues here as the default is to use cpu unless cuda is available. I'll work to update the Pipfile however to ensure smoother installation for Mac users. Thanks for the note.

well... I'm getting a response (or at least it's not behaving the same) on the MBP m2 with ollama running locally.... so I'm gonna try again once this host finishes indexing the document (4h to index a 1.5mb md file... I find it hard to understand why it would take so long... but that's orthogonal to this issue... so I'mma drop the 'please confirm settings' thing until I can narrow down a why )

Do you have a link to the MD file you are using? I can use it in some testing.

Next release will migrate the way settings are applied in the backend and should prevent odd OpenAI related errors.