run-llama / llama_index

LlamaIndex is a data framework for your LLM applications

Home Page:https://docs.llamaindex.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: Bedrock Cohere embeddings are not working as expected.

omarinho opened this issue · comments

Bug Description

I know that Cohere (through Bedrock) has a limit of 512 tokens (also known as 2048 characters) in the Bedrock API.

Additionally, I'm aware that a fix was recently accepted from a pull request by @gsuriano to address an issue with this limit. (See: #13126)

However, the problem persists despite the fix. For example, when I input the following:

storage_context = StorageContext.from_defaults(
vector_store=vector_store
)
service_context = ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model,
chunk_size=400,
chunk_overlap=80,
)
VectorStoreIndex.from_documents(
documents=documents,
storage_context=storage_context,
service_context=service_context,
)

Where llm is, for example, anthropic.claude-3-haiku-20240307-v1:0 and embed_model is cohere.embed-english-v3 and I try to make embedings of a PDF , I am getting an error saying "Input too long for selected model" so, even with a chunk size less than 512 is still failing.

We are using llama-index-embeddings-bedrock==0.1.5 which include the fix by @gsuriano, so apparently there is something else that needs to be fixed.

Any suggestion is well received.

Version

0.10.19

Steps to Reproduce

Try to make embeddings of a PDF file. by using chunk size = 400 and chunk overlap = 80.

In our case we are using MongoDB Atlas as vector store.

Relevant Logs/Tracbacks

https://github.com/run-llama/llama_index/pull/13126

I suggest to set the chunk_size and similar parameters not with ServiceContext but with Settings.
I link documentation about it https://docs.llamaindex.ai/en/stable/module_guides/supporting_modules/service_context_migration/

To address the "Input too long for selected model" error you're encountering, even after implementing the recent fix and adjusting your chunk size to 400 with an overlap of 80, consider the following steps:

  1. Check Model Capabilities: Ensure the model (cohere.embed-english-v3) supports the chunk sizes you're using. Different models have varying input size limitations.

  2. Adjust Chunk Parameters: If some chunks are still too large, reduce the chunk size further or adjust the overlap. This might help keep the chunks within the model's acceptable input range.

  3. Explore Other Models: If adjusting sizes doesn't work, look into using a different model that can handle larger inputs. The list_supported_models method can help identify suitable alternatives.

  4. Customize Request Body: For the Cohere provider, you might need to customize how the request body is constructed, particularly how text is split and sent. This could involve adjusting settings like truncate.

  5. Review API and SDK Settings: Ensure your API requests are optimally configured, which might help with processing large inputs more efficiently.

  6. Consult Documentation and Support: If issues persist, the official documentation or support channels for the models you're using may offer additional insights or solutions.

By following these steps, you should be better equipped to resolve the issue with input size when making embeddings of a PDF file.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@gsuriano, Thanks for the suggestion. I will migrate to the Settings approach for sure; however, I am still encountering the same error.

Perhaps this limit is another constraint because I have conducted some testing, and it begins to fail when the total character count of "texts" is close to 8192 characters.

To effectively change embed_batch_size you should use the Settings object. That's why I suggested using Settings also for the chunk_size, I think that the split is done with default values of llamaindex and not with the chunk_size that you set on the ServiceContext.

@gsuriano, Thank you for the advice. I was able to change the embed_batch_size via Settings. However, the error still persists even when I set embed_batch_size=1, so apparently, this is not the problem.

Upon debugging, I found that the error starts to occur at this node:

{"texts": ["page_label: 2\ngenai_document_id: 1681b55d-a16b-4fe7-9aa5-a8edf60501b6\ngenai_tenant_id: 5853\ngenai_created_date_utc: 2024-05-09T16:06:43.000Z\ngenai_key1: value 1\ngenai_key2: value 2\ngenai_application: CE\ngenai_entitytype: Program\ngenai_entityid: 51137\ngenai_llmmodel: anthropic.claude-3-haiku-20240307-v1:0\ngenai_embeddingmodel: cohere.embed-english-v3\ngenai_embeddingdimension: 1024\n\nIT Change Management \nUTRGV 2 \n \nTable of Contents \nIntroduction ............................................................................................................................................................. 3 \nDefining Change .................................................................................................................................................... 3 \nRoles and Responsibilities ..............."], "input_type": "search_document", "truncate": "NONE"}

The "texts" field has only 848 characters. However, the error I encountered is:

File "C:\work\LambdaModules10\Lib\site-packages\llama_index\embeddings\bedrock\base.py", line 345, in _get_embedding
response = self._client.invoke_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\work\LambdaModules10\Lib\site-packages\botocore\client.py", line 565, in _api_call
return self._make_api_call(operation_name, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\work\LambdaModules10\Lib\site-packages\botocore\client.py", line 1021, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: Input is too long for requested model.

If "texts" has fewer characters, it works without issues.

I would appreciate any suggestions.

I have finally found the issue.

This is caused by the TRUNCATE: 'NONE' parameter in this code (llama_index\embeddings\bedrock\base.py):

        request_body = json.dumps(
            {
                "texts": payload,
                "input_type": input_types[input_type],
                "truncate": "NONE",
            }
        )

If you remove this truncate parameter, it works! In the Amazon Documentation, they said the default value is NONE, but when it is explicitly specified as "NONE", it causes the "input too long" error. It appears to be an internal bug in Amazon Bedrock.

Anyway, if the default value is NONE, removing the truncate parameter should have the same behavior in theory. Source: https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-embed.html

I have created a pull request with this fix.

#13442