TimDettmers / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Home Page:https://huggingface.co/docs/bitsandbytes/main/en/index

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

error on VectorstoreIndexCreator

MohammadAminDHM opened this issue · comments

System Info

run on kaggle

Reproduction

i get this error :


ValidationError Traceback (most recent call last)
Cell In[3], line 33
31 index = VectorstoreIndexCreator(vectorstore_kwargs={"persist_directory": "persist", "embedding": embedding_function}).from_loaders([loader])
32 else:
---> 33 index = VectorstoreIndexCreator(vectorstore_kwargs={"embedding": embedding_function}).from_loaders([loader])
35 chain = ConversationalRetrievalChain.from_llm(
36 llm=model,
37 retriever=index.vectorstore.as_retriever(search_kwargs={"k": 1}),
38 )
40 chat_history = []

File /opt/conda/lib/python3.10/site-packages/pydantic/v1/main.py:341, in BaseModel.init(pydantic_self, **data)
339 values, fields_set, validation_error = validate_model(pydantic_self.class, data)
340 if validation_error:
--> 341 raise validation_error
342 try:
343 object_setattr(pydantic_self, 'dict', values)

ValidationError: 1 validation error for VectorstoreIndexCreator
embedding
field required (type=value_error.missing)

please help me to solve this

Expected behavior

when i want use RAG, get this error

I'm getting the same error! did you manage to find a solution?

same here

Hi all,

This seems to be an issue for LangChain, and not bitsandbytes.

Not sure what you are trying to do but in my case I was using LangChain and running this example https://python.langchain.com/docs/integrations/document_loaders/hugging_face_dataset/

It seems to work with the below changes (comments at the top of each line)

# import from langchain.indexes.vectorstore rather than langchain.indexes as in the example
from langchain.indexes.vectorstore import VectorstoreIndexCreator 
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_openai import ChatOpenAI

from langchain_community.document_loaders.hugging_face_dataset import (
    HuggingFaceDatasetLoader,
)

embeddings = HuggingFaceEmbeddings()

dataset_name = "tweet_eval"
page_content_column = "text"
name = "stance_climate"

loader = HuggingFaceDatasetLoader(dataset_name, page_content_column, name)

# pass the embedding as parameter, in the example is empty
index = VectorstoreIndexCreator(embedding=embeddings).from_loaders([loader])

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature="0")

query = "What are the most used hashtag?"

# looks like we need to pass an llm now
result = index.query(query, llm=llm)

I hope it helps

Confirm having this issue as well. following code at https://github.com/Ryota-Kawamura/LangChain-for-LLM-Application-Development

In addition, at Langchain docs at langchain docs it show that we can run below code, but we cannot, with error provide below

from langchain.indexes import VectorstoreIndexCreator
from langchain_community.document_loaders.hugging_face_dataset import (
    HuggingFaceDatasetLoader,
)
dataset_name = "tweet_eval"
page_content_column = "text"
name = "stance_climate"

loader = HuggingFaceDatasetLoader(dataset_name, page_content_column, name)
index = VectorstoreIndexCreator().from_loaders([loader])
> error : ValidationError: 1 validation error for VectorstoreIndexCreator
embedding
  field required (type=value_error.missing)

below is install packages from piplock file : also all latest

[packages]
langchain = "*"
python-dotenv = "*"
openai = "==0.28"
langchain-community = "*"
langchain-core = "*"
tiktoken = "*"
docarray = "*"
commented

following, same error. tried a few alternatives...