chroma-core / chroma

the AI-native open-source embedding database

Home Page:https://www.trychroma.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: chromadb 0.5.4 crashes on windows

petacube opened this issue · comments

What happened?

running collection.add function crashes after 100 documents are inserted

Versions

chromadb 0.5.4, python 3.9;

Relevant log output

No response

rolling the code back to 0.5.0 release of chromadb resolves the issue.
please explain what is going on with crash

Do you have a stack trace or any output?

it crashes silently. the whole python process dies. there is not even exception thrown.
i can try testing with on linux tmr to see if i can replicate the crash and run systrace to see if core dump can be captured.

similar/same issue reported in discord - https://discord.com/channels/1073293645303795742/1261229903383236720

Windows Fatal Exception: Access Violation

image

@petacube, unable to reproduce on GH windows-latest

Here's the test code - https://github.com/amikos-tech/chrm-2513-exp/blob/main/test_import.py

With the following WF - https://github.com/amikos-tech/chrm-2513-exp/actions/runs/9919966674/workflow

Conda env with Python 3.9 and Chroma 0.5.4

I tried adding things in bulk and separately. I also intentionally have high dimensional vectors (4096).

Let me know if you encounter the error in a similar setting.

Hmm, I wonder if this is due to a chroma-hnswlib version mismatch. Can you run pip show chroma-hnswlib? It should be 0.7.5 for chroma 0.5.4

my version of chroma-hnswlib is 0.7.3
should not the dependency like this be handled at chromadb level ?

'chroma-hnswlib==0.7.5',

It is set here, i am not sure how you updated but maybe something went wrong. Can you upgrade the dep and try again

i did pip install --upgrade chromadb==0.5.4, so probably that does not upgrade dependencies possibly?

I had the same issue: Silent crash after updating to chromadb 0.5.4 on Windows EVEN WITH chroma-hnswlib vers. 0.7.5

I moved back to chromadb 0.5.0 and chroma-hnswlib 0.7.3 and everything is working like before.

@kaixxx, can you confirm whether you were using anaconda? A user in Discord reported that the problems were resolved when he switched from anaconda to pip.

On a related note: If your environment rebuilds the chroma-hnsw lib that can be the culprit. Can you let me know what Python version and CPU Arch you have? We have prebuilt wheels for amd64 only on Windows (py39-py312).

Thanks for looking into this. Here is some additional info:

  • I am using anaconda to manage my environments. However, I do not install any packages from Anaconda but use pip for everything.
  • hnswlib: After reading the above message about the possibIe problem with version 0.7.3 I've checked that I had 0.7.5 installed. Chromadb still crashed.
  • I am using Python 3.10.13
  • CPU: AMD Ryzen 7 6800U

@kaixxx, in your venv can you run the following code with python:

import hnswlib
import numpy as np

index = hnswlib.Index(space="l2", dim=1024)
index.init_index(max_elements=1000, ef_construction=100, M=16)
vectors = np.random.randn(1000, 1024).astype(np.float32)
index.add_items(vectors,ids=np.arange(1000))

Let me know if this crashes

Yes, it seems to crash.
I've created a new environment, installed chromadb (0.5.4 with chroma-hnswlib 0.7.5).
Then I've added the line print('finished') to the end of your script. This line is never reached. The script exits silently without any error message.
In my other environment with chromadb 0.5.0, the script runs fine and prints 'finished'.

Another test: I've now downgraded to chroma-hnswlib 0.7.3 but kept chromadb 0.5.4 and your script runs fine!

@kaixxx thanks for confirming. Can you add debug prints like this to identify whether it fails in the init of the index or when adding vectors:

import hnswlib
import numpy as np

index = hnswlib.Index(space="l2", dim=1024)
print("New index - ok")
index.init_index(max_elements=1000, ef_construction=100, M=16)
print("Init index - ok")
vectors = np.random.randn(1000, 1024).astype(np.float32)
index.add_items(vectors,ids=np.arange(1000))
print("All good")

Yes, output:

New index - ok
Init index - ok

(no "All good")

@kaixxx, fantastic. Thank you for following up. 0.7.5 adds this change to add_items functionality - chroma-core/hnswlib@408c5d1?diff=split&w=0#diff-ab27cbb27975c68cb0c6da824871058623f7f76a761c3c8365ef2e1395cf7cd9R1706-R1708

Can I ask you to rebuild the HNSW lib locally (if you have the necessary deps):

pip install --no-binary :all: chroma-hnswlib==0.7.5

Hey @tazarov, I've tried to build it but it results in an error from the linker that a certain file could not be opened. It may be that my build environment is not set up properly, but I don't have the time to dig into that. Is there anything else I can do?

when the document's length big enough and insert the 100th , then the bug will occur, Whether you insert data one by one or all at once

Reproduced for python 3.12 and 3.10 on our windows machine (though this does not show up in CI, we should figure out why - perhaps the number of embeddings we insert in CI is not large enough to trigger this).

@HammadB and I are looking into it.

I have confirmed that running with --no-binary (building from source) fixes this as a workaround. This points to an issue in the wheel build. Investigating further.

It seems the windows wheels were building with AVX/SSE enabled if the runners they were compiled on had it, I guess previously for 0.7.3 the runner just happened to not have AVX/SSE but now it does. I have pushed an alpha release 0.7.6.alpha1.

@dddxst and @kaixxx and @petacube can you pip install chroma-hnswlib==0.7.6a1 and let me know if that fixes your issue? If so, I can issue a main release. Thanks.

Thanks!
I've tested chroma-hnswlib 0.7.6a1 with the above script and it still crashes, unfortunately. Exactly the same behavior as described in #2513 (comment)

Have reproduced the 0.7.6a1 failure on our windows machine. The next step is to put a debugger on the cpp code itself. This will be a bit hairy but will coordinate with @HammadB to ship a fix.

I had the same problem with 0.5.5 and downgrading to 0.5.3/0.7.3 has solved it for now!

@EricBLivingston what version of python are you on?

@EricBLivingston what version of python are you on?

Version 3.11.9

It would appear that the issue exists on hnswlib 0.7.3 too (Windows 10, AMD Ryzen 5) - https://discord.com/channels/1073293645303795742/1265778818422145149

It would appear that the issue exists on hnswlib 0.7.3 too (Windows 10, AMD Ryzen 5) - https://discord.com/channels/1073293645303795742/1265778818422145149

@tazarov can you please post a summary of this long conversation here for easy reference? There is a lot going on and it's unclear to me what the issue is. Which python version is the user on?

@atroyn, the use has the following config:

Window 10
AMD Ryzen 5 3600xt
Python 3.12
Running in local jupyter notebook

Versions where they manage to reproduce the bug: 0.5.3 and 0.5.4

They had build chain (msvc) and tried to build from source, but encountered a build error (something related to ninja). Attaching the build failure here.
chroma-hnswlib==0.7.3-build-failure.txt

We should advise users on Windows to downgrade to python 3.10 for their Chroma environments.