hora-search / hora

๐Ÿš€ efficient approximate nearest neighbor search algorithm collections library written in Rust ๐Ÿฆ€ .

Home Page:http://horasearch.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

panick when use `CosineSimilarity`

vaaaaanquish opened this issue ยท comments

Hi.

With the latest hora, I get a panic when using hora::core::metrics::Metric::CosineSimilarity.

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/hora-0.1.1/src/core/neighbor.rs:32:54

I think this is an error caused by partial_cmp.
partial_cmp expect it to be None when the given value cannot be ordered.

commented

yes, you are right, because it's float type(https://en.wikipedia.org/wiki/IEEE_754) which have lots of situations, such as NaN. it can not be compared, so I have to make sure for every place where has use partial_cmp is valid.

I will fix it as soon as possible and do something to avoid invalid value(maybe just panic?).

commented

hi @vaaaaanquish, Iโ€™ve read a post about hora as I was googling this project. Is it written by you? ;-)

your application is great๐Ÿ‘๐Ÿ‘๐Ÿ‘, and if you want to reach hora's full potential, you can enableSIMD support

firstly, use some command, such as lscpu to check which SIMD instruction your platform support, and export an env variable to enable it

export RUSTFLAGS='-C target-feature=+avx2,+fma'

and in your Cargo.toml to enable simd feature

hora = {version = "0.1.0", features = ["simd"]}

then, use nightly to run your application, you would find that blazingly fast!

PS: The lack of SIMD-supported documents is my carelessness ๐Ÿ˜ข, I will fill it up immediately, for more information, you can check this documentation, feel free to contribute!

@salamer Thank you.
As you said, I had assumed that SIMD was enabled. It is very helpful.

PS: I published example, and I will do some PR in the future. Thx good project ๐Ÿ˜„
https://github.com/vaaaaanquish/rust-ann-search-example

commented

Can I display this example link in hora's readme?

I think this is a very complete and great example. The model in my demo is still in Python (huggingface), the others are rust, using RPC to connect. not pure Rust ๐Ÿ˜‚๐Ÿ˜‚

Finally, thanks for using hora, I am very happy to see someone using it.

BTW, I'm continuing to study Japanese, ใ€Œๅฐใ•ใชๆ‹ใฎใ†ใŸใ€ is a song I once learned. ๐Ÿ˜‚๐Ÿ˜‚

Can I display this example link in hora's readme?

Of course. I'm glad :)

I'm continuing to study Japanese, ใ€Œๅฐใ•ใชๆ‹ใฎใ†ใŸใ€ is a song I once learned.

Wow. I was wondering why. It's an interesting connection!

commented

my Japanese level may only be able to read but not the level of daily communication ๐Ÿ˜‚.

and of course, the most direct benefit of learning Japanese is to help the Japanese doc which I am writing ๐Ÿ˜‚.

As far as I can tell, this is because the dot product of a vector and itself is always negative, so the square root is always NaN. Is this intentional? It seems like it is as there is a test in calc.rs:

#[cfg(test)]
mod tests {
    use crate::core::calc::dot;
    #[test]
    fn test_dot() {
        let a = [1., 2., 3.];
        let b = [1., 2., 3.];
        assert_eq!(dot::<f32>(&a, &b).unwrap(), -14.0);
    }
}

And both the simd and non-simd implementations negate the sums:

fn dot_product(a: &[$type_id], b: &[$type_id]) -> Result<$type_id, &'static str> {
                assert_eq!(a.len(), b.len());

                #[cfg(feature = $simd_size)]
                {
                    let size = a.len() - (a.len() % $size);
                    let c = a
                        .chunks_exact($size)
                        .map($simd_type::from_slice_unaligned)
                        .zip(b.chunks_exact($size).map($simd_type::from_slice_unaligned))
                        .map(|(a, b)| a * b)
                        .sum::<$simd_type>()
                        .sum();
                    let d: $type_id = a[size..].iter().zip(&b[size..]).map(|(p, q)| p * q).sum();
                    Ok(-(c + d))  // negated here
                }
                #[cfg(not(feature = $simd_size))]
                {
                    Ok(-(a.iter().zip(b).map(|(p, q)| p * q).sum::<$type_id>())) // negated here
                }
            }

Removing the negation allows cosine similarity to work. Any suggestions about where to go from here?

Great repository! I was able to integrate hora search into a rust application even as an inexperienced rust programmer.

But I cannot use hora::core::metrics::Metric::CosineSimilarity because of the same panic: "called Option::unwrap() on a None value"

What is the status of a fix here?