yahoojapan / NGT

Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add new QBG methods to C API

lerouxrgd opened this issue · comments

Hello,

Following up on #136, in order to have a type-safe API in ngt-rs for QBG indexes, the following functions would be required in the C API:

  • ObjectID qbg_append_object_as_uint8(QBGIndex index, uint8_t *obj, uint32_t obj_dim, QBGError error);
  • uint8_t* qbg_get_object_as_uint8(QBGIndex index, ObjectID id, QBGError error);

On a related topic, I have questions about the QBGConstructionParameters:

What is the internal_data_type ? Should it always be the same as data_type ?

Similarly, what is extended_dimension compared to dimension ? Are there some rules between them ?

Hello,
V2.0.13 has been released and includes the C APIs you requested.

internal_data_type is the vector data type in the graph for QBG. You should always set float to internal_data_type and data_type, because testing against other data types is not sufficient.

dimension must be set as the dimension of the given data. When the rotation is enabled, the given vector can be extended inside QBG to improve quantization errors. extended_dimension should be equal to or larger than dimension, and should be a multiple of 16.

Hello,

Thank you for the updates and the explanations.
I was able to use qbg_append_object_as_uint8 and qbg_get_object_as_uint8 on my side.

However I am not sure to understand your answer regarding data_type/internal_data_type. If I look at the C API I can see that QBGConstructionParameters is initialized as follows:

void qbg_initialize_construction_parameters(QBGConstructionParameters *parameters)
{
  // ...
  parameters->internal_data_type = NGTQ::DataTypeFloat;
  parameters->data_type = NGTQ::DataTypeFloat;
  // ...
}

So I thought that we could also use NGTQ::DataTypeUint8 to create a QBG containing uint8_t objects. Is this correct ?
Or is the only possible value NGTQ::DataTypeFloat ?

Actually, if I try to set them both to NGTQ::DataTypeUint8 I get a SIGABRT with the following gdb backtrace:

Thread 9 "qbg::index::tes" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffdbffd6c0 (LWP 221840)]
0x00007ffff729f26c in ?? () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff729f26c in ?? () from /usr/lib/libc.so.6
#1  0x00007ffff724fa08 in raise () from /usr/lib/libc.so.6
#2  0x00007ffff7238538 in abort () from /usr/lib/libc.so.6
#3  0x00005555557ae73a in NGTQ::QuantizerInstance<unsigned char>::open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) ()
#4  0x0000555555769b75 in NGTQ::QuantizerInstance<unsigned char>::open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, NGT::Property&, bool)
    ()
#5  0x000055555578567f in NGTQ::Index::getQuantizer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, NGT::Property&, bool) ()
#6  0x0000555555785b38 in NGTQ::Index::open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) ()
#7  0x0000555555785c1e in QBG::Index::Index(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) ()
#8  0x0000555555785ff7 in qbg_open_index ()
#9  0x00005555555c8eef in ngt::qbg::index::QbgIndex<u8, ngt::qbg::index::ModeWrite>::create<u8, &std::path::Path> (path=..., create_params=...) at src/qbg/index.rs:50

Hello,
NGTQ::DataTypeUint8 is for the obsolete module NGTQ. I will consider reimplementing uint8 as a data type for QBG in the near future.

Hello,
I have released v2.0.15 which includes reimplementation of uint8.

data_type is the type of the stored objects. internal_data_type is the type of the objects on the graph used by QBG. You can specify different data types for them.

Thanks a lot, it works fine now !

Looking at your commit it seems that QBG can also handle float16 ? Does QG handle float16 too ?

Since I found that v2.0.15 does not work well in some cases, I have released v2.0.16 to fix it.

QG and QBG can handle float16 as well.

Thank you for the precision.

Then I guess some more functions would be needed in the C API for float16, I will list them in #141