By Free Mind Labs, Inc. - Dive into your Stream
Use Elasticsearch as vector storage for Microsoft Kernel Memory (KM)
Kernel Memory (KM) is an open-source service and plugin specialized in the efficient indexing of datasets through custom continuous data hybrid pipelines.
Utilizing advanced embeddings and LLMs, the system enables Natural Language querying for obtaining answers from the indexed data, complete with citations and links to the original sources.
This repository contains the Elasticsearch adapter that allows KM to use Elasticsearch as vector database, thus allowing developers to perform hybrid and semantic search directly on Elasticsearch, on-premise or in the cloud.
Tokenization can be done using commercial models from OpenAI, Azure Open AI or open sourece models hosted on Hugging Face, including those used by Sentence Transformers
-
To implement an maintain an open source Elasticsearch IMemoryDB connector for Kernel Memory.
- Free Mind Labs require such connector to finish developing Videomatic.
- KM can also be used as memory store for Semantic Kernel.
- The basic connector (i.e. the complete implementation of IMemoryDb) will be free of charge and open source.
-
In the future we hope to add additional features (e.g. advanced search options for pre and post filtering, analytics, ES-specific features, etc.) that could generate some revenue to support this and other projects.
- Patreon?
- Github donantions?
- Other?
We'd love to hear what you think about this.
Click on đź““ DIARY to read daily thoughts and what is happening behind the scenes.
This is a screenshot of the solution. We highlighted some of the most important files for you to explore.
Here are some screenshots of the tests included in the project. Look at the output window to see what they do.
Click here to see the source code of the test.
Click here to see the source code of the test.
The examples uses the OpenAI's text-embedding-ada-002. It is possible to use any other embedding model supported by SK (e.g. Azure Open AI and Hugging Face).
Here are some screenshots of the data stored in ES, after running the tests in the solution.
Here's an example of how to run semantic search directly on ES.
- A running instance of Elasticsearch 8
- Semantic search does not seem to be available in v7.
- Please follow the instructions at Elasticsearch to install and configure Elasticsearch.
- Make sure you have the following information, as they will be needed for configuration:
- The user name and password to connect to ES.
- The certificate fingerprint generated by the ES server.
The configuration instructions can be found here.
- The connector is currently in development and not ready for production use yet.
- The connector is not yet available as a NuGet package.
- We hope to complete this project and the associated documentation by the end of 2023.
The new API in the Elasticsearch client is not yet feature complete and there are bugs.
- AutoMap(),etc. missing
- CreateIndexDescriptor Mappings -> Map Api usage issue #7929
- FEATURE - Support AutoMap to allow creation of mappings using type inference #6610
- flobernd comment on Aug 17 suggests this:
var mapResponse = client.Indices.PutMapping("index", x => x
.Properties<Person>(p => p
.DenseVector(x => x.Data, d => d
.Index(true)
.Similarity("dot_product"))));
-
Elastic's official docs on the client.
- NEST 7.17: https://www.elastic.co/guide/en/elasticsearch/client/net-api/7.17/nest-getting-started.html
- New client 8.9: https://www.elastic.co/guide/en/elasticsearch/client/net-api/8.9/introduction.html
- This client is not yet feature complete.
- In addition, the docs are not up to date. For some stuff we need to lok at NEST's docs.
-
Semantic Kernel/Memory-Kernel