dluc / FreeMindLabs.SemanticKernel

The Elasticsearch adapter for Microsoft Kernel Memory.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Elasticsearch Memory Storage for Kernel Memory

By Free Mind Labs, Inc. - Dive into your Stream

License: MIT

Use Elasticsearch as vector storage for Microsoft Kernel Memory (KM)

Kernel Memory (KM) is an open-source service and plugin specialized in the efficient indexing of datasets through custom continuous data hybrid pipelines.

Utilizing advanced embeddings and LLMs, the system enables Natural Language querying for obtaining answers from the indexed data, complete with citations and links to the original sources.

This repository contains the Elasticsearch adapter that allows KM to use Elasticsearch as vector database, thus allowing developers to perform hybrid and semantic search directly on Elasticsearch, on-premise or in the cloud.

Tokenization can be done using commercial models from OpenAI, Azure Open AI or open sourece models hosted on Hugging Face, including those used by Sentence Transformers

Goals

  1. To implement an maintain an open source Elasticsearch IMemoryDB connector for Kernel Memory.

    1. Free Mind Labs require such connector to finish developing Videomatic.
    2. KM can also be used as memory store for Semantic Kernel.
    3. The basic connector (i.e. the complete implementation of IMemoryDb) will be free of charge and open source.
  2. In the future we hope to add additional features (e.g. advanced search options for pre and post filtering, analytics, ES-specific features, etc.) that could generate some revenue to support this and other projects.

    1. Patreon?
    2. Github donantions?
    3. Other?

We'd love to hear what you think about this.

Click on đź““ DIARY to read daily thoughts and what is happening behind the scenes.

The .NET Solution

This is a screenshot of the solution. We highlighted some of the most important files for you to explore.

Here are some screenshots of the tests included in the project. Look at the output window to see what they do.

Click here to see the source code of the test.

Click here to see the source code of the test.

Mappings

The examples uses the OpenAI's text-embedding-ada-002. It is possible to use any other embedding model supported by SK (e.g. Azure Open AI and Hugging Face).

Kibana

Here are some screenshots of the data stored in ES, after running the tests in the solution.

KNN Query

Here's an example of how to run semantic search directly on ES.

Pre-requisites

  1. A running instance of Elasticsearch 8
    • Semantic search does not seem to be available in v7.
    • Please follow the instructions at Elasticsearch to install and configure Elasticsearch.
  2. Make sure you have the following information, as they will be needed for configuration:
    1. The user name and password to connect to ES.
    2. The certificate fingerprint generated by the ES server.

Configuration

The configuration instructions can be found here.

Current status

  1. The connector is currently in development and not ready for production use yet.
  2. The connector is not yet available as a NuGet package.

Timeline

  1. We hope to complete this project and the associated documentation by the end of 2023.

Challenges

The new API in the Elasticsearch client is not yet feature complete and there are bugs.

  1. AutoMap(),etc. missing
    1. CreateIndexDescriptor Mappings -> Map Api usage issue #7929
    2. FEATURE - Support AutoMap to allow creation of mappings using type inference #6610
      1. flobernd comment on Aug 17 suggests this:
var mapResponse = client.Indices.PutMapping("index", x => x
    .Properties<Person>(p => p
        .DenseVector(x => x.Data, d => d
            .Index(true)
            .Similarity("dot_product"))));

Resources

  1. Elastic's official docs on the client.

    1. NEST 7.17: https://www.elastic.co/guide/en/elasticsearch/client/net-api/7.17/nest-getting-started.html
    2. New client 8.9: https://www.elastic.co/guide/en/elasticsearch/client/net-api/8.9/introduction.html
      1. This client is not yet feature complete.
        1. Look here for details: https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/release-notes-8.0.0.html
      2. In addition, the docs are not up to date. For some stuff we need to lok at NEST's docs.
  2. Elasticsearch.net Github repository

  3. Semantic Kernel/Memory-Kernel

    1. Introduction to Semantic Memory (feat. Devis Lucato) | Semantic Kernel
    2. 11.29.2023 - Semantic Kernel Office Hours (US/Europe Region)

About

The Elasticsearch adapter for Microsoft Kernel Memory.

License:MIT License


Languages

Language:C# 100.0%