microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.

Home Page:https://microsoft.github.io/kernel-memory

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RedisMemory support

slorello89 opened this issue Β· comments

πŸ‘‹ Hi there, I'm Steve from Redis.

I wanted to check to make sure you all would accept a PR adding Redis as one of the MemoryStorage services? Our Vector API should be compatible with IMemoryDb, with the one caveat that getting tags working right might be a bit tricky, Redis requires you to specify which fields are indexed when creating the index time so for:

Task CreateIndexAsync(string index, int vectorSize, CancellationToken cancellationToken = default);

I can certainly get the vectors indexed easily, but non-vector fields might need to be worked in via configuration.

Anyway, let me know what you all think.

Hi @slorello89 we surely would love having Redis support.

There are a couple of approaches, we could host the code here, though you would have to go through our processes, or you could develop a nuget in a separate repo, and we would add it to the service as one option. We can also start with a branch here and later move the code to a repo of yours.

Here's an example about what a separate repo would look like:

with the one caveat that getting tags working right might be a bit tricky, Redis requires you to specify which fields are indexed when creating the index time

I'm not too familiar with this, but I hope we can get a serialization format that allows both indexing and search :-)

Hi @dluc - sounds good to me. Whatever is easier for you all I'll probably just start with a PR here and if it make sense I can always kick it out to another repo/package.

So wrt indexing. The trick is that Redis (traditionally a pure key-value store) can only search on values in a defined secondary index. So in our case, we'd have to set whatever tags you want to be able to search on in here. My initial thought (short of adding an extra API) would be to define the tags you want to index in the Configuration of the service?

My initial thought (short of adding an extra API) would be to define the tags you want to index in the Configuration of the service?

Seems reasonable. So the Redis class will receive a configuration object containing the list of tags, and the user is responsible for setting up the right configuration. There are some known tags that we'll need to support by default, see the "reserved" tags (e.g. check Constants.cs)

Exactly, I mean it's a tiny bit hacky, but short of adding a new API (which would break the other clients) idk how else you'd get there.

Question about MemoryRecord.Payload - is it safe to assume that this can be safely serialized into a JSON string? I see it's being done elsewhere but naturally, if you serialize anything (except really primitives), it'll be difficult for the resultant MemoryRecord to maintain the symmetrical type information between the storage and requester.

Question about MemoryRecord.Payload - is it safe to assume that this can be safely serialized into a JSON string?

I'd say so yes, and we don't search inside the payload. Depending on the engine the content might not be readable, but I wouldn't worry

@slorello89 FYI, I just finished integrating Postgres, and I think external repos might make the integration too expensive to maintain, because the Abstractions DLL (the only dependency used) changes often, and all these external repos would have to keep updating even for patches, to avoid compilation warnings. One option would be to version the Abstractions library separately, but we're not ready to do that yet.

I was thinking we could test this approach, see if it helps collaborations:

  • create in this repo a folder dedicated to storage adapters, e.g. Qdrant, Postgres, Redis, etc.
  • each storage adapter has its own maintainer, responsible for development/fixes/major upgrades, sending PRs that we merge - the PR must change only code in this folder
  • as KM maintainers, we take care of minor upgrades across the entire repo, e.g. upgrading a the Abstractions nuget version, maybe also fixing simple build warnings/code style reported by R#.

@dluc wondering if it's possible to use the semantic-kernel memory connectors so that we could avoid creating another serial of memory connectors for kernel memory and make most use of semantic-kernel

@WeihanLi the SK connectors are limited and missing important features such as filtering, hybrid search, custom schemas, plus they are very expensive to maintain because they need to be written in 3 languages (C#, Python and Java). The connectors under development here will eventually replace those (if one wants), reducing the cost because KM comes as a service and offers a plugin, bringing the extra features of RAG and memory management.

Hi @dluc - I generally agree with you that it wouldn't make sense to kick such an integration out of this repository. I get the sense that things in SK/KM land are quite dynamic at the moment, and colocating the connectors in this repository and tying them into the same CI should prevent connectors from breaking whenever something changes in Abstractions. The PR I opened initially has the connector written into "core" (a bit odd, but that seemed to be the trend with the other datasources), I'll kick it out to a seperate project.

@dluc - moved it out of core and into a seperate project under adapters/Redis/ given your comment above - not sure exactly what you want the directory structure to be.

repo updated, there's an extensions folder that should be easy to follow to add new DBs and other things. I also added an empty Redis placeholder if you'd like to work on it