microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.

Home Page:https://microsoft.github.io/kernel-memory

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KernelMemory Crash with LLamaSharp Embedder during Importing Document

LSXAxeller opened this issue · comments

I had a go at using Kernel Memory to store documents (1.3MB), but it kept crashing after memory.SaveReferenceAsync, or with the latest LLamaSharp commit and Kernel Memory LLamaSharp connector, GenerationMemory.ImportDocumentAsync.

LLamaSharp 0.8.1 & LLamaSharp.KernelMemory 0.8.1

var GenerationModelParameters = new ModelParams("THE_PATH_TO_MODEL")
            {
                ContextSize = 4096,
                GpuLayerCount =  0
            };
            var GenerationModel = LLamaWeights.LoadFromFile(GenerationModelParameters);
            var GenerationModelEmbedder = new(GenerationModel, GenerationModelParameters);
            var MemoryContext = GenerationModelEmbedder.Context;

            var memory = new Microsoft.SemanticKernel.Memory.MemoryBuilder()
                .WithTextEmbeddingGeneration(new LLamaSharp.SemanticKernel.TextEmbedding.LLamaSharpEmbeddingGeneration(GenerationModelEmbedder))
                .WithMemoryStore(new Microsoft.SemanticKernel.Memory.VolatileMemoryStore())
                .Build();

            Dictionary<string, string> fileContents = new();
            string[] files = Directory.GetFiles("twi", "*.txt");
            foreach (string file in files)
            {
                string fileName = Path.GetFileName(file);
                string content = File.ReadAllText(file);
                fileContents.Add(fileName, content);
            }

            foreach (var entry in fileContents)
            {
                var result = await memory.SaveReferenceAsync(
                    collection: "Twilight",
                    externalSourceName: "FanFiction",
                    externalId: entry.Key,
                    text: entry.Value);
            }

            
            var qs = "Who is the Volturi's greatest enemy?";
            var memories = memory.SearchAsync("Twilight", qs, limit: 10, minRelevanceScore: 0.5);
            var stringBuilder = new StringBuilder();
            await foreach (var result in memories)
            {
                stringBuilder.AppendLine("  Path:     : " + result.Metadata.Id);
                stringBuilder.AppendLine("  Result    : " + result.Metadata.Text);
                stringBuilder.AppendLine("  Relevance: " + result.Relevance);
                stringBuilder.AppendLine();
            }

            File.WriteAllText("res.txt", stringBuilder.ToString());
        }
        catch (OperationCanceledException ex)
        {
            File.WriteAllText("res01.txt", ex.Message);
        }
        catch (Exception ex)
        {
            File.WriteAllText("res01.txt", ex.Message);
        }

Latest LLamaSharp commit & Microsoft.KernelMemory.AI.LlamaSharp 0.24.231228.5 (latest)

try
        {

            var GenerationModelParameters = new ModelParams("THE_PATH_TO_MODEL")
            {
                ContextSize = 4096,
                GpuLayerCount =  0
            };
            var GenerationModel = LLamaWeights.LoadFromFile(GenerationModelParameters);
            var GenerationModelEmbedder = new(GenerationModel, GenerationModelParameters);
            var MemoryContext = GenerationModelEmbedder.Context;

            var GenerationMemory = new KernelMemoryBuilder()
                .WithSearchClientConfig(new SearchClientConfig { MaxMatchesCount = 2, AnswerTokens = 100 })
                .WithLLamaSharpTextGeneration(new LlamaSharpTextGenerator(GenerationModel, MemoryContext))
                .WithLLamaSharpTextEmbeddingGeneration(new LLamaSharpTextEmbeddingGenerator(GenerationModelEmbedder))
                .WithSimpleFileStorage(new SimpleFileStorageConfig { StorageType = FileSystemTypes.Disk })
                .Build<MemoryServerless>();

            await GenerationMemory.ImportDocumentAsync(new Document("twilight")
            .AddFiles(new[] {
                "tl.pdf"
            }));

            var qs = "Who is the Volturi's greatest enemy?";
            var answer = await GenerationMemory.AskAsync(qs);
            File.WriteAllText("res.txt", answer.Result);
        }
        catch (OperationCanceledException ex)
        {
            File.WriteAllText("res01.txt", ex.Message);
        }
        catch (Exception ex)
        {
            File.WriteAllText("res01.txt", ex.Message);
        }

Both Codes getting the same error

The target process exited with code -2146233082 (Ox80131506)
while evaluating the function
`System.SpanDebugView<T>.SpanDebugView`.

on  return embeddings.ToArray(); from
`
public float[] GetEmbeddings(string text, bool addBos)
        {
            var embed_inp_array = Context.Tokenize(text, addBos);

            // TODO(Rinne): deal with log of prompt

            if (embed_inp_array.Length > 0)
                Context.Eval(embed_inp_array, 0);

            var embeddings = NativeApi.llama_get_embeddings(Context.NativeHandle);
            if (embeddings == null)
                return Array.Empty<float>();

            return embeddings.ToArray();
        }
`

@LSXAxeller I believe the error is in the LLamaSharp library, since the error message is coming from there. You'll have to strep trace through the code to find out what data is sent to LLamaSharp in order to reproduce the bug, and raise an issue in the LLamaSharp repo.