What RAG? (WIP)

Input talk about trying to run RAG w/ Go, Leipzig Gophers #41, 2024-02-27 1900

🚧 This material is very much WIP.

A large language model can be flaky compressors. How to improve reliability of the output? One approach is to retrieval-augmented generation (Meta, 2020).

Underpinning all foundation models, including LLMs, is an AI architecture known as the transformer. It turns heaps of raw data into a compressed representation of its basic structure. Starting from this raw representation, a foundation model can be adapted to a variety of tasks with some additional fine-tuning on labeled, domain-specific knowledge.

But fine-tuning alone rarely gives the model the full breadth of knowledge it needs to answer highly specific questions in an ever-changing context. In a 2020 paper, Meta came up with a framework called retrieval-augmented generation to give LLMs access to information beyond their training data. RAG allows LLMs to build on a specialized body of knowledge to answer questions in more accurate way. -- What is retrieval-augmented generation?

Can this approach help finding code, documentation, text passages faster?

Countless hours spent searching for documentation on peculiar, intellectually uninteresting aspects; the efforts to learn an overly complicated API, often without good reason; writing immediately usable programs that I would discard after a few hours. These are all things I do not want to do, especially now, with Google having become a sea of spam in which to hunt for a few useful things. -- http://antirez.com/news/140

and also:

I have also learned that LLMs are a bit like Wikipedia and all the video courses scattered on YouTube: they help those with the will, ability, and discipline, but they are of marginal benefit to those who have fallen behind.

How do we learn, when and how to use this technology?

Have a peek at this blog post that is going around lately: The pain points of building a copilot These people are brimming with excitement about all the new problems that LLMs are bringing to the table. -- Why We Can't Have Nice Software

LeCun on 2023-02-13 (780.8K views as of 2024-02-06, archived):

My unwavering opinion on current (auto-regressive) LLMs

They are useful as writing aids.

They make stuff up or retrieve stuff approximately.

Current LLMs should be used as writing aids, not much more.

Marrying them with tools such as search engines is highly non trivial.

There will be better systems that are factual, non toxic, and controllable. They just won't be auto-regressive LLMs. [...]

Warning folks that AR-LLMs make stuff up and should not be used to get factual advice.

Warning that only a small superficial portion of human knowledge can ever be captured by LLMs.

Being clear that better system will be appearing, but they will be based on different principles. They will not be auto-regressive LLMs. [...]

All this does not seem to stop people to build (lots of) stuff.

LLM tooling

Proliferation of new frameworks and tool categories. Some problems:

download, models onto machine, copy, packaging, wrapper, customization (e.g. ollama to run local models)
api wrappers, adding custom data to the generation process; general libraries like llamaindex, with adapters, these libraries then use some paid or hosted API, like openai, claude - llama and friends; langchain, ...
running models: llamafile
...

Go LLM framework: lingoose

Many tools written in Python, but is there something similar in Go?

$ go run examples/hello.go
2024/02/27 14:41:42 using default ollama endpoint with model stablelm2:1.6b-zephyr-fp16
Thread:
user:
        Type: text
        Text: tell me a joke about geese
assistant:
        Type: text
        Text: Why did the chicken cross the road?

To get to the other side, of course!

But seriously, here's another one:

What do you call a group of geese with no leader?

A gaggle!

Tasks

setup threads, communicate with LLM via API (different options)
loading content (e.g. wrappers aroung filetypes, data sources, like pubmed, ...)
vector database (or just json file)

Basic RAG example

Splitting the Go 1.22 release notes on newlines, calculating an embedding.

takes about 4ms for short documents to create an embedding

$ go run examples/lingoose/rag/main.go
2024/02/27 18:00:11 adding 4 docs
2024/02/27 18:00:11 user interaction
Combining Data: The chosen data segments from the database are combined
        with the user’s initial query, creating an expanded prompt.

Loading files

Learning from PDFs. 62 files, 146M, mostly arxiv papers.

$ go run examples/lingoose/embeddings/knowledge_base/main.go
Learning Knowledge Base...

Follow up questions

this is all "frozen RAG", where documents are added, compared, but not learned from
would something like FAISS or ANNOY be enough, e.g. for just finding similar documents?

miku / whatrag