ElkinStas/KG_and_LLM

KG_and_LLM A research project aimed at investigating the functioning of knowledge graphs and their combined utilization with language models.

The growing popularity of Large Language Models (LLMs) has given rise to new areas of work and research. One such area is Retrieval Augmented Generation (RAG), which allows models to connect with memory and work with new facts not covered during training.

At present, RAGs are in the early stages of development, and one of the promising foundations for RAG is knowledge graphs (KGs). Our research will delve into how we can integrate LLMs and KGs to build a prospective RAG.

Initially, our concept was to construct a RAG based on knowledge graphs for construction documentation. However, as we recognized the potential, our focus shifted towards a more research-oriented approach. We began an in-depth review of literature and scientific resources related to the construction of knowledge graphs and RAG.

Link	Name	Summary
Springer Link	BEAR on GitHub	Requires payment, but interesting examples are available on GitHub.
Arxiv Paper	LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities	A brief overview of the tasks where LLMs currently excel and where they do not. They are good at reasoning and worse at construction. Suggests an agent-based approach to graph construction. The method is somewhat vague.
Arxiv Paper	AutoKG: Constructing Virtual Knowledge Graphs from Unstructured Documents for Question Answering	An older work on graph construction pre-GPT on BERT. The method is interesting and worth considering. Briefly, it's a three-step process: 1. OpenEI for creating triplets, 2. Encoding with BERT, 3. Entity linking (though unclear how).
Arxiv Paper	Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation from Text	Proposes a benchmark for the task of knowledge graph generation. A very recent article with a new benchmark.
Arxiv Paper	ITERATIVE ZERO-SHOT LLM PROMPTING FOR KNOWLEDGE GRAPH CONSTRUCTION	Proposes an iterative LLM prompting-based pipeline for automatically generating KGs without human effort. It introduces well-formed LLM prompts for each stage of the process and achieves impressive accuracy results.
PLOS One Article		Similar to the previous article. If we decide to build graphs, this article should be examined carefully. Essentially, it proposes the same agent-based approach with good accuracy parameters for their specific task.
FCST Article		Chinese language article. From images, it appears to involve multi-layered prompts.
Arxiv Paper	Unifying Large Language Models and Knowledge Graphs: A Roadmap	An excellent overview of current developments. Dedicated subsections for each process of interest.
Arxiv Paper	PiVe: Prompting with Iterative Verification Improving Graph-based Generative Capability of LLMs	Uses a model with a verifier module for graph construction. Briefly, they trained a small transformer on correct/incorrect responses and generate iteratively with it.
Arxiv Paper	BertNet: Harvesting Knowledge Graphs with Arbitrary Relations from Pretrained Language Models	Constructs graphs using prompts and a trained BERT. Achieves up to 70% accuracy in some cases.
Arxiv Paper	CodeKGC: Code Language Model for Generative Knowledge Graph Construction
Arxiv Paper	Joint Entity and Relation Extraction with Span Pruning and Hypergraph Neural Networks	Describes an ERE model based on Packed Levitated Marker. The code didn't work well, so it's challenging to verify the method's effectiveness.
Arxiv Paper	Packed Levitated Marker for Entity and Relation Extraction	Describes the Packed Levitated Marker method for ERE data labeling. The code is functional, and metrics are confirmed. It's difficult to say whether it should be used in our work.
GitHub - REBEL	REBEL: Relation Extraction By End-to-end Language generation	REBEL, a seq2seq model based on BART, performs end-to-end relation extraction for over 200 different relation types. It works excellently and is currently in use.

After two months of studying knowledge graphs and scientific articles, we have come to the conclusion that it would be valuable to build a Retrieval Augmented Generation (RAG) model based on knowledge graphs. We plan to compare it to a conventional RAG model using vector-based approaches and potentially encapsulate this research into either a library or a scientific paper.

Among the current tasks:

Dataset Generation and Compilation for RAG Unfortunately, a sufficiently acceptable dataset for RAG has not yet been gathered, so we will need to develop it ourselves.

Metric Definition for RAG

Testing RAG in various formats, including Text-To-Cypher and Semantic Search over Graph.

ElkinStas / KG_and_LLM

About

Languages