A Neo4j-JDBC driver SqlTranslator implementation that uses OpenAI to create Cypher statements from natural language description.
The translator depends on an existing vector index with populated embeddings for information about the Cypher language.
For this PoC, I took the full reference PDF version of the Cypher documentation and ran it through a PDF parser.
This was heavily inspired by the work done in the Spring AI project, especially in the PagePdfDocumentReader
.
Because the current version uses the already created JDBC connection, this data needs to be present in the database.
The translator will pass through all statements not starting with the prefix 🤖,
(hey, this is just a PoC, let me have some fun) to avoid interpreting all valid SQL or Cypher as a prompt.
Within the translator, the first query that will be issued is the similarity search with the provided search term on the configured index.
CALL db.index.vector.queryNodes($0, 10, $1)
YIELD node AS node, score
RETURN node.content AS content
ORDER BY score DESC
This will return the 10 best matching documents (pages) from the database. Their content will be used then in a predefined system prompt as the only source of information.
The system prompt will get put together with the search term as a user prompt to an OpenAI chat request. The response gets returned as (non verified) Cypher.
You need to provide your OpenAI API access token as OPEN_AI_TOKEN
environment variable or directly as a driver property.
There are some parameters that need to be provided and some that can be overwritten by the user as a driver property.
parameter | type | default |
---|---|---|
indexName (mandatory) | String | null |
embeddingModel | String | text-embedding-ada-002 |
chatModel | String | gpt-3.5-turbo |
chatTemperature | Double | 0.0 |
openAIToken | String | environment variable OPEN_AI_TOKEN |
The RagToCypherTranslatorFactory
will register itself via the META-INF/service mechanism with the Neo4j JDBC Driver.
The required Open AI Token can be passed as a system environment variable or via properties passed to the JDBC driver itself.
The Neo4j JDCB Driver can be configured in such a way that every statement to be executed goes automatically through the translation layer.
This is done by setting enableSQLTranslation
to true
, either as URL parameter or as driver property.
After this is done, every JDBC Statement
call, e.g. Statement#executeQuery
, will invoke the translator before executing the statement.
The translator will only kick in if they are prefixed with 🤖,
.
If a statement is prefixed accordingly, it will be interpreted as textual input for the machine, otherwise it will be treated as normal Cypher and executed as is.
The Open AI response is hopefully a working Cypher statement, as it will be executed.
Another option is to just put the translator on the classpath, configure the Open AI Token, but set enableSQLTranslation
to false
(which is the default).
JDBC provides Connection#nativeSQL
, which you can call as needed then, i.e. like this con.nativeSQL("🤖, how many IWasHere nodes are in the graph");
which should give you than a Cypher statement, that you can either use or amend as needed.
See the test RegToCypherTranslatorAIExample
in the src/test folder for a working example.
Node creation example:
Input:
Create a Message node with a property 'content' containing 'the dynamic duo rocks'
Output:
CREATE (:Message {content: 'the dynamic duo rocks'})
Count query generator:
Input:
How many nodes labeled 'Message' are in the graph.
Output:
MATCH (n:Message) RETURN count(n)