The apoc.import.json constraint check should work only when a unique constraint exists

Question

The apoc.import.json constraint check should work only when a unique constraint exists

ToucanBran opened this issue a year ago · comments

Brandon commented a year ago

Expected Behavior (Mandatory)

Json import successfully imports

Actual Behavior (Mandatory)

Getting a missing constraint error for every node type

How to Reproduce the Problem

This is the same issue reference in #2930 . According to PR #3099 , this should be fixed in release 5.8.0, however I confirmed it's still not fixed even in that release.

Simple Dataset (where it's possibile)

You can use https://raw.githubusercontent.com/cj2001/nodes2021_kg_workshop/main/json_files/wiki.json

Steps (Mandatory)

Open neo4j desktop and create a new database, set to version 5.8.0
Install the APOC library
Start the DB and use the given query

CALL apoc.import.json('https://raw.githubusercontent.com/cj2001/nodes2021_kg_workshop/main/json_files/wiki.json')

You should see the error given in the screenshot below. When I add the constraint, it works.

Screenshots (where it's possibile)

Specifications (Mandatory)

Currently used versions

Versions

OS: Windows 10
Neo4j: 5.8.0
Neo4j-Apoc: 5.8.0

Gem Lamont · Answer 1 · Fri Jun 02 2023 15:20:18 GMT+0800 (China Standard Time)

Hi! I am not sure I am understanding, but this is correct behaviour. You need a unique constraint for it to work. The error you linked to was about the procedure working when any constraint was added (e.g a not null constraint) instead of the specific uniqueness one.

Brandon · Answer 2 · Fri Jun 02 2023 22:56:35 GMT+0800 (China Standard Time)

@gem-neo4j Hmm... I was following the example shown here. I know the presenter is using an older version of neo4j (which I confirmed still works) but this behavior would be ideal.

Here's my use case:

We have a neo4j instance running for our CI/CD pipeline to run integration tests on. When a branch triggers a pipeline:

a new database is created on that server
the json which contains specific data for testing is loaded into that database based on the current test data that's loaded for master
other cypher scripts are run which may modify the graph based on the branch updates
integration tests are run

If everything passes:

A call to apoc.export.json.all is called
This file is pushed to s3 which overwrites the current test data json mentioned earlier
branch is merged into master

Having to create these uniqueness constraints anytime we want to load in data from a file that apoc output in the first place seems a little odd to me. If that's the intended behavior though, maybe I'm going about this the wrong way?

Gem Lamont · Answer 3 · Mon Jun 05 2023 15:24:15 GMT+0800 (China Standard Time)

Hi again :)

I looked a bit more into this, so you're correct that this was not always the case, but it was introduced about a year and a half ago to the procedure (and unfortunately not well documented, hence the confusion!).

If we don't have these constraints, the import will take forever (as it will do a full label scan for each node imported with the generic constraint).

This procedure is not marked as a schema operations procedure, so is not able to make the constraints itself.

I'll create a ticket for my team to improve the documentation around this!

Is is possible for your application to know the names of the labels prior and create those constraints, using perhaps apoc.schema.assert?

Brandon · Answer 4 · Mon Jun 05 2023 22:27:12 GMT+0800 (China Standard Time)

Unfortunately, no. Is there a way to generate these constraints using apoc or some other function?

…

On Mon, Jun 5, 2023, 12:24 AM Gem Lamont ***@***.***> wrote: Hi again :) I looked a bit more into this, so you're correct that this was not always the case, but it was introduced about a year and a half ago to the procedure (and unfortunately not well documented, hence the confusion!). If we don't have these constraints, the import will take forever (as it will do a full label scan for each node imported with the generic constraint). This procedure is not marked as a schema operations procedure, so is not able to make the constraints itself. I'll create a ticket for my team to improve the documentation around this! Is is possible for your application to know the names of the labels prior and create those constraints, using perhaps apoc.schema.assert? — Reply to this email directly, view it on GitHub <#3609 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AET3SFY7RNOAVI657Q3CZFDXJWCSVANCNFSM6AAAAAAYXEIVZA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Gem Lamont · Answer 5 · Wed Jun 07 2023 15:10:39 GMT+0800 (China Standard Time)

Hmm that is tricky. The only thing I can think of would be to use apoc.load.json to find all the label names and then apoc.schema.assert to create constraints using those first. But that might be pretty inefficient 😅

Brandon · Answer 6 · Thu Jun 08 2023 02:36:14 GMT+0800 (China Standard Time)

I think im just going to suggest we stand up a local instance and restore from a backup file. Feel free to close this issue since the constraint error is intended.

…

On Wed, Jun 7, 2023, 12:10 AM Gem Lamont ***@***.***> wrote: Hmm that is tricky. The only thing I can think of would be to use apoc.load.json to find all the label names and then apoc.schema.assert to create constraints using those first. But that might be pretty inefficient 😅 — Reply to this email directly, view it on GitHub <#3609 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AET3SF4E6AVNFDP42Q37NDLXKASPTANCNFSM6AAAAAAYXEIVZA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Gem Lamont · Answer 7 · Thu Jun 08 2023 14:54:57 GMT+0800 (China Standard Time)

Okay, thank you. Sorry I couldn't help more!