neo4j-contrib / neo4j-apoc-procedures

Awesome Procedures On Cypher for Neo4j - codenamed "apoc"                     If you like it, please ★ above ⇧            

Home Page:https://neo4j.com/labs/apoc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The apoc.import.json constraint check should work only when a unique constraint exists

ToucanBran opened this issue · comments

Expected Behavior (Mandatory)

Json import successfully imports

Actual Behavior (Mandatory)

Getting a missing constraint error for every node type

How to Reproduce the Problem

This is the same issue reference in #2930 . According to PR #3099 , this should be fixed in release 5.8.0, however I confirmed it's still not fixed even in that release.

Simple Dataset (where it's possibile)

You can use https://raw.githubusercontent.com/cj2001/nodes2021_kg_workshop/main/json_files/wiki.json

Steps (Mandatory)

  1. Open neo4j desktop and create a new database, set to version 5.8.0
  2. Install the APOC library
  3. Start the DB and use the given query
CALL apoc.import.json('https://raw.githubusercontent.com/cj2001/nodes2021_kg_workshop/main/json_files/wiki.json')

You should see the error given in the screenshot below. When I add the constraint, it works.

Screenshots (where it's possibile)

image

Specifications (Mandatory)

Currently used versions

Versions

  • OS: Windows 10
  • Neo4j: 5.8.0
  • Neo4j-Apoc: 5.8.0

Hi! I am not sure I am understanding, but this is correct behaviour. You need a unique constraint for it to work. The error you linked to was about the procedure working when any constraint was added (e.g a not null constraint) instead of the specific uniqueness one.

@gem-neo4j Hmm... I was following the example shown here. I know the presenter is using an older version of neo4j (which I confirmed still works) but this behavior would be ideal.

Here's my use case:

We have a neo4j instance running for our CI/CD pipeline to run integration tests on. When a branch triggers a pipeline:

  • a new database is created on that server
  • the json which contains specific data for testing is loaded into that database based on the current test data that's loaded for master
  • other cypher scripts are run which may modify the graph based on the branch updates
  • integration tests are run

If everything passes:

  • A call to apoc.export.json.all is called
  • This file is pushed to s3 which overwrites the current test data json mentioned earlier
  • branch is merged into master

Having to create these uniqueness constraints anytime we want to load in data from a file that apoc output in the first place seems a little odd to me. If that's the intended behavior though, maybe I'm going about this the wrong way?

Hi again :)

I looked a bit more into this, so you're correct that this was not always the case, but it was introduced about a year and a half ago to the procedure (and unfortunately not well documented, hence the confusion!).

If we don't have these constraints, the import will take forever (as it will do a full label scan for each node imported with the generic constraint).

This procedure is not marked as a schema operations procedure, so is not able to make the constraints itself.

I'll create a ticket for my team to improve the documentation around this!

Is is possible for your application to know the names of the labels prior and create those constraints, using perhaps apoc.schema.assert?

Hmm that is tricky. The only thing I can think of would be to use apoc.load.json to find all the label names and then apoc.schema.assert to create constraints using those first. But that might be pretty inefficient 😅

Okay, thank you. Sorry I couldn't help more!