neo4j-contrib / neo4j-apoc-procedures

Awesome Procedures On Cypher for Neo4j - codenamed "apoc"                     If you like it, please ★ above ⇧            

Home Page:https://neo4j.com/labs/apoc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Query generates arbitrary nodes; termination doesn't roll them back

aubsartre opened this issue · comments

Description

Sorry in advance if this isn't a apoc.refactor.cloneNodes() issue, per se, but instead a problem with Cypher, the query planner, or whatever else. Not sure where to file it, so let me know what to do in that regard.

The query below generates an unexpected, and seemingly arbitrary, number of nodes. On my small docker instance it will reliably generate dozens or hundreds of nodes, but a test over the weekend on a Neo4j sandbox instance generated 11,516,488 nodes. So many, in fact, that issuing MATCH (n) DETACH DELETE n failed with the error message: Neo.DatabaseError.Statement.ExecutionFailed Java heap space, and my sandbox is no longer usable.

NB: termination of the query mid-execution will not rollback the nodes already created, as noted in issue #3960 for cloneNodes().

Expected Behavior

If the query is not legal, expectation would be to issue an error, and if legal, then to at least generate a fixed, deterministic number of nodes matching input size.

Actual Behavior

A great number of clone nodes are generated, from dozens to many millions, but I'm not sure what factors influence it. The only case in which the query seems to work as expected is if there is only one input node to begin with.

How to Reproduce the Problem

This is the offending Cypher:

CREATE (primary:demo {uid: 1}), (secondary:demo {uid: 2})
WITH primary AS _
MATCH (n)
CALL apoc.refactor.cloneNodes([n], false, [])
YIELD input, output AS clone, error
RETURN clone

Steps

  1. Issue the Cypher query (execution can be seconds or hours)
  2. Observe there are many clones nodes made MATCH (n) RETURN COUNT(n)

Specifications

Versions Tested

Both versions I have ready access to have the fault.

Enterprise Edition (online sandbox)

  • OS: ? (Sandbox)
  • Neo4j: 5.16.0 EE
  • Neo4j-Apoc: 5.17.0

Community Edition

  • OS: Ubuntu 18.04.5 LTS
  • Neo4j: 5.10.0 CE
  • Neo4j-Apoc: 5.10.0

Hello! Thanks for writing in, this is because the procedure was opening its own transaction and then the MATCH (n) sees those nodes created after that transaction is committed, and creates more and so on. This is strange behaviour for sure, I hade stopped it from doing this (which fixes the other bug you reported as well), and it will be available in 5.18 😊

A workaround for this is to collect to a list instead, something like this:

CREATE (primary:demo {uid: 1}), (secondary:demo {uid: 2})
WITH primary AS _
MATCH (n)
WITH collect(n) as nodes
CALL apoc.refactor.cloneNodes(nodes, false, [])
YIELD input, output AS clone, error
RETURN clone

Awesome, @gem-neo4j, thanks for that