Generated cql for large datasets chokes Neo4j

Question

Generated cql for large datasets chokes Neo4j

osamakhan opened this issue 4 years ago · comments

I've started building a large aspen file with 1000+ lines which results in a generated .cql with ~2000 lines. I'm using Neo4j Desktop so I paste the query in the .cql file into Neo4j browser to run. The query run fails with the following error:
There is not enough stack size to perform the current task. Increasing the JVM stack size works but then you hit the same error when the aspen size increases further.

Would it be possible to have a commandline switch to break up the generated cql query into chunks which can be batched ? I found some useful info here for batching queries:
https://medium.com/neo4j/5-tips-tricks-for-fast-batched-updates-of-graph-structures-with-neo4j-and-cypher-73c7f693c8cc

Matt Cloyd · Answer 1 · Thu Dec 24 2020 21:09:15 GMT+0800 (China Standard Time)

@osamakhan I think this is a great idea! (I'm sorry about the delay in response, I've been unable to prioritize Aspen for a bit and totally missed this when it came in!)

I think the primary challenge here is how to handle batching data when the node and relationship labels are highly diverse.

Have you figured out any code for batching that's helped you with this upload? That would be a great starting point for figuring out how to get batching into Aspen.

Matt Cloyd · Answer 2 · Fri Oct 29 2021 03:57:04 GMT+0800 (China Standard Time)

At the moment, Aspen makes a new MERGE line for every node AND every relationship. I think we could start by batching nodes. Batching relationships might take more work.