thepeergroup / aspen

Aspen is a markup language for turning text into graph data (via Cypher for Neo4j).

Home Page:https://aspen-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Generated cql for large datasets chokes Neo4j

osamakhan opened this issue · comments

I've started building a large aspen file with 1000+ lines which results in a generated .cql with ~2000 lines. I'm using Neo4j Desktop so I paste the query in the .cql file into Neo4j browser to run. The query run fails with the following error:
There is not enough stack size to perform the current task. Increasing the JVM stack size works but then you hit the same error when the aspen size increases further.

Would it be possible to have a commandline switch to break up the generated cql query into chunks which can be batched ? I found some useful info here for batching queries:
https://medium.com/neo4j/5-tips-tricks-for-fast-batched-updates-of-graph-structures-with-neo4j-and-cypher-73c7f693c8cc

@osamakhan I think this is a great idea! (I'm sorry about the delay in response, I've been unable to prioritize Aspen for a bit and totally missed this when it came in!)

I think the primary challenge here is how to handle batching data when the node and relationship labels are highly diverse.

Have you figured out any code for batching that's helped you with this upload? That would be a great starting point for figuring out how to get batching into Aspen.

At the moment, Aspen makes a new MERGE line for every node AND every relationship. I think we could start by batching nodes. Batching relationships might take more work.