is there any apoc procedure can make allShortestPaths execute in parallel?

Question

is there any apoc procedure can make allShortestPaths execute in parallel?

Reid00 opened this issue 9 months ago · comments

neo4j version: 5.7

I have five pairs (A, B), (A,C), (A,D), (B,C), (B,D)，I want to get the allShortestPaths with below statement:

format!("MATCH (start:{}{{keyNo: $start}})  OPTIONAL MATCH (end:{}{{keyNo:$end}}),
                    p = allShortestPaths ((start)-[:INVEST|LEGAL|EMPLOY|BRANCH|HISINVEST|HISLEGAL|HISEMPLOY*..{}]-(end))
                    RETURN p LIMIT 15", start_label, end_label, hop);

The driver is HTTP API: https://neo4j.com/docs/http-api/current/transactions/ , use it execute multiple queries.

but HTTP API run multiple queries in sequence, it will be slow. So I want to know is there any apoc can wrapper the query make it execute in parallel.

I try to apoc.periodic.iterate, apoc.cypher.runManyReadOnly, it's both no path return, want to use apoc.cypher.runParallel but I have no idea how to use it?

Michael Hunger · Answer 1 · Mon Oct 16 2023 16:16:47 GMT+0800 (China Standard Time)

in this case it won't help you for intra-query-parallelism. Best wait for the new concurrent execution that will be coming soon.

you can also parallelize the execution from python

if you want to use intra-query parallelism, you need to have larger lists of starting nodes that you would then parallelize over.

https://neo4j.com/labs/apoc/4.3/overview/apoc.cypher/apoc.cypher.mapParallel2/

something like

MATCH (start:Start{{keyNo: $start}})  
OPTIONAL MATCH (end:End)
WITH start, collect(end) as list

call apoc.cypher.mapParallel2(
// fragment
'MATCH p = allShortestPaths ((start)-[:INVEST|LEGAL|EMPLOY|BRANCH|HISINVEST|HISLEGAL|HISEMPLOY*..10]-(_)) RETURN p LIMIT 15', 
// list of nodes to parallelize over, in statement it uses _ for those
list, 
// parameters that are passed in
{start:start}) yield value
return value.p as p

Reid00 · Answer 2 · Mon Oct 16 2023 16:27:50 GMT+0800 (China Standard Time)

Thank you very much @jexp.
In my cases, I have 55 (start, end) pairs at most. I list part of them, it will be search from diff labels, like below I try to use apoc.cypher.runParallel (it's syntax error in fact).
Could you please give some advice how to refactor my cpyher?

MATCH (n:Person) where n.keyNo IN ["p501f298f6fc918f1c1e0fb94268f53b", "p756de5fbd0f8acc034a55cd43c97821",
"p5baf3e85f5b3e69e5b47871813d0bfb", "pr45eadf7927a52ca7d6a96a1216794e"]
WITH collect(n) as starts
MATCH (m:Company) where m.keyNo IN ["8c9f7ddc1a7bcee3d1f7676773fe9404", "f9caa5f860e1c9faf7867c301b6b6a06", 
"0bf44c895427b4e60801aa4150b908a8", "dd67092006c584c069b36944bbb6043f", "1ac04be0065f9740d1647cade5790adc",
"f5dbba14219585b4578bb63040a3d418"]
WITH  starts, collect(m) as ends
WITH starts + ends AS nodes
UNWIND nodes as n
UNWIND nodes as m
WITH n,m WHERE id(n) < id(m)
MATCH path = allShortestPaths( (n)-[:INVEST|LEGAL|EMPLOY|BRANCH|HISINVEST|HISLEGAL|HISEMPLOY*..15]-(m) )
CALL apoc.cypher.parallel('RETURN path', {}, 'm,n')
YIELD value RETURN value.title as title

Michael Hunger · Answer 3 · Tue Oct 17 2023 07:08:50 GMT+0800 (China Standard Time)

Just wait a week and then try the prefix cypher runtime=parallel with regular Neo4j.

Reid00 · Answer 4 · Tue Oct 17 2023 09:49:36 GMT+0800 (China Standard Time)

thank you @jexp. You mean cypher runtime=parallel is new feature for the latest neo4j version? is there any docs currently?

Reid00 · Answer 5 · Mon Dec 11 2023 10:53:02 GMT+0800 (China Standard Time)

hello guys, is there any update? where i can find some docs?

Michael Hunger · Answer 6 · Mon Dec 11 2023 11:18:46 GMT+0800 (China Standard Time)

Here are the docs for the parallel runtime:
https://neo4j.com/docs/cypher-manual/current/planning-and-tuning/runtimes/reference/

Reid00 · Answer 7 · Thu Dec 14 2023 13:15:07 GMT+0800 (China Standard Time)

Got it, thanks

Michael Hunger · Answer 8 · Thu Feb 22 2024 20:47:05 GMT+0800 (China Standard Time)

Closing this issues as handled by parallel runtime.