memgraph / gqlalchemy

GQLAlchemy is a library developed with the purpose of assisting in writing and running queries on Memgraph. GQLAlchemy supports high-level connection to Memgraph as well as modular query builder.

Home Page:https://pypi.org/project/gqlalchemy/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

slow create relationships

POMXARK opened this issue · comments

Discussed in https://github.com/memgraph/gqlalchemy/discussions/224

Originally posted by POMXARK March 15, 2023
Why is this code not working? if you run in the console everything works? is it possible to make multi requests?

from gqlalchemy import Memgraph

memgraph = Memgraph(host='127.0.0.1', port=7687)
memgraph.execute("""
MATCH (parent_stmt:TL), (child_stmt:TL)
WHERE parent_stmt.parent_stmt_id = 2 AND child_stmt.child_stmt_id = 4
CREATE (child_stmt)-[:STMT_BINDING {is_archived: 1}]->(parent_stmt);
MATCH (parent_stmt:TL), (child_stmt:TL)
WHERE parent_stmt.parent_stmt_id = 2 AND child_stmt.child_stmt_id = 6
CREATE (child_stmt)-[:STMT_BINDING {is_archived: 1}]->(parent_stmt); 
""")

optimization of this kind did not help

MATCH 
    (p_1:TL {parent_stmt_id: 2}), (child_stmt:TL {c_1: 4}), 
    (p_2:TL {parent_stmt_id: 3}), (child_stmt:TL {c_2: 6}),
CREATE 
     (p_1)-[:STMT_BINDING {is_archived: 1}]->(c_1),
    (p_2)-[:STMT_BINDING {is_archived: 0}]->(c_2)

average speed 40 requests/sec
to create 66 000 Relationships
time was spent 1:30 (hour and 30 minutes)
is there a way to optimize this?
if run in different threads it can help?

mgmigrate can perform 7000 requests per second, that is, this is a limitation of the python implementation?

Hi @POMXARK, thanks for opening the issue. This is a very slow performance for these simple queries, which suggests something fundamentally wrong with the setup. Did you add indexes on labels and label + properties?
https://memgraph.com/docs/memgraph/reference-guide/indexing

@antejavor Thank you for your response. No. Could you suggest what indexes need to be created?

For the queries above, you need the following label+property index:

CREATE INDEX ON :TL(parent_stmt_id);
CREATE INDEX ON :TL(c_1);

If you will be doing MATCH on just label, not label + property, for example:

MATCH (node:TL)

You will also need a separate label index:

CREATE INDEX ON :TL;

But also take a look at the guide: https://memgraph.com/docs/memgraph/under-the-hood/indexing

CREATE INDEX ON :TL(c_2); CREATE INDEX ON :TL(child_stmt_id);
as well

@POMXARK at the moment gqlalchemy does not offer multiple commands inside one execute_and_fetch.

Mgconsole I think parses it and then executes every method by itself.

@antejavor @Josipmrden thank you very much, the indicators have grown from 40 to 4000 requests per second. indexing is important)

CREATE INDEX ON :entity_0;
CREATE INDEX ON :entity_1;
CREATE INDEX ON :entity_0(user_id);
CREATE INDEX ON :entity_1(user_id);
SHOW INDEX INFO;
MATCH (parent:entity_0), (child:entity_1) 
WHERE parent.user_id = 0 AND child.user_id = 1  
CREATE (child)-[:tree_binging]->(parent)

@antejavor @Josipmrden tell me is it possible to index relationships?

At the moment that is not possible to index relationships