memgraph / gqlalchemy

GQLAlchemy is a library developed with the purpose of assisting in writing and running queries on Memgraph. GQLAlchemy supports high-level connection to Memgraph as well as modular query builder.

Home Page:https://pypi.org/project/gqlalchemy/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support: Confusion over usage of `Field` class

MartinBubel opened this issue · comments

Hi there,
I have some issues with using memgraph for more than just simple notebooks. Instead I want to embed some utilities for my graph database into a python package.

For that, I am defining some entities to be found in the database. What I see in documentation and tutorials is e.g. this:

class Source(Node):
    url: Field(unique=True, db=db)
    timestamp: Field(unique=False, db=db)

which works fine for jupyter notebooks where the db attribute is globally available. However, if I want to import the Source class from a python file, I currently cannot find any way to get this done.

My current approach is this

class Source(Node):
    url: str  # TODO: make it a Field(unique=True)
    timestamp: str

    def __init__(self, url: str, timestamp: str, **kwargs) -> None:
        kwargs = {"url": url, "timestamp": timestamp} | kwargs
        super().__init__(**kwargs)

but I cannot find a way to make url a Field.

Can someone help me with this one? I feel like I am missing something super trivial here but I am trying for three days now and feel like I am stuck.

Sorry for the potentially "how to python" question but it seems nontrivial to me and potentially could be something limiting others as well.

what I can get done is this

def get_source(db, url: str, timestamp: str):
    class Source(Node):
        url: str = Field(index=True, unique=True, exists=True, db=db)
        timestamp: str = Field(index=True, unique=False, exists=True, db=db)

    return Source(url=url, timestamp=timestamp)

But I am not sure if this is the "nicest" fix as this does not allow for exporting the type (still, type information can be obtained, but not "directly").

Hi @MartinBubel, thanks for opening the issue. We are trying to be helpful with a different range of questions, so no worries.

If I understood this correctly, you want to enable dynamic class creation that has some custom properties like URL timestamp etc.? Do you want to enforce this during the class creation?

I think Python has the construct like this: https://peps.python.org/pep-0487/#:~:text=An%20__init_subclass__%20hook,defined%20in%20the%20class%2C%20and

This def __init_subclass__ enables you to define a base class, and each class construction will go from that point on.

Hi @antejavor
thanks for your reply. Your hint guided me to a better understanding of what my problem actually is.

In fact, my confusion over the usage and initialization of Nodes with Fields was resolved by checking out pydanctic docs (I initially did not notice it was a pydantic type as it is imported from gqlalchemy, which is fine).

So what I am now using is this

class Source(Node):
    url: Field(index=True, unique=True, exists=True)
    timestamp: Field(index=True, unique=False, exists=True)

However, what remains unclear to me is the role of the db keyword. See e.g. the gqlalchemy-query-build doc on the memgraph blog, there is this code

class Movie(Node):
    id: int = Field(index=True, unique=True, exists=True, db=memgraph)
    title: Optional[str]

which uses db=memgraph when "registering" (not instantiating?!) the Nodes Field.

Could you please explain a bit what this db argument is used for and whether it is necessary for functionality / or what happens if it is omitted?
In the above snippet, title does not have a db reference. Why is that?
Also, is it necessary to pass a db instance to a Field of a Node?

Ultimately, I want to import the class Source and instantiate it in some other module, where db=memgrpah is not globally available but passed as an argument, e.g. like this

from somelib.entities import Source  # importing Source from a lib/module where db is available

...
db=Memgraph()
...

source = Source(url, timestamp)  # could also pass db=db here if necessary

Hi @MartinBubel, I'm jumping in here as one of the GQLAlchemy contributors/maintainers.

Whenever you want to create an index or constraint on a Node, db must be provided to a property you want to index/add constraint on.
The db argument is needed because indexes and constraints are being created in the database before any instance of that Node is created and hence you need to forward the correct connection to the database. This happens before any instance of the Node is created because it is expected that indexes and constraints are set before any data import. Also, there would be a repeat retries of creating indexes and constraints if this was done every time when creating a class instance.

In the background, when you create your class:

class Source(Node):
    url: Field(index=True, unique=True, exists=True)
    timestamp: Field(index=True, unique=False, exists=True)

this is what happens: __new__ from NodeMetaclass is called. It checks the properties and if indexes or constraints are set to true, it looks for the db kwarg, because it will run db.create_index() or db.create_constraint(). I would expect the code above to throw GQLAlchemyDatabaseMissingInNodeClassError and that indexes and constraints are not properly created. This can be checked by running print(db.get_indexes()). Here is the documentation on indexes and constraints in OGM.

To explain what happened in the example from the blog post:

class Movie(Node):
    id: int = Field(index=True, unique=True, exists=True, db=memgraph)
    title: Optional[str]

The db was needed on the id property because indexes/constraints were created in the database. The title property didn't need anything because it will be included for the schema validation and no queries need to be run on class definition.

To save the source you will need to run source = Source(url, timestamp).save(db) or

source = Source(url, timestamp)
db.save_node(source)

and that's where you will need db again.
Here is the how-to guide that might help with that.

Let me know if that helps, we can always hop on a quick call if you have more questions :)
Btw. we're also available on Discord, so feel free to join.

Hi @katarinasupe
thanks for following up and the explanations, I really appreciate them.

As you exepct, my snippet throws a GQLAlchemyDatabaseMissingInNodeClassError. However, I still want to be able to create my nodes dynamically. So if I conclude correctly, this means that, if I don't have a global memgraph object that is available at class definition, I need to come up with some sort of a factory-method?

def get_source(db, url: str, timestamp: str):
    class Source(Node):
        url: str = Field(index=True, unique=True, exists=True, db=db)
        timestamp: str = Field(index=True, unique=False, exists=True, db=db)

    return Source(url=url, timestamp=timestamp)

Not sure if this will help, but I had a similar problem while working on an app a while ago. To resolve that, I created models.py where I defined all node and relationship classes and I imported db from the backend. Here is a part of the code where I defined the database. Does that help in your case somehow?

And just to make it clear: do you need to create instances of nodes (new nodes) dynamically or the class that defines a node? Because as far as I can see, your Source node depends on URL and timestamp and that's why you're creating it dynamically, but you can still define Source class somewhere else and import it to the file where you're dynamically creating a node. That would be similar to what I did in twitch_data.py where I imported models I defined along with the db object.

that helps for sure!

I withdrew that idea a while ago as for some reason, I wanted to take a different approach. However, considering how I will use my lib, this for sure is a good solution. I think I will adopt that idea of importing db from the backend.

Thank you for the fast and helpful support 👍 :)

I will close the issue. Maybe, if questions like this keep appearing, it would be cool to add some "mini-demo-project" to the examples, demonstrating this approach.

Yes, I agree with your suggestion of adding "mini-demo-project" and will keep that in mind. Don't hesitate to reach out on Discord or here if you have additional questions 😄