Support: Confusion over usage of `Field` class
MartinBubel opened this issue · comments
Hi there,
I have some issues with using memgraph
for more than just simple notebooks. Instead I want to embed some utilities for my graph database into a python package.
For that, I am defining some entities
to be found in the database. What I see in documentation and tutorials is e.g. this:
class Source(Node):
url: Field(unique=True, db=db)
timestamp: Field(unique=False, db=db)
which works fine for jupyter notebooks where the db
attribute is globally available. However, if I want to import the Source
class from a python file, I currently cannot find any way to get this done.
My current approach is this
class Source(Node):
url: str # TODO: make it a Field(unique=True)
timestamp: str
def __init__(self, url: str, timestamp: str, **kwargs) -> None:
kwargs = {"url": url, "timestamp": timestamp} | kwargs
super().__init__(**kwargs)
but I cannot find a way to make url
a Field
.
Can someone help me with this one? I feel like I am missing something super trivial here but I am trying for three days now and feel like I am stuck.
Sorry for the potentially "how to python" question but it seems nontrivial to me and potentially could be something limiting others as well.
what I can get done is this
def get_source(db, url: str, timestamp: str):
class Source(Node):
url: str = Field(index=True, unique=True, exists=True, db=db)
timestamp: str = Field(index=True, unique=False, exists=True, db=db)
return Source(url=url, timestamp=timestamp)
But I am not sure if this is the "nicest" fix as this does not allow for exporting the type (still, type information can be obtained, but not "directly").
Hi @MartinBubel, thanks for opening the issue. We are trying to be helpful with a different range of questions, so no worries.
If I understood this correctly, you want to enable dynamic class creation that has some custom properties like URL timestamp etc.? Do you want to enforce this during the class creation?
I think Python has the construct like this: https://peps.python.org/pep-0487/#:~:text=An%20__init_subclass__%20hook,defined%20in%20the%20class%2C%20and
This def __init_subclass__
enables you to define a base class, and each class construction will go from that point on.
Hi @antejavor
thanks for your reply. Your hint guided me to a better understanding of what my problem actually is.
In fact, my confusion over the usage and initialization of Node
s with Field
s was resolved by checking out pydanctic
docs (I initially did not notice it was a pydantic
type as it is imported from gqlalchemy
, which is fine).
So what I am now using is this
class Source(Node):
url: Field(index=True, unique=True, exists=True)
timestamp: Field(index=True, unique=False, exists=True)
However, what remains unclear to me is the role of the db
keyword. See e.g. the gqlalchemy-query-build doc on the memgraph blog, there is this code
class Movie(Node):
id: int = Field(index=True, unique=True, exists=True, db=memgraph)
title: Optional[str]
which uses db=memgraph
when "registering" (not instantiating?!) the Node
s Field
.
Could you please explain a bit what this db
argument is used for and whether it is necessary for functionality / or what happens if it is omitted?
In the above snippet, title
does not have a db
reference. Why is that?
Also, is it necessary to pass a db
instance to a Field
of a Node
?
Ultimately, I want to import the class Source
and instantiate it in some other module, where db=memgrpah
is not globally available but passed as an argument, e.g. like this
from somelib.entities import Source # importing Source from a lib/module where db is available
...
db=Memgraph()
...
source = Source(url, timestamp) # could also pass db=db here if necessary
Hi @MartinBubel, I'm jumping in here as one of the GQLAlchemy contributors/maintainers.
Whenever you want to create an index or constraint on a Node
, db
must be provided to a property you want to index/add constraint on.
The db
argument is needed because indexes and constraints are being created in the database before any instance of that Node
is created and hence you need to forward the correct connection to the database. This happens before any instance of the Node
is created because it is expected that indexes and constraints are set before any data import. Also, there would be a repeat retries of creating indexes and constraints if this was done every time when creating a class instance.
In the background, when you create your class:
class Source(Node):
url: Field(index=True, unique=True, exists=True)
timestamp: Field(index=True, unique=False, exists=True)
this is what happens: __new__
from NodeMetaclass
is called. It checks the properties and if indexes or constraints are set to true, it looks for the db
kwarg, because it will run db.create_index()
or db.create_constraint()
. I would expect the code above to throw GQLAlchemyDatabaseMissingInNodeClassError
and that indexes and constraints are not properly created. This can be checked by running print(db.get_indexes())
. Here is the documentation on indexes and constraints in OGM.
To explain what happened in the example from the blog post:
class Movie(Node):
id: int = Field(index=True, unique=True, exists=True, db=memgraph)
title: Optional[str]
The db
was needed on the id
property because indexes/constraints were created in the database. The title
property didn't need anything because it will be included for the schema validation and no queries need to be run on class definition.
To save the source you will need to run source = Source(url, timestamp).save(db)
or
source = Source(url, timestamp)
db.save_node(source)
and that's where you will need db
again.
Here is the how-to guide that might help with that.
Let me know if that helps, we can always hop on a quick call if you have more questions :)
Btw. we're also available on Discord, so feel free to join.
Hi @katarinasupe
thanks for following up and the explanations, I really appreciate them.
As you exepct, my snippet throws a GQLAlchemyDatabaseMissingInNodeClassError
. However, I still want to be able to create my nodes dynamically. So if I conclude correctly, this means that, if I don't have a global memgraph
object that is available at class definition, I need to come up with some sort of a factory-method?
def get_source(db, url: str, timestamp: str):
class Source(Node):
url: str = Field(index=True, unique=True, exists=True, db=db)
timestamp: str = Field(index=True, unique=False, exists=True, db=db)
return Source(url=url, timestamp=timestamp)
Not sure if this will help, but I had a similar problem while working on an app a while ago. To resolve that, I created models.py where I defined all node and relationship classes and I imported db from the backend. Here is a part of the code where I defined the database. Does that help in your case somehow?
And just to make it clear: do you need to create instances of nodes (new nodes) dynamically or the class that defines a node? Because as far as I can see, your Source node depends on URL and timestamp and that's why you're creating it dynamically, but you can still define Source class somewhere else and import it to the file where you're dynamically creating a node. That would be similar to what I did in twitch_data.py where I imported models I defined along with the db object.
that helps for sure!
I withdrew that idea a while ago as for some reason, I wanted to take a different approach. However, considering how I will use my lib, this for sure is a good solution. I think I will adopt that idea of importing db
from the backend.
Thank you for the fast and helpful support 👍 :)
I will close the issue. Maybe, if questions like this keep appearing, it would be cool to add some "mini-demo-project" to the examples, demonstrating this approach.
Yes, I agree with your suggestion of adding "mini-demo-project" and will keep that in mind. Don't hesitate to reach out on Discord or here if you have additional questions 😄