memgraph / gqlalchemy

GQLAlchemy is a library developed with the purpose of assisting in writing and running queries on Memgraph. GQLAlchemy supports high-level connection to Memgraph as well as modular query builder.

Home Page:https://pypi.org/project/gqlalchemy/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Table to Graph Importer CSV example

karrtikiyer opened this issue · comments

Memgraph version 2.6.1

I am trying to reproduce the one to many example shown here: https://memgraph.com/docs/gqlalchemy/how-to-guides/table-to-graph-importer

  address: []        # currently needed, leave [] if no relations to define
  individuals:
    - foreign_key: # foreign key used for mapping;
      column_name: add_id         # specifies its column
      reference_table: address    # name of table from which the foreign key is taken
      reference_key: add_id       # column name in reference table from which the foreign key is taken
    label: LIVES_IN        # label applied to relationship created
      from_entity: False     # (optional) define direction of relationship created

Can someone please help me with CSV for the same?
Also how should ONE to ONE relationship be specified? Please help.

Thanks in advance.

Also the - at the start of foreign_key seems to be misplaced in the example of yaml above, can someone help with a proper config file for one to many relationship configuration?

Just to add, this is related to discussion on Discord.

Also logically in the above example of individuals & address, it is many to one, wherein 1 individual can have multiple addresses, so shouldn't the foreign key be present in the address table? And so the configuration address: [] should have the foreign key to individual ID?

Request you to also please provide example for Many to Many Configuration and CSV?
For Many to Many, should the relations be in a separate CSV file? or should we duplicate rows in the referenced table?
E.g. if we have Student to Teacher relation ship, it is many to many, one student can have many teachers and one teacher can have many students. Where should we place the relationship between student and teacher? Should it be placed in a separate CSV file?

Hello @karrtikiyer !
For CSV files you can use the CSVLocalFileSystemImporter.
One to one relationships can be defined with the one_to_many_relations field. Let me know if you want me to provide an example, but you simply declare the foreign_key in the config file.

You are right about the wrong formatting, the proper way to write it would be:

one_to_many_relations:
  address: []
  individual:
  - foreign_key:
      column_name: add_id 
      reference_table: address
      reference_key: add_id
    label: LIVES_IN    

Thanks @brunos252 , an example of CSV along with corresponding config would help.

For many_to_many_relations, the config part would look like:

many_to_many_relations:
  incident_individual:
    foreign_key_from:
      column_name: inc_id
      reference_table: incident
      reference_key: inc_id
    foreign_key_to:
      column_name: ind_id
      reference_table: individuals
      reference_key: ind_id
    label: INCIDENT

Where you would need to have a separate associative table (incident_individual.csv) which would contain fields of foreign keys like:

inc_id,ind_id,relation
72,23,DRIVER
12,21,PASSENGER
...

The individual.csv for the above example of one_to_many_relations would look like:

ind_id,name,surname,add_id
1,Tomislav,Petrov,1
2,Ivan,Horvat,3
3,Marko,Horvat,3
4,John,Doe,2
5,John,Though,4

While the address.csv would be:

add_id,street,street_num,city
1,Ilica,2,Zagreb
2,Death Valley,0,Knowhere
3,Horvacanska,3,Horvati
4,Broadway,12,New York

Hope this answers all your questions, and let me know if there is anything else, so we can update the how-to guide :)

Thanks @brunos252 : I think for the below example the foreign key should be in the address table and not individual. Like every address should have link to the individual.

one_to_many_relations:
  address: []
  individual:
  - foreign_key:
      column_name: add_id 
      reference_table: address
      reference_key: add_id
    label: LIVES_IN    

Also @brunos252 in this link:
It states Loading a CSV file from the local file system
But the code shown is of ParquetLocalFileSystemImporter, I think this should be CSVLocalFileSystemImporter:

importer = ParquetLocalFileSystemImporter(
    data_configuration=parsed_yaml,
    path="/home/user/table_data",
)

importer.translate(drop_database_on_start=True)

@karrtikiyer
As you can see above I created a PR to fix the bugs in the how-to, and will try to add more stuff. Thank you for your inputs!

Regarding the individual/address foreign keys, would it not be preferable in that case to have an associative table? Because although an individual could have multiple addresses, there could also be multiple individuals on a single address, so it feels wrong to have a foreign key in the address table. This is an example anyway so I don't consider it much of an issue, but feel free to comment if you think differently.

Thanks @brunos252 , I understand that it is just an example, I only pointed it out since it was an example quoted for One to many. Ideally I agree it should be many to many. But if we want to quote it as one to many, I think the right way would be to have foreign key in address. Hence I suggested. But I am okay overall.

I am closing this issue because Bruno improved the how-to guide and provided the example. @karrtikiyer if you have more questions please open a new issue or ask on Discord.