Table to Graph Importer CSV example

Question

Table to Graph Importer CSV example

karrtikiyer opened this issue a year ago · comments

Memgraph version 2.6.1

I am trying to reproduce the one to many example shown here: https://memgraph.com/docs/gqlalchemy/how-to-guides/table-to-graph-importer

  address: []        # currently needed, leave [] if no relations to define
  individuals:
    - foreign_key: # foreign key used for mapping;
      column_name: add_id         # specifies its column
      reference_table: address    # name of table from which the foreign key is taken
      reference_key: add_id       # column name in reference table from which the foreign key is taken
    label: LIVES_IN        # label applied to relationship created
      from_entity: False     # (optional) define direction of relationship created

Can someone please help me with CSV for the same?
Also how should ONE to ONE relationship be specified? Please help.

Thanks in advance.

karrtikiyer · Answer 1 · Mon Apr 03 2023 18:21:47 GMT+0800 (China Standard Time)

Also the - at the start of foreign_key seems to be misplaced in the example of yaml above, can someone help with a proper config file for one to many relationship configuration?

Katarina Supe · Answer 2 · Mon Apr 03 2023 19:21:48 GMT+0800 (China Standard Time)

Just to add, this is related to discussion on Discord.

karrtikiyer · Answer 3 · Tue Apr 04 2023 13:34:51 GMT+0800 (China Standard Time)

Also logically in the above example of individuals & address, it is many to one, wherein 1 individual can have multiple addresses, so shouldn't the foreign key be present in the address table? And so the configuration address: [] should have the foreign key to individual ID?

karrtikiyer · Answer 4 · Tue Apr 04 2023 13:49:43 GMT+0800 (China Standard Time)

Request you to also please provide example for Many to Many Configuration and CSV?
For Many to Many, should the relations be in a separate CSV file? or should we duplicate rows in the referenced table?
E.g. if we have Student to Teacher relation ship, it is many to many, one student can have many teachers and one teacher can have many students. Where should we place the relationship between student and teacher? Should it be placed in a separate CSV file?

Bruno Sačarić · Answer 5 · Tue Apr 04 2023 17:57:11 GMT+0800 (China Standard Time)

Hello @karrtikiyer !
For CSV files you can use the CSVLocalFileSystemImporter.
One to one relationships can be defined with the one_to_many_relations field. Let me know if you want me to provide an example, but you simply declare the foreign_key in the config file.

You are right about the wrong formatting, the proper way to write it would be:

one_to_many_relations:
  address: []
  individual:
  - foreign_key:
      column_name: add_id 
      reference_table: address
      reference_key: add_id
    label: LIVES_IN

karrtikiyer · Answer 6 · Tue Apr 04 2023 18:17:54 GMT+0800 (China Standard Time)

Thanks @brunos252 , an example of CSV along with corresponding config would help.

Bruno Sačarić · Answer 7 · Tue Apr 04 2023 18:20:34 GMT+0800 (China Standard Time)

For many_to_many_relations, the config part would look like:

many_to_many_relations:
  incident_individual:
    foreign_key_from:
      column_name: inc_id
      reference_table: incident
      reference_key: inc_id
    foreign_key_to:
      column_name: ind_id
      reference_table: individuals
      reference_key: ind_id
    label: INCIDENT

Where you would need to have a separate associative table (incident_individual.csv) which would contain fields of foreign keys like:

inc_id,ind_id,relation
72,23,DRIVER
12,21,PASSENGER
...

Bruno Sačarić · Answer 8 · Tue Apr 04 2023 18:26:10 GMT+0800 (China Standard Time)

The individual.csv for the above example of one_to_many_relations would look like:

ind_id,name,surname,add_id
1,Tomislav,Petrov,1
2,Ivan,Horvat,3
3,Marko,Horvat,3
4,John,Doe,2
5,John,Though,4

While the address.csv would be:

add_id,street,street_num,city
1,Ilica,2,Zagreb
2,Death Valley,0,Knowhere
3,Horvacanska,3,Horvati
4,Broadway,12,New York

Hope this answers all your questions, and let me know if there is anything else, so we can update the how-to guide :)

karrtikiyer · Answer 9 · Tue Apr 04 2023 19:30:29 GMT+0800 (China Standard Time)

Thanks @brunos252 : I think for the below example the foreign key should be in the address table and not individual. Like every address should have link to the individual.

one_to_many_relations:
  address: []
  individual:
  - foreign_key:
      column_name: add_id 
      reference_table: address
      reference_key: add_id
    label: LIVES_IN

karrtikiyer · Answer 10 · Tue Apr 04 2023 19:33:10 GMT+0800 (China Standard Time)

Also @brunos252 in this link:
It states Loading a CSV file from the local file system
But the code shown is of ParquetLocalFileSystemImporter, I think this should be CSVLocalFileSystemImporter:

importer = ParquetLocalFileSystemImporter(
    data_configuration=parsed_yaml,
    path="/home/user/table_data",
)

importer.translate(drop_database_on_start=True)

Bruno Sačarić · Answer 11 · Tue Apr 04 2023 22:54:29 GMT+0800 (China Standard Time)

@karrtikiyer
As you can see above I created a PR to fix the bugs in the how-to, and will try to add more stuff. Thank you for your inputs!

Regarding the individual/address foreign keys, would it not be preferable in that case to have an associative table? Because although an individual could have multiple addresses, there could also be multiple individuals on a single address, so it feels wrong to have a foreign key in the address table. This is an example anyway so I don't consider it much of an issue, but feel free to comment if you think differently.

karrtikiyer · Answer 12 · Wed Apr 05 2023 13:14:53 GMT+0800 (China Standard Time)

Thanks @brunos252 , I understand that it is just an example, I only pointed it out since it was an example quoted for One to many. Ideally I agree it should be many to many. But if we want to quote it as one to many, I think the right way would be to have foreign key in address. Hence I suggested. But I am okay overall.

Katarina Supe · Answer 13 · Tue Dec 19 2023 20:25:29 GMT+0800 (China Standard Time)

I am closing this issue because Bruno improved the how-to guide and provided the example. @karrtikiyer if you have more questions please open a new issue or ask on Discord.