DeepGraphLearning / ULTRA

A foundation model for knowledge graph reasoning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Adding my own dataset

baseballtrout opened this issue · comments

Hello, first off, this tool is it works really well. I am inspired by you all. My name is Bradford Patton, I got to school at Meharry Medical College. I am having one issue of adding my own knowledge graph. The process of processing my triples won't work. What's the best spacing for the triples that the KG is supposed to be in. It is giving me this error below. I figured I would message the masterminds behind this tool.

File "C:\Users\bpatton23\ULTRA\script\run.py", line 243, in
dataset = util.build_dataset(cfg)
File "C:\Users\bpatton23\ULTRA\ultra\util.py", line 149, in build_dataset
dataset = ds_cls(**data_config)
File "C:\Users\bpatton23\ULTRA\ultra\datasets.py", line 246, in init
super().init(root, transform, pre_transform)
File "C:\Users\bpatton23\AppData\Local\anaconda3\envs\UltraGPU\lib\site-packages\torch_geometric\data\in_memory_dataset.py", line 76, in init
super().init(root, transform, pre_transform, pre_filter, log)
File "C:\Users\bpatton23\AppData\Local\anaconda3\envs\UltraGPU\lib\site-packages\torch_geometric\data\dataset.py", line 102, in init
self._process()
File "C:\Users\bpatton23\AppData\Local\anaconda3\envs\UltraGPU\lib\site-packages\torch_geometric\data\dataset.py", line 235, in _process
self.process()
File "C:\Users\bpatton23\ULTRA\ultra\datasets.py", line 291, in process
train_results = self.load_file(train_files[0], inv_entity_vocab={}, inv_rel_vocab={})
File "C:\Users\bpatton23\ULTRA\ultra\datasets.py", line 264, in load_file
u, r, v = l.split() if self.delimiter is None else l.strip().split(self.delimiter)
ValueError: not enough values to unpack (expected 3, got 1)

Hi, the default separator is a Tab symbol "\t", so the expected format of input triples is tsv.
You can adjust it to your case by setting delimiter = <your symol> in your custom dataset class, for example, delimiter = "," for comma-separated subject,predicate,object lines

Thank you for the information, I will try to separate my data in test, train and valid files as tsv files.

Hello again it's still not working and giving me the same error as before. I have my data in tsv files. Is it possible that if it has downloaded one file before it will just keep loading the same file and won't download a new file from a new link you put there?

Yes, you have to clean up the dataset cache folder and download new files

How? and where would this dataset cache folder be found?

The default path in the config files (eg, for transductive inference) is ~/git/ULTRA/kg-datasets (unless you put your own path in the config). There, delete the folder named after your custom dataset and that should be sufficient.

Thank you again it worked. I will try my datasets again and see if they will work.