zjukg / UMAEA

[Paper][ISWC 2023] Rethinking Uncertainly Missing and Ambiguous Visual Modality in Multi-Modal Entity Alignment

Home Page:https://arxiv.org/abs/2307.16210

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing ent and char embeddings pkl files

renjith-digicat opened this issue · comments

I am trying to reproduce the unsupervised training and results. But it seems the ent and char word embeddings are missing. Where can I get this embeddings?

save_path_name = os.path.join(args.data_path, "embedding", f"dbp_{args.data_split}_name.pkl")

The entire below function returns None, None since the files are missing.

def load_word_char_features(node_size, word2vec_path, args, logger):
    """
    node_size : ent num
    """
    name_path = os.path.join(args.data_path, "DBP15K", "translated_ent_name", "dbp_" + args.data_split + ".json")
    assert osp.exists(name_path)
    save_path_name = os.path.join(args.data_path, "embedding", f"dbp_{args.data_split}_name.pkl")
    save_path_char = os.path.join(args.data_path, "embedding", f"dbp_{args.data_split}_char.pkl")
    if osp.exists(save_path_name) and osp.exists(save_path_char):
        logger.info(f"load entity name emb from {save_path_name} ... ")
        ent_vec = pickle.load(open(save_path_name, "rb"))
        logger.info(f"load entity char emb from {save_path_char} ... ")
        char_vec = pickle.load(open(save_path_char, "rb"))
        return ent_vec, char_vec

It is not available in the data download url provided in the readme. If there are any steps to generate them, that would be great.

Thank you for your attention.

In fact, we have not conducted any experiments related to "char" and "name" in UMAEA, but our framework is theoretically capable of supporting it. If the original row data contains this type of data, the relevant cache file should be automatically generated.

Please refer to our previous MEAformer repository for details, ensuring that the data format and path are aligned with it.

After obtaining this part of the cache, some sections of the training code will also need to be modified, primarily in the modal fusion part.

Thank you for the response.

Even in the MEAformer repo, the above part of the code seems unchanged.
https://github.com/zjukg/MEAformer/blob/2bda7b32fb67243bf45072bb4b2aeedc48a7fc03/src/data.py#L210

I could not find where the embeddings are generated for the below two file.

save_path_name = os.path.join(args.data_path, "embedding", f"dbp_{args.data_split}_name.pkl")
save_path_char = os.path.join(args.data_path, "embedding", f"dbp_{args.data_split}_char.pkl")

Could you please guide me to the file or part of the code where the caching happens?

It is just doing the loading of the pkl file assuming it already exists.

    save_path_name = os.path.join(args.data_path, "embedding", f"dbp_{args.data_split}_name.pkl")
    save_path_char = os.path.join(args.data_path, "embedding", f"dbp_{args.data_split}_char.pkl")
    if osp.exists(save_path_name) and osp.exists(save_path_char):
        logger.info(f"load entity name emb from {save_path_name} ... ")
        ent_vec = pickle.load(open(save_path_name, "rb"))
        logger.info(f"load entity char emb from {save_path_char} ... ")
        char_vec = pickle.load(open(save_path_char, "rb"))
        return ent_vec, char_vec

My question is - how can I generate the mentioned pkl files (ent and char embeddings)? Those are not provided in the data download url available in the repo and no mention of how to create or generate them as well.

Thank you so much for patiently answering my dumb question!!!
I was making some silly mistakes with the arguments and completely missed that part of the code.

@renjith-digicat No worries. UMAEA may not be fully compatible with this part of the data, as I have not attempted to run the model using it before. If you wish to proceed, you will need to modify parts of the code, including but not limited to the section found here.

I also suggest using the code without CMMI (w/o CMMI) initially.

In addition, if you successfully get it to work, you are welcome to submit a pull request.

Best wishes,
Zhuo