Missing ontotype

Question

Missing ontotype

simongray opened this issue a year ago · comments

Simon Gray commented a year ago

https://wordnet.dk/dannet/data/synset-78300

Simon Gray · Answer 1 · Fri Mar 10 2023 22:01:46 GMT+0800 (China Standard Time)

Ok, so the issue relates to data in the 2023 adjective dataset and it seems to be the following:

ontotypes are not being display because multiple composite ontotypes are being attached to a single synset, which ruins the assumptions of the UI.
In one type of case, the existing synset is actually an old synset that already has an ontotype. When this is the case, it shouldn't inherit anything. This has been fixed in 7c737d3.
In the other case, the same sense ID appears multiple times in the adjectives dataset. This is causing several identical synthesized synset IDs made from the sense IDs, e.g. 21038758.

Simon Gray · Answer 2 · Fri Mar 10 2023 22:55:10 GMT+0800 (China Standard Time)

For the other case, I wanted to solve it by adding -N for each dupe at the the end of the synthesized ID, e.g. synset-s21038758-0 and synset-s21038758-1. My first attempt just preprocessed the rows, adding this information as metadata...

However, this other case is pretty hairy, since the synthesized IDs are generated not only for the sek_id of a particular row, but also for its siblings, so how do I know if the siblings are dupes? I need to know this when finding siblings too.

Simon Gray · Answer 3 · Mon Mar 27 2023 20:55:11 GMT+0800 (China Standard Time)

At least the hypernyms seem to be distinct from the new adjectives.

(let [rows (read-triples [identity
                            "bootstrap/other/dannet-new/adjectives.tsv"
                            :encoding "UTF-8"
                            :separator \tab
                            :preprocess rest])]
    (set/intersection (set (map #(nth % 5) rows))
                      (set (map #(nth % 7) rows))))

;; => #{""}

Simon Gray · Answer 4 · Tue Mar 28 2023 15:27:26 GMT+0800 (China Standard Time)

At least the hypernyms seem to be distinct from the new adjectives.

(let [rows (read-triples [identity
                            "bootstrap/other/dannet-new/adjectives.tsv"
                            :encoding "UTF-8"
                            :separator \tab
                            :preprocess rest])]
    (set/intersection (set (map #(nth % 5) rows))
                      (set (map #(nth % 7) rows))))

;; => #{""}

Simon Gray · Answer 5 · Tue Mar 28 2023 15:44:00 GMT+0800 (China Standard Time)

A side issue I have discovered is that some of the dannetsemid in the dataset are not defined in the label dataset from Thomas, yet they do exist in our dataset (e.g. lydig, sense 21049162), so I have to make sure to also check wordsenses.csv when creating these links.

https://wordnet.dk/dannet/data/sense-21049162
https://wordnet.dk/dannet/data/synset-79018

Simon Gray · Answer 6 · Tue Mar 28 2023 16:35:52 GMT+0800 (China Standard Time)

Hmmm... an unsplit duplicate has appeared here: http://localhost:3456/dannet/data/sense-21086269