Proposal of semantic of sensor/edgesensor return values

Question

Proposal of semantic of sensor/edgesensor return values

guoquan opened this issue 4 years ago · comments

Here is a wrap-up of today's discussion about the sensor return values' dimension and also an additional proposal for compositional relation (for example pair) case.

In general, the sensor returns a list or a tensor whose:
1. length or the first dimension matches the number of the datanode of this concept;
2. and all the other dimensions or the items in the list are associated with the datanode with this index.
If the sensor is an EdgeSensor (or instance of its subclass), and assuming there is only one relation involved, the relation attribute of this sensor indicates which Relation (contains, equal, is_a, has_a) is considered in the returned value. In 'forward' mode, the source and destination of the sensor are the same as the source and destination of the relation. In 'backward' mode, they are reversed. Then:
1. the first dimension of the returned tensor, or the first level of the returned list, matches the number of datanodes of the source of the sensor (which is the destination of the relation in 'backward' mode);
2. the second dimension of the tensor, or the nested list of the return value, indicates the datanodes of the current concept that is in this relationship with the source concept. (discussion: In case of tensor, padding may be needed.)
3. and the other dimensions or the items in the nested list are associated with the datanode.
(Proposal) If multiple relations are considered in one EdgeSensor case indicated by a tuple of N relation attribute, for example, two has_a are used together to generate a pair, then:
1. the first N dimensions are associated with the sources concepts indicates by the N relations following the order of the relations.
2. the N+1th dimension is associate with the current concept, and indicates the datanodes of the current concept that is in all these relationships. (discussion: In has_a case, this dimension is just 1. I don't come up with an example where this dimension is helpful. But this makes the semantic more consistent with the second case above).
3. and the other dimensions are associated with the datanode.

Guo, Quan · Answer 1 · Thu Oct 08 2020 02:20:51 GMT+0800 (China Standard Time)

It should be noticed that the current interpretation of case 2, the second dimension is not aligned with the number of datanodes of the concept. For example, "John works for IBM.", considering the sentence contains words, it will be 1x5 which looks fine. But considering words contains characters, it will be 5x5 where works has the most characters 5 and other words needs padding.

Another way is to consider the second dimension as the number of all the current datanodes, which results in a 5x16 matrix since there are 16 characters. In this way, it has the benefit if we want to collect the information of the words from characters, and we have the feature for the characters which should be 16xM according to case 1, it is natural and easy to be done by a matrix multiplication and reduced to 5xM easily.

Parisa Kordjamshidi · Answer 2 · Mon Oct 12 2020 11:52:17 GMT+0800 (China Standard Time)

In the above description for multiple relations, do you mean multiple relations between same type of node? Can you clarify the proposal using an example? WorksFor (John, IBM), Owner(John, IBM)? It would be helpful if we have a concrete case with our latest representation in the graph and assigned sensors here, then discuss the same instances with multiple relations.

Guo, Quan · Answer 3 · Mon Oct 12 2020 22:56:22 GMT+0800 (China Standard Time)

In the above description for multiple relations, do you mean multiple relations between same type of node? Can you clarify the proposal using an example? WorksFor (John, IBM), Owner(John, IBM)? It would be helpful if we have a concrete case with our latest representation in the graph and assigned sensors here, then discuss the same instances with multiple relations.

Hi,
Sorry for the confusion. Multiple relations in a sensor means we want to consider multiple relations when calculating one property of a concept. For example, for calculating a property of a pair (John, IBM), I might need the two relations arg1 and arg2 where they are defined by arg1, arg2 = pair.has_a(word, word).

I will add a series of examples for each fo the case here.

Parisa Kordjamshidi · Answer 4 · Tue Oct 13 2020 09:57:08 GMT+0800 (China Standard Time)

It would be great to add an example, thanks.
This is not clear to me yet, also why you call this multiple relations, do you mean multiple outgoing edges?

Guo, Quan · Answer 5 · Tue Oct 13 2020 22:48:06 GMT+0800 (China Standard Time)

(discussion)

Example:

Graph

word = Concept()
phrase = Concept()
sentence = Concept()
(pcw,) = phrase.contains(word)
(scw,) = sentence.contains(word)
(scp,) =sentence.contains(phrase)

pair = Concept()
(pa1, pa2) = pair.has_a(arg1=word, arg2=word)

people = phrase()  # IS_A relation will be generated
prp = people.relate_to(phrase)[0]

Data

reader = [{'text': 'John works for IBM .'}]

1. Sensor with 1 concept, no relation:

N x ... Tensor,
or list of N elements,
where N is the number of the concept's datanode
each ... of Tensor or element in the list will be associated with a datanode

Example 1.1 `sentence`

Based on example data, only one sentence is involved. The sensor should return a tensor of 1 x ...:

sentence['ids'] = DummySensor()
tensor([[48, 97, 72, 9, 83]])  # shape = (1,5)

or a list of 1 element

sentence['text'] = DummySensor()
['John works for IBM .']

Example 1.2 `word`

Dummy sensor setting. Assume we assign to words directly. Tokenizer will be introduced later.
In Tensor:

word['ids'] = DummySensor()
tensor([48, 97, 72, 9, 83])  # shape = (5,)

or list:

word['text'] = DummySensor()
['John', 'works', 'for', 'IBM', '.']

2. Sensor with 1 concept, 1 relation:

M x N x ... Tensor,
or list of M lists, each of them has N_m elements
where M is the number of datanodes of the destination concept and N is that of the source concept

Example 2.1 `(scw,) = sentence.contains(word)` forward

# scw.src = sentence
# scw.dst = word
# mode=forward
# sensor.src = sentence
# sensor.dst = word

This is the trivial tokenizer setting.
In Tensor:

word['ids'] = DummySensor(relation=scw, mode='forward')
tensor([[48, 97, 72, 9, 83]])  # shape = (1,5)

or list:

word['text'] = DummySensor(relation=scw, mode='forward')
[['John', 'works', 'for', 'IBM', '.']]

Example 2.2 `(pcw,) = phrase.contains(word)` forward

# scw.src = sentence
# scw.dst = word
# mode=forward
# sensor.src = phrase
# sensor.dst = word

This is tokenizer over multiple instances.
In Tensor:

word['ids'] = DummySensor('ids', relation=pcw, mode='forward')
tensor([[48, 0],
              [97, 72],
              [9, 0],
              [83, 0]])  # shape = (4,2)
# 0 is padding value

or list:

word['text'] = DummySensor('text', relation=scw, mode='forward')
[['John'],
 ['works', 'for'],
 ['IBM'],
 ['.']]

Example 2.3 `(pcw,) = phrase.contains(word)` backward

# scw.src = sentence
# scw.dst = word
# mode=forward
# sensor.src = word
# sensor.dst = phrase

This is the case when using BIO tagging and create the phrases from words
In Tensor:

phrase['ids'] = DummySensor('ids', relation=pcw, mode='backward')
tensor([[48, 0, 0, 0],
              [0, 97, 0, 0],
              [0, 72, 0, 0],
              [0, 0, 9, 0],
              [0, 0, 0, 83]])  # shape = (5, 4)
# 0 is padding value

or list:

phrase['text'] = DummySensor('text', relation=scw, mode='backward')
# 'text'
[['John', None, None, None],
 [None, 'works', None, None],
 [None, 'for', None, None],
 [None, None, 'IBM', None],
 [None, None, None, '.']]
# or equivalent and more elegant dict
[{0:John'},
 {1:'works'},
 {1:'for'},
 {2:'IBM'},
 {3:'.'}]

(discussion)
However, what we expect for 'ids' is a 4x... tensor,

tensor([[48, 0],
              [97, 72],
              [9, 0],
              [83, 0]])  # shape = (4,2)

and 'text' should be a list of 4 strings

['John'. 'works for', 'IBM', '.']

To achieve this, we need to use phrase, which is the source of the relation and destination of the sensor, as the first dimension, and there is no concept connecting to the second dimension, and there is nothing to encode word association to the phrase now.

The following is what we do now (adjust from equality example), just for reference

phrase['match'] = DummySensor(relation=pcw, mode='backward')
tensor([[1, 0, 0, 0],
              [0, 1, 0, 0],
              [0, 1, 0, 0],
              [0, 0, 1, 0],
              [0, 0, 0, 1]]) # shape = (5,4)

Then other features will use this 'match', follow the 1 concept no relation paradigm, and use external property from word.

phrase['ids'] = DummySensor('match', word['ids'])
tensor([[48, 0],
              [97, 72],
              [9, 0],
              [83, 0]])  # shape = (4,2)

or list:

phrase['text'] = DummySensor('match', word['text'])
['John'. 'works for', 'IBM', '.']

Using this external property from word['ids'] or word['text'] looks dangerous because the sensor cannot make any assumption about which concept's property can be used.
(WIP)

Guo, Quan · Answer 6 · Wed Oct 14 2020 00:16:25 GMT+0800 (China Standard Time)

When instantiating the example, even with one relation, it becomes hard to generalize all the relations.
contains tends to have M being the parent and N being the children, and only the parent's children will count.
equal needs M being all datanodes of the source and N being all datanodes of the destination.
is_a needs no mapping. So M x ... is enough.
has_a only happens in the scenario where multiple has_a should be used together.
Backward cases are all different again.
Should we define different rules of interpretation for different relations?

I am thinking maybe we should separate the relation association and property propagation.
Currently, we are kind of mixing them.

For example, in the tokenizer, 1 x 5 x 300 feature matrix means 1 sentence contains 5 words, and each of the 300 vectors is associated with one word. But if we consider the phrase to word tokenizer, it will be complicated. It is 4x2x300 where 4 means we have 4 phrases, and 2 means the there are at most 2 words in each phrase (and thus, we need to handle padding).
If we want to transfer additional properties from phrases to words, we need to generate another 4x2x100 where the mask/padding should match the first tokenizer.

How about if we generate the relation association/mapping 1x5 and calculate feature for words 5x300 separately?
And phrases to words case will be 4x5 mapping matrix and 5x300 feature matrix.
If we want to transfer other properties from phrase to words, the 4x5 mapping can be used repeatedly (instead of being generated repeatedly).

Guo, Quan · Answer 7 · Wed Oct 14 2020 13:37:22 GMT+0800 (China Standard Time)

Proposal for separation of relation mapping and feature propagating:

For relation mapping:
I want to use the 'match' property mentioned above as a general way to maintain the mapping, but store it on edge/relation.
It will always be a big NxM matrix with {0,1} value.
Or we can make specifications for 1-to-1 (vector), 1-to-many (vector), many-to-1 (vector), and many-to-many (matrix) relations. This is different than distinguishing the semantic of has_a and is_a, which leads to coding for each relation type.
For feature propagating:
And all the properties should just follow N x ... or list of N element where N is the number of datanodes of the concept.

For example, a phrase to word tokenizer in the above example on the edge pcw should be returning a mapping matrix:

tensor([[1, 0, 0, 0],
              [0, 1, 0, 0],
              [0, 1, 0, 0],
              [0, 0, 1, 0],
              [0, 0, 0, 1]]) # shape = (5,4)

or 1-to-many vector:

tensor([0, 1, 1, 2, 3])  # shape = (5,)

And features are collected separately. Here the features should go to word. ids feature will just be

tensor([48, 97, 72, 9, 83])  # shape = (5,)

with which we don't need to care about the source (being phrase), and just respect the rule of word.

The question then is where to store the mapping matrix or vector, how to write it in the model declaration, and how can this mapping help transferring or transforming the feature automatically.
Example:

phrase[pcw] = DummySensor(relation=pcw)  # resulting the above (5,4) matrix
# assume phrase['emb'] = 
tensor([[0.73, -1.02, ..., 0.23],
              [0.85, 0.94, ..., -0.72],
              [-1.9, 0.24, ..., 0.02],
              [0.38, -1.40, ..., 0.01]])  # shape = (4,100)
word['emb'] = DummySensor(phrase['ids'], relation=pcw.forward)

The sensor can detect it is using an external property from a concept that has a mapping relation provided.
It can apply an aggregation automatically by matrix multiplication

tensor([[1, 0, 0, 0],
              [0, 1, 0, 0],
              [0, 1, 0, 0],
              [0, 0, 1, 0],
              [0, 0, 0, 1]]).matmal(
tensor([[0.73, -1.02, ..., 0.23],
              [0.85, 0.94, ..., -0.72],
              [-1.9, 0.24, ..., 0.02],
              [0.38, -1.40, ..., 0.01]])
) = 
tensor([[0.73, -1.02, ..., 0.23],
              [0.85, 0.94, ..., -0.72],
              [0.85, 0.94, ..., -0.72],
              [-1.9, 0.24, ..., 0.02],
              [0.38, -1.40, ..., 0.01]])
#  (5,4) x (4,100) = (5,100)

With the 1-to-many vector which contains the index, there is an equivalent implementation with torch.scatter().
Matrix multiplication is generally representing relation mapping. We might allow customizing the reduction, instead of summation in matrix multiplication to make it more flexible.

For the case using pcw.backward to get phrase representation from word representation, it is done naturally.
For the case of the pair with two has_a relations, we can extend to the Cartesian product.

Guo, Quan · Answer 8 · Thu Oct 15 2020 00:05:50 GMT+0800 (China Standard Time)

Following the above data example, a phrase tokenizer will be like the following:

Given

(pcw,) = phrase.contains(word)

phrase['text'] = DummySensor1()
['John'. 'works for', 'IBM', '.']

phrase['emb'] = DummySensor2()
tensor([[0.73, -1.02, ..., 0.23],
        [0.85, 0.94, ..., -0.72],
        [-1.9, 0.24, ..., 0.02],
        [0.38, -1.40, ..., 0.01]])  # shape = (4,100)

Then the tokenizer

word[pcw, 'text', 'ids'] = DummySensor3(phrase['text'])
# pcw = 
tensor([[1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0],
        [0, 0, 0, 1]])  # shape = (5, 4)
# text =
['John', 'works', 'for', 'IBM', '.']
# ids =
tensor([48, 97, 72, 9, 83])  # shape = (5,)

Transforming other features will be

word['emb_p']=DummySensor4(pcw(phrase['emb']))
# internally phrase['emb'] is transformed by pcw, and the sensor will just copy the value, or do the customized reduction
# emb =
tensor([[0.73, -1.02, ..., 0.23],
        [0.85, 0.94, ..., -0.72],
        [0.85, 0.94, ..., -0.72],
        [-1.9, 0.24, ..., 0.02],
        [0.38, -1.40, ..., 0.01]])

Guo, Quan · Answer 9 · Thu Oct 15 2020 03:15:00 GMT+0800 (China Standard Time)

New proposal multiple relation case

Given

pa1, pa2 =pair.has_a(arg1=word, arg2=word)

word['text'] = DummySensor20()
['John', 'works', 'for', 'IBM', '.']

word['emb'] = DummySensor21()
tensor([[0.63, 1.12, ..., -0.83],
        [0.05, -0.94, ..., 2.72],
        [0.91, 0.24, ..., 0.12],
        [0.84, -0.22, ..., -0.72],
        [0.08, 1.10, ..., 0.01]])  # shape = (5,100)

pair[pa1.backward, pa2.backward] = DummySenor22(word['text'])
# for example, the sensor will filter out self-connected pair of word
# pa1.backward
tensor([[1, 0, 0, 0, 0],
        [1, 0, 0, 0, 0],
        [1, 0, 0, 0, 0],
        [1, 0, 0, 0, 0],
        [0, 1, 0, 0, 0],
        ...,
        [0, 0, 1, 0, 0],
        ...,
        [0, 0, 0, 1, 0],
        ...,
        [0, 0, 0, 0, 1],
        ...])  # shape = (20,5)
# pa2.backward
tensor([[0, 1, 0, 0, 0],
        [0, 0, 1, 0, 0],
        [0, 0, 0, 1, 0],
        [0, 0, 0, 0, 1],
        [1, 0, 0, 0, 0],
        [0, 0, 1, 0, 0],
        [0, 0, 0, 1, 0],
        [0, 0, 0, 0, 1],
        ...])  # shape = (20,5)

Internally, to identify, for example, the 7th pair's arguments, it is as simple as to get the 7th row from pa1

        [0, 1, 0, 0, 0]  # indicating the second word

and 7th from pa2

        [0, 0, 0, 1, 0]  #  indicating the fourth word

Automatic transforming of other feature is also naturally

pair['emb_w'] = DummySensor(pa1.backward(word['emb']), pa2.backward(word['emb']))
# to get something (20,200)

Internally, the first argument is calculated by

# pa1.backward x word['emb']
tensor([[1, 0, 0, 0, 0],
        [1, 0, 0, 0, 0],
        [1, 0, 0, 0, 0],
        [1, 0, 0, 0, 0],
        [0, 1, 0, 0, 0],
        ...,
        [0, 0, 1, 0, 0],
        ...,
        [0, 0, 0, 1, 0],
        ...,
        [0, 0, 0, 0, 1],
        ...]).matmul(
tensor([[0.63, 1.12, ..., -0.83],
        [0.05, -0.94, ..., 2.72],
        [0.91, 0.24, ..., 0.12],
        [0.84, -0.22, ..., -0.72],
        [0.08, 1.10, ..., 0.01]])) =
tensor([[0.63, 1.12, ..., -0.83],
        [0.63, 1.12, ..., -0.83],
        [0.63, 1.12, ..., -0.83],
        [0.63, 1.12, ..., -0.83],
        [0.05, -0.94, ..., 2.72],
        ...,
        [0.91, 0.24, ..., 0.12],
        ...,
        [0.84, -0.22, ..., -0.72],
        ...,
        [0.08, 1.10, ..., 0.01],
        ...])
# (20,5) x (5x100) = (20,100)

It will select and collect the identified word's embedding. The second argument is calculated by pa2.backward x word['emb'] = (20,5) x (5x100) = (20,100).
Then the user will just need to apply a concatenation or whatever op for merging the two embeddings to calculate the pair's embedding. For example, concatenating the two (20,100) matrix as (20,200).

Guo, Quan · Answer 10 · Thu Oct 15 2020 03:49:31 GMT+0800 (China Standard Time)

Having pa2.backward, we automatically have pa2 as the transpose of the matrix.
So we can also transform pairs' feature to a word with p2.

Here for example, if we have bits somehow encode a pair's type,
Given

pair['type'] = DummySensor30()
tensor([[0, 0, 0],
        [0, 0, 1],
        ...,
        [0, 1, 1]])  # shape = (20, 3)

word['type_pa2'] = DummySensor31(pa2(phrase['type']))
# internally, pa2 x phrase['type'] =
tensor([[0, 0, 0, 0, 1, 0, ...],
        [1, 0, 0, 0, 0, 0, ...],
        [0, 1, 0, 0, 0, 1, ...],
        [0, 0, 1, 0, 0, 0, ...],
        [0, 0, 0, 1, 0, 0, ...]])  # shape = (5,20)
.matmal(
tensor([[0, 0, 0],
        [0, 0, 1],
        ...,
        [0, 1, 1]])
) = 
tensor([[4, 0, 3],
        [1, 3, 6],
        [5, 10, 2],
        [4, 5, 0],
        [9, 1, 7]])  # shape = (5,3)

This internal result noting that how many times this word participates as the arg2 in pairs of each type (base on here we have summation as the reduction function in matrix multiplication).
If we don't like the summation, we can allow customizing the reduction by some pytorch tricks. In the worse case, we can do this by a for loop implemented op function.
The interface will be like

word['type_pa2'] = DummySensor41(pa2(phrase['type'], fn=max))
tensor([[1, 0, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 0],
        [1, 1, 1]])  # shape = (5,3)

or even keep the dimension

word['type_pa2'] = DummySensor41(pa2(phrase['type'], fn=torch.stack))
# some {0,1}'s shape = (5,3,20)

Guo, Quan · Answer 11 · Thu Oct 15 2020 03:58:23 GMT+0800 (China Standard Time)

Another interesting potential usage is nested query like

sentence['word_participant'] = DummySensor50(scw.backward(pa2(phrase['type'], fn=max), fn=sum))
# internally it gives (1,5) x (5,20) x (20,3) = (1,3)

Guo, Quan · Answer 12 · Sat Oct 17 2020 11:21:01 GMT+0800 (China Standard Time)

test_regr/examples/conll04 is updated based on this semantic with c8ddaba
@auszok please update the datanode builder to support this example.
If other test yields an error because of the change, let me know so I can also change those tests.

Guo, Quan · Answer 13 · Wed Jan 27 2021 00:12:39 GMT+0800 (China Standard Time)

Implemented by 238610f

Proposal of semantic of sensor/edgesensor return values

Example:

Graph

Data

1. Sensor with 1 concept, no relation:

Example 1.1 sentence

Example 1.2 word

2. Sensor with 1 concept, 1 relation:

Example 2.1 (scw,) = sentence.contains(word) forward

Example 2.2 (pcw,) = phrase.contains(word) forward

Example 2.3 (pcw,) = phrase.contains(word) backward

Example 1.1 `sentence`

Example 1.2 `word`

Example 2.1 `(scw,) = sentence.contains(word)` forward

Example 2.2 `(pcw,) = phrase.contains(word)` forward

Example 2.3 `(pcw,) = phrase.contains(word)` backward