Proposal of semantic of sensor/edgesensor return values
guoquan opened this issue · comments
Here is a wrap-up of today's discussion about the sensor return values' dimension and also an additional proposal for compositional relation (for example pair
) case.
-
In general, the sensor returns a list or a tensor whose:
- length or the first dimension matches the number of the datanode of this concept;
- and all the other dimensions or the items in the list are associated with the datanode with this index.
-
If the sensor is an
EdgeSensor
(or instance of its subclass), and assuming there is only one relation involved, the relation attribute of this sensor indicates whichRelation
(contains
,equal
,is_a
,has_a
) is considered in the returned value. In'forward'
mode, the source and destination of the sensor are the same as the source and destination of the relation. In'backward'
mode, they are reversed. Then:- the first dimension of the returned tensor, or the first level of the returned list, matches the number of datanodes of the source of the sensor (which is the destination of the relation in
'backward'
mode); - the second dimension of the tensor, or the nested list of the return value, indicates the datanodes of the current concept that is in this relationship with the source concept. (discussion: In case of tensor, padding may be needed.)
- and the other dimensions or the items in the nested list are associated with the datanode.
- the first dimension of the returned tensor, or the first level of the returned list, matches the number of datanodes of the source of the sensor (which is the destination of the relation in
-
(Proposal) If multiple relations are considered in one
EdgeSensor
case indicated by a tuple ofN
relation
attribute, for example, twohas_a
are used together to generate apair
, then:- the first
N
dimensions are associated with the sources concepts indicates by theN
relations following the order of the relations. - the
N+1
th dimension is associate with the current concept, and indicates the datanodes of the current concept that is in all these relationships. (discussion: Inhas_a
case, this dimension is just1
. I don't come up with an example where this dimension is helpful. But this makes the semantic more consistent with the second case above). - and the other dimensions are associated with the datanode.
- the first
It should be noticed that the current interpretation of case 2, the second dimension is not aligned with the number of datanodes of the concept. For example, "John works for IBM.", considering the sentence contains words, it will be 1x5 which looks fine. But considering words contains characters, it will be 5x5 where works
has the most characters 5 and other words needs padding.
Another way is to consider the second dimension as the number of all the current datanodes, which results in a 5x16 matrix since there are 16 characters. In this way, it has the benefit if we want to collect the information of the words from characters, and we have the feature for the characters which should be 16xM according to case 1, it is natural and easy to be done by a matrix multiplication and reduced to 5xM easily.
In the above description for multiple relations, do you mean multiple relations between same type of node? Can you clarify the proposal using an example? WorksFor (John, IBM), Owner(John, IBM)? It would be helpful if we have a concrete case with our latest representation in the graph and assigned sensors here, then discuss the same instances with multiple relations.
In the above description for multiple relations, do you mean multiple relations between same type of node? Can you clarify the proposal using an example? WorksFor (John, IBM), Owner(John, IBM)? It would be helpful if we have a concrete case with our latest representation in the graph and assigned sensors here, then discuss the same instances with multiple relations.
Hi,
Sorry for the confusion. Multiple relations in a sensor means we want to consider multiple relations when calculating one property of a concept. For example, for calculating a property of a pair (John, IBM), I might need the two relations arg1
and arg2
where they are defined by arg1, arg2 = pair.has_a(word, word)
.
I will add a series of examples for each fo the case here.
It would be great to add an example, thanks.
This is not clear to me yet, also why you call this multiple relations, do you mean multiple outgoing edges?
(discussion)
Example:
Graph
word = Concept()
phrase = Concept()
sentence = Concept()
(pcw,) = phrase.contains(word)
(scw,) = sentence.contains(word)
(scp,) =sentence.contains(phrase)
pair = Concept()
(pa1, pa2) = pair.has_a(arg1=word, arg2=word)
people = phrase() # IS_A relation will be generated
prp = people.relate_to(phrase)[0]
Data
reader = [{'text': 'John works for IBM .'}]
1. Sensor with 1 concept, no relation:
- N x ...
Tensor
, - or
list
of N elements, - where N is the number of the concept's datanode
- each ... of Tensor or element in the list will be associated with a datanode
Example 1.1 sentence
Based on example data, only one sentence is involved. The sensor should return a tensor of 1 x ...:
sentence['ids'] = DummySensor()
tensor([[48, 97, 72, 9, 83]]) # shape = (1,5)
or a list of 1 element
sentence['text'] = DummySensor()
['John works for IBM .']
Example 1.2 word
Dummy sensor setting. Assume we assign to words directly. Tokenizer will be introduced later.
In Tensor
:
word['ids'] = DummySensor()
tensor([48, 97, 72, 9, 83]) # shape = (5,)
or list
:
word['text'] = DummySensor()
['John', 'works', 'for', 'IBM', '.']
2. Sensor with 1 concept, 1 relation:
- M x N x ...
Tensor
, - or
list
of Mlist
s, each of them has N_m elements - where M is the number of datanodes of the destination concept and N is that of the source concept
Example 2.1 (scw,) = sentence.contains(word)
forward
# scw.src = sentence
# scw.dst = word
# mode=forward
# sensor.src = sentence
# sensor.dst = word
This is the trivial tokenizer setting.
In Tensor
:
word['ids'] = DummySensor(relation=scw, mode='forward')
tensor([[48, 97, 72, 9, 83]]) # shape = (1,5)
or list
:
word['text'] = DummySensor(relation=scw, mode='forward')
[['John', 'works', 'for', 'IBM', '.']]
Example 2.2 (pcw,) = phrase.contains(word)
forward
# scw.src = sentence
# scw.dst = word
# mode=forward
# sensor.src = phrase
# sensor.dst = word
This is tokenizer over multiple instances.
In Tensor
:
word['ids'] = DummySensor('ids', relation=pcw, mode='forward')
tensor([[48, 0],
[97, 72],
[9, 0],
[83, 0]]) # shape = (4,2)
# 0 is padding value
or list
:
word['text'] = DummySensor('text', relation=scw, mode='forward')
[['John'],
['works', 'for'],
['IBM'],
['.']]
Example 2.3 (pcw,) = phrase.contains(word)
backward
# scw.src = sentence
# scw.dst = word
# mode=forward
# sensor.src = word
# sensor.dst = phrase
This is the case when using BIO tagging and create the phrases from words
In Tensor
:
phrase['ids'] = DummySensor('ids', relation=pcw, mode='backward')
tensor([[48, 0, 0, 0],
[0, 97, 0, 0],
[0, 72, 0, 0],
[0, 0, 9, 0],
[0, 0, 0, 83]]) # shape = (5, 4)
# 0 is padding value
or list
:
phrase['text'] = DummySensor('text', relation=scw, mode='backward')
# 'text'
[['John', None, None, None],
[None, 'works', None, None],
[None, 'for', None, None],
[None, None, 'IBM', None],
[None, None, None, '.']]
# or equivalent and more elegant dict
[{0:John'},
{1:'works'},
{1:'for'},
{2:'IBM'},
{3:'.'}]
(discussion)
However, what we expect for 'ids' is a 4x... tensor,
tensor([[48, 0],
[97, 72],
[9, 0],
[83, 0]]) # shape = (4,2)
and 'text' should be a list of 4 strings
['John'. 'works for', 'IBM', '.']
To achieve this, we need to use phrase
, which is the source of the relation and destination of the sensor, as the first dimension, and there is no concept connecting to the second dimension, and there is nothing to encode word association to the phrase now.
The following is what we do now (adjust from equality example), just for reference
phrase['match'] = DummySensor(relation=pcw, mode='backward')
tensor([[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]) # shape = (5,4)
Then other features will use this 'match', follow the 1 concept no relation paradigm, and use external property from word.
phrase['ids'] = DummySensor('match', word['ids'])
tensor([[48, 0],
[97, 72],
[9, 0],
[83, 0]]) # shape = (4,2)
or list
:
phrase['text'] = DummySensor('match', word['text'])
['John'. 'works for', 'IBM', '.']
Using this external property from word['ids']
or word['text']
looks dangerous because the sensor cannot make any assumption about which concept's property can be used.
(WIP)
When instantiating the example, even with one relation, it becomes hard to generalize all the relations.
contains
tends to have M being the parent and N being the children, and only the parent's children will count.
equal
needs M being all datanodes of the source and N being all datanodes of the destination.
is_a
needs no mapping. So M x ... is enough.
has_a
only happens in the scenario where multiple has_a
should be used together.
Backward cases are all different again.
Should we define different rules of interpretation for different relations?
I am thinking maybe we should separate the relation association and property propagation.
Currently, we are kind of mixing them.
For example, in the tokenizer, 1 x 5 x 300 feature matrix means 1 sentence contains 5 words, and each of the 300 vectors is associated with one word. But if we consider the phrase to word tokenizer, it will be complicated. It is 4x2x300 where 4 means we have 4 phrases, and 2 means the there are at most 2 words in each phrase (and thus, we need to handle padding).
If we want to transfer additional properties from phrases to words, we need to generate another 4x2x100 where the mask/padding should match the first tokenizer.
How about if we generate the relation association/mapping 1x5 and calculate feature for words 5x300 separately?
And phrases to words case will be 4x5 mapping matrix and 5x300 feature matrix.
If we want to transfer other properties from phrase to words, the 4x5 mapping can be used repeatedly (instead of being generated repeatedly).
Proposal for separation of relation mapping and feature propagating:
- For relation mapping:
I want to use the 'match' property mentioned above as a general way to maintain the mapping, but store it on edge/relation.
It will always be a big NxM matrix with {0,1} value.
Or we can make specifications for 1-to-1 (vector), 1-to-many (vector), many-to-1 (vector), and many-to-many (matrix) relations. This is different than distinguishing the semantic ofhas_a
andis_a
, which leads to coding for each relation type. - For feature propagating:
And all the properties should just follow N x ... or list of N element where N is the number of datanodes of the concept.
For example, a phrase to word tokenizer in the above example on the edge pcw
should be returning a mapping matrix:
tensor([[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]) # shape = (5,4)
or 1-to-many vector:
tensor([0, 1, 1, 2, 3]) # shape = (5,)
And features are collected separately. Here the features should go to word. ids
feature will just be
tensor([48, 97, 72, 9, 83]) # shape = (5,)
with which we don't need to care about the source (being phrase), and just respect the rule of word.
The question then is where to store the mapping matrix or vector, how to write it in the model declaration, and how can this mapping help transferring or transforming the feature automatically.
Example:
phrase[pcw] = DummySensor(relation=pcw) # resulting the above (5,4) matrix
# assume phrase['emb'] =
tensor([[0.73, -1.02, ..., 0.23],
[0.85, 0.94, ..., -0.72],
[-1.9, 0.24, ..., 0.02],
[0.38, -1.40, ..., 0.01]]) # shape = (4,100)
word['emb'] = DummySensor(phrase['ids'], relation=pcw.forward)
The sensor can detect it is using an external property from a concept that has a mapping relation provided.
It can apply an aggregation automatically by matrix multiplication
tensor([[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]).matmal(
tensor([[0.73, -1.02, ..., 0.23],
[0.85, 0.94, ..., -0.72],
[-1.9, 0.24, ..., 0.02],
[0.38, -1.40, ..., 0.01]])
) =
tensor([[0.73, -1.02, ..., 0.23],
[0.85, 0.94, ..., -0.72],
[0.85, 0.94, ..., -0.72],
[-1.9, 0.24, ..., 0.02],
[0.38, -1.40, ..., 0.01]])
# (5,4) x (4,100) = (5,100)
With the 1-to-many vector which contains the index, there is an equivalent implementation with torch.scatter()
.
Matrix multiplication is generally representing relation mapping. We might allow customizing the reduction, instead of summation in matrix multiplication to make it more flexible.
For the case using pcw.backward
to get phrase representation from word representation, it is done naturally.
For the case of the pair with two has_a relations, we can extend to the Cartesian product.
Following the above data example, a phrase tokenizer will be like the following:
Given
(pcw,) = phrase.contains(word)
phrase['text'] = DummySensor1()
['John'. 'works for', 'IBM', '.']
phrase['emb'] = DummySensor2()
tensor([[0.73, -1.02, ..., 0.23],
[0.85, 0.94, ..., -0.72],
[-1.9, 0.24, ..., 0.02],
[0.38, -1.40, ..., 0.01]]) # shape = (4,100)
Then the tokenizer
word[pcw, 'text', 'ids'] = DummySensor3(phrase['text'])
# pcw =
tensor([[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]) # shape = (5, 4)
# text =
['John', 'works', 'for', 'IBM', '.']
# ids =
tensor([48, 97, 72, 9, 83]) # shape = (5,)
Transforming other features will be
word['emb_p']=DummySensor4(pcw(phrase['emb']))
# internally phrase['emb'] is transformed by pcw, and the sensor will just copy the value, or do the customized reduction
# emb =
tensor([[0.73, -1.02, ..., 0.23],
[0.85, 0.94, ..., -0.72],
[0.85, 0.94, ..., -0.72],
[-1.9, 0.24, ..., 0.02],
[0.38, -1.40, ..., 0.01]])
New proposal multiple relation case
Given
pa1, pa2 =pair.has_a(arg1=word, arg2=word)
word['text'] = DummySensor20()
['John', 'works', 'for', 'IBM', '.']
word['emb'] = DummySensor21()
tensor([[0.63, 1.12, ..., -0.83],
[0.05, -0.94, ..., 2.72],
[0.91, 0.24, ..., 0.12],
[0.84, -0.22, ..., -0.72],
[0.08, 1.10, ..., 0.01]]) # shape = (5,100)
pair[pa1.backward, pa2.backward] = DummySenor22(word['text'])
# for example, the sensor will filter out self-connected pair of word
# pa1.backward
tensor([[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
...,
[0, 0, 1, 0, 0],
...,
[0, 0, 0, 1, 0],
...,
[0, 0, 0, 0, 1],
...]) # shape = (20,5)
# pa2.backward
tensor([[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1],
[1, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1],
...]) # shape = (20,5)
Internally, to identify, for example, the 7th pair's arguments, it is as simple as to get the 7th row from pa1
[0, 1, 0, 0, 0] # indicating the second word
and 7th from pa2
[0, 0, 0, 1, 0] # indicating the fourth word
Automatic transforming of other feature is also naturally
pair['emb_w'] = DummySensor(pa1.backward(word['emb']), pa2.backward(word['emb']))
# to get something (20,200)
Internally, the first argument is calculated by
# pa1.backward x word['emb']
tensor([[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
...,
[0, 0, 1, 0, 0],
...,
[0, 0, 0, 1, 0],
...,
[0, 0, 0, 0, 1],
...]).matmul(
tensor([[0.63, 1.12, ..., -0.83],
[0.05, -0.94, ..., 2.72],
[0.91, 0.24, ..., 0.12],
[0.84, -0.22, ..., -0.72],
[0.08, 1.10, ..., 0.01]])) =
tensor([[0.63, 1.12, ..., -0.83],
[0.63, 1.12, ..., -0.83],
[0.63, 1.12, ..., -0.83],
[0.63, 1.12, ..., -0.83],
[0.05, -0.94, ..., 2.72],
...,
[0.91, 0.24, ..., 0.12],
...,
[0.84, -0.22, ..., -0.72],
...,
[0.08, 1.10, ..., 0.01],
...])
# (20,5) x (5x100) = (20,100)
It will select and collect the identified word's embedding. The second argument is calculated by pa2.backward x word['emb'] = (20,5) x (5x100) = (20,100)
.
Then the user will just need to apply a concatenation or whatever op for merging the two embeddings to calculate the pair's embedding. For example, concatenating the two (20,100)
matrix as (20,200)
.
Having pa2.backward
, we automatically have pa2
as the transpose of the matrix.
So we can also transform pairs' feature to a word with p2
.
Here for example, if we have bits somehow encode a pair's type,
Given
pair['type'] = DummySensor30()
tensor([[0, 0, 0],
[0, 0, 1],
...,
[0, 1, 1]]) # shape = (20, 3)
word['type_pa2'] = DummySensor31(pa2(phrase['type']))
# internally, pa2 x phrase['type'] =
tensor([[0, 0, 0, 0, 1, 0, ...],
[1, 0, 0, 0, 0, 0, ...],
[0, 1, 0, 0, 0, 1, ...],
[0, 0, 1, 0, 0, 0, ...],
[0, 0, 0, 1, 0, 0, ...]]) # shape = (5,20)
.matmal(
tensor([[0, 0, 0],
[0, 0, 1],
...,
[0, 1, 1]])
) =
tensor([[4, 0, 3],
[1, 3, 6],
[5, 10, 2],
[4, 5, 0],
[9, 1, 7]]) # shape = (5,3)
This internal result noting that how many times this word participates as the arg2 in pairs of each type (base on here we have summation as the reduction function in matrix multiplication).
If we don't like the summation, we can allow customizing the reduction by some pytorch tricks. In the worse case, we can do this by a for
loop implemented op function.
The interface will be like
word['type_pa2'] = DummySensor41(pa2(phrase['type'], fn=max))
tensor([[1, 0, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 0],
[1, 1, 1]]) # shape = (5,3)
or even keep the dimension
word['type_pa2'] = DummySensor41(pa2(phrase['type'], fn=torch.stack))
# some {0,1}'s shape = (5,3,20)
Another interesting potential usage is nested query like
sentence['word_participant'] = DummySensor50(scw.backward(pa2(phrase['type'], fn=max), fn=sum))
# internally it gives (1,5) x (5,20) x (20,3) = (1,3)