HLR / DomiKnowS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Phrase Word relation in ACE Event example

hfaghihi15 opened this issue · comments

This issue is to discuss how we want to model the relationship between phrases and words inside the ACE Event example.
I do not see any connection in the graph yet, @guoquan can you share please how you planned to connect them?

Now, in the graph, we have the following definition

token = Concept(name='token')
span_candidate = Concept(name='span_candidate')
span = span_candidate(name='span')
document = Concept(name='document')
span_candidate.has_a(start=token, end=token)
span.has_a(start=token, end=token)
span.contains(token)
document.contains(token)
document.contains(span)

After tokenizing tokens from sentence, and enriched with token['emb'], I apply a CandidateSensor to generate span_candidate
def token_to_span_candidate(spans, start, end):
length = end.instanceID - start.instanceID
if length > 0 and length < 10:
return True
else:
return False
span_candidate['index'] = CandidateSensor(forward=token_to_span_candidate)

Basically filtering the length of span_candidate instances.
Then collect span_candidate['emb'] by
def span_candidate_emb(token_emb, span_index):
embs = cartesian_concat(token_emb, token_emb)
span_index = span_index.rename(None)
span_index = span_index.unsqueeze(-1).repeat(1, 1, embs.shape[-1])
selected = embs.masked_select(span_index).view(-1, embs.shape[-1])
return selected
span_candidate['emb'] = FunctionalSensor(token['emb'], span_candidate['index'], forward=span_candidate_emb)

And classify span_candidate as span by
span_candidate[span] = ModuleLearner('emb', module=torch.nn.Linear(768*2, 2))

Then, I generate spans from span_candidate
def span_candidate_to_span(spans, span_candidate, _):
span_candidate
# filter based on span_candidate.getAttribute('<span>')
return True
span['index'] = CandidateRelationSensor(span_candidate[span], relations=(span_is_span_candidate,), forward=span_candidate_to_span)

The problem I am facing now is span_candidate.getAttribute('<span>') is not there.
span_candidate[span] is used for later requirement, thus the corresponding classifier should be already triggered to run.

Actually I can find builder['global/linguistic/span_candidate/<span>'], which means the classifier is run.
Maybe @auszok can look at why it is not created in datanode.

Here is relevant datanode.log

2020-08-31 01:32:48,000 - INFO - dataNodeBuilder:__setitem__ - key - global/linguistic/span_candidate/index/candidatesensor,  value - <class 'torch.Tensor'>, shape torch.Size([7, 7])
2020-08-31 01:32:48,000 - INFO - dataNodeBuilder:__buildRelationLink - Found 7 dataNodes of the attribute start concept token
2020-08-31 01:32:48,000 - INFO - dataNodeBuilder:__buildRelationLink - Found 7 dataNodes of the attribute end concept token
2020-08-31 01:32:48,000 - INFO - dataNodeBuilder:__buildRelationLink - Processing relation link dataNode for span_candidate, found 0 existing dataNode of this type - provided value has length 7
2020-08-31 01:32:48,002 - INFO - dataNodeBuilder:__setitem__ - key - global/linguistic/span_candidate/index,  value - <class 'torch.Tensor'>, shape torch.Size([7, 7])
2020-08-31 01:32:48,002 - INFO - dataNodeBuilder:__buildRelationLink - Found 7 dataNodes of the attribute start concept token
2020-08-31 01:32:48,002 - INFO - dataNodeBuilder:__buildRelationLink - Found 7 dataNodes of the attribute end concept token
2020-08-31 01:32:48,002 - INFO - dataNodeBuilder:__buildRelationLink - Processing relation link dataNode for span_candidate, found 21 existing dataNode of this type - provided value has length 7
2020-08-31 01:32:48,002 - INFO - dataNodeBuilder:__buildRelationLink - Updating attribute index in relation link dataNodes
2020-08-31 01:32:48,005 - INFO - dataNodeBuilder:__setitem__ - key - global/linguistic/span_candidate/emb/functionalsensor,  value - <class 'torch.Tensor'>, shape torch.Size([21, 1536])
2020-08-31 01:32:48,005 - INFO - dataNodeBuilder:__buildRelationLink - Found 7 dataNodes of the attribute start concept token
2020-08-31 01:32:48,005 - INFO - dataNodeBuilder:__buildRelationLink - Found 7 dataNodes of the attribute end concept token
2020-08-31 01:32:48,005 - ERROR - dataNodeBuilder:__buildRelationLink - Wrong size of value for dimension 0; it is 21 not equal to the number of relation attributes 7 - abandon processing relation link dataNode value for span_candidate
2020-08-31 01:32:48,005 - INFO - dataNodeBuilder:__setitem__ - key - global/linguistic/span_candidate/emb,  value - <class 'torch.Tensor'>, shape torch.Size([21, 1536])
2020-08-31 01:32:48,005 - INFO - dataNodeBuilder:__buildRelationLink - Found 7 dataNodes of the attribute start concept token
2020-08-31 01:32:48,005 - INFO - dataNodeBuilder:__buildRelationLink - Found 7 dataNodes of the attribute end concept token
2020-08-31 01:32:48,005 - ERROR - dataNodeBuilder:__buildRelationLink - Wrong size of value for dimension 0; it is 21 not equal to the number of relation attributes 7 - abandon processing relation link dataNode value for span_candidate
2020-08-31 01:32:48,006 - INFO - dataNodeBuilder:__setitem__ - key - global/linguistic/span_candidate/<span>/modulelearner-1,  value - <class 'torch.Tensor'>, shape torch.Size([21, 2])
2020-08-31 01:32:48,006 - INFO - dataNodeBuilder:__buildRelationLink - Found 7 dataNodes of the attribute start concept token
2020-08-31 01:32:48,006 - INFO - dataNodeBuilder:__buildRelationLink - Found 7 dataNodes of the attribute end concept token
2020-08-31 01:32:48,006 - ERROR - dataNodeBuilder:__buildRelationLink - Wrong size of value for dimension 0; it is 21 not equal to the number of relation attributes 7 - abandon processing relation link dataNode value for span_candidate
2020-08-31 01:32:48,006 - INFO - dataNodeBuilder:__setitem__ - key - global/linguistic/span_candidate/<span>,  value - <class 'torch.Tensor'>, shape torch.Size([21, 2])
2020-08-31 01:32:48,006 - INFO - dataNodeBuilder:__buildRelationLink - Found 7 dataNodes of the attribute start concept token
2020-08-31 01:32:48,006 - INFO - dataNodeBuilder:__buildRelationLink - Found 7 dataNodes of the attribute end concept token
2020-08-31 01:32:48,006 - ERROR - dataNodeBuilder:__buildRelationLink - Wrong size of value for dimension 0; it is 21 not equal to the number of relation attributes 7 - abandon processing relation link dataNode value for span_candidate

There are 7 tokens, and should be 21 span_candidates (6+5+4+3+2+1=21).
Why is it trying to align 7 for span_candidates?

It turns out that the builder is looking for 7x7xN tensor for span_candidates.
There is an issue with the dimension related to nested relation in the current setting #175

Now, "span contains tokens" is used, and span "has tokens" is removed from the graph to avoid nested relations #175 .
However, we also consider using span_candidates and keep "span_candidates has tokens".
We might also build some kind of "equality" #158 between span (as a concept) and span_candidates (as a relation) and transfer information among them.