Using GraphMatcher with directed graphs is wrong
cuihaoleo opened this issue · comments
I find the output results of PoliCheck are not very consistent across different Python versions or sometimes even different runs. There seem to be many things that depends on the internal ordering of unordered structures (eg: set
/dict
/nx.DiGraph
).
One partitular bug I found is PatternExtractionNotebook.py
uses GraphMatcher
to test ismorphism between depgraph and pre-defined patterns. Both of depGraph
and p
are DiGraph
:
According to networkx's manual, DiGraphMatcher
should be used with directed graphs instead of GraphMatcher
. Using GraphMatcher
with DiGraph
is basically wrong and I believe the behaviour is not what you expected. It would find more subgraph isomorphisms which should not be there. How wrong it goes depends on the internal storage of the DiGraph
, which is not predictable.
Check https://gist.github.com/cuihaoleo/edc94df39ad53fac11ea07a24bf7e548 for a minimum reproducible example of this. pattern_1.gml
and pattern_2.gml
are basically the same DiGraph
with different internal node IDs. They are the real ones I got in PatternDiscover.train()
after running PoliCheck on the same data for multiple times. There should be no subgraph isomorphism because no edge in depgraph
is xcomp
.
Depending on your python and networkx versions, you might see different results when running the minimum reproducible example. Here are what I got:
Python 3 + networkx 2.5.1:
Are pattern1 and pattern2 isomorphic? True
GraphMatcher(depgraph, pattern1):
GraphMatcher(depgraph, pattern2):
{'(1, measure, <AnnotationType.NONE: 0>)': '(2, choose, <AnnotationType.NONE: 0>)', '(0, We, <AnnotationType.ENTITY: 5>)': '(0, We, <AnnotationType.ENTITY: 5>)', '(3, collect, <AnnotationType.COLLECT_VERB: 3>)': '(4, obtain, <AnnotationType.COLLECT_VERB: 3>)', '(6, your usage data, <AnnotationType.DATA_OBJ: 1>)': '(6, personal information, <AnnotationType.DATA_OBJ: 1>)'}
Python 2.7.18 + networkx 2.2:
Are pattern1 and pattern2 isomorphic? True
GraphMatcher(depgraph, pattern1):
{u'(0, We, <AnnotationType.ENTITY: 5>)': u'(6, personal information, <AnnotationType.DATA_OBJ: 1>)', u'(3, collect, <AnnotationType.COLLECT_VERB: 3>)': u'(4, obtain, <AnnotationType.COLLECT_VERB: 3>)', u'(1, measure, <AnnotationType.NONE: 0>)': u'(2, choose, <AnnotationType.NONE: 0>)', u'(6, your usage data, <AnnotationType.DATA_OBJ: 1>)': u'(0, We, <AnnotationType.ENTITY: 5>)'}
{u'(0, We, <AnnotationType.ENTITY: 5>)': u'(0, We, <AnnotationType.ENTITY: 5>)', u'(3, collect, <AnnotationType.COLLECT_VERB: 3>)': u'(4, obtain, <AnnotationType.COLLECT_VERB: 3>)', u'(1, measure, <AnnotationType.NONE: 0>)': u'(2, choose, <AnnotationType.NONE: 0>)', u'(6, your usage data, <AnnotationType.DATA_OBJ: 1>)': u'(6, personal information, <AnnotationType.DATA_OBJ: 1>)'}
GraphMatcher(depgraph, pattern2):
{u'(0, We, <AnnotationType.ENTITY: 5>)': u'(0, We, <AnnotationType.ENTITY: 5>)', u'(3, collect, <AnnotationType.COLLECT_VERB: 3>)': u'(4, obtain, <AnnotationType.COLLECT_VERB: 3>)', u'(1, measure, <AnnotationType.NONE: 0>)': u'(2, choose, <AnnotationType.NONE: 0>)', u'(6, your usage data, <AnnotationType.DATA_OBJ: 1>)': u'(6, personal information, <AnnotationType.DATA_OBJ: 1>)'}