benandow / PrivacyPolicyAnalysis

This repository holds the code for PolicyLint and PoliCheck, which identifies internal contradictions within privacy policies and analyzes data flow to privacy policy consistency.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using GraphMatcher with directed graphs is wrong

cuihaoleo opened this issue · comments

I find the output results of PoliCheck are not very consistent across different Python versions or sometimes even different runs. There seem to be many things that depends on the internal ordering of unordered structures (eg: set/dict/nx.DiGraph).

One partitular bug I found is PatternExtractionNotebook.py uses GraphMatcher to test ismorphism between depgraph and pre-defined patterns. Both of depGraph and p are DiGraph:

GM = nx.algorithms.isomorphism.GraphMatcher(depGraph, p, node_match=GraphCompare.nmatchCallback, edge_match=GraphCompare.ematchCallback)

According to networkx's manual, DiGraphMatcher should be used with directed graphs instead of GraphMatcher. Using GraphMatcher with DiGraph is basically wrong and I believe the behaviour is not what you expected. It would find more subgraph isomorphisms which should not be there. How wrong it goes depends on the internal storage of the DiGraph, which is not predictable.

Check https://gist.github.com/cuihaoleo/edc94df39ad53fac11ea07a24bf7e548 for a minimum reproducible example of this. pattern_1.gml and pattern_2.gml are basically the same DiGraph with different internal node IDs. They are the real ones I got in PatternDiscover.train() after running PoliCheck on the same data for multiple times. There should be no subgraph isomorphism because no edge in depgraph is xcomp.

Depending on your python and networkx versions, you might see different results when running the minimum reproducible example. Here are what I got:

Python 3 + networkx 2.5.1:

Are pattern1 and pattern2 isomorphic? True
GraphMatcher(depgraph, pattern1):
GraphMatcher(depgraph, pattern2):
{'(1, measure, <AnnotationType.NONE: 0>)': '(2, choose, <AnnotationType.NONE: 0>)', '(0, We, <AnnotationType.ENTITY: 5>)': '(0, We, <AnnotationType.ENTITY: 5>)', '(3, collect, <AnnotationType.COLLECT_VERB: 3>)': '(4, obtain, <AnnotationType.COLLECT_VERB: 3>)', '(6, your usage data, <AnnotationType.DATA_OBJ: 1>)': '(6, personal information, <AnnotationType.DATA_OBJ: 1>)'}

Python 2.7.18 + networkx 2.2:

Are pattern1 and pattern2 isomorphic? True
GraphMatcher(depgraph, pattern1):
{u'(0, We, <AnnotationType.ENTITY: 5>)': u'(6, personal information, <AnnotationType.DATA_OBJ: 1>)', u'(3, collect, <AnnotationType.COLLECT_VERB: 3>)': u'(4, obtain, <AnnotationType.COLLECT_VERB: 3>)', u'(1, measure, <AnnotationType.NONE: 0>)': u'(2, choose, <AnnotationType.NONE: 0>)', u'(6, your usage data, <AnnotationType.DATA_OBJ: 1>)': u'(0, We, <AnnotationType.ENTITY: 5>)'}
{u'(0, We, <AnnotationType.ENTITY: 5>)': u'(0, We, <AnnotationType.ENTITY: 5>)', u'(3, collect, <AnnotationType.COLLECT_VERB: 3>)': u'(4, obtain, <AnnotationType.COLLECT_VERB: 3>)', u'(1, measure, <AnnotationType.NONE: 0>)': u'(2, choose, <AnnotationType.NONE: 0>)', u'(6, your usage data, <AnnotationType.DATA_OBJ: 1>)': u'(6, personal information, <AnnotationType.DATA_OBJ: 1>)'}
GraphMatcher(depgraph, pattern2):
{u'(0, We, <AnnotationType.ENTITY: 5>)': u'(0, We, <AnnotationType.ENTITY: 5>)', u'(3, collect, <AnnotationType.COLLECT_VERB: 3>)': u'(4, obtain, <AnnotationType.COLLECT_VERB: 3>)', u'(1, measure, <AnnotationType.NONE: 0>)': u'(2, choose, <AnnotationType.NONE: 0>)', u'(6, your usage data, <AnnotationType.DATA_OBJ: 1>)': u'(6, personal information, <AnnotationType.DATA_OBJ: 1>)'}