WittleWolfie / PyGram

An efficient approximation for tree edit-distance.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to get information on differences between trees?

sashahart opened this issue · comments

This is really just a request for information and please forgive me if I'm asking a stupid question!

I understand that PyGram's API is primarily oriented toward providing an edit distance between two trees (and it's impressively efficient for that). Now I am hoping to get something closer to a summary of changes - e.g. for shallow trees with branches [2, 3, 4, 5] vs. [2, 3, 6, 8, 5] I might expect [([2], [2]), ([3],[3]), ([4], [6, 8]), (5, 5)]. Heuristic is fine. Can you give me a little clue on how you would approach this?

Hey,

Great to see interest in PyGram! After working on it to help a friend's
research project I was always amazed that the PQ-Gram algorithm isn't more
widely used.

As far as your question is concerned, I'm not sure I can think of a good
solution related to PQ-Gram. The problem with using PQ-Gram (PyGram or any
other implementation) for something like this is that PQ-Gram doesn't
actually compare the trees, so there is not really a point where you can
extract data which directly compares two trees.

In a simple case like the one you described it is clear that simply doing a
walk-through comparison of two trees could be done in linear time, in order
to summarize the changes what you're actually looking for are the edit
steps, which get much more complicated.

Are you concerned with an absolute summary of changes or simply determining
the differences at branching points or leaf nodes? If you are concerned
about branching points/leaf nodes you could start with a simple greedy
solution: traverse both trees and save pairs of nodes. When two nodes
differ mark those two branches as divergent and skip the remainder of that
branch until no matching nodes remain.

If you're concerned with an absolute summary trying to match the two trees
as best as possible you are likely going to want to look at absolute
tree-edit distance algorithms like Zhang and Shasha (
https://github.com/timtadh/zhang-shasha). That was actually developed as a
comparison at the same time I wrote PyGram. Timtadh has done more work in
this area may also be able to help you out with this.

-Tyler

On Mon, Apr 1, 2013 at 6:02 PM, Sasha Hart notifications@github.com wrote:

This is really just a request for information and please forgive me if I'm
asking a stupid question!

I understand that PyGram's API is primarily oriented toward providing an
edit distance between two trees (and it's impressively efficient for that).
Now I am hoping to get something closer to a summary of changes - e.g. for
shallow trees with branches [2, 3, 4, 5] vs. [2, 3, 6, 8, 5] I might expect
[([2], [2]), ([3],[3]), ([4], [6, 8]), (5, 5)]. Heuristic is fine. Can you
give me a little clue on how you would approach this?


Reply to this email directly or view it on GitHubhttps://github.com//issues/1
.

This is really helpful input. Thanks so much.