CADbloke / daisydiff

Automatically exported from code.google.com/p/daisydiff

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow html diffs to be interrupted / cancelled

GoogleCodeExporter opened this issue · comments

DaisyDiff's html diff can be extremely resource hungry.  Eg our wiki recently 
encountered a page diff (admittedly a very large one) that caused the system to 
run out of memory.  Unit testing the difference at fault I gave it 900MB and it 
still OOMEd.  

There are numerous ways that the resource use could be limited.  Eg modifying 
the LCSSettings' limits, or implementing skipRangeComparison.  But I'd like to 
just make it cancellable externally (because it's easier).

The Eclipse RangeDifferencer at the heart of the HTMLDiffer can be passed an 
IProgressMonitor, whose isCancelled method is often checked.  When the 
IProgressMonitor is cancelled, that diffing operation terminates.

I'm working on a simple change to DaisyDiff to include a diff(IProgressMonitor 
progressMonitor, TextNodeComparator leftComparator, TextNodeComparator 
rightComparator) method that passes that progressMonitor through to the various 
calls to RangeDifference.

That's the most straightforward way to implement the functionality I need, but 
it's not necessarily the best.  Eg people might think we shouldn't leak the 
reliance on Eclipse classes to client code, which would be pretty reasonable.  
Also I haven't done anything about the rest of the IProgressMonitor interface 
at the moment, so it's a bit misleading to take one.

Original issue reported on code.google.com by don.jp.w...@gmail.com on 17 May 2011 at 12:04

Perhaps it would be better to find out if there are any memory leaks and fix 
them first?

I am not sure if DaisyDiff was designed in order to run with gigantic files. 
See also issue 21 and issue 23

Original comment by kkape...@gmail.com on 19 May 2011 at 8:36

  • Added labels: ****
  • Removed labels: ****
There's no memory leaked beyond the end of the operation.   It's just that as 
daisydiff/rangedifferencer go through their machinations they slowly create a 
giant set of result definitions.  This may be the result of a bug that occurs 
for my specific data, but I don't really think so. Doubtless there are 
improvements to be made to daisy diff's memory usage.  But it may never be 
achievable to make it handle absolutely any data in limited time and memory.

However, as long as a running diff can be cancelled externally, mitigation for 
impossible diffs is feasible.

I don't consider this issue to be a defect report but an enhancement request 
(but I can't set that).

Think of it as a solution to 
http://code.google.com/p/daisydiff/issues/detail?id=21#c5 that is based on the 
observation that daisydiff takes considerable time to consume memory.

Original comment by don.jp.w...@gmail.com on 20 May 2011 at 12:29

  • Added labels: ****
  • Removed labels: ****

Original comment by kkape...@gmail.com on 20 May 2011 at 7:33

  • Added labels: Type-Enhancement
  • Removed labels: Type-Defect