java-diff-utils / java-diff-utils

Diff Utils library is an OpenSource library for performing the comparison / diff operations between texts or some kind of data: computing diffs, applying patches, generating unified diffs or parsing them, generating diff output for easy future displaying (like side-by-side view) and so on.

Home Page:https://java-diff-utils.github.io/java-diff-utils/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Severe performance problem

annafoglio opened this issue · comments

Severe performance problem occurs when I try to compare two huge files.

Try comparing two files of about 300 MB dimension.

I'd like the process to end in a couple of minutes, now it runs for hours and hours.

Look into WithMeyersDiffWithLinearSpacePatchTest. There is a new implementation of Meyers - Algorithm. In fact the first one was an non opimized version from Meyers paper. This one is now the LinearSpace version with much lesser bounds. You use it like:

DiffUtils.diff(insertTest_from, insertTest_to, new MeyersDiffWithLinearSpace<String>());

Does this finishes faster?

Since this version is somehow beta it is not the standard version. However, using DiffUtils new algorithm factory you can replace the standard algorithm using

DiffUtils.withDefaultDiffAlgorithmFactory(factory);

Each Algorithm class has its own factory method: MeyersDiffWithLinearSpace.factory().

Unfortunately, after 12 hours, the process was still running and I had to kill it. Is there any other option I can try?

Could you give an example file of this? Since these are text files they should be highly compressable. Or is there any rule to dynamically create one of those?