Wilfred / difftastic

a structural diff that understands syntax 🟥🟩

Home Page:https://difftastic.wilfred.me.uk/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"memory allocation" error

jasongibson opened this issue · comments

Diffing a 13kb C++ source file results in a memory allocation of 9223372036854775800 bytes failed error. My laptop doesn't have 9,223 petabytes of memory, so this keeps it from working.

Release 0.53 onward fails like this. Version 0.52 works.
The latest release tested is Difftastic 0.55.0 (7fda26d 2024-01-30, built with rustc 1.65.0).

Unfortunately, the source is proprietary. I'll see if it can be reproduced with something that can be shared.

How many lines are in the files, and roughly how many lines are in common (e.g. how big is a patch created by GNU diff)?

You're probably hitting the scaling limitations of the textual diffing logic, which uses O(N^2) memory with dynamic programming (the Myers Diff algorithm). I know that GNU diff / BSD diff have tricks to handle larger files well, at the cost of a less optimal diff (more lines shown as changed). I think difftastic needs to have similar heuristics for large text diffs.

I can reproduce with "Difftastic 0.55.0 (built with rustc 1.75.0)" (latest homebrew version) with these files:

GNU diff's output is 306 lines.

@simonmichael are you sure that's the same issue? I can't reproduce with your files.

$ /bin/time -v difft a.txt b.txt >/dev/null
	Command being timed: "difft a.txt b.txt"
	User time (seconds): 0.02
	System time (seconds): 0.01
	Percent of CPU this job got: 93%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.04
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 26312
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 338
	Voluntary context switches: 0
	Involuntary context switches: 4
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

That memory amount seems sensible.

Indeed. Unfortunately, today I'm not able to reproduce it either! I do expect to see it again eventually, I'll update here in that case.

Apologies for the delay in getting some files for this.

Here's a repro (with different files than the initial message in this report) using 'Difftastic 0.56.1 (d9d6401 2024-03-05, built with rustc 1.65.0)':

wget https://raw.githubusercontent.com/llvm/llvm-project/5d6304f01742a0a7c628fe6850e921c745eaea08/compiler-rt/lib/asan/asan_allocator.cpp -O a
wget https://raw.githubusercontent.com/llvm/llvm-project/9e68b7e0e04f89cd3810102016ddf34fd3a33b3d/compiler-rt/lib/asan/asan_allocator.cpp -O b
./difft a b
b --- 1/10 --- Text
16 16 
17 17 #include "asan_allocator.h"
18 18 
19 .. #include "asan_internal.h"
20 19 #include "asan_mapping.h"
21 20 #include "asan_poisoning.h"
22 21 #include "asan_report.h"

b --- 2/10 --- Text
25 24 #include "lsan/lsan_common.h"
26 25 #include "sanitizer_common/sanitizer_allocator_checks.h"
27 26 #include "sanitizer_common/sanitizer_allocator_interface.h"
28 .. #include "sanitizer_common/sanitizer_common.h"
29 27 #include "sanitizer_common/sanitizer_errno.h"
30 28 #include "sanitizer_common/sanitizer_flags.h"
31 29 #include "sanitizer_common/sanitizer_internal_defs.h"

b --- 3/10 --- Text
memory allocation of 9223372036854775803 bytes failed
aborted (core dumped)