Wilfred / difftastic

a structural diff that understands syntax 🟥🟩

Home Page:https://difftastic.wilfred.me.uk/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[2; .txt, .js] Outputs entirety of generated file despite there being only two changes; tries to allocate 16.3 TiB

berrymot opened this issue · comments

Buy one issue, get another free!

(1) A description of the issue. A screenshot is often helpful too.

This occurred when running git diff on this before commit.

image

This continues for all 25,109 lines, excluding the empty final one.

It diffs data/data.txt fine, but then tries allocating ~18 trillion bytes for the diff of data/jbo.js, probably at least because it's 4.54 MiB squished onto one line. I admit I don't know what to do about this.

image

(2) A copy of what you're diffing. If you're diffing files, include
the before and after files. If you're using difftastic with a VCS
repository (e.g. git), include the URL and commit hash.

Link above; hash is 2190a95d50cfe15039df80f53a310f66a40008e3.

(3) The version of difftastic you're using (see difft --version) and
your operating system.

0.55, Windows 11

lojbo .ui

Thanks for the report.

I think the first issue is probably trying to diff lines ending \r\n against files ending with just \n and concluding that they're different.

The second issue is probably due to line diffing trying to highlight words within the same line.

I can repro the second issue, but not the first. Can you reproduce the first issue with local copies of the files? If so, could you attach them here?

OK, I can reproduce the first issue if I have two files that differ by line ending, but I'm not convinced that difftastic's behaviour is wrong here.

$ echo "one\ntwo" > a.txt                            
$ echo "one\nfoo\ntwo" > b.txt
$ unix2dos b.txt       

$ difft a.txt b.txt
b.txt --- Text
1 one                             1 one
.                                 2 foo
2 two                             3 two

$ diff a.txt b.txt 
1,2c1,3
< one
< two
---
> one
> foo
> two

Plain GNU diff also considers these files to be completely different. Do you have any special crlf settings in your Windows git setup?

I can repro the second issue, but not the first. Can you reproduce the first issue with local copies of the files? If so, could you attach them here?

Unfortunately I'll be away from my computer until next week, sorry. I don't think it's the line endings though? Pretty sure allwords.txt has been CRLF all along. When I get back I'll try adding a test entry to the database and regenerate allwords.txt and see if this still happens.

As for

Do you have any special crlf settings in your Windows git setup?

I don't remember ever messing with that- the one thing I know I've done to etc/config is using difftastic rather than the default diff lol

ni'o lu «lojbo .ui» li'u zo'u .ue xu do jbopre

The test entry has been made, it happens to mean 'line terminator' lol; I'll rerun the parsing stuff when I can

Reran the script with the new word.

allwords.txt still printed the entire file:
image

data.txt still didn't:
image

Replaced every \n with \r\n in the script, reran.

This fixed allwords.txt, but data.txt's changes were too big.
image

Reverted the changes to those two, manually replaced each LF with CRLF, committed, reran.

image

wheeeeeee

Updated to 0.56.1, turns out printing such a giant diff is REALLY bad for my terminal lol