Issue127

Title hg diff -w does not ignore white space in hunks containing other changes
Priority bug Status chatting
Superseder Nosy List ThomasAH, abuehl, djc, haps, jglick, mason, mpm, rawagajah, tksoh, tlynn
Assigned To Topics diff

Created on 2006-02-15.18:36:56 by vadim, last changed 2008-09-05.09:13:17 by rawagajah.

Files
File name Uploaded Type Edit Remove
broken.diff vadim, 2006-02-15.18:36:53 text/x-patch
Messages
msg6952 (view) Author: rawagajah Date: 2008-09-05.09:13:17
(I meant "hg diff -b 1" at the end there, of course, which gives the same poor diff)
msg6931 (view) Author: rawagajah Date: 2008-09-03.17:33:26
Here's another demonstration of a poor diff caused by this bug, which standalone
"diff -ub" handles much better, giving a diff which is much easier for the user
to understand...

$ echo -e "X\n  A\n  B\n  D\n  E" > 1
$ echo -e "X\n A\n B\n C\n D\n E" > 2
$ diff -ub 1 2
--- 1	2008-09-03 18:24:35.000000000 +0100
+++ 2	2008-09-03 18:24:37.000000000 +0100
@@ -1,5 +1,6 @@
 X
   A
   B
+ C
   D
   E
$ hg init
$ hg add 1 2
$ hg commit -m 'Check in'
$ mv 2 1
$ hg diff 1
diff -r 34bb3cd02d65 1
--- a/1	Wed Sep 03 18:24:47 2008 +0100
+++ b/1	Wed Sep 03 18:25:01 2008 +0100
@@ -1,5 +1,6 @@
 X
-  A
-  B
-  D
-  E
+ A
+ B
+ C
+ D
+ E
msg6774 (view) Author: haps Date: 2008-08-14.15:49:02
Well, you're absolutely correct on that one.  I was positive that diff -w
ignored newline differences but some simple tests show me that I was way off on
that one.  Maybe I was thinking of -wB, but even that will say that 'abc\ndef'
differs from 'ab\ncd\nef'.  So please ignore my thought that -bB should act like -w.
msg6773 (view) Author: djc Date: 2008-08-14.15:04:43
There are of course also the -b and -B options for newline stuff.
msg6772 (view) Author: ThomasAH Date: 2008-08-14.15:02:43
Shouldn't the behaviour match that of the "normal" diff command?

diff (GNU diffutils) 2.8.1 considers with -w:

"foo bar\n" and "foo  bar\n" to be equal, but "foo bar\n" and "foo\nbar\n" not,
so XML's idea of whitespace doesn't help here.

"foo bar\n" and "foo\rbar\r" to be equal, so \r seems to be viewed as whitespace.
msg6771 (view) Author: haps Date: 2008-08-14.14:38:15
I agree, \n is sort of limiting to the non-windows world (who uses that anyway?:).
So, what about \s?  I think though the most important thing is that it behaves
similarly to other diffs.  At the moment it does not.  Most definately, if -w is
changed to ignore \s+ then -b should be suitably updated (and if \r is included
in -w then -B should also be updated).

My key point is that -bB should be identical to -w.  I hope it's clear and is
everyone in agreement with this?  Can we move forward to fixing this issue?
msg6769 (view) Author: tlynn Date: 2008-08-14.13:33:17
The definition of whitespace should include at least \r (' \t\r\n' is XML's idea
of whitespace). Alternatively you could use the '\s' regexp ('\t\n\v\f\r ',
usually the same as string.whitespace) or whatever the locale says. But at least \r.
msg6649 (view) Author: haps Date: 2008-07-31.15:44:00
Hi;
I think that diff -w should be the same as diff -bB.  When I use diff -bB it
acts like diff -w (no exhaustive test cases).  The problem with the diff code
now I believe is that this old version
mercurial/mdiff.py, beginning at line 52 (version 0.9.5), old:

def wsclean(opts, text):
    if opts.ignorews:
        text = re.sub('[ \t]+', '', text)
    elif opts.ignorewsamount:
        text = re.sub('[ \t]+', ' ', text)
        text = re.sub('[ \t]+\n', '\n', text)
    if opts.ignoreblanklines:
        text = re.sub('\n+', '', text)
    return text

could be changed to something like:

def wsclean(opts, text):
    if opts.ignorews:
        text = re.sub('[ \t\n]+', '', text)
    else:
        if opts.ignorewsamount:
            text = re.sub('[ \t]+', ' ', text)
            text = re.sub('[ \t]+\n', '\n', text)
        if opts.ignoreblanklines:
            text = re.sub('\n+', '', text)
    return text
msg6051 (view) Author: jglick Date: 2008-05-20.18:32:37
A simple test case:

$ hg init
$ (echo 'doStuff'; echo 'doMoreStuff') > f
$ hg ci -A -m 1
adding f
$ (echo 'block {'; echo '  doStuff'; echo '  doMoreStuff'; echo '} end') > f
$ hg ci -m 2
$ hg cat -r 0 f > f_0
$ diff -uw f_0 f
--- f_0	2008-05-20 14:29:16.000000000 -0400
+++ f	2008-05-20 14:28:16.000000000 -0400
@@ -1,2 +1,4 @@
+block {
 doStuff
 doMoreStuff
+} end
$ hg --config diff.ignorews=1 di -r0:1
diff --git a/f b/f
--- a/f
+++ b/f
@@ -1,2 +1,4 @@
-doStuff
-doMoreStuff
+block {
+  doStuff
+  doMoreStuff
+} end
$
msg4588 (view) Author: djc Date: 2007-12-13.12:48:26
I'll look into fixing this soon. Transcript from chat with pmezard:

13:35 < pmezard> djc: i think the lines should be normalized before sending
                 them to bdiff.blocks()
13:37 < djc> pmezard: indeed
13:37 < pmezard> "if wsclean(opts, "".join(old)) == wsclean(opts,
                 "".join(new)):"  looks a bit simplistic
13:39 < pmezard> i don't know how expensive it would be to duplicate both line
                 lines. probably not too expensive, especially if done only for
                 ws cases.
13:39 < djc> pmezard: my concern was with dedenting a block of code
13:39 < djc> the diff becomes a bit unreadable if you do that
13:39 < djc> especially if the block is large
13:39 < djc> does this approach fix that problem?
13:40 < pmezard> only dedenting or dedenting + changes
13:41 < djc> well, I'd like to get a diff that ignores all the dedenting and
             shows the other changes
13:41 < djc> and I figure diff -b should do that
13:41 < djc> but it doesn't
13:41 < pmezard> yes i think this case is currently ignore by the line i quote
                 above
13:41 < pmezard> it only checks for pure whitespaces changes in non-matching
                 blocks
13:42 < djc> right
13:42 < pmezard> problem is that any other changes make the whole block to be
                 emitted
13:43 < djc> exactly
13:43 < pmezard> so, you can create two copies of l1/l2 with every lines
                 normalized by wsclean, pass that to bdiff.blocks, then match
                 again with the original input
msg4528 (view) Author: ThomasAH Date: 2007-12-08.09:58:05
Not fixed yet (tested with 1db1c7c1eb5e)
msg4486 (view) Author: mpm Date: 2007-12-07.21:48:18
Did we fix this one?
msg530 (view) Author: ThomasAH Date: 2006-02-21.13:49:07
changed title to match problem.
msg485 (view) Author: mason Date: 2006-02-15.19:39:33
On Wednesday 15 February 2006 13:36, Vadim Gelfer wrote:
> New submission from Vadim Gelfer <vadim.gelfer@gmail.com>:
>
> i am trying to fix issue 126. when my fix does not work, i think it is my
> fault, but diff code is generating wrong output instead. i have attached
> broken diff. look for hunk at end. line with "-I" is only different with
> white space. but shows up in diff output as changed, even with -w option to
> diff.

It came as part of a larger hunk.  The current ws handling only ignores hunks 
where nothing but ws have changed.

-chris
msg479 (view) Author: vadim Date: 2006-02-15.18:36:53
i am trying to fix issue 126. when my fix does not work, i think it is my fault,
but diff code is generating wrong output instead. i have attached broken diff.
look for hunk at end. line with "-I" is only different with white space. but
shows up in diff output as changed, even with -w option to diff.
History
Date User Action Args
2008-09-05 09:13:17rawagajahsetnosy: mpm, ThomasAH, mason, tksoh, jglick, djc, abuehl, tlynn, haps, rawagajah
messages: + msg6952
2008-09-03 17:33:27rawagajahsetnosy: + rawagajah
messages: + msg6931
2008-08-14 15:49:03hapssetnosy: mpm, ThomasAH, mason, tksoh, jglick, djc, abuehl, tlynn, haps
messages: + msg6774
2008-08-14 15:04:43djcsetnosy: mpm, ThomasAH, mason, tksoh, jglick, djc, abuehl, tlynn, haps
messages: + msg6773
2008-08-14 15:02:44ThomasAHsetnosy: mpm, ThomasAH, mason, tksoh, jglick, djc, abuehl, tlynn, haps
messages: + msg6772
2008-08-14 14:38:22hapssetnosy: mpm, ThomasAH, mason, tksoh, jglick, djc, abuehl, tlynn, haps
messages: + msg6771
2008-08-14 13:33:20tlynnsetnosy: + tlynn
messages: + msg6769
2008-07-31 15:44:01hapssetnosy: + haps
messages: + msg6649
2008-06-06 05:12:14tksohsetnosy: + tksoh
2008-06-05 08:26:39abuehlsetnosy: + abuehl
2008-05-20 18:32:37jglicksetnosy: + jglick
messages: + msg6051
2007-12-13 12:48:31djcsetnosy: + djc
messages: + msg4588
2007-12-08 09:58:05ThomasAHsetnosy: mpm, ThomasAH, mason
messages: + msg4528
2007-12-07 21:48:18mpmsetnosy: + mpm, - vadim
messages: + msg4486
assignedto: mason ->
2006-07-09 08:19:02tonfasettopic: + diff
nosy: vadim, ThomasAH, mason
2006-02-21 13:51:40ThomasAHlinkissue126 superseder
2006-02-21 13:51:25ThomasAHsetnosy: vadim, ThomasAH, mason
title: diff does not ignore white space in hunks containing other changes -> hg diff -w does not ignore white space in hunks containing other changes
2006-02-21 13:49:09ThomasAHsetassignedto: mason
title: diff does not ignore white space -> diff does not ignore white space in hunks containing other changes
messages: + msg530
nosy: + mason, ThomasAH
2006-02-15 19:39:39masonsetstatus: unread -> chatting
messages: + msg485
2006-02-15 18:36:56vadimcreate