Issue567

Title diff takes too long
Priority bug Status resolved
Superseder Nosy List abuehl, alexis, cedric, djc, ostkamp, tonfa
Assigned To Topics

Created on 2007-05-21.18:39:31 by ostkamp, last changed 2008-10-09.08:35:16 by djc.

Messages
msg7350 (view) Author: djc Date: 2008-10-09.08:35:16
No response, resolving.
msg7127 (view) Author: djc Date: 2008-09-15.08:07:34
So is there still a problem here, or was the slowdown fixed after all?
msg6914 (view) Author: tonfa Date: 2008-09-02.13:34:35
pushed in crew b3bc518a73c3 and 63d1d3e489f8.
msg6911 (view) Author: tonfa Date: 2008-09-02.09:50:34
Right, there's a typo.

But I'm experiencing a slowdown with this patch (with the hg kernel tree).
msg6909 (view) Author: cedric Date: 2008-09-02.07:18:05
> +            idgnore = util.always

But, isn't there a typo here?
msg6908 (view) Author: cedric Date: 2008-09-02.06:26:25
> Works for me:

For me also. Especially on a cold start, the speedup is very noticeable.
Without the patch:
  # echo 3 > /proc/sys/vm/drop_caches
  $ time hg diff
  hg diff  0,86s user 0,53s system 11% cpu 12,188 total

With the patch:
  # echo 3 > /proc/sys/vm/drop_caches
  $ time hg diff
  hg diff  0,16s user 0,04s system 10% cpu 1,926 total

  $ strace -c hg diff 2>&1 | awk '/stat/ {s+=$4} END {print s}'
  561

Thanks!
msg6907 (view) Author: tonfa Date: 2008-09-01.23:25:43
Works for me:

diff -r 91f6370c433c mercurial/dirstate.py
--- a/mercurial/dirstate.py	Tue Sep 02 01:19:12 2008 +0200
+++ b/mercurial/dirstate.py	Tue Sep 02 01:19:50 2008 +0200
@@ -434,12 +434,15 @@
             self._ui.warn(_('%s: unsupported file type (type is %s)\n')
                           % (self.pathto(f), kind))
 
-        # TODO: don't walk unknown directories if unknown and ignored are False
         ignore = self._ignore
         dirignore = self._dirignore
         if ignored:
             ignore = util.never
             dirignore = util.never
+        elif not unknown:
+            # if unknown and ignored are False, skip step 2
+            idgnore = util.always
+            dirignore = util.always
 
         matchfn = match.matchfn
         dmap = self._map
msg6905 (view) Author: cedric Date: 2008-09-01.21:31:58
> Do you have a script to reproduce it ?

Well, nothing more sophisticated than
  cd linux-2.6.26
  hg init
  strace -c hg diff

  $ strace -c hg diff 2>&1 | awk '/stat/{s+=$4} END{print s}'
  27716
  $ strace -c hg diff 2>&1 | awk '/newfstatat/ {print $4}'
  25713
  $ find | wc -l
  25725
msg6904 (view) Author: tonfa Date: 2008-09-01.20:58:39
Do you have a script to reproduce it ?
msg6903 (view) Author: cedric Date: 2008-09-01.19:46:49
Hi,

> Benoit Boissinot <bboissin@gmail.com> added the comment:
> Does this still happen ?

When I posted this report (e160f2312815):
  1922  stat64
  2923  getdents
  25713 newfstatat

And today, with crew's tip (a04d8cadb6af):
  1967  stat64
  2923  getdents
  25713 newfstatat

So for making a diff, mercurial still stats every single file in
the source tree, even if they are not under its control.

I am testing on a local filesystem today, and it just takes one
second for the diff, which is perfectly bearable. By the time I
posted the report, though, on a networked filesystem, it was not
usable.

I tried a few other revisions mentionned in this bug report:
dccb83241dd0 ("which should be bad"), b4eaa68dea1b ("which
should be ok"), b79cdb7f0597 ("should fix this"); and they
all yield similar figures (but different from the ones above):
  27564 stat64
  2923  getdents
  0     newfstatat

So some time ago the stat calls have been replaced with newfstatat
calls. Out of curiosity, I bisected to find when this change did
occur: 5105b119edd2 ("summary: Add osutil module, containing a
listdir function")
msg6902 (view) Author: tonfa Date: 2008-09-01.17:23:11
Does this still happen ?
msg5031 (view) Author: cedric Date: 2008-01-30.17:58:54
Hi,

I am posting a followup on this bug marked as resolved, as I am having the same
issue. Tested with today's snapshot e160f2312815.

On a freshly 'hg qinited' Linux kernel repository, just doing a 'hg diff'
triggers 26370 lstat64() calls, recursing in subtrees that are not under
mercurial's control (no files added).

The filesystem where the repository resides being network mounted, those
"spurious" lstat make mercurial unusable for me (approximately 50 seconds for
this begign diff) on such a big repository.
msg3149 (view) Author: ostkamp Date: 2007-06-04.11:08:15
Ok, I checked again and found the following:

As we now, revision dccb83241dd0 (Mon Dec 25 17:43:49 2006) introduced the problem.

However, while doing the bisect, I found that with revision 2216535f07d7 (Thu
Mar 15 18:55:53 2007) the problem was no longer present (probably even with
earlier revisions).

But with revision 6cb5be6bd70f (Wed Mar 14 10:45:18 2007) it was re-introduced
with even more stat calls.

As far as I can see these are different branches which have been merged somehow
later.

Can you investigate?

I also can confirm that with revision 22ebd6ee5672 (from crew) the problem is fixed.
msg3146 (view) Author: alexis Date: 2007-06-03.20:57:04
Changeset b79cdb7f0597 in the crew repo should fix this.

I'm still not sure why you were getting many more stat calls with 8fa54b9c6c5a
compared to dccb83241dd0. If you're still seeing this, can you try to use hg
bisect to determine when this was introduced?
msg3117 (view) Author: ostkamp Date: 2007-05-22.18:20:51
Add-on to my last note:

You can forget my comment regarding the Mercurial vs. Git numbers. The 475 calls
for Mercurial are mainly caused by environment and Python file lookups. I
checked and there are just 8 stats on files in my repositories file tree.
msg3116 (view) Author: ostkamp Date: 2007-05-22.18:05:12
I tested it again as requested (strace with same 'hg diff' call), the results
are as follows

With hg revision b4eaa68dea1b:
475 stat calls, 2 getdent calls

With hg revision dccb83241dd0:
25523 stat calls, 4633 getdent calls

With most current revision 8fa54b9c6c5a:
31084 stat calls, 4644 getdent calls

So there is a significant increase with the version you mentioned, however the
most current version is even more worse.

Even with the best results (475 calls) this is still a lot compared to Git (12).

I hope this has helped, if you need more info please contact me anytime.

Thanks!
msg3111 (view) Author: alexis Date: 2007-05-22.17:09:11
This is probably a problem that was introduced by the contextification of
patch.diff back in changeset dccb83241dd0.

In addition to the (restricted) working dir walk required to figure out what has
changed, we've started doing a full walk to generate a full manifest for the
working dir.

Can you do a quick test with hg revisions dccb83241dd0 (which should be bad) and
b4eaa68dea1b (which should be ok) to confirm this?
msg3105 (view) Author: ostkamp Date: 2007-05-21.18:39:30
(This has also been posted on the List on 2007-05-16 and was accepted as bug by
Matt Mackall)

The mercurial version I used for testing was ba22e867cb23+20070510.

I changed a file in my large repository, then I tried a 
'hg diff pathname' on it. This simple call took a very long time.

I investigated using 'strace' and it appears Mercurial
is using tons of lstat() and getdents() calls. In my case I 
counted 29009 lstat64() and 4634 getdents64() calls. lstat
is called for tons of files that have nothing to do with
my change.

For comparision: I have the same repository in Git. There I get the response
in almost no time with just 12 lstat() calls and 2 getdents().

When a fix gets available, I can test it anytime.
History
Date User Action Args
2008-10-09 08:35:16djcsetstatus: testing -> resolved
nosy: tonfa, alexis, ostkamp, djc, abuehl, cedric
messages: + msg7350
2008-09-15 08:07:35djcsetnosy: + djc
messages: + msg7127
2008-09-02 13:34:35tonfasetstatus: chatting -> testing
nosy: tonfa, alexis, ostkamp, abuehl, cedric
messages: + msg6914
2008-09-02 11:46:42abuehlsetnosy: + abuehl
2008-09-02 09:50:34tonfasetnosy: tonfa, alexis, ostkamp, cedric
messages: + msg6911
2008-09-02 07:18:06cedricsetnosy: tonfa, alexis, ostkamp, cedric
messages: + msg6909
2008-09-02 06:26:26cedricsetnosy: tonfa, alexis, ostkamp, cedric
messages: + msg6908
2008-09-01 23:25:43tonfasetnosy: tonfa, alexis, ostkamp, cedric
messages: + msg6907
2008-09-01 21:31:58cedricsetnosy: tonfa, alexis, ostkamp, cedric
messages: + msg6905
2008-09-01 20:58:39tonfasetnosy: tonfa, alexis, ostkamp, cedric
messages: + msg6904
2008-09-01 19:46:49cedricsetnosy: tonfa, alexis, ostkamp, cedric
messages: + msg6903
2008-09-01 17:23:12tonfasetnosy: + tonfa
messages: + msg6902
2008-01-30 17:58:54cedricsetstatus: resolved -> chatting
nosy: + cedric
messages: + msg5031
2007-06-23 17:53:29mpmsetstatus: testing -> resolved
nosy: alexis, ostkamp
title: Mercurial Performance Problem -> diff takes too long
2007-06-04 11:08:15ostkampsetnosy: alexis, ostkamp
messages: + msg3149
2007-06-03 20:57:05alexissetstatus: chatting -> testing
nosy: alexis, ostkamp
messages: + msg3146
2007-05-22 18:20:51ostkampsetnosy: alexis, ostkamp
messages: + msg3117
2007-05-22 18:05:12ostkampsetnosy: alexis, ostkamp
messages: + msg3116
2007-05-22 17:09:11alexissetstatus: unread -> chatting
nosy: + alexis
messages: + msg3111
2007-05-21 18:39:31ostkampcreate