[PATCH 6 of 6] revset: fix a crash with 'roots(wdir())'

Pierre-Yves David pierre-yves.david at ens-lyon.org
Fri Jul 3 11:51:31 CDT 2015



On 07/01/2015 06:28 PM, Matt Harbison wrote:
> On Wed, 01 Jul 2015 09:01:17 -0400, Yuya Nishihara <yuya at tcha.org> wrote:
>
>> On Tue, 30 Jun 2015 23:31:32 -0400, Matt Harbison wrote:
>>> On Tue, 30 Jun 2015 23:24:32 -0400, Pierre-Yves David
>>> <pierre-yves.david at ens-lyon.org> wrote:
>>> > On 06/30/2015 07:56 PM, Matt Harbison wrote:
>>> >> # HG changeset patch
>>> >> # User Matt Harbison <matt_harbison at yahoo.com>
>>> >> # Date 1435715342 14400
>>> >> #      Tue Jun 30 21:49:02 2015 -0400
>>> >> # Node ID c3aecbbe50596e54402e4bd98931d33f1b2d5d01
>>> >> # Parent  e26c2caf800020627f58e45f799ce932c6aee761
>>> >> revset: fix a crash with 'roots(wdir())'
>>> >
>>> > This series worries me about two things:
>>> >
>>> > 1) performance impact, can we get than benchmarked
>>>
>>> I've seen you do benchmarking patches before, but admittedly didn't pay
>>> much attention to what you were doing or how to benchmark.  Can you
>>> point
>>> me to some specific steps to perform?
>>>
>>> > 2) this is a lot of churn using None, could we settle for another
>>> value
>>> > (max int) and move toward that instead?
>>>
>>> https://www.selenic.com/pipermail/mercurial-devel/2015-June/071656.html
>>
>> It means all "None"s in this series will have to be replaced by wdirrev.
>> Perhaps I should send my wdirrev patches soon.
>
> I'll set this aside for now then- they aren't currently an issue for
> me.  I just seemed worth fixing all of the changelog related errors
> after stumbling on it in heads().
>
>> I want to add one more thing:
>>
>> 3) can we have a function that is as fast as cl.parentrevs(r) and can
>> process
>>    wdir() ?
>
> Something like this in revset.py?  It looks like an optimization to not
> go through context.
>
> def _parentrevs(repo, cl, r):
>      if r is not None:
>          return cl.parentrevs(r)
>      else:
>          return [p.rev() for p in repo[r].parents()]
>
> The only reason I didn't do it was because with the local changelog
> caching optimization, I assumed calling a function over and over would
> be unacceptable overhead.

About performance
-----------------

1) Going through repo[r] is super bad. Between the multiple functions 
involved, the complex algorithm and the creation of python object, the 
performance are abyssal. I did a quick test and using repo[r].parents() 
in the existing code makes it 31x time slower.

At this point usage or repo[xxx] in any revsets loop should be banned.

2) Accessing `repo.changelog` have a small but significant cost (because 
we check if the changelog filtering changed. It should alway be accessed 
once, before any loops, with its value cached.

3) Function call is hurting us hard, so adding any function layer in 
time sensitive code should be avoided.
For example, dropping the changelog.parentrevs → revlog.parentrevs → 
index[x] layer to directly access repo.changelog.index speed the 
"parents" revset by a factor of 2.

hum, that email content could maybe end up in a wiki page about writing 
performant code and revset.


About the API
-------------

I'm still about bit worried about this whole move toward wdir support. 
(This is not about "should we do it" but more about "how do we do it")

For the reason listed above revset (and other parts of the code) will 
rarely use repo[id] to access the data they need to get their job done. 
direct access to the changelog object (or similar low level API) will be 
the norm in a lot of place. With what I've seen so far, it becomes easy 
to specify 'wdir' revision but all this code still have to be upgraded 
by hand. To detect if you are in the wdir case and not use the same API 
in this case. Given how many different user we have in core and the vast 
amount of extension out there, I expect a long trail of bug and 
mis-usage from the API. This is what worries me.

So I think we should have the API not panic when handed and 
'wdirrev'/'wdirid' (the same way we probably want to detect 'nullrev'/….

I see possibles path for this:

1) Have the low level API support the 'wdirrev' by providing them some 
access to the dirstate data. That would be the "best" as it would ensure 
transparent support for 'wdir' everywhere (mostly). However, this would 
probably requires a massive number of layer violation and mudding.

2) Have low level API detect 'wdirrev' case and "ignore" it, issue a 
devel-warn for developer to spot such case. This will provide a middle 
ground between full support (1) and plain crash (what we have now).

3) As you guys have been working on it more closely than I did, you will 
likely comes with another better solution.

4) I got convinced that my worries are groundless.

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list