<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Tue, Jun 9, 2015 at 12:58 PM Pierre-Yves David <<a href="mailto:pierre-yves.david@ens-lyon.org">pierre-yves.david@ens-lyon.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

On 06/03/2015 10:32 AM, FUJIWARA Katsunori wrote:<br>

><br>

> At Tue, 02 Jun 2015 19:40:26 +0000,<br>

> Martin von Zweigbergk wrote:<br>

><br>

>> On Tue, Jun 2, 2015 at 9:27 AM FUJIWARA Katsunori <<a href="mailto:foozy@lares.dti.ne.jp" target="_blank">foozy@lares.dti.ne.jp</a>><br>

>> wrote:<br>

><br>

>>> At Mon, 01 Jun 2015 18:57:16 +0000,<br>

>>> Martin von Zweigbergk wrote:<br>

><br>

>>>> On Mon, Jun 1, 2015 at 11:07 AM FUJIWARA Katsunori <<br>

>>> <a href="mailto:foozy@lares.dti.ne.jp" target="_blank">foozy@lares.dti.ne.jp</a>><br>

>>>> wrote:<br>

>>>><br>

>>>>> At Mon, 01 Jun 2015 17:02:10 +0000,<br>

>>>>> Martin von Zweigbergk wrote:<br>

><br>

> (snip)<br>

><br>

>>>>>    (2) execute `hg merge`, but it is aborted before "update dirstate"<br>

>>>>>        for example:<br>

>>>>>        - invalid updating in subrepositories<br>

>>>>>        - keyboard interruption by user, file I/O error and so on<br>

>>>>>> $ hg --config extensions.abort=$TESTTMP/abort.py merge 5<br>

>>>>><br>

>>>>>    (3) (re-)set timestamp of 'b' on the filesystem for mis-leading<br>

>>>>>        this emulates that `hg merge` update 'b' before<br>

>>> 200001010000(00)+1sec<br>

>>>>>> $ touch -t 200001010000 b<br>

>>>>><br>

>>>><br>

>>>> Since 'hg status' above was run at 200001010000(00)+1sec (or later), how<br>

>>>> can this (interrupted) merge be run at 200001010000(00)?<br>

>>><br>

>>> Oh, your wondering is natural. My explanation was wrong :-<<br>

>>><br>

>>> In real world, steps below reproduce this issue:<br>

>>><br>

>><br>

>> Great explanation! Thanks!<br>

>><br>

>> So the problem is that we mark the file clean in dirstate as a result of<br>

>> checking its status, then we update it on disk, abort the update, write the<br>

>> clean dirstate to disk and stay on the old revision. It seems a little<br>

>> strange to me that we write the dirstate even on failure. Do you know if<br>

>> there is a good reason for that?<br>

><br>

> I don't know actual reason. It may be:<br>

><br>

>    Invalidating in-memory changes at aborting has subsequent hg<br>

>    commands pay cost of "checking dirty-ness" again.<br>

><br>

>    The more "checking dirty-ness" costs, the more and larger possibly<br>

>    dirty files there are.<br>

><br>

> On the other hand, Pierre-Yves suggested me to discard all dirstate<br>

> changes inside transaction at failure in the article about making<br>

> in-memory dirstate changes visible to external process (in this case,<br>

> important point is "dirstate may refer rollback-ed revisions").<br>

><br>

> So, I'll re-examine what we should treat in-memory dirstate changes at<br>

> failure widely (maybe after fixing this issue :-))<br>

><br>

><br>

>> Perhaps another option is to mark all files in the merge normallookup()<br>

>> before applyupdates()? That should mean that an abort later on would be<br>

>> safe, no?<br>

><br>

> `dirstate.normallookup()` always discards all of mode, size and<br>

> timestamp. This forces subsequent status examinations to compare file<br>

> contents until it marked as clean, because lack of "mode and size"<br>

> marks such files as "unsure" until it is marked as "clean".<br>

><br>

> This costs so much with large "possibly dirty" files.<br>

><br>

><br>

> BTW, I hit an idea of dirstate functionality to discard timestamp but<br>

> keep expected mode/size (current `normallookup` discards all of them).<br>

><br>

> When at least one of mode/size of file is changed, this functionality<br>

> can avoid comparison of file content at subsequent status checking.<br>

><br>

><br>

>>>    == @200001010000(00)<br>

>>><br>

>>>    (A) 'b' should be modified at this point, but<br>

>>><br>

>>>    (B) it is assumed that dirstate timestamp of 'b' is -1 at this point<br>

>>><br>

>>>    (C) invoke `hg merge`<br>

>>><br>

>>>      (C-1) get wlock<br>

>>><br>

>>>      (C-2) check uncommitted changes in `merge.update()` by `wc.files()`:<br>

>>><br>

>>>            this indirectly sets dirstate timestamp of 'b' as<br>

>>> 200001010000(00)<br>

>>><br>

>>>              wc.files() => wc._status() => repo.status() => ... =><br>

>>>              wc._dirstatestatus() => wc._checklookup() => dirstate.normal()<br>

>>><br>

>>><br>

>> Would an alternative fix be to write the dirstate here (and after<br>

>> recordupdates(), of course)? Would that be unacceptably slow, perhaps?<br>

><br>

> It should be cheaper than merge/update itself.<br>

><br>

> But we may have to put explicit `dirstate.write()` into many code<br>

> paths for safety. Today, we know that putting it before<br>

> `applyupdates()` avoids this issue. Then, tomorrow, we may have to put<br>

> it into other places to avoid similar issues.<br>

><br>

> So, I want to find generalized fixing.<br>

<br>

Wait, is this series directly related with the dirstate<br>

guard/transaction thing ? (as in, the next step we must do) or is it an<br>

unrelated fix to dirstate.<br></blockquote><div><br></div><div>I think it is related, but I'll let Fujiwara confirm.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

If it is unrelated, we should probably focus on fixing the dirstate<br>

write scheduling before working on this issue.<br></blockquote><div><br></div><div>And what do you recommend if it is related?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

--<br>

Pierre-Yves David<br>

</blockquote></div></div>