<br><br><div class="gmail_quote">On Tue Nov 18 2014 at 3:57:01 PM Pierre-Yves David <<a href="mailto:pierre-yves.david@ens-lyon.org">pierre-yves.david@ens-lyon.org</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"># HG changeset patch<br>

# User Pierre-Yves David <<a href="mailto:pierre-yves.david@fb.com" target="_blank">pierre-yves.david@fb.com</a>><br>

# Date 1409847572 -7200<br>

#      Thu Sep 04 18:19:32 2014 +0200<br>

# Node ID 06767ea27aca31cd1e65ccd32c22c4<u></u>5c1ebab83d<br>

# Parent  e63941631a3f61b3323dbcc2545689<u></u>b1eb34e308<br>

graphmod: add a function for topological iteration<br>

<br>

This changesets introduce a function to perform topological (one branch after<br>

the other) iteration over a set of changeset. This first version have a lot of<br>

limitation but the approach should be flexible enough to allow all kind of<br>

improvement in the future. This changeset aims at setting the first stone more<br>

than providing a final solution.<br>

<br>

The algorithm works does not needs to know the whole set of nodes involved<br>

before emitting revision. So make it a good candidate for usage in place like<br>

`hg log` or graphical tools that need a fast first result time.<br>

<br>

diff --git a/mercurial/graphmod.py b/mercurial/graphmod.py<br>

--- a/mercurial/graphmod.py<br>

+++ b/mercurial/graphmod.py<br>

@@ -20,10 +20,160 @@ Data depends on type.<br>

 from mercurial.node import nullrev<br>

 import util<br>

<br>

 CHANGESET = 'C'<br>

<br>

+def topoiter(revs, parentsfunc):<br>

+    """topologically iter over a set of revision, one branch at a time.<br>

+<br>

+    Currently does not handle non contigious <revs> input.<br>

+<br>

+    Currently consider every changeset under a merge to be on the same branch<br>

+    using revision number to sort them.<br></blockquote><div><br></div><div>Any plans for how to extend the algorithm for that case? Not asking you to code it up, just checking that it's not too hard to extend for that case.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

+<br>

+    Could be easily extend to give priority to an initial branch.<br>

+<br>

+    The revision are emitted heads to roots."""<br>

+    ### Quick summary of the algorithm<br>

+    #<br>

+    # This function is based around a "retention" principle. We keep revision<br>

+    # in memory until we are ready to emit a whole branch that immediately<br>

+    # "merge" into an existing one. This reduce the number of branch "ongoing"<br>

+    # at the same time.<br>

+    #<br>

+    # During iteration revs are split into two groups:<br>

+    # A) revision already emitted<br>

+    # B) revision in "retention". They are stored as different subgroup.<br>

+    #<br>

+    # for each REV, we do the follow logic:<br>

+    #<br>

+    #   if REV is a parent of (A), we will emit it. But before emitting it,<br>

+    #   we'll "free" all the revs from subgroup in (B) that were waiting for<br>

+    #   REV to be available.<br>

+    #<br>

+    #   else, we'll search for a subgroup if (B) awaiting for this revision to<br>

+    #   be available, if such group exist, we add REV to it and the subgroup is<br>

+    #   now awaiting for REV.parents() to be available.<br>

+    #<br>

+    #   finally if no such group existed in (B), we create a new subgroup.<br>

+    #<br>

+    #<br>

+    # To bootstrap the algorithm, we display the first revision we saw.<br></blockquote><div><br></div><div>What does this line refer to? There is no explicit bootstrapping here. It seems like it flows from the "if no unblocked revisions exist, pick arbitrary group", but I'm not sure.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

+<br>

+    revs.sort(reverse=True)<br></blockquote><div><br></div><div>For someone not familiar with the revset (?) implementation, this seems to go against the goal of incrementally producing the revisions. Is "sort(reverse=True)" simply not doing anything to a lazy revset that is not already explicitly sorted in some other order?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

+<br>

+    # set of parents of revision that have been yield. They can be considered<br>

+    # unblocked as the graph generator is already aware of them so there is no<br>

+    # need to delay the one that reference them.<br>

+    unblocked = set()<br>

+<br>

+    # list of group waiting to be displayed, each group is defined by:<br>

+    #<br>

+    #   (revs:    lists of revs waiting to be displayed,<br>

+    #    blocked: set of that cannot be displayed before those in the sets)<br>

+    #<br>

+    # The second value  correspond to parents of any revision in the group<br>

+    # that is not itself contained in the group. The main idea of this<br>

+    # algorithm is to delay as much as possible the emission of any revision.<br>

+    # This means waiting for the moment we are about to display theses<br>

+    # parents to display the revs in a group.<br>

+    #<br>

+    # This first implementation is smart until it meet a merge: it will<br>

+    # emit revs as soon as any parents is about to be emitted and can<br>

+    # grow an arbitrary number of revs in `blocked`. In practice this mean<br>

+    # we properly retains new branches but does not any special ordering<br>

+    # for ancestors of merges. The implementation can be improved to handle<br>

+    # this better.<br>

+    #<br>

+    # the first group is a special group. It correspond to all the revision<br>

+    # that were already emitted. the <revs> lists is expected to be empty<br>

+    # and the <blocked> set contains the parents revisions of already emitted<br>

+    # revision.<br>

+    #<br>

+    # You could pre-seed the <parents> set of groups[0] to a specific<br>

+    # changesets to select what the first emitted branch should be.<br>

+    #<br>

+    # We do not support revisions will hole yet, but adding such support<br>

+    # would be easy. The iteration will have to be done using both input<br>

+    # revision and parents (see cl.ancestors function + a few tweaks) but<br>

+    # only revisions parts of the initial set should be emitted.<br>

+    groups = [([], unblocked)]<br>

+    for current in revs:<br>

+        # Look for a group awaiting on the current revision.<br>

+        matching = [i for i, g in enumerate(groups) if current in g[1]]<br>

+<br>

+        if matching:<br>

+            # The main idea is to gather together all set that await on the<br>

+            # same revision.<br>

+            #<br>

+            # this merging is done at the time we are about to add this common<br>

+            # awaited to the group for simplicity purpose. Such merge could<br>

+            # happen sooner when we update the "blocked" set of revision.<br>

+            #<br>

+            # We also always keep the oldest group first. We can<br>

+            # probably improve the behavior by having the longuest set<br>

+            # first. That way, graph algorythms could minimise the<br>

+            # length of parallele lines their draw. This is currently<br>

+            # not done.<br>

+            targetidx = matching.pop(0)<br>

+            trevs, tparents = groups[targetidx]<br>

+            for i in matching:<br>

+                gr = groups[i]<br>

+                trevs.extend(gr[0])<br>

+                tparents |= gr[1]<br>

+            # delete all merged groups (but the one we keep)<br>

+            # (starting from the last group for performance and sanity reason)<br>

+            for i in reversed(matching):<br>

+                del groups[i]<br>

+        else:<br>

+            # his is a new head we create a new group for it.<br>

+            targetidx = len(groups)<br>

+            groups.append(([], set([current])))<br>

+<br>

+        gr = groups[targetidx]<br>

+<br>

+        # We now adds the current nodes to this groups. This is done after<br>

+        # the group merging because all elements from a group that relied on<br>

+        # this rev must preceed it.<br>

+        #<br>

+        # we also update the <parents> set to includes the parents on the<br>

+        # new nodes.<br>

+        gr[0].append(current)<br>

+        gr[1].remove(current)<br>

+        gr[1].update([p for p in parentsfunc(current) if p > nullrev])<br>

+<br>

+        # Look for a group to display<br>

+        #<br>

+        # When unblocked is empty (if clause), We are not waiting over any<br>

+        # revision during the first iteration (if no priority was given) or if<br>

+        # we outputed a whole disconnected sets of the graph (reached a root).<br>

+        # In that case we arbitrarily takes the oldest known group. The<br>

+        # heuristique could probably be better.<br>

+        #<br>

+        # Otherwise (elif clause) this mean we have some emitted revision.<br>

+        # if the group awaits on the same revision that the outputed ones,<br>

+        # we can safely output it.<br>

+        if not unblocked:<br>

+            if len(groups) > 1:  # display other subset<br>

+                targetidx = 1<br>

+                gr = groups[1]<br>

+        elif not gr[1] & unblocked:<br>

+            gr = None<br>

+<br>

+        if gr is not None:<br>

+            # update the set of awaited revisions with the one from the group<br>

+            unblocked |= gr[1]<br>

+            # output all revisions in the group<br>

+            for r in gr[0]:<br>

+                yield r<br>

+            # deleted the group that you just outputed.<br>

+            # unless it is group[0] in which case you just empty it.<br>

+            if targetidx:<br>

+                del groups[targetidx]<br>

+            else:<br>

+                gr[0][:] = []<br>

+<br>

 def dagwalker(repo, revs):<br>

     """cset DAG generator yielding (id, CHANGESET, ctx, [parentids]) tuples<br>

<br>

     This generator function walks through revisions (which should be ordered<br>

     from bigger to lower). It returns a tuple for each node. The node and parent<br>

______________________________<u></u>_________________<br>

Mercurial-devel mailing list<br>

<a href="mailto:Mercurial-devel@selenic.com" target="_blank">Mercurial-devel@selenic.com</a><br>

<a href="http://selenic.com/mailman/listinfo/mercurial-devel" target="_blank">http://selenic.com/mailman/<u></u>listinfo/mercurial-devel</a><br>

</blockquote></div>