Call for testing: generaldelta

Matt Mackall mpm at selenic.com
Mon Jul 25 13:40:32 CDT 2011


Mercurial 1.9 includes an experimental feature called 'generaldelta'
that should improve compression in repositories with lots of branching.
Please help us test it so that we can work towards making it the
default.

Things we'd like to evaluate:

- how significant the compression improvements are
- how much overhead there is when communicating with older clients
- (advanced) what the best trade-off compression window size is

== Evaluating compression: ==

Do two clones:

$ hg clone -U --pull proj proj-normal
$ hg clone -U --pull --config format.generaldelta=1 proj proj-gdelta

Then compare their sizes:

(Unix) $ du -sh proj-normal proj-gdelta
31M	proj-normal
26M	proj-gdelta

And compare their manifest sizes:
$ ls -l proj-normal/.hg/store/00manifest.*
-rw-r--r-- 1 1000 1000 6043911 Jul 25 13:18 hgn/.hg/store/00manifest.d
-rw-r--r-- 1 1000 1000  955648 Jul 25 13:18 hgn/.hg/store/00manifest.i
$ ls -l proj-gdelta/.hg/store/00manifest.*
-rw-r--r-- 1 1000 1000 3197528 Jul 25 13:15 hgg/.hg/store/00manifest.d
-rw-r--r-- 1 1000 1000  955648 Jul 25 13:15 hgg/.hg/store/00manifest.i

This data may also be valuable:

$ hg debugrevlog -m
format : 1
flags  : generaldelta

revisions     :   14932
    merges    :    1763 (11.81%)
    normal    :   13169 (88.19%)
revisions     :   14932
    full      :      61 ( 0.41%)
    deltas    :   14871 (99.59%)
revision size : 3197528
    full      :  744577 (23.29%)
    deltas    : 2452951 (76.71%)

avg chain length  : 172
compression ratio : 229

uncompressed data size (min/max/avg) : 125 / 80917 / 49156
full revision size (min/max/avg)     : 113 / 37284 / 12206
delta size (min/max/avg)             : 0 / 27029 / 164

deltas against prev  : 13770 (92.60%)
    where prev = p1  : 13707     (99.54%)
    where prev = p2  :     8     ( 0.06%)
    other            :    55     ( 0.40%)
deltas against p1    :  1097 ( 7.38%)
deltas against p2    :     4 ( 0.03%)
deltas against other :     0 ( 0.00%)




== Evaluating performance: ==

Servers serving general-delta repositories will reorder changesets on
the fly to improve compression and streaming performance over the
existing wire protocol. So we'd like to see three results:

- cloning from old to old (baseline):
$ hg clone --time -U --pull proj-normal proj-normal-normal
requesting all changes
adding changesets
adding manifests
adding file changes                                                             
added 14938 changesets with 29187 changes to 2054 files                         
Time: real 10.420 secs (user 10.060+0.000 sys 0.340+0.000)

- cloning from new to old
$  hg clone --time -U --pull proj-gdelta proj-gdelta-normal
requesting all changes
adding changesets
adding manifests
adding file changes                                                             
added 14938 changesets with 29187 changes to 2054 files                         
Time: real 13.030 secs (user 12.560+0.000 sys 0.410+0.000)

- cloning from new to new
$  hg clone --time -U --pull --config format.generaldelta=1 proj-gdelta proj-gdelta-gdelta
requesting all changes
adding changesets
adding manifests
adding file changes                                                             
added 14938 changesets with 29187 changes to 2054 files                         
Time: real 16.620 secs (user 16.160+0.000 sys 0.390+0.000)

And then, compare the sizes again:

$ du -sh proj-normal-normal proj-gdelta-normal proj-gdelta-gdelta
31M	hgnn
27M	hggn
26M	hggg

== Evaluating window size ==

Tweaking the compression window size can potentially have a large impact
on resulting size, but right now tuning it requires hacking the source.
Around line 1051 in mercurial/revlog.py is the following magic constant:

        if d is None or dist > textlen * 2:
            text = buildtext()
            data = compress(text)

Changing that "* 2" to values between 3 and 10 will change the
compression/performance trade-off and may result in large improvements
in generaldelta compression in repositories with lots of branching. If
that's you, give it a try and tell us what you find. Again, the output
of 'hg debugrevlog -m' may be valuable.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial mailing list