Call for testing: generaldelta
Matt Mackall
mpm at selenic.com
Mon Jul 25 13:40:32 CDT 2011
Mercurial 1.9 includes an experimental feature called 'generaldelta'
that should improve compression in repositories with lots of branching.
Please help us test it so that we can work towards making it the
default.
Things we'd like to evaluate:
- how significant the compression improvements are
- how much overhead there is when communicating with older clients
- (advanced) what the best trade-off compression window size is
== Evaluating compression: ==
Do two clones:
$ hg clone -U --pull proj proj-normal
$ hg clone -U --pull --config format.generaldelta=1 proj proj-gdelta
Then compare their sizes:
(Unix) $ du -sh proj-normal proj-gdelta
31M proj-normal
26M proj-gdelta
And compare their manifest sizes:
$ ls -l proj-normal/.hg/store/00manifest.*
-rw-r--r-- 1 1000 1000 6043911 Jul 25 13:18 hgn/.hg/store/00manifest.d
-rw-r--r-- 1 1000 1000 955648 Jul 25 13:18 hgn/.hg/store/00manifest.i
$ ls -l proj-gdelta/.hg/store/00manifest.*
-rw-r--r-- 1 1000 1000 3197528 Jul 25 13:15 hgg/.hg/store/00manifest.d
-rw-r--r-- 1 1000 1000 955648 Jul 25 13:15 hgg/.hg/store/00manifest.i
This data may also be valuable:
$ hg debugrevlog -m
format : 1
flags : generaldelta
revisions : 14932
merges : 1763 (11.81%)
normal : 13169 (88.19%)
revisions : 14932
full : 61 ( 0.41%)
deltas : 14871 (99.59%)
revision size : 3197528
full : 744577 (23.29%)
deltas : 2452951 (76.71%)
avg chain length : 172
compression ratio : 229
uncompressed data size (min/max/avg) : 125 / 80917 / 49156
full revision size (min/max/avg) : 113 / 37284 / 12206
delta size (min/max/avg) : 0 / 27029 / 164
deltas against prev : 13770 (92.60%)
where prev = p1 : 13707 (99.54%)
where prev = p2 : 8 ( 0.06%)
other : 55 ( 0.40%)
deltas against p1 : 1097 ( 7.38%)
deltas against p2 : 4 ( 0.03%)
deltas against other : 0 ( 0.00%)
== Evaluating performance: ==
Servers serving general-delta repositories will reorder changesets on
the fly to improve compression and streaming performance over the
existing wire protocol. So we'd like to see three results:
- cloning from old to old (baseline):
$ hg clone --time -U --pull proj-normal proj-normal-normal
requesting all changes
adding changesets
adding manifests
adding file changes
added 14938 changesets with 29187 changes to 2054 files
Time: real 10.420 secs (user 10.060+0.000 sys 0.340+0.000)
- cloning from new to old
$ hg clone --time -U --pull proj-gdelta proj-gdelta-normal
requesting all changes
adding changesets
adding manifests
adding file changes
added 14938 changesets with 29187 changes to 2054 files
Time: real 13.030 secs (user 12.560+0.000 sys 0.410+0.000)
- cloning from new to new
$ hg clone --time -U --pull --config format.generaldelta=1 proj-gdelta proj-gdelta-gdelta
requesting all changes
adding changesets
adding manifests
adding file changes
added 14938 changesets with 29187 changes to 2054 files
Time: real 16.620 secs (user 16.160+0.000 sys 0.390+0.000)
And then, compare the sizes again:
$ du -sh proj-normal-normal proj-gdelta-normal proj-gdelta-gdelta
31M hgnn
27M hggn
26M hggg
== Evaluating window size ==
Tweaking the compression window size can potentially have a large impact
on resulting size, but right now tuning it requires hacking the source.
Around line 1051 in mercurial/revlog.py is the following magic constant:
if d is None or dist > textlen * 2:
text = buildtext()
data = compress(text)
Changing that "* 2" to values between 3 and 10 will change the
compression/performance trade-off and may result in large improvements
in generaldelta compression in repositories with lots of branching. If
that's you, give it a try and tell us what you find. Again, the output
of 'hg debugrevlog -m' may be valuable.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial
mailing list