State of Mercurial at Mozilla

Sun Nov 22 20:30:51 UTC 2015

Mozilla has been a long time user of Mercurial. At the Munich sprint in
September 2014, I gave a an overview of the state of Mercurial at Mozilla.
About a year has passed and I'd like to do something similar here since I
didn't do it at the London sprint last month.

This post is long and rambling. The tl;dr is Mercurial is currently capable
of being awesome [at Mozilla], but achieving nirvana is too difficult, as
it requires heavy customization and reliance on potentially unsupported 3rd
party tools and extensions.

Overall, I'd say the state of Mercurial at Mozilla has improved
substantially over the past year.

We were encountering numerous scaling pains on both the client and server a
year ago due to the size of the Firefox repository. The improvements to
tags and branch caches, the ability to seed clones from pre-generated
bundles, and various performance enhancements around revsets, phases,
discovery, etc have done wonders for scaling hg.mozilla.org and for making
interactions with it more pleasant. Core improvements and Facebook's
hgwatchman extension have substantially improved the client-side experience
for `hg status` and other commands. The blackbox extension has proved
invaluable at tracking down performance issues. And a steady stream of bug
fixes, performance improvements, and new features in core and the bundled
extensions have resulted in people having a much more positive opinion
about Mercurial than a year ago.

While Mercurial has been moving in a positive direction on several fronts
(and I don't want to take away from this), there are still a number of
areas for improvement. I'd like to describe some of them here.

It's worth noting that most Mercurial use at Mozilla is related to Firefox
development. And most of that revolves around the the mozilla-central
repository, which currently has ~273,000 changesets, ~273,000 manifests,
~227,000 filelogs, 1,555,000 file revisions, and ~130,000 files in the tip
manifest comprising ~910 MB checked out. A tar.gz of the working copy is
~254 MB. A gzip bundle is ~1,240 MB (wire transfer size) and on-disk file
size is ~1,700 MB.

A few months ago, Mozilla conducted a survey of its engineers on developer
workflow matters. There was a comprehensive section on version control. 182
people left feedback about Mercurial-related questions. Much of the
information I present below is derived from their responses.

The following are what I perceive to be the most significant concerns about
Mercurial at Mozilla. In no particular order:

* Commit/Patch workflows (bookmarks vs branches vs MQ vs topics vs nameless
heads vs ...)
* Default settings insufficient / requires too much effort to configure for
optimal usage
* Performance on day to day operations (status, blame, diff, rebase,
histedit)
* Cloning/pulling/pushing large repositories (lack of usable shallow clone,
narrow clone, ability to resume partial clones)
* Windows support - especially performance
* Lack of popularity / "It's not Git/GitHub"

Many of these topics are related. For example, commit workflow is strongly
coupled with what extensions are enabled and different workflows have
different performance impacts.

In the aforementioned survey, we asked an open ended question to describe
the "biggest complaint about Mercurial." Most responses can be summarized
to:

* Commit/patch workflows
* Performance

Many of the specific answers referenced Git. A lot of people believe that
Git is faster, better, etc.

I'd like to focus a bit on workflow issues. The aforementioned survey asked
which workflows people practiced:

65.7% MQ
29.8% Import/export from/to Git
21.5% Bookmarks
12.7% Evolve / Changeset evolution
12.2% Nameless heads
 7.2% Other
 5.5% Branches

A subsequent question asked for specific thoughts on MQ:

44.4% I use it regularly
39.9% Resolving conflicts is annoying
29.8% I tolerate using it, despite deficiencies
22.5% I used it previously but have stopped
15.7% I can't live without it
15.2% I love it
14.6% MQ is silly and should be avoided

So, we have somewhere between 44% and 66% of our Mercurial users using MQ.
I reckon a lot of this usage is historical: Mozilla was using Mercurial
before bookmarks and all the history rewriting facilities existed. So MQ
was the only game in town. We all know that MQ can result in a bad
experience, especially on large repositories like Firefox. `hg qpop` after
a `hg pull` of a thousand changesets is not pleasant. The lack of sane
merge conflict resolution is horrendous. Yet, a number of our users enjoy
the mental simplicity of MQ. And they have a point.

Assuming someone is sticking to the core supported workflows (bookmarks,
branches, or nameless heads), they will have a hard time with Firefox
development. Without obsolescence markers enabled, `hg rebase` or `hg
histedit` after a `hg pull` requires stripping the repo, reapplying data,
then applying the rewritten changesets. This is *worse* than MQ's similar
performance issue because at least with MQ you can pop everything before
pulling to avoid the expensive strip.

For people to enjoy the freedom and power of the core supported workflows
without the performance issues from stripping during history rewriting, you
need to install the experimental evolve extension. Of course, this
extension introduces completely new (and complicated) workflows that people
must learn. And it has an "experimental" label, making people even more
uncomfortable. For a lot of our users, the easiest choices are MQ or Git.

I would absolutely love if there were a way to turn on obsolescence markers
/ changeset hiding without introducing the new workflows from evolve. This
would provide strip-free history rewriting without the added cognitive
load. I know the "inhibit" extension does something like this. But
Pierre-Yves insists that it isn't appropriate for general deployment.

While many of our performance complaints stem from non-evolve users, there
are still other areas of performance concern. Because it isn't bundled with
Mercurial, a number of users don't have the amazing hgwatchman extension
installed. This includes our entire Windows developer base (although we
will be bundling hgwatchman as an experimental feature in our Windows
development environment soon). Speaking of Windows, there are a number of
performance pain points there. I've written previously about I/O issues
writing/closing thousands of files. `hg update` still uses a single
process. And Windows doesn't get the optimization that other platforms do.
We all know Python startup overhead hurts latency of hg commands. I think
the lack of "snappiness" running commands contributes to the perception
that Mercurial is slow[er than Git]. chg, of course, largely makes this
issue go away. I should also briefly mention blame/annotate. A number of
our users frequently perform blame as part of investigating changes made
over the course of several years, even over a decade. There are a number of
performance and usability issues with blame that make these workflows
harder than they could be. (See BlamePlan on the wiki for some ideas.)

Cloning can be problematic for a number of our users. We have community
volunteers all across the world that want to contribute to Firefox and
other Mozilla projects. Cloning 1+ GB of data over a slow or unreliable
network connection can be painful. The work I did around seeding clones
from pre-generated bundle files is helping: we put our bundles on a CDN and
people can use curl, wget, etc to resume downloads if the transparent `hg
clone` via the CDN doesn't work. But it would be really nice if Mercurial
could do incremental clones or not lose data when a clone/pull is aborted
due to network hiccup. Even better would be native support for shallow and
narrow clone. Remotefilelog appears to be a non-starter for developers
because it requires a manual garbage collection mechanism on the client.
(It is probably suitable for automation and managed machines, however.) And
narrow clone is still in the works. While things here aren't terrific, I
think we're on the right trajectory for making cloning and pulling large
repositories more pleasant.

Another major concern our users face is configuring Mercurial. With all the
configuration options and extensions, I feel Mercurial is awesome. However,
it takes a herculean task to configure it. My ~/.hgrc is 156 lines. I have
something like 30 extensions enabled (OK, some of them are developer
extensions). The settings are curated from years of constant tinkering.
Some of the settings I only know about because I follow this developer list
and 3rd party projects like Facebook's repositories on Bitbucket. New users
don't stand a chance of having what I consider a reasonable experience out
of the box, especially if you care about performance and moving fast. I
understand the need for KISS for first time VCS users and why hg is bare
bones out of the box. But the steps required for "Mercurial for power/Git
users" are not very well defined and difficult to achieve. As much as we
take pride in the simplicity of the CLI, getting to a modern and expected
configuration is complex. Users with version control knowledge (especially
Git) think Mercurial doesn't have sufficient features or is too complicated
to configure. It doesn't "just work."

Contributing to the configuration problem is that many extensions are 3rd
party or marked as experimental. A lot of people are uneasy using something
that they feel may break on next upgrade. It's a reasonable concern. For
example, Facebook's hg-experimental repository contains a ton of useful,
generic, and mostly stable extensions. But, right there in the name it says
"experimental." And, Facebook has the luxury of controlling the dev
environments and dropping backwards compatibility more aggressively than I
can at Mozilla (we try to support Mercurial versions from the past year).
There are many good ideas and improvements in the land of 3rd party
extensions. However, using them can be risky and support can be a mess. I
cannot underscore this point enough.

In summary, I believe significant gains have been made to Mercurial the
core tool, Mercurial the open source project and ecosystem, and Mercurial
at Mozilla. I feel solutions to the remaining significant pains are either
available or are under development. However, for the solutions that exist,
it is difficult to employ them because achieving an optimal Mercurial
distribution/configuration is just too difficult. This is painfully obvious
in the open source realm, where there is no strong control over machine
environments or client configurations.

If I could request a single thing to improve the state of Mercurial at
Mozilla, it would be to make achieving an optimal client configuration
easier. This could be done by moving some extensions into core (pager and
color come to mind), more aggressively bundling successful 3rd party tools
and extensions (like chg, hgwatchman, evolve, remotefilelog, smartlog, etc)
(even if they are marked as experimental and subject to change, having them
in tree would make the support situation much better), providing a
mechanism for servers to advertise recommended extensions and settings
(I've experimented with this previously and it is achievable with bundle2
now), and providing configuration wizards and prompts to help people obtain
the configuration they want.

I could write several more paragraphs containing gobs of more details. But
I think I hit all the major points I wanted to hit. I hope others find this
useful. And hopefully it goes without saying, I want to help and I'm
committed to helping where I can. I'd like to close by reiterating that
Mercurial has been moving forward very rapidly over the past year and the
general sentiment towards Mercurial at Mozilla has improved over that time.
Keep up the great work!

Gregory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20151122/c2686bab/attachment.html>