best way to clone repo on 'central' server

Thu Mar 10 04:34:49 CST 2011

On 2011-03-10 10:03, Felix Dorner wrote:
> Hi,
> 
> I hope noone is disturbed if I elaborate a bit.

I'm not :)

> On Tue, Mar 1, 2011 at 7:22 PM, Kevin Bullock
> <kbullock+mercurial at ringworld.org> wrote:
>> On Mar 1, 2011, at 11:54 AM, Ryan wrote:
>>
>>> b) clone the repository as a file clone:
>>>
>>> hg clone /var/hg/repos/proj1 /var/hg/repos/proj1-branch
>>
>> You should do it this way to take advantage of hardlinking. When cloning locally (on the same disk), Mercurial will use hardlinks for the repository (basically everything under .hg/) to save space. It will then automatically break these hardlinks when the clones diverge.
> 
> 
> Say I have restricted filesystem permissions on the origin repository
> /var/hg/stable: Only members of a certain group "restricted" have
> write access. Now someone wants a new repo cloned off this one. As has
> been previously explained, if the person cannot lock the repo, no
> hardlinks will be used. So instead of cloning the repo himself, he
> asks a member of the group that _can_ lock the repository to clone it
> for him:
> 
> hg clone /var/hg/stable /var/hg/feature
> 
> Additionally the /var/hg has the setgroup ID bit set to the
> "developer" group, so _new_ files will have a group of which the user
> _is_ a member of. Additionally, umasks are set to 002, so the
> developer should have write access to the cloned repository. Only the
> hardlinks that are shared with the original are read-only for him
> (because they are still owned by the restricted group). Will this
> cause problems?

I'll try to give you a highly technical wordy answer, including some
insight into the implementation details of Mercurial. In the hope it is
useful for you (or maybe others).

I'm not a Linux expert (Windows 7 here, running Ubuntu Linux in a VM for
mercurial testing), but I'd say it's *not* a problem.

Mercurial uses a special opener object under the hood when accessing
files (I stared a lot at this tiny bit of code. See mercurial/util.py,
line 872. It's quite a hot spot in the codebase of Mercurial :-).

This opener creates objects that behave like python file objects.

  http://docs.python.org/tutorial/inputoutput.html#methods-of-file-objects

Higher levels of the mercurial code then use these objects to read and
write to the files on disk in the repository (for both the history store
and the working directory).

Now, if mercurial wants to write to a file on disk, it asks the
responsible opener (there are multiple) to open it (function __call__),
giving it the path of the file. That function's job is to return a
python file object.

The opener then checks the kind of access that is requested. If it's an
access kind (parameter "mode") that wants to open the file for
*modification*, then the opener takes a look at how many hardlinks that
file has (nlink).

  (see also http://docs.python.org/library/functions.html#open )

If the nlink value is > 1 (or it detects hardlink blindness for that
directory -- but let's ignore this for now) then the opener assumes that
this file is shared via hardlinks with another directory (could be
another repository). It then breaks up the hardlink *before* creating
the file object and returning it.

The procedure for "breaking up a hardlinked file F" is basically as
follows (ignoring some tricks for now, which we use to work around some
Windows weirdness [1]):

(1) create a full copy of F to a temporary file T in the same dir
(2) call os.unlink on F
(3) rename T to F

Step (2) works, since you have write access to the directory. Unlinking
just removes the name entry in the directory data structure of the file
system. The file *contents* are not touched.

So, the opener does COW when needed (copy on write), hiding the details
of how it's done from higher levels of the code.

> Or are the hardlinked files immutable and should never
> change anyway?

Exactly. If a file is hardlinked, mercurial does *not* touch its
contents (if it did, it would be a horrible bug, which can be detected
by using hg verify for files in the store).

The opener makes sure that files which are hardlinked are not modified.
If they need to be modified coming from one link, the link is broken up
before modifying the file.

As I tried to explain above.

> If this approach is problematic, then there's two
> alternatives:

No. I don't think the approach is problematic. Your worries are
ill-founded :)

> 1) Make the original repository less restricted, i.e. make everything
> owned by the developer group. (And prevent commits via mercurial
> hooks. Not as safe as using file system though, as hooks are
> overridable)
> 2) Don't use the hardlinking feature at all , and instead run clone --pull

I think whatever tricks you do with restricting file permissions,
mercurial commands will either abort with an error message or do
something sensible with your repository, which includes hardlinked clones.

  http://mercurial.selenic.com/wiki/HardlinkedClones

(Of course, the usual disclaimer applies. We cannot guarantee that
Mercurial is free of bugs. See the license for details.)

BTW, if you want to see the opener in action, then install and enable my
fsdebug debugging extension

  https://bitbucket.org/abuehl/fsdebug

and watch it making noise when using mercurial on the command line.

As another side note, there is an extension shipping together with
mercurial which provides a way to reestablish hardlinks in hardlinked
clones.

See http://mercurial.selenic.com/wiki/RelinkExtension

(Now, I'll shut up :)

[1] http://mercurial.selenic.com/wiki/UnlinkingFilesOnWindows