What is the purpose of the "Use pull protocol to copy metadata" hg clone option?

Thu May 27 18:01:50 CDT 2010

On 28.05.2010 00:45, Steve Borho wrote:
> On Thu, May 27, 2010 at 5:20 PM, Steve Borho <steve at borho.org> wrote:
>> On Thu, May 27, 2010 at 4:09 PM, Adrian Buehlmann <adrian at cadifra.com> wrote:
>>> On 27.05.2010 20:23, Martin Geisler wrote:
>>>> Adrian Buehlmann <adrian at cadifra.com> writes:
>>>>
>>>>> On 27.05.2010 17:24, Steve Borho wrote:
>>>>>> On Thu, May 27, 2010 at 9:51 AM, Didly Bom <didlybom at gmail.com> wrote:
>>>>>>> On Thu, May 27, 2010 at 4:24 PM, Matt Mackall <mpm at selenic.com> wrote:
>>>>>>>>
>>>>>>>> When you say SMB, do you actually mean the completely different CIFS
>>>>>>>> protocol?
>>>>>>>>
>>>>>>>
>>>>>>> Matt, I actually meant a regular windows network share. If I recall
>>>>>>> correctly that used to be called a samba or smb share on the linux world. My
>>>>>>> understanding was that cifs is just an evolution of smb.
>>>>>>> Sorry if my explanation was confusing. My point was that setting up a
>>>>>>> "central" repository on a shared network share is a very useful and probably
>>>>>>> common scenario.
>>>>>
>>>>> Talking about SMB if you don't mean it is indeed rather unhelpful.
>>>>
>>>> Did you not understand what he was talking about? I don't know what the
>>>> difference between CIFS and SMB is, but I when I've called Windows
>>>> network shares for "SMB", then people have understood me just fine.
>>>
>>> It certainly matters to some extent if someone reports a problem
>>> involving network shares, if that share is served just by something like
>>> Samba or a full blown native Windows server.
>>>
>>> If I see "SMB share", I mostly infer "Samba server" or some NAS device,
>>> which are known to have some quirks. People using native Windows server
>>> shares usually don't refer to such shares by the term "SMB share".
>>>
>>> But yes, it might well be that I didn't understand some things. As always.
>>>
>>>>>> Especially in corporate environs that are heavily firewalled. Running
>>>>>> an ad-hoc web server is completely frowned upon (you must get
>>>>>> permission from IT), while setting up clones on a network share is
>>>>>> very much "do first, ask permission later".
>>>>>
>>>>> It's just that creating clones on the server while running mercurial
>>>>> on a client isn't exactly that frequently.
>>>>
>>>> You need a place to share the code, right? When you cannot (must not)
>>>> start 'hg serve' and when there already is a network filesystem in use,
>>>> then using that filesystem for sharing seems very natural to me.
>>>>
>>>>> Most simple setups with a web server don't allow cloning on the server
>>>>> either.
>>>>>
>>>>> And I fail to see what the problem is if people have to issue --pull
>>>>> when they clone from a network share to their local disk.
>>>>
>>>> I think it's a problem if you need to specify a flag to make 'hg clone'
>>>> fast.
>>>
>>> I agree, after having learned that the problem reported apparently
>>> exists when cloning from a share served by a native Windows.
>>>
>>> I'm wondering though why no one else has complained about it so far, if
>>> that use case occurs that frequently as Didly says.
>>>
>>
>> It's been reported to the TortoiseHg BTS, as an enhancement request to
>> always turn on the --pull flag when cloning from a UNC url.
>>
>> http://bitbucket.org/tortoisehg/stable/issue/1060
>>
>> What I don't understand is that Mercurial should be falling back to
>> --pull anyway, since the source is obviously not on the same device as
>> the destination.  The UNC path must be breaking that detection logic.
>>  Does mapping the network drive to a drive letter improve the
>> performance?
>>
>> hmm; poking around in hg.py makes me think this is probably the case.
> 
> Nah, strike that.  Looking further I think my first guess was correct
> and this is a latency problem.
> 
> A "local" clone, one that does not involve an http or ssh protocol,
> will copy files one by one from the source to the dest folder.  If
> this per-file transaction setup is expensive, it will be dog slow
> because it is all serialized.  A 'pull' clone streams the source
> repository in one (usually large) transaction.
> 
> But again, we'd need a network trace to know for sure.
> 

I'm beginning to wonder if this could be related to that WebDAV
weirdness we have seen (and fixed) for the .hg/dirstate -reading overlay
handler shell extension of TortoiseHg.

http://bitbucket.org/tortoisehg/stable/issue/508/tortoisehg-slows-down-access-to-unc-paths-without

IIRC I fixed that one by carefully making sure not to hit a specific
costly path when searching subpaths upwards towards the root.

I think I probably should take a look at the hg sources from that angle.

Didly: It might be interesting to know if you have the WebClient service
running on your client Windows computer (needed for WebDAV).

See
http://bitbucket.org/tortoisehg/stable/issue/508/tortoisehg-slows-down-access-to-unc-paths-without#comment-44054

Specifically, http://support.microsoft.com/kb/832161 says:

"One easy test that you can use is to turn off the client computer's
WebClient service"

If turning off the "WebClient service" makes clone without --pull
faster, it might be related.