Current py3k stage and next steps

Matt Mackall mpm at selenic.com
Wed Jun 30 10:08:47 CDT 2010


(looks like this might not have made it to the list)

On Tue, 2010-06-29 at 10:51 +0900, Nicolas Dumazet wrote:
> What kind of solution do _you_ foresee for the encoding problems?

Having thought about it a bit more this evening, I think the most
straightforward approach is:

a) teach 2to3 to change all strings in the source into bytestrings
b) fix up the annoying b"A"[0] = 65 behavior
c) make the minimum amount of other source changes to get it working
under 3.x 

I don't think we can actually get to the point where we ditch 2to3 and
build from one source base until we've dropped compatibility for 2.4 and
2.5. That might be 5 years off and there's really very little pressure
to move to 3.x right now.

> It seems that the only major problem is to find a decent solution for f) ?

(f) is just a proxy for any operation on two or more strings, including
%, +, join, etc. There's a decent amount of string handling in
Mercurial, so it's BIG problem.

> What about extending _ , or wrapping it, to pass arguments around?
> 
> Something like
> 
> def __(gettextkey, *args):
>     def encodeifunicode(arg):
>         if isinstance(arg, unicode):
>             return arg.encode("utf8")
>         else:
>             # by default, str objects we have around should be
> utf8-encoded strings

We actually want things in the local encoding, but..

>             return arg
> 
>     # assumption: _() returns utf8-encoded strings
>     _(gettextkey) % tuple(map(encodeifunicode, args))
> 
> So we would use
>    __("foo %s bar %s", rawbytes, mayberawbytes)
> instead of
>    _("foo %s bar %s") % (rawbytes, mayberawbytes)

Hmm, not completely horrible. It solves the % problem reasonably well
(though note your implementation is only correct for %s).
But that still leaves the + problem.

One thing we want to avoid is freely intermixing different string
classes in the code, so down the road, we're going to need to either
switch everything to bytestrings (and enforce it with something like
check-code) or do a lot of careful wrapping of filenames and file data.

-- 
Mathematics is the supreme nostalgia of our time.





More information about the Mercurial-devel mailing list