Issue1211

Title hg convert (cvs) looses log entries -- due to a problem of cvsps
Priority critical Status resolved
Superseder Nosy List frank, makarius, pmezard
Assigned To Topics convert

Created on 2008-07-03.15:11:27 by makarius, last changed 2008-10-18.16:06:37 by pmezard.

Files
File name Uploaded Type Edit Remove
repos.tar.gz makarius, 2008-07-03.15:11:25 application/x-gzip
Messages
msg7490 (view) Author: pmezard Date: 2008-10-18.16:06:37
In main, marking as resolved
msg7462 (view) Author: pmezard Date: 2008-10-15.21:20:55
OK, so you say everything is fine.

Marking as testing, please reopen if I misunderstood.
msg7459 (view) Author: frank Date: 2008-10-15.21:11:03
The issue makarius found is not present in cvsps.py. If the user wants external
cvsps then he gets what he deserves^H^H^H^H^H^H^H asked for.

The problem I mentioned in msg6466 is hard to fix, and also quite unlikely to be
encountered, and I'm happy to leave this. No CVS log output parser will ever
cope with what a determined user could do to wreck it, for example using a log
message that contains the output of CVS log.
msg7458 (view) Author: pmezard Date: 2008-10-15.21:02:49
FYI, builtin cvsps is now used by default in crew.

What do we do about the current issue ?
msg6538 (view) Author: frank Date: 2008-07-20.10:13:40
When several commits make up one changeset, cvsps.py takes the date of the most
recent one, while cvsps-2.1 uses the oldest one. This was done deliberately to
help in identifying changesets on very large files, which might take more than
'fuzz' seconds to commit.
msg6529 (view) Author: makarius Date: 2008-07-18.21:57:54
I have now checked with cvsps.py from the hg/crew repository.  Even when using
the same fuzz factor of 300, the result differs slightly: (1) changeset dates
often deviate by a couple of minutes, (2) the changeset boundaries are sometimes
a bit different (very rarely).

In general both results of the old cvsps-2.1 (after an adhoc patch) and your
cvsps.py look reasonable.  So I wonder if everything is actually OK, and the new
script merely uses different ways to interpret cvs dates.

Is this the proper place to continue this discussion of testing cvsps.py?
msg6471 (view) Author: frank Date: 2008-07-05.09:57:09
A wrapper is available in mercurial's hgext/convert directory. If you set the
PYTHONPATH correctly then you can run hgext/convert/cvsps, which will work in
much the same way as cvsps.c does.

You should then be able to diff the output of the two tools with little effort.
msg6470 (view) Author: makarius Date: 2008-07-04.16:03:34
Yes, for the time being I am satisfied with the prospect to see cvsps.py in the
next official release -- at the moment it seems to in the development repository
only.

What I did in the (short!) time window of issueing the ticket and getting the
first response was to apply a quick-and-dirty fix to cvsps.c 2.1 and
successfully managed to convert 150 MB / 15 years of CVS history!  The result
can be seen here: http://isabelle.in.tum.de/isabelle-bin/mercurial.cgi

Now we are in the process of validating the result -- the original CVS is still
active.  The final conversion will probably happen within the next 6 months.

If it helps to stabilize the tool chain, I could try to compare the outcome of
cvsps.c vs. cvsps.py at some point, if you give me some hints how to do it.
msg6468 (view) Author: pmezard Date: 2008-07-04.15:49:58
makarius: does it solve your problem ?

cvsps.py seems to be working very well.
msg6467 (view) Author: makarius Date: 2008-07-03.19:55:41
OK, the builtin cvsps.py is much more than the workaround that I was hoping for.

So the actual problem seems to be more in documentation or the default setup (of
Mercurial 1.0.1).
msg6466 (view) Author: frank Date: 2008-07-03.19:16:59
This is not a problem for builtin cvsps.

However, there is a potential issue if a commit log message has a line in it
which matches re_31 or re_32 (see hgext/convert/cvsps.py). That is, a line
containing exactly 28 dashes or 77 equal signs, and nothing else. I have come
across one of those in one of the (Free?)BSD CVS trees.

If that were to happen, then "cvs admin -m" might be a simpler workaround than
trying to make cvsps.py backtrack around this kind of problem.
msg6465 (view) Author: makarius Date: 2008-07-03.15:11:25
When using hg convert on a CVS repository, there is a pending danger of loosing
log entries without further notice!  This is actually a problem of cvsps, which
ignores log entries reading like "foo: bar;" on the first line, because they are
mistaken as "revision meta data".

Included as a minimal example CVS where the effect shows up routinely -- many
empty log messages in the resulting hg.

Maybe there there is a workaround on the hg convert side.  At least one could
consider to add some words of warning to the hg convert documentation etc.
History
Date User Action Args
2008-10-18 16:06:37pmezardsetstatus: testing -> resolved
nosy: pmezard, frank, makarius
messages: + msg7490
2008-10-15 21:20:55pmezardsetstatus: chatting -> testing
nosy: pmezard, frank, makarius
messages: + msg7462
2008-10-15 21:11:03franksetnosy: pmezard, frank, makarius
messages: + msg7459
2008-10-15 21:02:49pmezardsetnosy: pmezard, frank, makarius
messages: + msg7458
2008-07-20 10:13:41franksetnosy: pmezard, frank, makarius
messages: + msg6538
2008-07-18 21:57:55makariussetnosy: pmezard, frank, makarius
messages: + msg6529
2008-07-05 09:57:09franksetnosy: pmezard, frank, makarius
messages: + msg6471
2008-07-04 16:03:35makariussetnosy: pmezard, frank, makarius
messages: + msg6470
2008-07-04 15:49:59pmezardsetnosy: + pmezard
messages: + msg6468
2008-07-03 19:55:42makariussetmessages: + msg6467
2008-07-03 19:17:06franksetstatus: unread -> chatting
nosy: + frank
messages: + msg6466
2008-07-03 15:11:27makariuscreate