<div dir="ltr">On Tue, Feb 19, 2013 at 8:29 PM, Bryan O'Sullivan <<a href="mailto:bos@serpentine.com">bos@serpentine.com</a>> wrote:<br>><br>> # HG changeset patch<br>> # User Bryan O'Sullivan <<a href="mailto:bryano@fb.com">bryano@fb.com</a>><br>
> # Date 1361297550 28800<br>> # Node ID 42c14cff887e20d033dbaa8f8c00100e807a1149<br>> # Parent 9ef52f0a93a0cba939742743ff59e4c2a2463fab<br>> worker: handle worker failures more aggressively<br>><br>> We now wait for worker processes in a separate thread, so that we can<br>
> spot failures in a timely way, wihout waiting for the progress pipe<br>> to drain.<br>><br>> If a worker fails, we recover the pre-parallel-update behaviour of<br>> failing early by killing its peers before propagating the failure.<br>
><br>> diff --git a/mercurial/worker.py b/mercurial/worker.py<br>> --- a/mercurial/worker.py<br>> +++ b/mercurial/worker.py<br>> @@ -6,7 +6,7 @@<br>> # GNU General Public License version 2 or any later version.<br>
><br>> from i18n import _<br>> -import os, signal, sys, util<br>> +import os, signal, sys, threading, util<br>><br>> def countcpus():<br>> '''try to count the number of CPUs on the system'''<br>
> @@ -77,6 +77,7 @@ def _posixworker(ui, func, staticargs, a<br>> workers = _numworkers(ui)<br>> oldhandler = signal.getsignal(signal.SIGINT)<br>> signal.signal(signal.SIGINT, signal.SIG_IGN)<br>
> + pids, problem = [], [0]<br>> for pargs in partition(args, workers):<br>> pid = os.fork()<br>> if pid == 0:<br>> @@ -88,25 +89,40 @@ def _posixworker(ui, func, staticargs, a<br>
> os._exit(0)<br>> except KeyboardInterrupt:<br>> os._exit(255)<br>> + pids.append(pid)<br>> + pids.reverse()<br><br>Ok, so the last created child will be the first in pids.<br>
<br>> os.close(wfd)<br>> fp = os.fdopen(rfd, 'rb', 0)<br>> - def cleanup():<br>> - # python 2.4 is too dumb for try/yield/finally<br>> - signal.signal(signal.SIGINT, oldhandler)<br>
> - problem = None<br>> - for i in xrange(workers):<br>> + def killworkers():<br>> + # if one worker bails, there's no good reason to wait for the<br>> rest<br>> + for p in pids:<br>
> + try:<br>> + os.kill(p, signal.SIGTERM)<br>> + except OSError, err:<br>> + if err.errno != errno.ESRCH:<br>> + raise<br>> + def waitforworkers():<br>
> + for p in pids:<br>> pid, st = os.wait()<br><br>And here you're waiting for it to finish, but what happens<br>if for some reason one of the previous children fails first?<br><br>Why not use select on the children and also spare the thread?</div>