2010-04-27 Eric Seidel <eric@webkit.org>
authoreric@webkit.org <eric@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Tue, 27 Apr 2010 17:50:03 +0000 (17:50 +0000)
committereric@webkit.org <eric@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Tue, 27 Apr 2010 17:50:03 +0000 (17:50 +0000)
        Reviewed by Adam Barth.

        [chromium] new-run-webkit-tests hangs on Chromium Bots (OS X and Linux)
        https://bugs.webkit.org/show_bug.cgi?id=37987

        After further research, I believe the hang is caused by:
        http://bugs.python.org/issue2320
        Basically Popen() is not reentrant.
        The workaround is to pass close_fds=True to Popen() on Mac/Linux.

        I fixed our main Popen wrapper "Executive.run_command" to use close_fds=True
        when appropriate.

        I audited all places we call Popen() and either moved them to run_command
        or left a FIXME that they are not thread safe.  A few places I added the
        close_fds workaround there and left an explanitory note.

        * Scripts/webkitpy/common/checkout/scm_unittest.py:
         - Added note that this Popen use is not threadsafe.
        * Scripts/webkitpy/common/system/executive.py:
         - Fixed our Executive.run_* to workaround python bug 2320.
        * Scripts/webkitpy/common/system/user.py:
         _ Added note that this Popen use is not threadsafe.
        * Scripts/webkitpy/layout_tests/layout_package/json_results_generator.py: ditto.
        * Scripts/webkitpy/layout_tests/port/apache_http_server.py: ditto.
        * Scripts/webkitpy/layout_tests/port/base.py:
         - Change wdiff back to using run_command now that we believe it
           to be threadsafe.
        * Scripts/webkitpy/layout_tests/port/chromium.py:
         - Fix to use Executive in places.
         - Pass self._executive down to the Driver for easier unit testing.
        * Scripts/webkitpy/layout_tests/port/chromium_win.py:
         - Re-factor to use a _kill_all method.
         - Made the _kill_all method use run_command to be threadsafe.
        * Scripts/webkitpy/layout_tests/port/http_server.py:
         - Add FIXME about using Executive.
        * Scripts/webkitpy/layout_tests/port/server_process.py:
         - Use Executive to be threadsafe.
        * Scripts/webkitpy/layout_tests/port/webkit.py:
         - Pass self._executive down to the Driver.
        * Scripts/webkitpy/layout_tests/port/websocket_server.py:
         - Add note about Popen not being threadsafe.
        * Scripts/webkitpy/layout_tests/rebaseline_chromium_webkit_tests.py:
         - Move one caller to run_command add notes about moving others.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@58314 268f45cc-cd09-0410-ab3c-d52691b4dbfc

14 files changed:
WebKitTools/ChangeLog
WebKitTools/Scripts/webkitpy/common/checkout/scm_unittest.py
WebKitTools/Scripts/webkitpy/common/system/executive.py
WebKitTools/Scripts/webkitpy/common/system/user.py
WebKitTools/Scripts/webkitpy/layout_tests/layout_package/json_results_generator.py
WebKitTools/Scripts/webkitpy/layout_tests/port/apache_http_server.py
WebKitTools/Scripts/webkitpy/layout_tests/port/base.py
WebKitTools/Scripts/webkitpy/layout_tests/port/chromium.py
WebKitTools/Scripts/webkitpy/layout_tests/port/chromium_win.py
WebKitTools/Scripts/webkitpy/layout_tests/port/http_server.py
WebKitTools/Scripts/webkitpy/layout_tests/port/server_process.py
WebKitTools/Scripts/webkitpy/layout_tests/port/webkit.py
WebKitTools/Scripts/webkitpy/layout_tests/port/websocket_server.py
WebKitTools/Scripts/webkitpy/layout_tests/rebaseline_chromium_webkit_tests.py

index 8204a7c..f69e56d 100644 (file)
@@ -1,3 +1,50 @@
+2010-04-27  Eric Seidel  <eric@webkit.org>
+
+        Reviewed by Adam Barth.
+
+        [chromium] new-run-webkit-tests hangs on Chromium Bots (OS X and Linux)
+        https://bugs.webkit.org/show_bug.cgi?id=37987
+
+        After further research, I believe the hang is caused by:
+        http://bugs.python.org/issue2320
+        Basically Popen() is not reentrant.
+        The workaround is to pass close_fds=True to Popen() on Mac/Linux.
+
+        I fixed our main Popen wrapper "Executive.run_command" to use close_fds=True
+        when appropriate.
+
+        I audited all places we call Popen() and either moved them to run_command
+        or left a FIXME that they are not thread safe.  A few places I added the
+        close_fds workaround there and left an explanitory note.
+
+        * Scripts/webkitpy/common/checkout/scm_unittest.py:
+         - Added note that this Popen use is not threadsafe.
+        * Scripts/webkitpy/common/system/executive.py:
+         - Fixed our Executive.run_* to workaround python bug 2320.
+        * Scripts/webkitpy/common/system/user.py:
+         _ Added note that this Popen use is not threadsafe.
+        * Scripts/webkitpy/layout_tests/layout_package/json_results_generator.py: ditto.
+        * Scripts/webkitpy/layout_tests/port/apache_http_server.py: ditto.
+        * Scripts/webkitpy/layout_tests/port/base.py:
+         - Change wdiff back to using run_command now that we believe it
+           to be threadsafe.
+        * Scripts/webkitpy/layout_tests/port/chromium.py:
+         - Fix to use Executive in places.
+         - Pass self._executive down to the Driver for easier unit testing.
+        * Scripts/webkitpy/layout_tests/port/chromium_win.py:
+         - Re-factor to use a _kill_all method.
+         - Made the _kill_all method use run_command to be threadsafe.
+        * Scripts/webkitpy/layout_tests/port/http_server.py:
+         - Add FIXME about using Executive.
+        * Scripts/webkitpy/layout_tests/port/server_process.py:
+         - Use Executive to be threadsafe.
+        * Scripts/webkitpy/layout_tests/port/webkit.py:
+         - Pass self._executive down to the Driver.
+        * Scripts/webkitpy/layout_tests/port/websocket_server.py:
+         - Add note about Popen not being threadsafe.
+        * Scripts/webkitpy/layout_tests/rebaseline_chromium_webkit_tests.py:
+         - Move one caller to run_command add notes about moving others.
+
 2010-04-27  Adam Barth  <abarth@webkit.org>
 
         Reviewed by Maciej Stachowiak.
index b4be6bb..a4e6f16 100644 (file)
@@ -54,6 +54,7 @@ from webkitpy.common.system.executive import Executive, run_command, ScriptError
 # FIXME: This should be unified into one of the executive.py commands!
 # Callers could use run_and_throw_if_fail(args, cwd=cwd, quiet=True)
 def run_silent(args, cwd=None):
+    # Note: Not thread safe: http://bugs.python.org/issue2320
     process = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, cwd=cwd)
     process.communicate() # ignore output
     exit_code = process.wait()
index 99bcf24..5c1d28c 100644 (file)
@@ -87,11 +87,20 @@ def run_command(*args, **kwargs):
 
 class Executive(object):
 
+    def _should_close_fds(self):
+        # We need to pass close_fds=True to work around Python bug #2320
+        # (otherwise we can hang when we kill DumpRenderTree when we are running
+        # multiple threads). See http://bugs.python.org/issue2320 .
+        # Note that close_fds isn't supported on Windows, but this bug only
+        # shows up on Mac and Linux.
+        return sys.platform not in ('win32', 'cygwin')
+
     def _run_command_with_teed_output(self, args, teed_output):
         args = map(unicode, args)  # Popen will throw an exception if args are non-strings (like int())
         child_process = subprocess.Popen(args,
                                          stdout=subprocess.PIPE,
-                                         stderr=subprocess.STDOUT)
+                                         stderr=subprocess.STDOUT,
+                                         close_fds=self._should_close_fds())
 
         # Use our own custom wait loop because Popen ignores a tee'd
         # stderr/stdout.
@@ -177,6 +186,9 @@ class Executive(object):
 
     def _compute_stdin(self, input):
         """Returns (stdin, string_to_communicate)"""
+        # FIXME: We should be returning /dev/null for stdin
+        # or closing stdin after process creation to prevent
+        # child processes from getting input from the user.
         if not input:
             return (None, None)
         if hasattr(input, "read"):  # Check if the input is a file.
@@ -200,6 +212,7 @@ class Executive(object):
                     return_exit_code=False,
                     return_stderr=True,
                     decode_output=True):
+        """Popen wrapper for convenience and to work around python bugs."""
         args = map(unicode, args)  # Popen will throw an exception if args are non-strings (like int())
         stdin, string_to_communicate = self._compute_stdin(input)
         stderr = subprocess.STDOUT if return_stderr else None
@@ -208,7 +221,8 @@ class Executive(object):
                                    stdin=stdin,
                                    stdout=subprocess.PIPE,
                                    stderr=stderr,
-                                   cwd=cwd)
+                                   cwd=cwd,
+                                   close_fds=self._should_close_fds())
         output = process.communicate(string_to_communicate)[0]
         # run_command automatically decodes to unicode() unless explicitly told not to.
         if decode_output:
index 076f965..12cc524 100644 (file)
@@ -67,6 +67,7 @@ class User(object):
     def page(self, message):
         pager = os.environ.get("PAGER") or "less"
         try:
+            # Note: Not thread safe: http://bugs.python.org/issue2320
             child_process = subprocess.Popen([pager], stdin=subprocess.PIPE)
             child_process.communicate(input=message)
         except IOError, e:
index a3086d9..0993cbd 100644 (file)
@@ -125,6 +125,7 @@ class JSONResultsGenerator(object):
             results_file.write(json)
             results_file.close()
 
+    # FIXME: Callers should use scm.py instead.
     def _get_svn_revision(self, in_directory):
         """Returns the svn revision for the given directory.
 
@@ -132,6 +133,7 @@ class JSONResultsGenerator(object):
           in_directory: The directory where svn is to be run.
         """
         if os.path.exists(os.path.join(in_directory, '.svn')):
+            # Note: Not thread safe: http://bugs.python.org/issue2320
             output = subprocess.Popen(["svn", "info", "--xml"],
                                       cwd=in_directory,
                                       shell=(sys.platform == 'win32'),
index 4c78f77..46617f6 100644 (file)
@@ -191,6 +191,9 @@ class LayoutTestApacheHttpd(http_server_base.HttpServerBase):
         # Use shell=True because we join the arguments into a string for
         # the sake of Window/Cygwin and it needs quoting that breaks
         # shell=False.
+        # FIXME: We should not need to be joining shell arguments into strings.
+        # shell=True is a trail of tears.
+        # Note: Not thread safe: http://bugs.python.org/issue2320
         self._httpd_proc = subprocess.Popen(self._start_cmd,
                                             stderr=subprocess.PIPE,
             shell=True)
index ac7a545..1ca465c 100644 (file)
@@ -50,7 +50,19 @@ from webkitpy.common.system.executive import Executive, ScriptError
 _log = logutils.get_logger(__file__)
 
 
-# Python bug workaround.  See Port.wdiff_text() for an explanation.
+# Python's Popen has a bug that causes any pipes opened to a
+# process that can't be executed to be leaked.  Since this
+# code is specifically designed to tolerate exec failures
+# to gracefully handle cases where wdiff is not installed,
+# the bug results in a massive file descriptor leak. As a
+# workaround, if an exec failure is ever experienced for
+# wdiff, assume it's not available.  This will leak one
+# file descriptor but that's better than leaking each time
+# wdiff would be run.
+#
+# http://mail.python.org/pipermail/python-list/
+#    2008-August/505753.html
+# http://bugs.python.org/issue3210
 _wdiff_available = True
 _pretty_patch_available = True
 
@@ -535,36 +547,11 @@ class Port(object):
                '--end-insert=##WDIFF_END##',
                actual_filename,
                expected_filename]
-        # FIXME: Why not just check os.exists(executable) once?
-        global _wdiff_available
+        global _wdiff_available  # See explaination at top of file.
         result = ''
         try:
-            # Python's Popen has a bug that causes any pipes opened to a
-            # process that can't be executed to be leaked.  Since this
-            # code is specifically designed to tolerate exec failures
-            # to gracefully handle cases where wdiff is not installed,
-            # the bug results in a massive file descriptor leak. As a
-            # workaround, if an exec failure is ever experienced for
-            # wdiff, assume it's not available.  This will leak one
-            # file descriptor but that's better than leaking each time
-            # wdiff would be run.
-            #
-            # http://mail.python.org/pipermail/python-list/
-            #    2008-August/505753.html
-            # http://bugs.python.org/issue3210
-            #
-            # It also has a threading bug, so we don't output wdiff if
-            # the Popen raises a ValueError.
-            # http://bugs.python.org/issue1236
             if _wdiff_available:
-                try:
-                    # FIXME: Use Executive() here.
-                    wdiff = subprocess.Popen(cmd,
-                        stdout=subprocess.PIPE).communicate()[0]
-                except ValueError, e:
-                    # Working around a race in Python 2.4's implementation
-                    # of Popen().
-                    wdiff = ''
+                wdiff = self._executive.run_command(cmd, decode_output=False)
                 wdiff = cgi.escape(wdiff)
                 wdiff = wdiff.replace('##WDIFF_DEL##', '<span class=del>')
                 wdiff = wdiff.replace('##WDIFF_ADD##', '<span class=add>')
index bfd5b8c..137baf0 100644 (file)
@@ -44,6 +44,8 @@ import webbrowser
 import base
 import http_server
 
+from webkitpy.common.system.executive import Executive
+
 # FIXME: To use the DRT-based version of this file, we need to be able to
 # run the webkit code, which uses server_process, which requires UNIX-style
 # non-blocking I/O with selects(), which requires fcntl() which doesn't exist
@@ -80,8 +82,8 @@ def check_file_exists(path_to_file, file_description, override_step=None,
 class ChromiumPort(base.Port):
     """Abstract base class for Chromium implementations of the Port class."""
 
-    def __init__(self, port_name=None, options=None):
-        base.Port.__init__(self, port_name, options)
+    def __init__(self, port_name=None, options=None, **kwargs):
+        base.Port.__init__(self, port_name, options, **kwargs)
         self._chromium_base_dir = None
 
     def baseline_path(self):
@@ -118,10 +120,8 @@ class ChromiumPort(base.Port):
         return result
 
     def check_sys_deps(self, needs_http):
-        dump_render_tree_binary_path = self._path_to_driver()
-        proc = subprocess.Popen([dump_render_tree_binary_path,
-                                '--check-layout-test-sys-deps'])
-        if proc.wait():
+        cmd = [self._path_to_driver(), '--check-layout-test-sys-deps']
+        if self._executive.run_command(cmd, return_exit_code=True):
             _log.error('System dependencies check failed.')
             _log.error('To override, invoke with --nocheck-sys-deps')
             _log.error('')
@@ -176,18 +176,20 @@ class ChromiumPort(base.Port):
             # FIXME: This should use User.open_url
             webbrowser.open(uri, new=1)
         else:
+            # Note: Not thread safe: http://bugs.python.org/issue2320
             subprocess.Popen([self._path_to_driver(), uri])
 
     def create_driver(self, image_path, options):
         """Starts a new Driver and returns a handle to it."""
         if self._options.use_drt:
-            return webkit.WebKitDriver(self, image_path, options)
-        return ChromiumDriver(self, image_path, options)
+            return webkit.WebKitDriver(self, image_path, options, exectuive=self._executive)
+        return ChromiumDriver(self, image_path, options, exectuive=self._executive)
 
     def start_helper(self):
         helper_path = self._path_to_helper()
         if helper_path:
             _log.debug("Starting layout helper %s" % helper_path)
+            # Note: Not thread safe: http://bugs.python.org/issue2320
             self._helper = subprocess.Popen([helper_path],
                 stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=None)
             is_ready = self._helper.stdout.readline()
@@ -273,7 +275,7 @@ class ChromiumPort(base.Port):
 class ChromiumDriver(base.Driver):
     """Abstract interface for test_shell."""
 
-    def __init__(self, port, image_path, options):
+    def __init__(self, port, image_path, options, executive=Executive()):
         self._port = port
         self._configuration = port._options.configuration
         # FIXME: _options is very confusing, because it's not an Options() element.
@@ -281,6 +283,7 @@ class ChromiumDriver(base.Driver):
         # be passed into .start()
         self._options = options
         self._image_path = image_path
+        self._executive = executive
 
     def start(self):
         # FIXME: Should be an error to call this method twice.
@@ -402,8 +405,4 @@ class ChromiumDriver(base.Driver):
                 if self._proc.poll() is None:
                     _log.warning('stopping test driver timed out, '
                                  'killing it')
-                    # FIXME: This should use Executive.
-                    null = open(os.devnull, "w")  # Does this need an encoding?
-                    subprocess.Popen(["kill", "-9",
-                                     str(self._proc.pid)], stderr=null)
-                    null.close()
+                    self._executive.kill_process(self._proc.pid)
index 2e3de85..55e6896 100644 (file)
@@ -144,6 +144,10 @@ class ChromiumWinPort(chromium.ChromiumPort):
         return self.path_from_chromium_base('third_party', 'cygwin', 'bin',
                                             'wdiff.exe')
 
+    def _kill_all(self, process_name):
+        cmd = ['taskkill.exe', '/f', '/im', process_name]
+        self._executive.run_command(cmd)
+
     def _shut_down_http_server(self, server_pid):
         """Shut down the lighttpd web server. Blocks until it's fully
         shut down.
@@ -151,11 +155,7 @@ class ChromiumWinPort(chromium.ChromiumPort):
         Args:
             server_pid: The process ID of the running server.
         """
-        subprocess.Popen(('taskkill.exe', '/f', '/im', 'LightTPD.exe'),
-                        stdin=open(os.devnull, 'r'),
-                        stdout=subprocess.PIPE,
-                        stderr=subprocess.PIPE).wait()
-        subprocess.Popen(('taskkill.exe', '/f', '/im', 'httpd.exe'),
-                        stdin=open(os.devnull, 'r'),
-                        stdout=subprocess.PIPE,
-                        stderr=subprocess.PIPE).wait()
+        # FIXME: Why are we ignoring server_pid and calling
+        # _kill_all instead of Executive.kill_process(pid)?
+        self._kill_all("LightTPD.exe")
+        self._kill_all("httpd.exe")
index 5943c68..1479c61 100755 (executable)
@@ -210,9 +210,11 @@ class Lighttpd(http_server_base.HttpServerBase):
         if sys.platform == 'win32' and self._register_cygwin:
             setup_mount = self._port_obj.path_from_chromium_base('third_party',
                 'cygwin', 'setup_mount.bat')
+            # FIXME: Should use Executive.run_command
             subprocess.Popen(setup_mount).wait()
 
         _log.debug('Starting http server')
+        # FIXME: Should use Executive.run_command
         self._process = subprocess.Popen(start_cmd, env=env)
 
         # Wait for server to start.
index 70573a9..cf4f4a2 100644 (file)
@@ -38,6 +38,8 @@ import subprocess
 import sys
 import time
 
+from webkitpy.common.system.executive import Executive
+
 _log = logging.getLogger("webkitpy.layout_tests.port.server_process")
 
 
@@ -48,12 +50,13 @@ class ServerProcess:
     indefinitely. The class also handles transparently restarting processes
     as necessary to keep issuing commands."""
 
-    def __init__(self, port_obj, name, cmd, env=None):
+    def __init__(self, port_obj, name, cmd, env=None, executive=Executive()):
         self._port = port_obj
         self._name = name
         self._cmd = cmd
         self._env = env
         self._reset()
+        self._executive = executive
 
     def _reset(self):
         self._proc = None
@@ -66,6 +69,7 @@ class ServerProcess:
         if self._proc:
             raise ValueError("%s already running" % self._name)
         self._reset()
+        # close_fds is a workaround for http://bugs.python.org/issue2320
         close_fds = sys.platform not in ('win32', 'cygwin')
         self._proc = subprocess.Popen(self._cmd, stdin=subprocess.PIPE,
                                       stdout=subprocess.PIPE,
@@ -215,10 +219,6 @@ class ServerProcess:
             if self._proc.poll() is None:
                 _log.warning('stopping %s timed out, killing it' %
                              self._name)
-                # FIXME: This should use Executive.
-                null = open(os.devnull, "w")
-                subprocess.Popen(["kill", "-9",
-                                  str(self._proc.pid)], stderr=null)
-                null.close()
+                self._executive.kill_process(self._proc.pid)
                 _log.warning('killed')
         self._reset()
index f8d7921..c48e450 100644 (file)
@@ -45,6 +45,8 @@ import sys
 import time
 import webbrowser
 
+from webkitpy.common.system.executive import Executive
+
 import webkitpy.common.system.ospath as ospath
 import webkitpy.layout_tests.port.base as base
 import webkitpy.layout_tests.port.server_process as server_process
@@ -55,8 +57,8 @@ _log = logging.getLogger("webkitpy.layout_tests.port.webkit")
 class WebKitPort(base.Port):
     """WebKit implementation of the Port class."""
 
-    def __init__(self, port_name=None, options=None):
-        base.Port.__init__(self, port_name, options)
+    def __init__(self, port_name=None, options=None, **kwargs):
+        base.Port.__init__(self, port_name, options, **kwargs)
         self._cached_build_root = None
         self._cached_apache_path = None
 
@@ -195,7 +197,7 @@ class WebKitPort(base.Port):
         webbrowser.open(uri, new=1)
 
     def create_driver(self, image_path, options):
-        return WebKitDriver(self, image_path, options)
+        return WebKitDriver(self, image_path, options, executive=self._executive)
 
     def test_base_platform_names(self):
         # At the moment we don't use test platform names, but we have
@@ -347,7 +349,7 @@ class WebKitPort(base.Port):
 class WebKitDriver(base.Driver):
     """WebKit implementation of the DumpRenderTree interface."""
 
-    def __init__(self, port, image_path, driver_options):
+    def __init__(self, port, image_path, driver_options, executive=Executive()):
         self._port = port
         # FIXME: driver_options is never used.
         self._image_path = image_path
index ed6d871..29b7ca3 100644 (file)
@@ -207,6 +207,7 @@ class PyWebSocket(http_server.Lighttpd):
                    self._server_name, self._port))
         _log.debug('cmdline: %s' % ' '.join(start_cmd))
         # FIXME: We should direct this call through Executive for testing.
+        # Note: Not thread safe: http://bugs.python.org/issue2320
         self._process = subprocess.Popen(start_cmd,
                                          stdin=open(os.devnull, 'r'),
                                          stdout=self._wsout,
index 163354b..211ce93 100644 (file)
@@ -58,6 +58,8 @@ import urllib
 import webbrowser
 import zipfile
 
+from webkitpy.common.system.executive import run_command
+
 import port
 from layout_package import test_expectations
 from test_types import image_diff
@@ -96,7 +98,9 @@ def run_shell_with_return_code(command, print_output=False):
     """
 
     # Use a shell for subcommands on Windows to get a PATH search.
+    # FIXME: shell=True is a trail of tears, and should be removed.
     use_shell = sys.platform.startswith('win')
+    # Note: Not thread safe: http://bugs.python.org/issue2320
     p = subprocess.Popen(command, stdout=subprocess.PIPE,
                          stderr=subprocess.STDOUT, shell=use_shell)
     if print_output:
@@ -281,10 +285,10 @@ class Rebaseliner(object):
     def get_rebaselining_tests(self):
         return self._rebaselining_tests
 
+    # FIXME: Callers should use scm.py instead.
     def _get_repo_type(self):
         """Get the repository type that client is using."""
-        output, return_code = run_shell_with_return_code(['svn', 'info'],
-                                                         False)
+        return_code = run_command(['svn', 'info'], return_exit_code=True)
         if return_code == 0:
             return REPO_SVN
 
@@ -608,6 +612,7 @@ class Rebaseliner(object):
         else:
             _log.info('No test was rebaselined so nothing to remove.')
 
+    # FIXME: Callers should move to SCM.add instead.
     def _svn_add(self, filename):
         """Add the file to SVN repository.