Teach TestFailures to detect possibly flaky tests and list them separately
authoraroben@apple.com <aroben@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 29 Jun 2011 22:24:48 +0000 (22:24 +0000)
committeraroben@apple.com <aroben@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 29 Jun 2011 22:24:48 +0000 (22:24 +0000)
Fixes <http://webkit.org/b/61061> <rdar://problem/9452796> TestFailures page blames
arbitrary revisions for breaking flaky tests

Reviewed by Dan Bates.

* BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/FlakyLayoutTestDetector.js: Added.
(FlakyLayoutTestDetector): This class identifies flaky tests when given the test results
from various builds (in reverse-chronological order).
(FlakyLayoutTestDetector.prototype.incorporateTestResults): Detects flaky tests. Tests move
monotonically through three states: LastSeenFailing, LastSeenPassing, and PossiblyFlaky.
(FlakyLayoutTestDetector.prototype.flakinessExamples): Finds examples of flakiness for the
given test. Essentially, finds all the transitions from passing to failing (or vice-versa)
and puts them in an array in reverse-chronological order.
(FlakyLayoutTestDetector.prototype.get possiblyFlakyTests): Returns all tests we've detected
to be possibly flaky.

* BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/LayoutTestHistoryAnalyzer.js:
(LayoutTestHistoryAnalyzer): Initialize new members.
(LayoutTestHistoryAnalyzer.prototype.start): Now passes the callback an object with two
properties: history and possiblyFlaky. history holds the data this function used to pass to
the callback, while possiblyFlaky lists all tests that might be flaky and examples of their
flakiness. Updated documentation comment to match.
(LayoutTestHistoryAnalyzer.prototype._incorporateBuildHistory): Now uses a
FlakyLayoutTestDetector to identify possibly flaky tests. Any possibly flaky tests are
removed from the failure history, since when they started failing is no longer meaningful.
We tell our caller to keep calling until all current failures have been explained and we've
gone through 5 builds without any new flaky tests being identified.

* BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/Utilities.js:
(sorted): New helper function to return a sorted copy of an array.
(Array.prototype.findLast): New helper function. Like findFirst, but finds the last item
that matches the predicate.

* BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/ViewController.js:
(ViewController.prototype._displayBuilder): Updated for change in the object passed to us by
the analyzer. Now puts the list of possibly flaky tests after the failure history.
(ViewController.prototype._domForFailedTest): Moved some code from here...
(ViewController.prototype._domForFailureDiagnosis): ...to here.
(ViewController.prototype._domForPossiblyFlakyTests): New function, builds up a list of
possibly flaky tests and examples of their flakiness and returns it.

* BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/index.html: Pull in
FlakyLayoutTestDetector.js.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@90054 268f45cc-cd09-0410-ab3c-d52691b4dbfc

Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/FlakyLayoutTestDetector.js [new file with mode: 0644]
Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/LayoutTestHistoryAnalyzer.js
Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/Utilities.js
Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/ViewController.js
Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/index.html
Tools/ChangeLog

diff --git a/Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/FlakyLayoutTestDetector.js b/Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/FlakyLayoutTestDetector.js
new file mode 100644 (file)
index 0000000..e35dd19
--- /dev/null
@@ -0,0 +1,104 @@
+/*
+ * Copyright (C) 2011 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+function FlakyLayoutTestDetector() {
+    this._tests = {};
+}
+
+FlakyLayoutTestDetector.prototype = {
+    incorporateTestResults: function(buildName, failingTests, tooManyFailures) {
+        var newFlakyTests = [];
+
+        if (tooManyFailures) {
+            // Something was going horribly wrong during this test run. We shouldn't assume that any
+            // passes/failures are due to flakiness.
+            return newFlakyTests;
+        }
+
+        // Record failing tests.
+        for (var testName in failingTests) {
+            if (!(testName in this._tests)) {
+                this._tests[testName] = {
+                    state: this._states.LastSeenFailing,
+                    history: [],
+                };
+            }
+
+            var testData = this._tests[testName];
+            testData.history.push({ build: buildName, result: failingTests[testName] });
+
+            if (testData.state === this._states.LastSeenPassing) {
+                testData.state = this._states.PossiblyFlaky;
+                newFlakyTests.push(testName);
+            }
+        }
+
+        // Record passing tests.
+        for (var testName in this._tests) {
+            if (testName in failingTests)
+                continue;
+
+            var testData = this._tests[testName];
+            testData.history.push({ build: buildName, result: 'pass' });
+
+            if (testData.state === this._states.LastSeenFailing)
+                testData.state = this._states.LastSeenPassing;
+        }
+
+        return newFlakyTests;
+    },
+
+    flakinessExamples: function(testName) {
+        if (!(testName in this._tests) || this._tests[testName].state !== this._states.PossiblyFlaky)
+            return null;
+
+        var history = this._tests[testName].history;
+
+        var examples = [];
+        for (var i = 0; i < history.length - 1; ++i) {
+            var thisIsPassing = history[i].result === 'pass';
+            var nextIsPassing = history[i + 1].result === 'pass';
+            if (thisIsPassing === nextIsPassing)
+                continue;
+            var last = examples.last();
+            if (!last || last.build !== history[i].build)
+                examples.push(history[i]);
+            examples.push(history[i + 1]);
+        }
+
+        return examples;
+    },
+
+    get possiblyFlakyTests() {
+        var self = this;
+        return Object.keys(self._tests).filter(function(testName) { return self._tests[testName].state === self._states.PossiblyFlaky });
+    },
+
+    _states: {
+        LastSeenFailing: 0,
+        LastSeenPassing: 1,
+        PossiblyFlaky: 2,
+    },
+};
index 96bf594..26634a9 100644 (file)
 
 function LayoutTestHistoryAnalyzer(builder) {
     this._builder = builder;
+    this._flakinessDetector = new FlakyLayoutTestDetector();
     this._history = {};
     this._loader = new LayoutTestResultsLoader(builder);
+    this._testRunsSinceLastInterestingChange = 0;
 }
 
 LayoutTestHistoryAnalyzer.prototype = {
     /*
-     * Preiodically calls callback until all current failures have been explained. Callback is
+     * Periodically calls callback until all current failures have been explained. Callback is
      * passed an object like the following:
      * {
-     *     'r12347 (681)': {
-     *         'tooManyFailures': false,
-     *         'tests': {
-     *             'css1/basic/class_as_selector2.html': 'fail',
+     *     'history': {
+     *         'r12347 (681)': {
+     *             'tooManyFailures': false,
+     *             'tests': {
+     *                 'css1/basic/class_as_selector2.html': 'fail',
+     *             },
      *         },
-     *     },
-     *     'r12346 (680)': {
-     *         'tooManyFailures': false,
-     *         'tests': {},
-     *     },
-     *     'r12345 (679)': {
-     *         'tooManyFailures': false,
-     *         'tests': {
-     *             'css1/basic/class_as_selector.html': 'crash',
+     *         'r12346 (680)': {
+     *             'tooManyFailures': false,
+     *             'tests': {},
+     *         },
+     *         'r12345 (679)': {
+     *             'tooManyFailures': false,
+     *             'tests': {
+     *                 'css1/basic/class_as_selector.html': 'crash',
+     *             },
      *         },
      *     },
-     * },
-     * Each build contains just the failures that a) are still occuring on the bots, and b) were new
+     *     'possiblyFlaky': {
+     *         'fast/workers/worker-test.html': [
+     *             { 'build': 'r12345 (679)', 'result': 'pass' },
+     *             { 'build': 'r12344 (678)', 'result': 'fail' },
+     *             { 'build': 'r12340 (676)', 'result': 'fail' },
+     *             { 'build': 'r12338 (675)', 'result': 'pass' },
+     *         ],
+     *     },
+     * }
+     * Each build contains just the failures that a) are still occurring on the bots, and b) were new
      * in that build.
      */
     start: function(callback) {
@@ -62,7 +74,14 @@ LayoutTestHistoryAnalyzer.prototype = {
                     var nextIndex = buildIndex + 1;
                     if (nextIndex >= buildNames.length)
                         callAgain = false;
-                    callback(self._history, callAgain);
+                    var data = {
+                        history: self._history,
+                        possiblyFlaky: {},
+                    };
+                    self._flakinessDetector.possiblyFlakyTests.forEach(function(testName) {
+                        data.possiblyFlaky[testName] = self._flakinessDetector.flakinessExamples(testName);
+                    });
+                    callback(data, callAgain);
                     if (!callAgain)
                         return;
                     setTimeout(function() { inner(nextIndex) }, 0);
@@ -78,11 +97,24 @@ LayoutTestHistoryAnalyzer.prototype = {
 
         var self = this;
         self._loader.start(nextBuildName, function(tests, tooManyFailures) {
+            ++self._testRunsSinceLastInterestingChange;
+
             self._history[nextBuildName] = {
                 tooManyFailures: tooManyFailures,
                 tests: {},
             };
 
+            var newFlakyTests = self._flakinessDetector.incorporateTestResults(nextBuildName, tests, tooManyFailures);
+            if (newFlakyTests.length) {
+                self._testRunsSinceLastInterestingChange = 0;
+                // Remove all possibly flaky tests from the failure history, since when they failed
+                // is no longer meaningful.
+                newFlakyTests.forEach(function(testName) {
+                    for (var buildName in self._history)
+                        delete self._history[buildName].tests[testName];
+                });
+            }
+
             for (var testName in tests) {
                 if (previousBuildName) {
                     if (!(testName in self._history[previousBuildName].tests))
@@ -92,7 +124,14 @@ LayoutTestHistoryAnalyzer.prototype = {
                 self._history[nextBuildName].tests[testName] = tests[testName];
             }
 
-            callback(Object.keys(self._history[nextBuildName].tests).length);
+            var previousUnexplainedFailuresCount = previousBuildName ? Object.keys(self._history[previousBuildName].tests).length : 0;
+            var unexplainedFailuresCount = Object.keys(self._history[nextBuildName].tests).length;
+
+            if (previousUnexplainedFailuresCount && !unexplainedFailuresCount)
+                self._testRunsSinceLastInterestingChange = 0;
+
+            const minimumRequiredTestRunsWithoutInterestingChanges = 5;
+            callback(unexplainedFailuresCount || self._testRunsSinceLastInterestingChange < minimumRequiredTestRunsWithoutInterestingChanges);
         },
         function(tests) {
             // Some tests failed, but we couldn't fetch results.html (perhaps because the test
index ff7356a..90b03be 100644 (file)
@@ -103,6 +103,12 @@ function longestCommonPathPrefix(paths) {
     return result.join(separator);
 }
 
+function sorted(array) {
+    var newArray = array.slice();
+    newArray.sort();
+    return newArray;
+}
+
 Array.prototype.findFirst = function(predicate) {
     for (var i = 0; i < this.length; ++i) {
         if (predicate(this[i]))
@@ -111,6 +117,14 @@ Array.prototype.findFirst = function(predicate) {
     return null;
 }
 
+Array.prototype.findLast = function(predicate) {
+    for (var i = this.length - 1; i >= 0; --i) {
+        if (predicate(this[i]))
+            return this[i];
+    }
+    return null;
+}
+
 Array.prototype.last = function() {
     if (!this.length)
         return undefined;
index 15b9fee..354c72b 100644 (file)
@@ -48,11 +48,12 @@ ViewController.prototype = {
 
     _displayBuilder: function(builder) {
         var self = this;
-        (new LayoutTestHistoryAnalyzer(builder)).start(function(history, stillFetchingData) {
+        var lastDisplay = 0;
+        (new LayoutTestHistoryAnalyzer(builder)).start(function(data, stillFetchingData) {
             var list = document.createElement('ol');
             list.id = 'failure-history';
-            Object.keys(history).forEach(function(buildName, buildIndex, buildNameArray) {
-                var failingTestNames = Object.keys(history[buildName].tests);
+            Object.keys(data.history).forEach(function(buildName, buildIndex, buildNameArray) {
+                var failingTestNames = Object.keys(data.history[buildName].tests);
                 if (!failingTestNames.length)
                     return;
 
@@ -63,13 +64,13 @@ ViewController.prototype = {
                 item.appendChild(testList);
 
                 testList.className = 'test-list';
-                for (var testName in history[buildName].tests) {
+                for (var testName in data.history[buildName].tests) {
                     var testItem = document.createElement('li');
-                    testItem.appendChild(self._domForFailedTest(builder, buildName, testName, history[buildName].tests[testName]));
+                    testItem.appendChild(self._domForFailedTest(builder, buildName, testName, data.history[buildName].tests[testName]));
                     testList.appendChild(testItem);
                 }
 
-                if (history[buildName].tooManyFailures) {
+                if (data.history[buildName].tooManyFailures) {
                     var p = document.createElement('p');
                     p.className = 'info';
                     p.appendChild(document.createTextNode('run-webkit-tests exited early due to too many failures/crashes/timeouts'));
@@ -92,6 +93,7 @@ ViewController.prototype = {
             document.title = builder.name;
             document.body.appendChild(header);
             document.body.appendChild(list);
+            document.body.appendChild(self._domForPossiblyFlakyTests(builder, data.possiblyFlaky));
 
             if (!stillFetchingData)
                 PersistentCache.prune();
@@ -196,24 +198,27 @@ ViewController.prototype = {
     },
 
     _domForFailedTest: function(builder, buildName, testName, failureType) {
-        var diagnosticInfo = builder.failureDiagnosisTextAndURL(buildName, testName, failureType);
-
         var result = document.createDocumentFragment();
         result.appendChild(document.createTextNode(testName));
         result.appendChild(document.createTextNode(' ('));
+        result.appendChild(this._domForFailureDiagnosis(builder, buildName, testName, failureType));
+        result.appendChild(document.createTextNode(')'));
+        return result;
+    },
 
-        var textNode = document.createTextNode(diagnosticInfo.text);
-        if ('url' in diagnosticInfo) {
-            var link = document.createElement('a');
-            link.href = diagnosticInfo.url;
-            link.appendChild(textNode);
-            result.appendChild(link);
-        } else
-            result.appendChild(textNode);
+    _domForFailureDiagnosis: function(builder, buildName, testName, failureType) {
+        var diagnosticInfo = builder.failureDiagnosisTextAndURL(buildName, testName, failureType);
+        if (!diagnosticInfo)
+            return document.createTextNode(failureType);
 
-        result.appendChild(document.createTextNode(')'));
+        var textNode = document.createTextNode(diagnosticInfo.text);
+        if (!('url' in diagnosticInfo))
+            return textNode;
 
-        return result;
+        var link = document.createElement('a');
+        link.href = diagnosticInfo.url;
+        link.appendChild(textNode);
+        return link;
     },
 
     _domForNewAndExistingBugs: function(tester, failingBuildName, passingBuildName, failingTests) {
@@ -382,4 +387,36 @@ ViewController.prototype = {
 
         return result;
     },
+
+    _domForPossiblyFlakyTests: function(builder, possiblyFlakyTestData) {
+        var result = document.createDocumentFragment();
+        var flakyTests = Object.keys(possiblyFlakyTestData);
+        if (!flakyTests.length)
+            return result;
+
+        var flakyHeader = document.createElement('h2');
+        result.appendChild(flakyHeader);
+        flakyHeader.appendChild(document.createTextNode('Possibly Flaky Tests'));
+
+        var flakyList = document.createElement('ol');
+        result.appendChild(flakyList);
+
+        var self = this;
+        flakyList.appendChildren(sorted(flakyTests).map(function(testName) {
+            var item = document.createElement('li');
+            item.appendChild(document.createTextNode(testName));
+            var historyList = document.createElement('ol');
+            item.appendChild(historyList);
+            historyList.appendChildren(possiblyFlakyTestData[testName].map(function(historyItem) {
+                var item = document.createElement('li');
+                item.appendChild(self._domForBuildName(builder, historyItem.build));
+                item.appendChild(document.createTextNode(': '));
+                item.appendChild(self._domForFailureDiagnosis(builder, historyItem.build, testName, historyItem.result));
+                return item;
+            }));
+            return item;
+        }));
+
+        return result;
+    },
 };
index eccbffa..1cb59ab 100644 (file)
@@ -30,6 +30,7 @@ THE POSSIBILITY OF SUCH DAMAGE.
     <script src="Bugzilla.js"></script>
     <script src="Buildbot.js"></script>
     <script src="Builder.js"></script>
+    <script src="FlakyLayoutTestDetector.js"></script>
     <script src="LayoutTestHistoryAnalyzer.js"></script>
     <script src="LayoutTestResultsLoader.js"></script>
     <script src="PersistentCache.js"></script>
index 9b37028..b57ddd2 100644 (file)
@@ -1,3 +1,51 @@
+2011-06-29  Adam Roben  <aroben@apple.com>
+
+        Teach TestFailures to detect possibly flaky tests and list them separately
+
+        Fixes <http://webkit.org/b/61061> <rdar://problem/9452796> TestFailures page blames
+        arbitrary revisions for breaking flaky tests
+
+        Reviewed by Dan Bates.
+
+        * BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/FlakyLayoutTestDetector.js: Added.
+        (FlakyLayoutTestDetector): This class identifies flaky tests when given the test results
+        from various builds (in reverse-chronological order).
+        (FlakyLayoutTestDetector.prototype.incorporateTestResults): Detects flaky tests. Tests move
+        monotonically through three states: LastSeenFailing, LastSeenPassing, and PossiblyFlaky.
+        (FlakyLayoutTestDetector.prototype.flakinessExamples): Finds examples of flakiness for the
+        given test. Essentially, finds all the transitions from passing to failing (or vice-versa)
+        and puts them in an array in reverse-chronological order.
+        (FlakyLayoutTestDetector.prototype.get possiblyFlakyTests): Returns all tests we've detected
+        to be possibly flaky.
+
+        * BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/LayoutTestHistoryAnalyzer.js:
+        (LayoutTestHistoryAnalyzer): Initialize new members.
+        (LayoutTestHistoryAnalyzer.prototype.start): Now passes the callback an object with two
+        properties: history and possiblyFlaky. history holds the data this function used to pass to
+        the callback, while possiblyFlaky lists all tests that might be flaky and examples of their
+        flakiness. Updated documentation comment to match.
+        (LayoutTestHistoryAnalyzer.prototype._incorporateBuildHistory): Now uses a
+        FlakyLayoutTestDetector to identify possibly flaky tests. Any possibly flaky tests are
+        removed from the failure history, since when they started failing is no longer meaningful.
+        We tell our caller to keep calling until all current failures have been explained and we've
+        gone through 5 builds without any new flaky tests being identified.
+
+        * BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/Utilities.js:
+        (sorted): New helper function to return a sorted copy of an array.
+        (Array.prototype.findLast): New helper function. Like findFirst, but finds the last item
+        that matches the predicate.
+
+        * BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/ViewController.js:
+        (ViewController.prototype._displayBuilder): Updated for change in the object passed to us by
+        the analyzer. Now puts the list of possibly flaky tests after the failure history.
+        (ViewController.prototype._domForFailedTest): Moved some code from here...
+        (ViewController.prototype._domForFailureDiagnosis): ...to here.
+        (ViewController.prototype._domForPossiblyFlakyTests): New function, builds up a list of
+        possibly flaky tests and examples of their flakiness and returns it.
+
+        * BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/index.html: Pull in
+        FlakyLayoutTestDetector.js.
+
 2011-06-29  Eric Seidel  <eric@webkit.org>
 
         Adam says cowboys don't review (or unit test).