B3 should use associativity to optimize expression trees
authorrmorisset@apple.com <rmorisset@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 4 Apr 2019 03:37:23 +0000 (03:37 +0000)
committerrmorisset@apple.com <rmorisset@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 4 Apr 2019 03:37:23 +0000 (03:37 +0000)
https://bugs.webkit.org/show_bug.cgi?id=194081

Reviewed by Filip Pizlo.

JSTests:

Added three microbenchmarks:
- add-tree should be the ideal case, but there is no speedup because we are currently unable to prove that the CheckAdd won't overflow
- bit-xor-tree most closely matches the situation where the optimization triggers on the JetStream2 subtests where it triggers:
  an unbalanced expression tree of size 8 that can be balanced, with no other optimizations being unlocked. 16% speedup
- bit-or-tree is an ideal case, where the reassociation also enables a ton of further simplifications. 42% speedup

* microbenchmarks/add-tree.js: Added.
* microbenchmarks/bit-or-tree.js: Added.
* microbenchmarks/bit-xor-tree.js: Added.

Source/JavaScriptCore:

This patch adds a new B3 pass, that tries to find and optimize expression trees made purely of any one associative and commutative operator (Add/Mul/BitOr/BitAnd/BitXor).
The pass only runs in O2, and runs once, after lowerMacros and just before a run of B3ReduceStrength (which helps clean up the dead code it tends to leave behind).
I had to separate killDeadCode out of B3ReduceStrength (as a new B3EliminateDeadCode pass) to run it before B3OptimizeAssociativeExpressionTrees, as otherwise it is stopped by high use counts
inherited from CSE.
This extra run of DCE is by itself a win, most notably on microbenchmarks/instanceof-always-hit-two (1.5x faster), and on microbenchmarks/licm-dragons(-out-of-bounds) (both get 1.16x speedup).
I suspect it is because it runs between CSE and tail-dedup, and as a result allows a lot more tail-dedup to occur.

The pass is currently extremely conservative, not trying anything if it would cause _any_ code duplication.
For this purpose, it starts by computing use counts for the potentially interesting nodes (those with the right opcodes), and segregate them into expression trees.
The root of an expression tree is a node that is either used in multiple places, or is used by a value with a different opcode.
The leaves of an expression tree are nodes that are either used in multiple places, or have a different opcode.
All constant leaves of a tree are combined, as well as all leaves that are identical. What remains is then laid out into a balanced binary tree, hopefully maximizing ILP.

This optimization was implemented as a stand-alone pass and not as part of B3ReduceStrength mostly because it needs use counts to avoid code duplication.
It also benefits from finding all tree roots first, and not trying to repeatedly optimize subtrees.

I added several tests to testB3 with varying patterns of trees. It is also tested in a less focused way by lots of older tests.

In the future this pass could be expanded to allow some bounded amount of code duplication, and merging more leaves (e.g. Mul(a, 3) and a in an Add tree, into Mul(a, 4))
The latter will need exposing the peephole optimizations out of B3ReduceStrength to avoid duplicating code.

* JavaScriptCore.xcodeproj/project.pbxproj:
* Sources.txt:
* b3/B3Common.cpp:
(JSC::B3::shouldDumpIR):
(JSC::B3::shouldDumpIRAtEachPhase):
* b3/B3Common.h:
* b3/B3EliminateDeadCode.cpp: Added.
(JSC::B3::EliminateDeadCode::run):
(JSC::B3::eliminateDeadCode):
* b3/B3EliminateDeadCode.h: Added.
(JSC::B3::EliminateDeadCode::EliminateDeadCode):
* b3/B3Generate.cpp:
(JSC::B3::generateToAir):
* b3/B3OptimizeAssociativeExpressionTrees.cpp: Added.
(JSC::B3::OptimizeAssociativeExpressionTrees::OptimizeAssociativeExpressionTrees):
(JSC::B3::OptimizeAssociativeExpressionTrees::neutralElement):
(JSC::B3::OptimizeAssociativeExpressionTrees::isAbsorbingElement):
(JSC::B3::OptimizeAssociativeExpressionTrees::combineConstants):
(JSC::B3::OptimizeAssociativeExpressionTrees::emitValue):
(JSC::B3::OptimizeAssociativeExpressionTrees::optimizeRootedTree):
(JSC::B3::OptimizeAssociativeExpressionTrees::run):
(JSC::B3::optimizeAssociativeExpressionTrees):
* b3/B3OptimizeAssociativeExpressionTrees.h: Added.
* b3/B3ReduceStrength.cpp:
* b3/B3Value.cpp:
(JSC::B3::Value::replaceWithIdentity):
* b3/testb3.cpp:
(JSC::B3::testBitXorTreeArgs):
(JSC::B3::testBitXorTreeArgsEven):
(JSC::B3::testBitXorTreeArgImm):
(JSC::B3::testAddTreeArg32):
(JSC::B3::testMulTreeArg32):
(JSC::B3::testBitAndTreeArg32):
(JSC::B3::testBitOrTreeArg32):
(JSC::B3::run):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@243851 268f45cc-cd09-0410-ab3c-d52691b4dbfc

17 files changed:
JSTests/ChangeLog
JSTests/microbenchmarks/add-tree.js [new file with mode: 0644]
JSTests/microbenchmarks/bit-or-tree.js [new file with mode: 0644]
JSTests/microbenchmarks/bit-xor-tree.js [new file with mode: 0644]
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj
Source/JavaScriptCore/Sources.txt
Source/JavaScriptCore/b3/B3Common.cpp
Source/JavaScriptCore/b3/B3Common.h
Source/JavaScriptCore/b3/B3EliminateDeadCode.cpp [new file with mode: 0644]
Source/JavaScriptCore/b3/B3EliminateDeadCode.h [new file with mode: 0644]
Source/JavaScriptCore/b3/B3Generate.cpp
Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp [new file with mode: 0644]
Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.h [new file with mode: 0644]
Source/JavaScriptCore/b3/B3ReduceStrength.cpp
Source/JavaScriptCore/b3/B3Value.cpp
Source/JavaScriptCore/b3/testb3.cpp

index 975e830..7ab9550 100644 (file)
@@ -1,3 +1,20 @@
+2019-04-03  Robin Morisset  <rmorisset@apple.com>
+
+        B3 should use associativity to optimize expression trees
+        https://bugs.webkit.org/show_bug.cgi?id=194081
+
+        Reviewed by Filip Pizlo.
+
+        Added three microbenchmarks:
+        - add-tree should be the ideal case, but there is no speedup because we are currently unable to prove that the CheckAdd won't overflow
+        - bit-xor-tree most closely matches the situation where the optimization triggers on the JetStream2 subtests where it triggers:
+          an unbalanced expression tree of size 8 that can be balanced, with no other optimizations being unlocked. 16% speedup
+        - bit-or-tree is an ideal case, where the reassociation also enables a ton of further simplifications. 42% speedup
+
+        * microbenchmarks/add-tree.js: Added.
+        * microbenchmarks/bit-or-tree.js: Added.
+        * microbenchmarks/bit-xor-tree.js: Added.
+
 2019-04-03  Yusuke Suzuki  <ysuzuki@apple.com>
 
         [JSC] Exception verification crash on operationArrayIndexOfValueInt32OrContiguous
diff --git a/JSTests/microbenchmarks/add-tree.js b/JSTests/microbenchmarks/add-tree.js
new file mode 100644 (file)
index 0000000..63eb446
--- /dev/null
@@ -0,0 +1,12 @@
+for (var i = 0; i < 20000000; ++i) {
+    var result = i + 1;
+    result += i + 2;
+    result += i + 3;
+    result += i + 4;
+    result += i + 5;
+    result += i + 6;
+    result += i + 7;
+    if (result != i * 7 + 28)
+        throw "Error: bad result at iteration " + i + " : " + result;
+
+}
diff --git a/JSTests/microbenchmarks/bit-or-tree.js b/JSTests/microbenchmarks/bit-or-tree.js
new file mode 100644 (file)
index 0000000..3fc0c35
--- /dev/null
@@ -0,0 +1,13 @@
+for (var i = 0; i < 20000000; ++i) {
+    var result = i;
+    result |= i | 1;
+    result |= i | 2;
+    result |= i | 3;
+    result |= i | 4;
+    result |= i | 5;
+    result |= i | 6;
+    result |= i | 7;
+    if (result !== (i | 7))
+        throw "Error: bad result at iteration " + i + " : " + result;
+
+}
diff --git a/JSTests/microbenchmarks/bit-xor-tree.js b/JSTests/microbenchmarks/bit-xor-tree.js
new file mode 100644 (file)
index 0000000..d4baf43
--- /dev/null
@@ -0,0 +1,13 @@
+var result = 0;
+for (var i = 0; i < 20000000; ++i) {
+    result ^= i;
+    result ^= i + 1;
+    result ^= i + 2;
+    result ^= i + 3;
+    result ^= i + 4;
+    result ^= i + 5;
+    result ^= i + 6;
+    result ^= i + 7;
+}
+if (result !== 0)
+    throw "Error: bad result: " + result;
index 7dfe278..0509933 100644 (file)
@@ -1,3 +1,67 @@
+2019-04-03  Robin Morisset  <rmorisset@apple.com>
+
+        B3 should use associativity to optimize expression trees
+        https://bugs.webkit.org/show_bug.cgi?id=194081
+
+        Reviewed by Filip Pizlo.
+
+        This patch adds a new B3 pass, that tries to find and optimize expression trees made purely of any one associative and commutative operator (Add/Mul/BitOr/BitAnd/BitXor).
+        The pass only runs in O2, and runs once, after lowerMacros and just before a run of B3ReduceStrength (which helps clean up the dead code it tends to leave behind).
+        I had to separate killDeadCode out of B3ReduceStrength (as a new B3EliminateDeadCode pass) to run it before B3OptimizeAssociativeExpressionTrees, as otherwise it is stopped by high use counts
+        inherited from CSE.
+        This extra run of DCE is by itself a win, most notably on microbenchmarks/instanceof-always-hit-two (1.5x faster), and on microbenchmarks/licm-dragons(-out-of-bounds) (both get 1.16x speedup).
+        I suspect it is because it runs between CSE and tail-dedup, and as a result allows a lot more tail-dedup to occur.
+
+        The pass is currently extremely conservative, not trying anything if it would cause _any_ code duplication.
+        For this purpose, it starts by computing use counts for the potentially interesting nodes (those with the right opcodes), and segregate them into expression trees.
+        The root of an expression tree is a node that is either used in multiple places, or is used by a value with a different opcode.
+        The leaves of an expression tree are nodes that are either used in multiple places, or have a different opcode.
+        All constant leaves of a tree are combined, as well as all leaves that are identical. What remains is then laid out into a balanced binary tree, hopefully maximizing ILP.
+
+        This optimization was implemented as a stand-alone pass and not as part of B3ReduceStrength mostly because it needs use counts to avoid code duplication.
+        It also benefits from finding all tree roots first, and not trying to repeatedly optimize subtrees.
+
+        I added several tests to testB3 with varying patterns of trees. It is also tested in a less focused way by lots of older tests.
+
+        In the future this pass could be expanded to allow some bounded amount of code duplication, and merging more leaves (e.g. Mul(a, 3) and a in an Add tree, into Mul(a, 4))
+        The latter will need exposing the peephole optimizations out of B3ReduceStrength to avoid duplicating code.
+
+        * JavaScriptCore.xcodeproj/project.pbxproj:
+        * Sources.txt:
+        * b3/B3Common.cpp:
+        (JSC::B3::shouldDumpIR):
+        (JSC::B3::shouldDumpIRAtEachPhase):
+        * b3/B3Common.h:
+        * b3/B3EliminateDeadCode.cpp: Added.
+        (JSC::B3::EliminateDeadCode::run):
+        (JSC::B3::eliminateDeadCode):
+        * b3/B3EliminateDeadCode.h: Added.
+        (JSC::B3::EliminateDeadCode::EliminateDeadCode):
+        * b3/B3Generate.cpp:
+        (JSC::B3::generateToAir):
+        * b3/B3OptimizeAssociativeExpressionTrees.cpp: Added.
+        (JSC::B3::OptimizeAssociativeExpressionTrees::OptimizeAssociativeExpressionTrees):
+        (JSC::B3::OptimizeAssociativeExpressionTrees::neutralElement):
+        (JSC::B3::OptimizeAssociativeExpressionTrees::isAbsorbingElement):
+        (JSC::B3::OptimizeAssociativeExpressionTrees::combineConstants):
+        (JSC::B3::OptimizeAssociativeExpressionTrees::emitValue):
+        (JSC::B3::OptimizeAssociativeExpressionTrees::optimizeRootedTree):
+        (JSC::B3::OptimizeAssociativeExpressionTrees::run):
+        (JSC::B3::optimizeAssociativeExpressionTrees):
+        * b3/B3OptimizeAssociativeExpressionTrees.h: Added.
+        * b3/B3ReduceStrength.cpp:
+        * b3/B3Value.cpp:
+        (JSC::B3::Value::replaceWithIdentity):
+        * b3/testb3.cpp:
+        (JSC::B3::testBitXorTreeArgs):
+        (JSC::B3::testBitXorTreeArgsEven):
+        (JSC::B3::testBitXorTreeArgImm):
+        (JSC::B3::testAddTreeArg32):
+        (JSC::B3::testMulTreeArg32):
+        (JSC::B3::testBitAndTreeArg32):
+        (JSC::B3::testBitOrTreeArg32):
+        (JSC::B3::run):
+
 2019-04-03  Yusuke Suzuki  <ysuzuki@apple.com>
 
         [JSC] Add dump feature for RandomizingFuzzerAgent
index 4c3000d..f44d036 100644 (file)
                2AD8932B17E3868F00668276 /* HeapIterationScope.h in Headers */ = {isa = PBXBuildFile; fileRef = 2AD8932917E3868F00668276 /* HeapIterationScope.h */; };
                2AF7382D18BBBF92008A5A37 /* StructureIDTable.h in Headers */ = {isa = PBXBuildFile; fileRef = 2AF7382B18BBBF92008A5A37 /* StructureIDTable.h */; settings = {ATTRIBUTES = (Private, ); }; };
                2D342F36F7244096804ADB24 /* SourceOrigin.h in Headers */ = {isa = PBXBuildFile; fileRef = 425BA1337E4344E1B269A671 /* SourceOrigin.h */; settings = {ATTRIBUTES = (Private, ); }; };
+               3395C70722555F6D00BDBFAD /* B3EliminateDeadCode.h in Headers */ = {isa = PBXBuildFile; fileRef = 3395C70522555F6D00BDBFAD /* B3EliminateDeadCode.h */; };
                371D842D17C98B6E00ECF994 /* libz.dylib in Frameworks */ = {isa = PBXBuildFile; fileRef = 371D842C17C98B6E00ECF994 /* libz.dylib */; };
                37C738D21EDB56E4003F2B0B /* ParseInt.h in Headers */ = {isa = PBXBuildFile; fileRef = 37C738D11EDB5672003F2B0B /* ParseInt.h */; settings = {ATTRIBUTES = (Private, ); }; };
                412952771D2CF6BC00E78B89 /* builtins_generate_internals_wrapper_header.py in Headers */ = {isa = PBXBuildFile; fileRef = 412952731D2CF6AC00E78B89 /* builtins_generate_internals_wrapper_header.py */; settings = {ATTRIBUTES = (Private, ); }; };
                2AF7382B18BBBF92008A5A37 /* StructureIDTable.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = StructureIDTable.h; sourceTree = "<group>"; };
                3032175DF1AD47D8998B34E1 /* JSSourceCode.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = JSSourceCode.h; sourceTree = "<group>"; };
                30A5F403F11C4F599CD596D5 /* WasmSignatureInlines.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = WasmSignatureInlines.h; sourceTree = "<group>"; };
+               33743649224D79EF00C8C227 /* B3OptimizeAssociativeExpressionTrees.cpp */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.cpp.cpp; name = B3OptimizeAssociativeExpressionTrees.cpp; path = b3/B3OptimizeAssociativeExpressionTrees.cpp; sourceTree = "<group>"; };
+               3374364A224D79EF00C8C227 /* B3OptimizeAssociativeExpressionTrees.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = B3OptimizeAssociativeExpressionTrees.h; path = b3/B3OptimizeAssociativeExpressionTrees.h; sourceTree = "<group>"; };
+               3395C70422555F6C00BDBFAD /* B3EliminateDeadCode.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = B3EliminateDeadCode.cpp; path = b3/B3EliminateDeadCode.cpp; sourceTree = "<group>"; };
+               3395C70522555F6D00BDBFAD /* B3EliminateDeadCode.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = B3EliminateDeadCode.h; path = b3/B3EliminateDeadCode.h; sourceTree = "<group>"; };
                37119A7720CCB5DC002C6DC9 /* WebKitTargetConditionals.xcconfig */ = {isa = PBXFileReference; lastKnownFileType = text.xcconfig; path = WebKitTargetConditionals.xcconfig; sourceTree = "<group>"; };
                371D842C17C98B6E00ECF994 /* libz.dylib */ = {isa = PBXFileReference; lastKnownFileType = "compiled.mach-o.dylib"; name = libz.dylib; path = usr/lib/libz.dylib; sourceTree = SDKROOT; };
                37C738D11EDB5672003F2B0B /* ParseInt.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = ParseInt.h; sourceTree = "<group>"; };
                                0FEC85BE1BE167A00080FF74 /* B3Effects.h */,
                                0F725CA31C503DED00AD943A /* B3EliminateCommonSubexpressions.cpp */,
                                0F725CA41C503DED00AD943A /* B3EliminateCommonSubexpressions.h */,
+                               3395C70422555F6C00BDBFAD /* B3EliminateDeadCode.cpp */,
+                               3395C70522555F6D00BDBFAD /* B3EliminateDeadCode.h */,
                                0F5BF16E1F23A5A10029D91D /* B3EnsureLoopPreHeaders.cpp */,
                                0F5BF16F1F23A5A10029D91D /* B3EnsureLoopPreHeaders.h */,
                                0F6971E81D92F42100BA02A5 /* B3FenceValue.cpp */,
                                0F338E071BF0276C0013C88F /* B3OpaqueByproducts.h */,
                                0FEC84D71BDACDAC0080FF74 /* B3Opcode.cpp */,
                                0FEC84D81BDACDAC0080FF74 /* B3Opcode.h */,
+                               33743649224D79EF00C8C227 /* B3OptimizeAssociativeExpressionTrees.cpp */,
+                               3374364A224D79EF00C8C227 /* B3OptimizeAssociativeExpressionTrees.h */,
                                0FEC84D91BDACDAC0080FF74 /* B3Origin.cpp */,
                                0FEC84DA1BDACDAC0080FF74 /* B3Origin.h */,
                                0F4DE1D01C4D764B004D6C11 /* B3OriginDump.cpp */,
                                0F48532A187DFDEC0083B687 /* FTLRecoveryOpcode.h in Headers */,
                                0FCEFAAC1804C13E00472CE4 /* FTLSaveRestore.h in Headers */,
                                0F25F1B2181635F300522F39 /* FTLSlowPathCall.h in Headers */,
+                               3395C70722555F6D00BDBFAD /* B3EliminateDeadCode.h in Headers */,
                                0F25F1B4181635F300522F39 /* FTLSlowPathCallKey.h in Headers */,
                                E322E5A71DA644A8006E7709 /* FTLSnippetParams.h in Headers */,
                                0F235BD717178E1C00690C7F /* FTLStackmapArgumentList.h in Headers */,
index daadc06..d9cd467 100644 (file)
@@ -124,6 +124,7 @@ b3/B3DataSection.cpp
 b3/B3DuplicateTails.cpp
 b3/B3Effects.cpp
 b3/B3EliminateCommonSubexpressions.cpp
+b3/B3EliminateDeadCode.cpp
 b3/B3EnsureLoopPreHeaders.cpp
 b3/B3FenceValue.cpp
 b3/B3FixSSA.cpp
@@ -143,6 +144,7 @@ b3/B3MemoryValue.cpp
 b3/B3MoveConstants.cpp
 b3/B3OpaqueByproducts.cpp
 b3/B3Opcode.cpp
+b3/B3OptimizeAssociativeExpressionTrees.cpp
 b3/B3Origin.cpp
 b3/B3OriginDump.cpp
 b3/B3PatchpointSpecial.cpp
index 40bb32b..0e2ce3a 100644 (file)
@@ -35,7 +35,7 @@
 
 namespace JSC { namespace B3 {
 
-bool shouldDumpIR(B3ComplitationMode mode)
+bool shouldDumpIR(B3CompilationMode mode)
 {
 #if ENABLE(FTL_JIT)
     return FTL::verboseCompilationEnabled() || FTL::shouldDumpDisassembly() || shouldDumpIRAtEachPhase(mode);
@@ -44,7 +44,7 @@ bool shouldDumpIR(B3ComplitationMode mode)
 #endif
 }
 
-bool shouldDumpIRAtEachPhase(B3ComplitationMode mode)
+bool shouldDumpIRAtEachPhase(B3CompilationMode mode)
 {
     if (mode == B3Mode)
         return Options::dumpGraphAtEachPhase() || Options::dumpB3GraphAtEachPhase();
index 75cc092..ed1d02d 100644 (file)
 
 namespace JSC { namespace B3 {
 
-enum B3ComplitationMode {
+enum B3CompilationMode {
     B3Mode,
     AirMode
 };
 
-JS_EXPORT_PRIVATE bool shouldDumpIR(B3ComplitationMode);
-bool shouldDumpIRAtEachPhase(B3ComplitationMode);
+JS_EXPORT_PRIVATE bool shouldDumpIR(B3CompilationMode);
+bool shouldDumpIRAtEachPhase(B3CompilationMode);
 bool shouldValidateIR();
 bool shouldValidateIRAtEachPhase();
 bool shouldSaveIRBeforePhase();
diff --git a/Source/JavaScriptCore/b3/B3EliminateDeadCode.cpp b/Source/JavaScriptCore/b3/B3EliminateDeadCode.cpp
new file mode 100644 (file)
index 0000000..117ea09
--- /dev/null
@@ -0,0 +1,118 @@
+/*
+ * Copyright (C) 2015-2019 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include "config.h"
+#include "B3EliminateDeadCode.h"
+
+#if ENABLE(B3_JIT)
+
+#include "B3PhaseScope.h"
+#include "B3ProcedureInlines.h"
+#include "B3ValueInlines.h"
+#include "B3Variable.h"
+#include "B3VariableValue.h"
+#include <wtf/GraphNodeWorklist.h>
+#include <wtf/IndexSet.h>
+#include <wtf/Vector.h>
+
+namespace JSC { namespace B3 {
+
+// FIXME: this pass currently only eliminates values and variables, it does not seem to eliminate dead blocks.
+
+bool eliminateDeadCodeImpl(Procedure& proc)
+{
+    bool changed = false;
+    GraphNodeWorklist<Value*, IndexSet<Value*>> worklist;
+    Vector<UpsilonValue*, 64> upsilons;
+    for (BasicBlock* block : proc) {
+        for (Value* value : *block) {
+            Effects effects;
+            // We don't care about effects of SSA operations, since we model them more
+            // accurately than the effects() method does.
+            if (value->opcode() != Phi && value->opcode() != Upsilon)
+                effects = value->effects();
+            
+            if (effects.mustExecute())
+                worklist.push(value);
+            
+            if (UpsilonValue* upsilon = value->as<UpsilonValue>())
+                upsilons.append(upsilon);
+        }
+    }
+    for (;;) {
+        while (Value* value = worklist.pop()) {
+            for (Value* child : value->children())
+                worklist.push(child);
+        }
+        
+        bool didPush = false;
+        for (size_t upsilonIndex = 0; upsilonIndex < upsilons.size(); ++upsilonIndex) {
+            UpsilonValue* upsilon = upsilons[upsilonIndex];
+            if (worklist.saw(upsilon->phi())) {
+                worklist.push(upsilon);
+                upsilons[upsilonIndex--] = upsilons.last();
+                upsilons.takeLast();
+                didPush = true;
+            }
+        }
+        if (!didPush)
+            break;
+    }
+
+    IndexSet<Variable*> liveVariables;
+    
+    for (BasicBlock* block : proc) {
+        size_t sourceIndex = 0;
+        size_t targetIndex = 0;
+        while (sourceIndex < block->size()) {
+            Value* value = block->at(sourceIndex++);
+            if (worklist.saw(value)) {
+                if (VariableValue* variableValue = value->as<VariableValue>())
+                    liveVariables.add(variableValue->variable());
+                block->at(targetIndex++) = value;
+            } else {
+                proc.deleteValue(value);
+                changed = true;
+            }
+        }
+        block->values().resize(targetIndex);
+    }
+
+    for (Variable* variable : proc.variables()) {
+        if (!liveVariables.contains(variable))
+            proc.deleteVariable(variable);
+    }
+    return changed;
+}
+
+bool eliminateDeadCode(Procedure& proc)
+{
+    PhaseScope phaseScope(proc, "eliminateDeadCode");
+    return eliminateDeadCodeImpl(proc);
+}
+
+} } // namespace JSC::B3
+
+#endif // ENABLE(B3_JIT)
diff --git a/Source/JavaScriptCore/b3/B3EliminateDeadCode.h b/Source/JavaScriptCore/b3/B3EliminateDeadCode.h
new file mode 100644 (file)
index 0000000..3be74ab
--- /dev/null
@@ -0,0 +1,41 @@
+/*
+ * Copyright (C) 2019 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#pragma once
+
+#if ENABLE(B3_JIT)
+
+namespace JSC { namespace B3 {
+
+class Procedure;
+
+// The 'Impl' version of this function does not have a scope phase. It is public so that B3ReduceStrength can use it in its fixpoint.
+bool eliminateDeadCodeImpl(Procedure&);
+JS_EXPORT_PRIVATE bool eliminateDeadCode(Procedure&);
+
+} } // namespace JSC::B3
+
+#endif // ENABLE(B3_JIT)
+
index a13854f..e370e21 100644 (file)
@@ -34,6 +34,7 @@
 #include "B3Common.h"
 #include "B3DuplicateTails.h"
 #include "B3EliminateCommonSubexpressions.h"
+#include "B3EliminateDeadCode.h"
 #include "B3FixSSA.h"
 #include "B3FoldPathConstants.h"
 #include "B3HoistLoopInvariantValues.h"
@@ -43,6 +44,7 @@
 #include "B3LowerMacrosAfterOptimizations.h"
 #include "B3LowerToAir.h"
 #include "B3MoveConstants.h"
+#include "B3OptimizeAssociativeExpressionTrees.h"
 #include "B3Procedure.h"
 #include "B3PureCSE.h"
 #include "B3ReduceDoubleToFloat.h"
@@ -87,12 +89,12 @@ void generateToAir(Procedure& procedure)
         hoistLoopInvariantValues(procedure);
         if (eliminateCommonSubexpressions(procedure))
             eliminateCommonSubexpressions(procedure);
+        eliminateDeadCode(procedure);
         inferSwitches(procedure);
         if (Options::useB3TailDup())
             duplicateTails(procedure);
         fixSSA(procedure);
         foldPathConstants(procedure);
-        
         // FIXME: Add more optimizations here.
         // https://bugs.webkit.org/show_bug.cgi?id=150507
     } else if (procedure.optLevel() >= 1) {
@@ -104,6 +106,7 @@ void generateToAir(Procedure& procedure)
     lowerMacros(procedure);
 
     if (procedure.optLevel() >= 2) {
+        optimizeAssociativeExpressionTrees(procedure);
         reduceStrength(procedure);
 
         // FIXME: Add more optimizations here.
diff --git a/Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp b/Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.cpp
new file mode 100644 (file)
index 0000000..8125a3c
--- /dev/null
@@ -0,0 +1,302 @@
+/*
+ * Copyright (C) 2019 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "config.h"
+#include "B3OptimizeAssociativeExpressionTrees.h"
+
+#if ENABLE(B3_JIT)
+
+#include "B3BasicBlock.h"
+#include "B3Const32Value.h"
+#include "B3Const64Value.h"
+#include "B3InsertionSet.h"
+#include "B3Opcode.h"
+#include "B3PhaseScope.h"
+#include "B3Procedure.h"
+#include "B3Value.h"
+#include "B3ValueInlines.h"
+
+namespace JSC { namespace B3 {
+
+class OptimizeAssociativeExpressionTrees {
+public:
+    OptimizeAssociativeExpressionTrees(Procedure& proc)
+        : m_proc(proc)
+    {
+    }
+
+    bool run();
+
+private:
+    int64_t neutralElement(Opcode);
+    bool isAbsorbingElement(Opcode, int64_t);
+    void combineConstants(Opcode, int64_t&, int64_t);
+    void emitValue(Opcode, Value*, unsigned numSeen, InsertionSet&, size_t indexInBlock, Vector<Value*, 4>& results);
+    bool optimizeRootedTree(Value* root, InsertionSet&, size_t indexInBlock, const Vector<unsigned>& useCounts);
+
+    Procedure& m_proc;
+    bool verbose { false };
+};
+
+int64_t OptimizeAssociativeExpressionTrees::neutralElement(Opcode op)
+{
+    switch (op) {
+    case Add:
+    case BitOr:
+    case BitXor:
+        return 0;
+    case Mul:
+        return 1;
+    case BitAnd:
+        return -1;
+    default:
+        RELEASE_ASSERT_NOT_REACHED();
+    }
+}
+
+bool OptimizeAssociativeExpressionTrees::isAbsorbingElement(Opcode op, int64_t constant)
+{
+    switch (op) {
+    case Add:
+    case BitXor:
+        return false;
+    case Mul:
+    case BitAnd:
+        return !constant;
+    case BitOr:
+        return constant == -1;
+    default:
+        RELEASE_ASSERT_NOT_REACHED();
+    }
+}
+
+void OptimizeAssociativeExpressionTrees::combineConstants(Opcode op, int64_t& const1, int64_t const2)
+{
+    switch (op) {
+    case Add:
+        const1 += const2;
+        break;
+    case Mul:
+        const1 *= const2;
+        break;
+    case BitAnd:
+        const1 &= const2;
+        break;
+    case BitOr:
+        const1 |= const2;
+        break;
+    case BitXor:
+        const1 ^= const2;
+        break;
+    default:
+        RELEASE_ASSERT_NOT_REACHED();
+    }
+}
+
+void OptimizeAssociativeExpressionTrees::emitValue(Opcode op, Value* value, unsigned numSeen, InsertionSet& insertionSet, size_t indexInBlock, Vector<Value*, 4>& results)
+{
+    switch (op) {
+    case Add:
+        if (numSeen > 1) {
+            Value* constNumSeen;
+            if (value->type() == Int32)
+                constNumSeen = insertionSet.insert<Const32Value>(indexInBlock, value->origin(), numSeen);
+            else
+                constNumSeen = insertionSet.insert<Const64Value>(indexInBlock, value->origin(), static_cast<int64_t>(numSeen));
+            results.append(insertionSet.insert<Value>(indexInBlock, Mul, value->origin(), value, constNumSeen));
+        } else
+            results.append(value);
+        break;
+    case Mul:
+        for (unsigned i = 0; i < numSeen; ++i)
+            results.append(value);
+        break;
+    case BitAnd:
+    case BitOr:
+        results.append(value);
+        break;
+    case BitXor:
+        if (numSeen % 2)
+            results.append(value);
+        break;
+    default:
+        RELEASE_ASSERT_NOT_REACHED();
+    }
+}
+
+bool OptimizeAssociativeExpressionTrees::optimizeRootedTree(Value* root, InsertionSet& insertionSet, size_t indexInBlock, const Vector<unsigned>& useCounts)
+{
+    Opcode op = root->opcode();
+    if ((root->child(0)->opcode() != op || useCounts[root->child(0)->index()] > 1)
+        && (root->child(1)->opcode() != op || useCounts[root->child(1)->index()] > 1)) {
+        // This is a trivial expression tree of size two, we have nothing to do here that B3ReduceStrength cannot do better than us.
+        return false;
+    }
+
+    // We proceed in three steps:
+    // - gather all the leaves of the expression tree
+    // - sort them, and combine as many as possible
+    // - make a balanced binary tree of them
+
+    Vector<Value*, 4> leaves;
+    Vector<Value*, 3> worklist = { root->child(0), root->child(1) };
+    int64_t constant = neutralElement(op);
+    unsigned numVisited = 0;
+    while (!worklist.isEmpty()) {
+        Value* val = worklist.takeLast();
+        if (val->opcode() == op && useCounts[val->index()] < 2) {
+            worklist.append(val->child(0));
+            worklist.append(val->child(1));
+        } else if (val->hasInt()) {
+            combineConstants(op, constant, val->asInt());
+            numVisited++;
+        } else {
+            numVisited++;
+            leaves.append(val);
+        }
+    }
+    if (isAbsorbingElement(op, constant)) {
+        Value* newRoot;
+        if (root->type() == Int32)
+            newRoot = insertionSet.insert<Const32Value>(indexInBlock, root->origin(), static_cast<int32_t>(constant));
+        else
+            newRoot = insertionSet.insert<Const64Value>(indexInBlock, root->origin(), constant);
+        root->replaceWithIdentity(newRoot);
+        return true;
+    }
+    if (numVisited < 4) {
+        // This is a nearly-trivial expression of size 3. B3ReduceStrength is still able to deal with such expressions competently, and there is no possible win from balancing them.
+        return false;
+    }
+
+    std::sort(leaves.begin(), leaves.end(), [](Value* x, Value* y) {
+        return x->index() < y->index();
+    });
+    Vector<Value*, 4> optLeaves;
+    Value* lastValue = nullptr;
+    unsigned numSeen = 0;
+    for (Value* value : leaves) {
+        if (lastValue == value)
+            numSeen++;
+        else {
+            if (lastValue)
+                emitValue(op, lastValue, numSeen, insertionSet, indexInBlock, optLeaves);
+            lastValue = value;
+            numSeen = 1;
+        }
+    }
+    if (lastValue)
+        emitValue(op, lastValue, numSeen, insertionSet, indexInBlock, optLeaves);
+
+    // optLeaves can be empty for trees of BitXor where all leaves happen an even number of times.
+    // In that case, we make the whole tree equivalent to the neutral element (which is 0 for BitXor).
+    if (constant != neutralElement(op) || optLeaves.isEmpty()) {
+        if (root->type() == Int32)
+            optLeaves.append(insertionSet.insert<Const32Value>(indexInBlock, root->origin(), static_cast<int32_t>(constant)));
+        else
+            optLeaves.append(insertionSet.insert<Const64Value>(indexInBlock, root->origin(), constant));
+    }
+
+    if (verbose) {
+        dataLog("      Expression tree rooted at ", *root, " (", root->opcode(), ") with leaves (numVisited = ", numVisited, ") ");
+        for (Value* leaf : leaves)
+            dataLog(" ", *leaf);
+        dataLog(" =>");
+        for (Value* leaf : optLeaves)
+            dataLog(" ", *leaf);
+        dataLog("\n");
+    }
+
+    // Finally we can build the balanced binary tree
+    unsigned leafIndex = 0;
+    while (leafIndex + 1 < optLeaves.size()) {
+        optLeaves.append(insertionSet.insert<Value>(indexInBlock, op, root->origin(), optLeaves[leafIndex], optLeaves[leafIndex + 1]));
+        leafIndex += 2;
+    }
+    ASSERT(leafIndex == optLeaves.size() - 1);
+    root->replaceWithIdentity(optLeaves[leafIndex]);
+    return true;
+}
+
+bool OptimizeAssociativeExpressionTrees::run()
+{
+    bool changed = false;
+
+    // We proceed in two phases.
+    // In the first one we compute the use counts of each value (of an interesting opcode), and find potential roots of interesting expression trees.
+    // In the second one we optimize each such expression tree in turn.
+    // We need the use counts to avoid duplicating code.
+
+    Vector<unsigned> useCounts(m_proc.values().size(), 0); // Mapping from Value::m_index to use counts.
+    HashSet<Value*> expressionTreeRoots;
+    HashSet<BasicBlock*> rootOwners;
+
+    for (BasicBlock* block : m_proc) {
+        for (Value* value : *block) {
+            for (Value* child : value->children()) {
+                if (!child->isInteger())
+                    continue;
+                switch (child->opcode()) {
+                case Mul:
+                case Add:
+                case BitAnd:
+                case BitOr:
+                case BitXor:
+                    useCounts[child->index()]++;
+                    if (child->opcode() != value->opcode() || useCounts[child->index()] > 1) {
+                        expressionTreeRoots.add(child);
+                        rootOwners.add(child->owner);
+                    }
+                    break;
+                default:
+                    break;
+                }
+            }
+        }
+    }
+
+    InsertionSet insertionSet = InsertionSet(m_proc);
+    for (BasicBlock* block : rootOwners) {
+        for (unsigned index = 0; index < block->size(); ++index) {
+            Value* value = block->at(index);
+            if (expressionTreeRoots.contains(value))
+                changed |= optimizeRootedTree(value, insertionSet, index, useCounts);
+        }
+        insertionSet.execute(block);
+    }
+
+    return changed;
+}
+
+bool optimizeAssociativeExpressionTrees(Procedure& proc)
+{
+    PhaseScope phaseScope(proc, "optimizeAssociativeExpressionTrees");
+    OptimizeAssociativeExpressionTrees optimizeAssociativeExpressionTrees(proc);
+    return optimizeAssociativeExpressionTrees.run();
+}
+
+} } // namespace JSC::B3
+
+#endif // ENABLE(B3_JIT)
diff --git a/Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.h b/Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.h
new file mode 100644 (file)
index 0000000..58602e2
--- /dev/null
@@ -0,0 +1,41 @@
+/*
+ * Copyright (C) 2019 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#pragma once
+
+#if ENABLE(B3_JIT)
+
+namespace JSC { namespace B3 {
+
+class Procedure;
+
+// Finds large expression trees using one of the associative and commutative opcodes (Add/Mul/BitAnd/BitOr/BitXor),
+// then reorder them to eliminate as many operations as possible, then make balanced binary trees out of them
+// to maximize ILP and minimize the total latency.
+bool optimizeAssociativeExpressionTrees(Procedure&);
+
+} } // namespace JSC::B3
+
+#endif // ENABLE(B3_JIT)
index 3cb9d14..9e62159 100644 (file)
@@ -33,6 +33,7 @@
 #include "B3BlockInsertionSet.h"
 #include "B3ComputeDivisionMagic.h"
 #include "B3Dominators.h"
+#include "B3EliminateDeadCode.h"
 #include "B3InsertionSetInlines.h"
 #include "B3MemoryValueInlines.h"
 #include "B3PhaseScope.h"
@@ -440,7 +441,7 @@ public:
             // "hoist" @thing. On the other hand, if we run DCE before CSE, we will kill @dead and
             // keep @thing. That's better, since we usually want things to stay wherever the client
             // put them. We're not actually smart enough to move things around at random.
-            killDeadCode();
+            m_changed |= eliminateDeadCodeImpl(m_proc);
             
             simplifySSA();
             
@@ -2732,70 +2733,6 @@ private:
         }
     }
 
-    void killDeadCode()
-    {
-        GraphNodeWorklist<Value*, IndexSet<Value*>> worklist;
-        Vector<UpsilonValue*, 64> upsilons;
-        for (BasicBlock* block : m_proc) {
-            for (Value* value : *block) {
-                Effects effects;
-                // We don't care about effects of SSA operations, since we model them more
-                // accurately than the effects() method does.
-                if (value->opcode() != Phi && value->opcode() != Upsilon)
-                    effects = value->effects();
-                
-                if (effects.mustExecute())
-                    worklist.push(value);
-                
-                if (UpsilonValue* upsilon = value->as<UpsilonValue>())
-                    upsilons.append(upsilon);
-            }
-        }
-        for (;;) {
-            while (Value* value = worklist.pop()) {
-                for (Value* child : value->children())
-                    worklist.push(child);
-            }
-            
-            bool didPush = false;
-            for (size_t upsilonIndex = 0; upsilonIndex < upsilons.size(); ++upsilonIndex) {
-                UpsilonValue* upsilon = upsilons[upsilonIndex];
-                if (worklist.saw(upsilon->phi())) {
-                    worklist.push(upsilon);
-                    upsilons[upsilonIndex--] = upsilons.last();
-                    upsilons.takeLast();
-                    didPush = true;
-                }
-            }
-            if (!didPush)
-                break;
-        }
-
-        IndexSet<Variable*> liveVariables;
-        
-        for (BasicBlock* block : m_proc) {
-            size_t sourceIndex = 0;
-            size_t targetIndex = 0;
-            while (sourceIndex < block->size()) {
-                Value* value = block->at(sourceIndex++);
-                if (worklist.saw(value)) {
-                    if (VariableValue* variableValue = value->as<VariableValue>())
-                        liveVariables.add(variableValue->variable());
-                    block->at(targetIndex++) = value;
-                } else {
-                    m_proc.deleteValue(value);
-                    m_changed = true;
-                }
-            }
-            block->values().resize(targetIndex);
-        }
-
-        for (Variable* variable : m_proc.variables()) {
-            if (!liveVariables.contains(variable))
-                m_proc.deleteVariable(variable);
-        }
-    }
-
     void simplifySSA()
     {
         // This runs Aycock and Horspool's algorithm on our Phi functions [1]. For most CFG patterns,
index d6704ca..dfcc9e4 100644 (file)
@@ -63,6 +63,7 @@ void Value::replaceWithIdentity(Value* value)
     // previous value in place, and then we construct the Identity Value in place.
 
     ASSERT(m_type == value->m_type);
+    ASSERT(value != this);
 
     if (m_type == Void) {
         replaceWithNopIgnoringType();
index ff62d36..aa8e499 100644 (file)
@@ -38,6 +38,7 @@
 #include "B3Compile.h"
 #include "B3ComputeDivisionMagic.h"
 #include "B3Const32Value.h"
+#include "B3Const64Value.h"
 #include "B3ConstPtrValue.h"
 #include "B3Effects.h"
 #include "B3FenceValue.h"
@@ -398,6 +399,132 @@ void testLoadOffsetScaledUnsignedOverImm12Max()
     testLoadWithOffsetImpl(32768, 16384);
 }
 
+void testBitXorTreeArgs(int64_t a, int64_t b)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    Value* argA = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
+    Value* argB = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1);
+    Value* node = root->appendNew<Value>(proc, BitXor, Origin(), argA, argB);
+    node = root->appendNew<Value>(proc, BitXor, Origin(), node, argB);
+    node = root->appendNew<Value>(proc, BitXor, Origin(), node, argA);
+    node = root->appendNew<Value>(proc, BitXor, Origin(), node, argB);
+    root->appendNew<Value>(proc, Return, Origin(), node);
+
+    CHECK_EQ(compileAndRun<int64_t>(proc, a, b), (((a ^ b) ^ b) ^ a) ^ b);
+}
+
+void testBitXorTreeArgsEven(int64_t a, int64_t b)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    Value* argA = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
+    Value* argB = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1);
+    Value* node = root->appendNew<Value>(proc, BitXor, Origin(), argA, argB);
+    node = root->appendNew<Value>(proc, BitXor, Origin(), node, argB);
+    node = root->appendNew<Value>(proc, BitXor, Origin(), node, argA);
+    root->appendNew<Value>(proc, Return, Origin(), node);
+
+    CHECK_EQ(compileAndRun<int64_t>(proc, a, b), ((a ^ b) ^ b) ^ a);
+}
+
+void testBitXorTreeArgImm(int64_t a, int64_t b)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    Value* argA = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
+    Value* immB = root->appendNew<Const64Value>(proc, Origin(), b);
+    Value* node = root->appendNew<Value>(proc, BitXor, Origin(), argA, immB);
+    node = root->appendNew<Value>(proc, BitXor, Origin(), argA, node);
+    node = root->appendNew<Value>(proc, BitXor, Origin(), argA, node);
+    node = root->appendNew<Value>(proc, BitXor, Origin(), immB, node);
+    root->appendNew<Value>(proc, Return, Origin(), node);
+
+    CHECK_EQ(compileAndRun<int64_t>(proc, a), b ^ (a ^ (a ^ (a ^ b))));
+}
+
+void testAddTreeArg32(int32_t a)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    Value* argA = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
+    argA = root->appendNew<Value>(proc, Trunc, Origin(), argA);
+    Value* node = argA;
+    int32_t expectedResult = a;
+    for (unsigned i = 0; i < 20; ++i) {
+        Value* otherNode;
+        if (!(i % 3)) {
+            otherNode = root->appendNew<Const32Value>(proc, Origin(), i);
+            expectedResult += i;
+        } else {
+            otherNode = argA;
+            expectedResult += a;
+        }
+        node = root->appendNew<Value>(proc, Add, Origin(), node, otherNode);
+    }
+    root->appendNew<Value>(proc, Return, Origin(), node);
+
+    CHECK_EQ(compileAndRun<int32_t>(proc, a), expectedResult);
+}
+
+void testMulTreeArg32(int32_t a)
+{
+    // Fibonacci-like expression tree with multiplication instead of addition.
+    // Verifies that we don't explode on heavily factored graphs.
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    Value* argA = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
+    argA = root->appendNew<Value>(proc, Trunc, Origin(), argA);
+    Value* nodeA = argA;
+    Value* nodeB = argA;
+    int32_t expectedA = a, expectedResult = a;
+    for (unsigned i = 0; i < 20; ++i) {
+        Value* newNodeB = root->appendNew<Value>(proc, Mul, Origin(), nodeA, nodeB);
+        nodeA = nodeB;
+        nodeB = newNodeB;
+        int32_t newExpectedResult = expectedA * expectedResult;
+        expectedA = expectedResult;
+        expectedResult = newExpectedResult;
+    }
+    root->appendNew<Value>(proc, Return, Origin(), nodeB);
+
+    CHECK_EQ(compileAndRun<int32_t>(proc, a), expectedResult);
+}
+
+void testBitAndTreeArg32(int32_t a)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    Value* argA = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
+    argA = root->appendNew<Value>(proc, Trunc, Origin(), argA);
+    Value* node = argA;
+    for (unsigned i = 0; i < 8; ++i) {
+        Value* constI = root->appendNew<Const32Value>(proc, Origin(), i | 42);
+        Value* newBitAnd = root->appendNew<Value>(proc, BitAnd, Origin(), argA, constI);
+        node = root->appendNew<Value>(proc, BitAnd, Origin(), node, newBitAnd);
+    }
+    root->appendNew<Value>(proc, Return, Origin(), node);
+
+    CHECK_EQ(compileAndRun<int32_t>(proc, a), a & 42);
+}
+
+void testBitOrTreeArg32(int32_t a)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    Value* argA = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
+    argA = root->appendNew<Value>(proc, Trunc, Origin(), argA);
+    Value* node = argA;
+    for (unsigned i = 0; i < 8; ++i) {
+        Value* constI = root->appendNew<Const32Value>(proc, Origin(), i);
+        Value* newBitAnd = root->appendNew<Value>(proc, BitOr, Origin(), argA, constI);
+        node = root->appendNew<Value>(proc, BitOr, Origin(), node, newBitAnd);
+    }
+    root->appendNew<Value>(proc, Return, Origin(), node);
+
+    CHECK_EQ(compileAndRun<int32_t>(proc, a), a | 7);
+}
+
 void testArg(int argument)
 {
     Procedure proc;
@@ -17048,6 +17175,14 @@ void run(const char* filter)
     RUN(testReturnConst64(-42));
     RUN(testReturnVoid());
 
+    RUN_BINARY(testBitXorTreeArgs, int64Operands(), int64Operands());
+    RUN_BINARY(testBitXorTreeArgsEven, int64Operands(), int64Operands());
+    RUN_BINARY(testBitXorTreeArgImm, int64Operands(), int64Operands());
+    RUN_UNARY(testAddTreeArg32, int32Operands());
+    RUN_UNARY(testMulTreeArg32, int32Operands());
+    RUN_UNARY(testBitAndTreeArg32, int32Operands());
+    RUN_UNARY(testBitOrTreeArg32, int32Operands());
+
     RUN(testAddArg(111));
     RUN(testAddArgs(1, 1));
     RUN(testAddArgs(1, 2));