CheckAdd/Mul should have commutativity optimizations in B3->Air lowering
authorfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Tue, 17 Nov 2015 22:31:40 +0000 (22:31 +0000)
committerfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Tue, 17 Nov 2015 22:31:40 +0000 (22:31 +0000)
https://bugs.webkit.org/show_bug.cgi?id=151214

Reviewed by Geoffrey Garen.

Source/JavaScriptCore:

This is an overhaul of CheckAdd/CheckSub/CheckMul that fixes bugs, improves codegen, and
simplifies the contract between B3 and its client.

Previously, the idea was that the operands to the Air BranchAdd/Sub/Mul matched the children of
the B3 CheckAdd/Sub/Mul, or at least, that's what the B3::CheckSpecial would make you believe.
This meant that B3/Air had to avoid optimizations that would break this protocol. This prevented
commutativity optimizations on CheckAdd/Mul and it also prevented strength reduction from
CheckMul(x, 2) to CheckAdd(x, x), for example. Those optimizations would break things because the
client's Stackmap generation callback was allowed to assume that the first entry in params.reps
was the first child. Also, there was a contract between B3 and its client that for CheckAdd/Sub,
the client would undo the operation by doing the opposite operation with the params.reps[0] as the
source and params.reps[1] as the destination.

This not only prevented commutativity optimizations, it also led to bugs. Here are two bugs that
we had:

- Add(x, x) would not work. The client would be told to undo the add using %x as both the source
  and destination. The client would use a sub() instruction. The result would not be x - it would
  be zero.

- Mul where the result got coalesced with one of the sources. You can't undo a multiplication, so
  you need to keep the inputs alive until after the result is computed - i.e. the inputs are late
  uses. I initially thought I worked around this by using a three-operand form of Mul, but of
  course that doesn't actually fix the issue.

This patch fixes these issues comprehensively by introducing the following changes:

The params.reps corresponding to the first two children of CheckAdd/Sub/Mul and the first child of
Check are always garbage: if you want to know the values of those children in the slow path, pass
them as additional stackmap children. This makes it clear to the compiler whose values you
actually care about, and frees the compiler to reorder and commute the non-stackmap children.

The "undo" of an Add or Sub is handled internally by B3: the client doesn't have to know anything
about undoing. B3 does it. The contract is simply that if a B3::Value* is a stackmap child of a
CheckXYZ, then the corresponding entry in params.reps inside the stackmap generator callback will
contain the value of that Value*. For Add and Sub, B3 undoes the operation. For Add(x, x), the
undo uses the carry flag and some shift magic. For Mul, B3 uses LateUse:

A new kind of Arg::Role called Arg::LateUse is introduced: This kind of use means that the use
happens at the start of the execution of the next instruction. None of the built-in Air opcodes
use LateUse, but it is used by B3::CheckSpecial. We use it to implement CheckMul.

Finally, this change fixes testComplex to actually create many live variables. This revealed a
really dumb pathology in allocateStack(), and this patch fixes it. Basically, it's a bad idea to
build interference graphs by repeatedly recreating the same cliques.

* assembler/MacroAssemblerX86Common.h:
(JSC::MacroAssemblerX86Common::test32):
(JSC::MacroAssemblerX86Common::setCarry):
(JSC::MacroAssemblerX86Common::invert):
* b3/B3CheckSpecial.cpp:
(JSC::B3::Air::numB3Args):
(JSC::B3::CheckSpecial::Key::Key):
(JSC::B3::CheckSpecial::Key::dump):
(JSC::B3::CheckSpecial::CheckSpecial):
(JSC::B3::CheckSpecial::forEachArg):
(JSC::B3::CheckSpecial::isValid):
(JSC::B3::CheckSpecial::generate):
(JSC::B3::CheckSpecial::dumpImpl):
(JSC::B3::CheckSpecial::deepDumpImpl):
* b3/B3CheckSpecial.h:
(JSC::B3::CheckSpecial::Key::Key):
(JSC::B3::CheckSpecial::Key::operator==):
(JSC::B3::CheckSpecial::Key::operator!=):
(JSC::B3::CheckSpecial::Key::opcode):
(JSC::B3::CheckSpecial::Key::numArgs):
(JSC::B3::CheckSpecial::Key::stackmapRole):
(JSC::B3::CheckSpecial::Key::hash):
* b3/B3CheckValue.cpp:
(JSC::B3::CheckValue::~CheckValue):
(JSC::B3::CheckValue::convertToAdd):
(JSC::B3::CheckValue::CheckValue):
* b3/B3CheckValue.h:
* b3/B3Const32Value.cpp:
(JSC::B3::Const32Value::checkMulConstant):
(JSC::B3::Const32Value::checkNegConstant):
(JSC::B3::Const32Value::divConstant):
* b3/B3Const32Value.h:
* b3/B3Const64Value.cpp:
(JSC::B3::Const64Value::checkMulConstant):
(JSC::B3::Const64Value::checkNegConstant):
(JSC::B3::Const64Value::divConstant):
* b3/B3Const64Value.h:
* b3/B3LowerToAir.cpp:
(JSC::B3::Air::LowerToAir::appendBinOp):
(JSC::B3::Air::LowerToAir::lower):
* b3/B3Opcode.h:
* b3/B3PatchpointSpecial.cpp:
(JSC::B3::PatchpointSpecial::~PatchpointSpecial):
(JSC::B3::PatchpointSpecial::forEachArg):
(JSC::B3::PatchpointSpecial::isValid):
* b3/B3ReduceStrength.cpp:
* b3/B3StackmapSpecial.cpp:
(JSC::B3::StackmapSpecial::forEachArgImpl):
* b3/B3StackmapSpecial.h:
* b3/B3StackmapValue.cpp:
(JSC::B3::StackmapValue::append):
(JSC::B3::StackmapValue::appendSomeRegister):
(JSC::B3::StackmapValue::setConstrainedChild):
* b3/B3StackmapValue.h:
* b3/B3Validate.cpp:
* b3/B3Value.cpp:
(JSC::B3::Value::checkMulConstant):
(JSC::B3::Value::checkNegConstant):
(JSC::B3::Value::divConstant):
* b3/B3Value.h:
* b3/air/AirAllocateStack.cpp:
(JSC::B3::Air::allocateStack):
* b3/air/AirArg.cpp:
(WTF::printInternal):
* b3/air/AirArg.h:
(JSC::B3::Air::Arg::isAnyUse):
(JSC::B3::Air::Arg::isEarlyUse):
(JSC::B3::Air::Arg::isLateUse):
(JSC::B3::Air::Arg::isDef):
(JSC::B3::Air::Arg::forEachTmp):
(JSC::B3::Air::Arg::isUse): Deleted.
* b3/air/AirGenerate.cpp:
(JSC::B3::Air::generate):
* b3/air/AirIteratedRegisterCoalescing.cpp:
(JSC::B3::Air::IteratedRegisterCoalescingAllocator::build):
(JSC::B3::Air::IteratedRegisterCoalescingAllocator::allocate):
(JSC::B3::Air::IteratedRegisterCoalescingAllocator::InterferenceEdge::hash):
(JSC::B3::Air::IteratedRegisterCoalescingAllocator::InterferenceEdge::dump):
(JSC::B3::Air::addSpillAndFillToProgram):
(JSC::B3::Air::iteratedRegisterCoalescingOnType):
(JSC::B3::Air::iteratedRegisterCoalescing):
* b3/air/AirLiveness.h:
(JSC::B3::Air::Liveness::Liveness):
(JSC::B3::Air::Liveness::LocalCalc::LocalCalc):
(JSC::B3::Air::Liveness::LocalCalc::live):
(JSC::B3::Air::Liveness::LocalCalc::takeLive):
(JSC::B3::Air::Liveness::LocalCalc::execute):
* b3/air/AirOpcode.opcodes:
* b3/air/AirReportUsedRegisters.cpp:
(JSC::B3::Air::reportUsedRegisters):
* b3/air/AirSpillEverything.cpp:
(JSC::B3::Air::spillEverything):
* b3/testb3.cpp:
(JSC::B3::testMulArg):
(JSC::B3::testMulArgStore):
(JSC::B3::testMulAddArg):
(JSC::B3::testMulArgs):
(JSC::B3::testComplex):
(JSC::B3::testSimpleCheck):
(JSC::B3::testCheckLessThan):
(JSC::B3::testCheckMegaCombo):
(JSC::B3::testCheckAddImm):
(JSC::B3::testCheckAddImmCommute):
(JSC::B3::testCheckAddImmSomeRegister):
(JSC::B3::testCheckAdd):
(JSC::B3::testCheckAdd64):
(JSC::B3::testCheckSubImm):
(JSC::B3::testCheckSubBadImm):
(JSC::B3::testCheckSub):
(JSC::B3::testCheckSub64):
(JSC::B3::testCheckNeg):
(JSC::B3::testCheckNeg64):
(JSC::B3::testCheckMul):
(JSC::B3::testCheckMulMemory):
(JSC::B3::testCheckMul2):
(JSC::B3::testCheckMul64):
(JSC::B3::run):

Source/WTF:

Disable my failed attempts at perfect forwarding, since they were incorrect, and caused compile
errors if you tried to pass an argument that bound to lvalue. This shouldn't affect performance of
anything we care about, and performance tests seem to confirm that it's all good.

* wtf/ScopedLambda.h:

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@192540 268f45cc-cd09-0410-ab3c-d52691b4dbfc

34 files changed:
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h
Source/JavaScriptCore/b3/B3CheckSpecial.cpp
Source/JavaScriptCore/b3/B3CheckSpecial.h
Source/JavaScriptCore/b3/B3CheckValue.cpp
Source/JavaScriptCore/b3/B3CheckValue.h
Source/JavaScriptCore/b3/B3Const32Value.cpp
Source/JavaScriptCore/b3/B3Const32Value.h
Source/JavaScriptCore/b3/B3Const64Value.cpp
Source/JavaScriptCore/b3/B3Const64Value.h
Source/JavaScriptCore/b3/B3LowerToAir.cpp
Source/JavaScriptCore/b3/B3Opcode.h
Source/JavaScriptCore/b3/B3PatchpointSpecial.cpp
Source/JavaScriptCore/b3/B3ReduceStrength.cpp
Source/JavaScriptCore/b3/B3StackmapSpecial.cpp
Source/JavaScriptCore/b3/B3StackmapSpecial.h
Source/JavaScriptCore/b3/B3StackmapValue.cpp
Source/JavaScriptCore/b3/B3StackmapValue.h
Source/JavaScriptCore/b3/B3Validate.cpp
Source/JavaScriptCore/b3/B3Value.cpp
Source/JavaScriptCore/b3/B3Value.h
Source/JavaScriptCore/b3/air/AirAllocateStack.cpp
Source/JavaScriptCore/b3/air/AirArg.cpp
Source/JavaScriptCore/b3/air/AirArg.h
Source/JavaScriptCore/b3/air/AirGenerate.cpp
Source/JavaScriptCore/b3/air/AirIteratedRegisterCoalescing.cpp
Source/JavaScriptCore/b3/air/AirLiveness.h
Source/JavaScriptCore/b3/air/AirOpcode.opcodes
Source/JavaScriptCore/b3/air/AirReportUsedRegisters.cpp
Source/JavaScriptCore/b3/air/AirSpillEverything.cpp
Source/JavaScriptCore/b3/testb3.cpp
Source/WTF/ChangeLog
Source/WTF/wtf/ScopedLambda.h
Tools/ReducedFTL/ComplexTest.cpp

index e77671e..7a46932 100644 (file)
@@ -1,5 +1,176 @@
 2015-11-17  Filip Pizlo  <fpizlo@apple.com>
 
+        CheckAdd/Mul should have commutativity optimizations in B3->Air lowering
+        https://bugs.webkit.org/show_bug.cgi?id=151214
+
+        Reviewed by Geoffrey Garen.
+
+        This is an overhaul of CheckAdd/CheckSub/CheckMul that fixes bugs, improves codegen, and
+        simplifies the contract between B3 and its client.
+
+        Previously, the idea was that the operands to the Air BranchAdd/Sub/Mul matched the children of
+        the B3 CheckAdd/Sub/Mul, or at least, that's what the B3::CheckSpecial would make you believe.
+        This meant that B3/Air had to avoid optimizations that would break this protocol. This prevented
+        commutativity optimizations on CheckAdd/Mul and it also prevented strength reduction from
+        CheckMul(x, 2) to CheckAdd(x, x), for example. Those optimizations would break things because the
+        client's Stackmap generation callback was allowed to assume that the first entry in params.reps
+        was the first child. Also, there was a contract between B3 and its client that for CheckAdd/Sub,
+        the client would undo the operation by doing the opposite operation with the params.reps[0] as the
+        source and params.reps[1] as the destination.
+
+        This not only prevented commutativity optimizations, it also led to bugs. Here are two bugs that
+        we had:
+
+        - Add(x, x) would not work. The client would be told to undo the add using %x as both the source
+          and destination. The client would use a sub() instruction. The result would not be x - it would
+          be zero.
+
+        - Mul where the result got coalesced with one of the sources. You can't undo a multiplication, so
+          you need to keep the inputs alive until after the result is computed - i.e. the inputs are late
+          uses. I initially thought I worked around this by using a three-operand form of Mul, but of
+          course that doesn't actually fix the issue.
+
+        This patch fixes these issues comprehensively by introducing the following changes:
+
+        The params.reps corresponding to the first two children of CheckAdd/Sub/Mul and the first child of
+        Check are always garbage: if you want to know the values of those children in the slow path, pass
+        them as additional stackmap children. This makes it clear to the compiler whose values you
+        actually care about, and frees the compiler to reorder and commute the non-stackmap children.
+
+        The "undo" of an Add or Sub is handled internally by B3: the client doesn't have to know anything
+        about undoing. B3 does it. The contract is simply that if a B3::Value* is a stackmap child of a
+        CheckXYZ, then the corresponding entry in params.reps inside the stackmap generator callback will
+        contain the value of that Value*. For Add and Sub, B3 undoes the operation. For Add(x, x), the
+        undo uses the carry flag and some shift magic. For Mul, B3 uses LateUse:
+
+        A new kind of Arg::Role called Arg::LateUse is introduced: This kind of use means that the use
+        happens at the start of the execution of the next instruction. None of the built-in Air opcodes
+        use LateUse, but it is used by B3::CheckSpecial. We use it to implement CheckMul.
+
+        Finally, this change fixes testComplex to actually create many live variables. This revealed a
+        really dumb pathology in allocateStack(), and this patch fixes it. Basically, it's a bad idea to
+        build interference graphs by repeatedly recreating the same cliques.
+
+        * assembler/MacroAssemblerX86Common.h:
+        (JSC::MacroAssemblerX86Common::test32):
+        (JSC::MacroAssemblerX86Common::setCarry):
+        (JSC::MacroAssemblerX86Common::invert):
+        * b3/B3CheckSpecial.cpp:
+        (JSC::B3::Air::numB3Args):
+        (JSC::B3::CheckSpecial::Key::Key):
+        (JSC::B3::CheckSpecial::Key::dump):
+        (JSC::B3::CheckSpecial::CheckSpecial):
+        (JSC::B3::CheckSpecial::forEachArg):
+        (JSC::B3::CheckSpecial::isValid):
+        (JSC::B3::CheckSpecial::generate):
+        (JSC::B3::CheckSpecial::dumpImpl):
+        (JSC::B3::CheckSpecial::deepDumpImpl):
+        * b3/B3CheckSpecial.h:
+        (JSC::B3::CheckSpecial::Key::Key):
+        (JSC::B3::CheckSpecial::Key::operator==):
+        (JSC::B3::CheckSpecial::Key::operator!=):
+        (JSC::B3::CheckSpecial::Key::opcode):
+        (JSC::B3::CheckSpecial::Key::numArgs):
+        (JSC::B3::CheckSpecial::Key::stackmapRole):
+        (JSC::B3::CheckSpecial::Key::hash):
+        * b3/B3CheckValue.cpp:
+        (JSC::B3::CheckValue::~CheckValue):
+        (JSC::B3::CheckValue::convertToAdd):
+        (JSC::B3::CheckValue::CheckValue):
+        * b3/B3CheckValue.h:
+        * b3/B3Const32Value.cpp:
+        (JSC::B3::Const32Value::checkMulConstant):
+        (JSC::B3::Const32Value::checkNegConstant):
+        (JSC::B3::Const32Value::divConstant):
+        * b3/B3Const32Value.h:
+        * b3/B3Const64Value.cpp:
+        (JSC::B3::Const64Value::checkMulConstant):
+        (JSC::B3::Const64Value::checkNegConstant):
+        (JSC::B3::Const64Value::divConstant):
+        * b3/B3Const64Value.h:
+        * b3/B3LowerToAir.cpp:
+        (JSC::B3::Air::LowerToAir::appendBinOp):
+        (JSC::B3::Air::LowerToAir::lower):
+        * b3/B3Opcode.h:
+        * b3/B3PatchpointSpecial.cpp:
+        (JSC::B3::PatchpointSpecial::~PatchpointSpecial):
+        (JSC::B3::PatchpointSpecial::forEachArg):
+        (JSC::B3::PatchpointSpecial::isValid):
+        * b3/B3ReduceStrength.cpp:
+        * b3/B3StackmapSpecial.cpp:
+        (JSC::B3::StackmapSpecial::forEachArgImpl):
+        * b3/B3StackmapSpecial.h:
+        * b3/B3StackmapValue.cpp:
+        (JSC::B3::StackmapValue::append):
+        (JSC::B3::StackmapValue::appendSomeRegister):
+        (JSC::B3::StackmapValue::setConstrainedChild):
+        * b3/B3StackmapValue.h:
+        * b3/B3Validate.cpp:
+        * b3/B3Value.cpp:
+        (JSC::B3::Value::checkMulConstant):
+        (JSC::B3::Value::checkNegConstant):
+        (JSC::B3::Value::divConstant):
+        * b3/B3Value.h:
+        * b3/air/AirAllocateStack.cpp:
+        (JSC::B3::Air::allocateStack):
+        * b3/air/AirArg.cpp:
+        (WTF::printInternal):
+        * b3/air/AirArg.h:
+        (JSC::B3::Air::Arg::isAnyUse):
+        (JSC::B3::Air::Arg::isEarlyUse):
+        (JSC::B3::Air::Arg::isLateUse):
+        (JSC::B3::Air::Arg::isDef):
+        (JSC::B3::Air::Arg::forEachTmp):
+        (JSC::B3::Air::Arg::isUse): Deleted.
+        * b3/air/AirGenerate.cpp:
+        (JSC::B3::Air::generate):
+        * b3/air/AirIteratedRegisterCoalescing.cpp:
+        (JSC::B3::Air::IteratedRegisterCoalescingAllocator::build):
+        (JSC::B3::Air::IteratedRegisterCoalescingAllocator::allocate):
+        (JSC::B3::Air::IteratedRegisterCoalescingAllocator::InterferenceEdge::hash):
+        (JSC::B3::Air::IteratedRegisterCoalescingAllocator::InterferenceEdge::dump):
+        (JSC::B3::Air::addSpillAndFillToProgram):
+        (JSC::B3::Air::iteratedRegisterCoalescingOnType):
+        (JSC::B3::Air::iteratedRegisterCoalescing):
+        * b3/air/AirLiveness.h:
+        (JSC::B3::Air::Liveness::Liveness):
+        (JSC::B3::Air::Liveness::LocalCalc::LocalCalc):
+        (JSC::B3::Air::Liveness::LocalCalc::live):
+        (JSC::B3::Air::Liveness::LocalCalc::takeLive):
+        (JSC::B3::Air::Liveness::LocalCalc::execute):
+        * b3/air/AirOpcode.opcodes:
+        * b3/air/AirReportUsedRegisters.cpp:
+        (JSC::B3::Air::reportUsedRegisters):
+        * b3/air/AirSpillEverything.cpp:
+        (JSC::B3::Air::spillEverything):
+        * b3/testb3.cpp:
+        (JSC::B3::testMulArg):
+        (JSC::B3::testMulArgStore):
+        (JSC::B3::testMulAddArg):
+        (JSC::B3::testMulArgs):
+        (JSC::B3::testComplex):
+        (JSC::B3::testSimpleCheck):
+        (JSC::B3::testCheckLessThan):
+        (JSC::B3::testCheckMegaCombo):
+        (JSC::B3::testCheckAddImm):
+        (JSC::B3::testCheckAddImmCommute):
+        (JSC::B3::testCheckAddImmSomeRegister):
+        (JSC::B3::testCheckAdd):
+        (JSC::B3::testCheckAdd64):
+        (JSC::B3::testCheckSubImm):
+        (JSC::B3::testCheckSubBadImm):
+        (JSC::B3::testCheckSub):
+        (JSC::B3::testCheckSub64):
+        (JSC::B3::testCheckNeg):
+        (JSC::B3::testCheckNeg64):
+        (JSC::B3::testCheckMul):
+        (JSC::B3::testCheckMulMemory):
+        (JSC::B3::testCheckMul2):
+        (JSC::B3::testCheckMul64):
+        (JSC::B3::run):
+
+2015-11-17  Filip Pizlo  <fpizlo@apple.com>
+
         Air should lay out code optimally
         https://bugs.webkit.org/show_bug.cgi?id=150478
 
index 74f85ce..bc8465b 100644 (file)
@@ -1611,6 +1611,11 @@ public:
         set32(x86Condition(cond), dest);
     }
 
+    void setCarry(RegisterID dest)
+    {
+        set32(X86Assembler::ConditionC, dest);
+    }
+
     // Invert a relational condition, e.g. == becomes !=, < becomes >=, etc.
     static RelationalCondition invert(RelationalCondition cond)
     {
index 25f6577..e5d02f6 100644 (file)
@@ -39,9 +39,9 @@ using namespace Air;
 
 namespace {
 
-unsigned numB3Args(Inst& inst)
+unsigned numB3Args(B3::Opcode opcode)
 {
-    switch (inst.origin->opcode()) {
+    switch (opcode) {
     case CheckAdd:
     case CheckSub:
     case CheckMul:
@@ -54,28 +54,40 @@ unsigned numB3Args(Inst& inst)
     }
 }
 
+unsigned numB3Args(Value* value)
+{
+    return numB3Args(value->opcode());
+}
+
+unsigned numB3Args(Inst& inst)
+{
+    return numB3Args(inst.origin);
+}
+
 } // anonymous namespace
 
 CheckSpecial::Key::Key(const Inst& inst)
 {
     m_opcode = inst.opcode;
     m_numArgs = inst.args.size();
+    m_stackmapRole = Arg::Use;
 }
 
 void CheckSpecial::Key::dump(PrintStream& out) const
 {
-    out.print(m_opcode, "(", m_numArgs, ")");
+    out.print(m_opcode, "(", m_numArgs, ",", m_stackmapRole, ")");
 }
 
-CheckSpecial::CheckSpecial(Air::Opcode opcode, unsigned numArgs)
+CheckSpecial::CheckSpecial(Air::Opcode opcode, unsigned numArgs, Arg::Role stackmapRole)
     : m_checkOpcode(opcode)
+    , m_stackmapRole(stackmapRole)
     , m_numCheckArgs(numArgs)
 {
     ASSERT(isTerminal(opcode));
 }
 
 CheckSpecial::CheckSpecial(const CheckSpecial::Key& key)
-    : CheckSpecial(key.opcode(), key.numArgs())
+    : CheckSpecial(key.opcode(), key.numArgs(), key.stackmapRole())
 {
 }
 
@@ -105,7 +117,7 @@ void CheckSpecial::forEachArg(Inst& inst, const ScopedLambda<Inst::EachArgCallba
     Inst hidden = hiddenBranch(inst);
     hidden.forEachArg(callback);
     commitHiddenBranch(inst, hidden);
-    forEachArgImpl(numB3Args(inst), m_numCheckArgs + 1, inst, callback);
+    forEachArgImpl(numB3Args(inst), m_numCheckArgs + 1, inst, m_stackmapRole, callback);
 }
 
 bool CheckSpecial::isValid(Inst& inst)
@@ -130,28 +142,71 @@ CCallHelpers::Jump CheckSpecial::generate(Inst& inst, CCallHelpers& jit, Generat
     ASSERT(value);
 
     Vector<ValueRep> reps;
-    if (isCheckMath(value->opcode())) {
-        if (value->opcode() == CheckMul) {
-            reps.append(repForArg(*context.code, inst.args[2]));
-            reps.append(repForArg(*context.code, inst.args[3]));
-        } else {
-            if (value->opcode() == CheckSub && value->child(0)->isInt(0))
-                reps.append(ValueRep::constant(0));
-            else
-                reps.append(repForArg(*context.code, inst.args[3]));
-            reps.append(repForArg(*context.code, inst.args[2]));
-        }
-    } else {
-        ASSERT(value->opcode() == Check);
-        reps.append(ValueRep::constant(1));
-    }
+    for (unsigned i = numB3Args(value); i--;)
+        reps.append(ValueRep());
 
     appendRepsImpl(context, m_numCheckArgs + 1, inst, reps);
-    
+
+    // Set aside the args that are relevant to undoing the operation. This is because we don't want to
+    // capture all of inst in the closure below.
+    Vector<Arg, 3> args;
+    for (unsigned i = 0; i < m_numCheckArgs; ++i)
+        args.append(inst.args[1 + i]);
+
     context.latePaths.append(
         createSharedTask<GenerationContext::LatePathFunction>(
-            [=] (CCallHelpers& jit, GenerationContext&) {
+            [=] (CCallHelpers& jit, GenerationContext& context) {
                 fail.link(&jit);
+
+                // If necessary, undo the operation.
+                switch (m_checkOpcode) {
+                case BranchAdd32:
+                    if (args[1] == args[2]) {
+                        // This is ugly, but that's fine - we won't have to do this very often.
+                        ASSERT(args[1].isGPR());
+                        GPRReg valueGPR = args[1].gpr();
+                        GPRReg scratchGPR = CCallHelpers::selectScratchGPR(valueGPR);
+                        jit.pushToSave(scratchGPR);
+                        jit.setCarry(scratchGPR);
+                        jit.lshift32(CCallHelpers::TrustedImm32(31), scratchGPR);
+                        jit.urshift32(CCallHelpers::TrustedImm32(1), valueGPR);
+                        jit.or32(scratchGPR, valueGPR);
+                        jit.popToRestore(scratchGPR);
+                        break;
+                    }
+                    Inst(Sub32, nullptr, args[1], args[2]).generate(jit, context);
+                    break;
+                case BranchAdd64:
+                    if (args[1] == args[2]) {
+                        // This is ugly, but that's fine - we won't have to do this very often.
+                        ASSERT(args[1].isGPR());
+                        GPRReg valueGPR = args[1].gpr();
+                        GPRReg scratchGPR = CCallHelpers::selectScratchGPR(valueGPR);
+                        jit.pushToSave(scratchGPR);
+                        jit.setCarry(scratchGPR);
+                        jit.lshift64(CCallHelpers::TrustedImm32(63), scratchGPR);
+                        jit.urshift64(CCallHelpers::TrustedImm32(1), valueGPR);
+                        jit.or64(scratchGPR, valueGPR);
+                        jit.popToRestore(scratchGPR);
+                        break;
+                    }
+                    Inst(Sub64, nullptr, args[1], args[2]).generate(jit, context);
+                    break;
+                case BranchSub32:
+                    Inst(Add32, nullptr, args[1], args[2]).generate(jit, context);
+                    break;
+                case BranchSub64:
+                    Inst(Add64, nullptr, args[1], args[2]).generate(jit, context);
+                    break;
+                case BranchNeg32:
+                    Inst(Neg32, nullptr, args[1]).generate(jit, context);
+                    break;
+                case BranchNeg64:
+                    Inst(Neg64, nullptr, args[1]).generate(jit, context);
+                    break;
+                default:
+                    break;
+                }
                 
                 StackmapGenerationParams params;
                 params.value = value;
@@ -166,7 +221,7 @@ CCallHelpers::Jump CheckSpecial::generate(Inst& inst, CCallHelpers& jit, Generat
 
 void CheckSpecial::dumpImpl(PrintStream& out) const
 {
-    out.print(m_checkOpcode, "(", m_numCheckArgs, ")");
+    out.print(m_checkOpcode, "(", m_numCheckArgs, ",", m_stackmapRole, ")");
 }
 
 void CheckSpecial::deepDumpImpl(PrintStream& out) const
index 5f305fd..2f9747e 100644 (file)
@@ -28,6 +28,7 @@
 
 #if ENABLE(B3_JIT)
 
+#include "AirArg.h"
 #include "AirOpcode.h"
 #include "B3StackmapSpecial.h"
 #include <wtf/HashMap.h>
@@ -56,12 +57,14 @@ public:
     public:
         Key()
             : m_opcode(Air::Nop)
+            , m_stackmapRole(Air::Arg::Use)
             , m_numArgs(0)
         {
         }
         
-        Key(Air::Opcode opcode, unsigned numArgs)
+        Key(Air::Opcode opcode, unsigned numArgs, Air::Arg::Role stackmapRole = Air::Arg::Use)
             : m_opcode(opcode)
+            , m_stackmapRole(stackmapRole)
             , m_numArgs(numArgs)
         {
         }
@@ -71,7 +74,8 @@ public:
         bool operator==(const Key& other) const
         {
             return m_opcode == other.m_opcode
-                && m_numArgs == other.m_numArgs;
+                && m_numArgs == other.m_numArgs
+                && m_stackmapRole == other.m_stackmapRole;
         }
 
         bool operator!=(const Key& other) const
@@ -83,11 +87,13 @@ public:
 
         Air::Opcode opcode() const { return m_opcode; }
         unsigned numArgs() const { return m_numArgs; }
+        Air::Arg::Role stackmapRole() const { return m_stackmapRole; }
 
         void dump(PrintStream& out) const;
 
         Key(WTF::HashTableDeletedValueType)
             : m_opcode(Air::Nop)
+            , m_stackmapRole(Air::Arg::Use)
             , m_numArgs(1)
         {
         }
@@ -100,15 +106,16 @@ public:
         unsigned hash() const
         {
             // Seriously, we don't need to be smart here. It just doesn't matter.
-            return m_opcode + m_numArgs;
+            return m_opcode + m_numArgs + m_stackmapRole;
         }
         
     private:
         Air::Opcode m_opcode;
+        Air::Arg::Role m_stackmapRole;
         unsigned m_numArgs;
     };
     
-    CheckSpecial(Air::Opcode, unsigned numArgs);
+    CheckSpecial(Air::Opcode, unsigned numArgs, Air::Arg::Role stackmapRole = Air::Arg::Use);
     CheckSpecial(const Key&);
     ~CheckSpecial();
 
@@ -133,6 +140,7 @@ protected:
 
 private:
     Air::Opcode m_checkOpcode;
+    Air::Arg::Role m_stackmapRole;
     unsigned m_numCheckArgs;
 };
 
index 09a9e7f..55b791a 100644 (file)
@@ -34,6 +34,12 @@ CheckValue::~CheckValue()
 {
 }
 
+void CheckValue::convertToAdd()
+{
+    RELEASE_ASSERT(opcode() == CheckAdd || opcode() == CheckSub || opcode() == CheckMul);
+    m_opcode = CheckAdd;
+}
+
 // Use this form for CheckAdd, CheckSub, and CheckMul.
 CheckValue::CheckValue(unsigned index, Opcode opcode, Origin origin, Value* left, Value* right)
     : StackmapValue(index, CheckedOpcode, opcode, left->type(), origin)
@@ -41,8 +47,8 @@ CheckValue::CheckValue(unsigned index, Opcode opcode, Origin origin, Value* left
     ASSERT(B3::isInt(type()));
     ASSERT(left->type() == right->type());
     ASSERT(opcode == CheckAdd || opcode == CheckSub || opcode == CheckMul);
-    append(ConstrainedValue(left, ValueRep::SomeRegister));
-    append(ConstrainedValue(right, ValueRep::SomeRegister));
+    append(ConstrainedValue(left, ValueRep::Any));
+    append(ConstrainedValue(right, ValueRep::Any));
 }
 
 // Use this form for Check.
@@ -50,7 +56,7 @@ CheckValue::CheckValue(unsigned index, Opcode opcode, Origin origin, Value* pred
     : StackmapValue(index, CheckedOpcode, opcode, Void, origin)
 {
     ASSERT(opcode == Check);
-    append(predicate);
+    append(ConstrainedValue(predicate, ValueRep::Any));
 }
 
 } } // namespace JSC::B3
index 781eed6..b02a56b 100644 (file)
@@ -49,6 +49,8 @@ public:
 
     ~CheckValue();
 
+    void convertToAdd();
+
 private:
     friend class Procedure;
 
index 195e2f1..286ba12 100644 (file)
@@ -98,6 +98,13 @@ Value* Const32Value::checkMulConstant(Procedure& proc, const Value* other) const
     return proc.add<Const32Value>(origin(), result.unsafeGet());
 }
 
+Value* Const32Value::checkNegConstant(Procedure& proc) const
+{
+    if (m_value == -m_value)
+        return nullptr;
+    return negConstant(proc);
+}
+
 Value* Const32Value::divConstant(Procedure& proc, const Value* other) const
 {
     if (!other->hasInt32())
index 2818b49..a1eee9f 100644 (file)
@@ -48,6 +48,7 @@ public:
     Value* checkAddConstant(Procedure&, const Value* other) const override;
     Value* checkSubConstant(Procedure&, const Value* other) const override;
     Value* checkMulConstant(Procedure&, const Value* other) const override;
+    Value* checkNegConstant(Procedure&) const override;
     Value* divConstant(Procedure&, const Value* other) const override;
     Value* bitAndConstant(Procedure&, const Value* other) const override;
     Value* bitOrConstant(Procedure&, const Value* other) const override;
index 406f6b5..663570e 100644 (file)
@@ -98,6 +98,13 @@ Value* Const64Value::checkMulConstant(Procedure& proc, const Value* other) const
     return proc.add<Const64Value>(origin(), result.unsafeGet());
 }
 
+Value* Const64Value::checkNegConstant(Procedure& proc) const
+{
+    if (m_value == -m_value)
+        return nullptr;
+    return negConstant(proc);
+}
+
 Value* Const64Value::divConstant(Procedure& proc, const Value* other) const
 {
     if (!other->hasInt64())
index 1bc21a9..b6e9342 100644 (file)
@@ -48,6 +48,7 @@ public:
     Value* checkAddConstant(Procedure&, const Value* other) const override;
     Value* checkSubConstant(Procedure&, const Value* other) const override;
     Value* checkMulConstant(Procedure&, const Value* other) const override;
+    Value* checkNegConstant(Procedure&) const override;
     Value* divConstant(Procedure&, const Value* other) const override;
     Value* bitAndConstant(Procedure&, const Value* other) const override;
     Value* bitOrConstant(Procedure&, const Value* other) const override;
index 055c122..6503e20 100644 (file)
@@ -598,6 +598,10 @@ private:
             return;
         }
 
+        // FIXME: If we're going to use a two-operand instruction, and the operand is commutative, we
+        // should coalesce the result with the operand that is killed.
+        // https://bugs.webkit.org/show_bug.cgi?id=151321
+        
         append(relaxedMoveForType(m_value->type()), tmp(left), result);
         append(opcode, tmp(right), result);
     }
@@ -1530,10 +1534,6 @@ private:
 
         case CheckAdd:
         case CheckSub: {
-            // FIXME: Make this support commutativity. That will let us leverage more instruction forms
-            // and it let us commute to maximize coalescing.
-            // https://bugs.webkit.org/show_bug.cgi?id=151214
-
             CheckValue* checkValue = m_value->as<CheckValue>();
 
             Value* left = checkValue->child(0);
@@ -1543,7 +1543,7 @@ private:
 
             // Handle checked negation.
             if (checkValue->opcode() == CheckSub && left->isInt(0)) {
-                append(relaxedMoveForType(checkValue->type()), tmp(right), result);
+                append(Move, tmp(right), result);
 
                 Air::Opcode opcode =
                     opcodeForType(BranchNeg32, BranchNeg64, Air::Oops, checkValue->type());
@@ -1559,39 +1559,67 @@ private:
                 return;
             }
 
-            append(relaxedMoveForType(m_value->type()), tmp(left), result);
+            // FIXME: Use commutativity of CheckAdd to increase the likelihood of coalescing.
+            // https://bugs.webkit.org/show_bug.cgi?id=151321
+
+            append(Move, tmp(left), result);
             
             Air::Opcode opcode = Air::Oops;
-            CheckSpecial* special = nullptr;
             switch (m_value->opcode()) {
             case CheckAdd:
                 opcode = opcodeForType(BranchAdd32, BranchAdd64, Air::Oops, m_value->type());
-                special = ensureCheckSpecial(opcode, 3);
                 break;
             case CheckSub:
                 opcode = opcodeForType(BranchSub32, BranchSub64, Air::Oops, m_value->type());
-                special = ensureCheckSpecial(opcode, 3);
                 break;
             default:
                 RELEASE_ASSERT_NOT_REACHED();
                 break;
             }
 
-            Inst inst(Patch, checkValue, Arg::special(special));
-
-            inst.args.append(Arg::resCond(MacroAssembler::Overflow));
-
             // FIXME: It would be great to fuse Loads into these. We currently don't do it because the
             // rule for stackmaps is that all addresses are just stack addresses. Maybe we could relax
             // this rule here.
             // https://bugs.webkit.org/show_bug.cgi?id=151228
-            
+
+            Arg source;
             if (imm(right) && isValidForm(opcode, Arg::ResCond, Arg::Imm, Arg::Tmp))
-                inst.args.append(imm(right));
+                source = imm(right);
             else
-                inst.args.append(tmp(right));
-            inst.args.append(result);
+                source = tmp(right);
+
+            // There is a really hilarious case that arises when we do BranchAdd32(%x, %x). We won't emit
+            // such code, but the coalescing in our register allocator also does copy propagation, so
+            // although we emit:
+            //
+            //     Move %tmp1, %tmp2
+            //     BranchAdd32 %tmp1, %tmp2
+            //
+            // The register allocator may turn this into:
+            //
+            //     BranchAdd32 %rax, %rax
+            //
+            // Currently we handle this by ensuring that even this kind of addition can be undone. We can
+            // undo it by using the carry flag. It's tempting to get rid of that code and just "fix" this
+            // here by forcing LateUse on the stackmap. If we did that unconditionally, we'd lose a lot of
+            // performance. So it's tempting to do it only if left == right. But that creates an awkward
+            // constraint on Air: it means that Air would not be allowed to do any copy propagation.
+            // Notice that the %rax,%rax situation happened after Air copy-propagated the Move we are
+            // emitting. We know that copy-propagating over that Move causes add-to-self. But what if we
+            // emit something like a Move - or even do other kinds of copy-propagation on tmp's -
+            // somewhere else in this code. The add-to-self situation may only emerge after some other Air
+            // optimizations remove other Move's or identity-like operations. That's why we don't use
+            // LateUse here to take care of add-to-self.
+            
+            CheckSpecial* special = ensureCheckSpecial(opcode, 3);
             
+            Inst inst(Patch, checkValue, Arg::special(special));
+
+            inst.args.append(Arg::resCond(MacroAssembler::Overflow));
+
+            inst.args.append(source);
+            inst.args.append(result);
+
             fillStackmap(inst, checkValue, 2);
 
             m_insts.last().append(WTF::move(inst));
@@ -1599,10 +1627,6 @@ private:
         }
 
         case CheckMul: {
-            // Handle multiplication separately. Multiplication is hard because we have to preserve
-            // both inputs. This requires using three-operand multiplication, even on platforms where
-            // this requires an additional Move.
-
             CheckValue* checkValue = m_value->as<CheckValue>();
 
             Value* left = checkValue->child(0);
@@ -1612,14 +1636,15 @@ private:
 
             Air::Opcode opcode =
                 opcodeForType(BranchMul32, BranchMul64, Air::Oops, checkValue->type());
-            CheckSpecial* special = ensureCheckSpecial(opcode, 4);
+            CheckSpecial* special = ensureCheckSpecial(opcode, 3, Arg::LateUse);
 
             // FIXME: Handle immediates.
             // https://bugs.webkit.org/show_bug.cgi?id=151230
 
+            append(Move, tmp(left), result);
+
             Inst inst(Patch, checkValue, Arg::special(special));
             inst.args.append(Arg::resCond(MacroAssembler::Overflow));
-            inst.args.append(tmp(left));
             inst.args.append(tmp(right));
             inst.args.append(result);
 
index 4ab2e46..0dbbfe3 100644 (file)
@@ -163,6 +163,10 @@ enum Opcode : int16_t {
     // callbacks. Yet, this transformation is valid even if they are different because we know that
     // after the first CheckAdd executes, the second CheckAdd could not have possibly taken slow
     // path. Therefore, the second CheckAdd's callback is irrelevant.
+    //
+    // Note that the first two children of these operations have ValueRep's, both as input constraints and
+    // in the reps provided to the generator. The output constraints could be anything, and should not be
+    // inspected for meaning. If you want to capture the values of the inputs, use stackmap arguments.
     CheckAdd,
     CheckSub,
     CheckMul,
@@ -170,7 +174,8 @@ enum Opcode : int16_t {
     // Check that side-exits. Use the CheckValue class. Like CheckAdd and friends, this has a
     // stackmap with a generation callback. This takes an int argument that this branches on, with
     // full branch fusion in the instruction selector. A true value jumps to the generator's slow
-    // path.
+    // path. Note that the predicate child is has both an input and output ValueRep. The input constraint
+    // must be Any, and the output could be anything.
     Check,
 
     // SSA support, in the style of DFG SSA.
index 97108ea..f779bb7 100644 (file)
@@ -43,16 +43,18 @@ PatchpointSpecial::~PatchpointSpecial()
 {
 }
 
-void PatchpointSpecial::forEachArg(
-    Inst& inst, const ScopedLambda<Inst::EachArgCallback>& callback)
+void PatchpointSpecial::forEachArg(Inst& inst, const ScopedLambda<Inst::EachArgCallback>& callback)
 {
+    // FIXME: Allow B3 Patchpoints to specify LateUse.
+    // https://bugs.webkit.org/show_bug.cgi?id=151335
+    
     if (inst.origin->type() == Void) {
-        forEachArgImpl(0, 1, inst, callback);
+        forEachArgImpl(0, 1, inst, Arg::Use, callback);
         return;
     }
 
     callback(inst.args[1], Arg::Def, inst.origin->airType());
-    forEachArgImpl(0, 2, inst, callback);
+    forEachArgImpl(0, 2, inst, Arg::Use, callback);
 }
 
 bool PatchpointSpecial::isValid(Inst& inst)
index bf8758b..a441048 100644 (file)
@@ -633,17 +633,10 @@ private:
             break;
 
         case CheckAdd:
-            // FIXME: Handle commutativity.
-            // https://bugs.webkit.org/show_bug.cgi?id=151214
-            
             if (replaceWithNewValue(m_value->child(0)->checkAddConstant(m_proc, m_value->child(1))))
                 break;
-            
-            if (m_value->child(0)->isInt(0)) {
-                m_value->replaceWithIdentity(m_value->child(1));
-                m_changed = true;
-                break;
-            }
+
+            handleCommutativity();
             
             if (m_value->child(1)->isInt(0)) {
                 m_value->replaceWithIdentity(m_value->child(0));
@@ -661,30 +654,43 @@ private:
                 m_changed = true;
                 break;
             }
+
+            if (Value* negatedConstant = m_value->child(1)->checkNegConstant(m_proc)) {
+                m_insertionSet.insertValue(m_index, negatedConstant);
+                m_value->as<CheckValue>()->convertToAdd();
+                m_value->child(1) = negatedConstant;
+                m_changed = true;
+                break;
+            }
             break;
 
         case CheckMul:
-            // FIXME: Handle commutativity.
-            // https://bugs.webkit.org/show_bug.cgi?id=151214
-            
             if (replaceWithNewValue(m_value->child(0)->checkMulConstant(m_proc, m_value->child(1))))
                 break;
 
-            if (m_value->child(0)->isInt(1)) {
-                m_value->replaceWithIdentity(m_value->child(1));
-                m_changed = true;
-                break;
-            }
-            
-            if (m_value->child(1)->isInt(1)) {
-                m_value->replaceWithIdentity(m_value->child(0));
-                m_changed = true;
-                break;
-            }
+            handleCommutativity();
 
-            if (m_value->child(0)->isInt(0) || m_value->child(1)->isInt(0)) {
-                replaceWithNewValue(m_proc.addIntConstant(m_value, 0));
-                break;
+            if (m_value->child(1)->hasInt()) {
+                bool modified = true;
+                switch (m_value->child(1)->asInt()) {
+                case 0:
+                    replaceWithNewValue(m_proc.addIntConstant(m_value, 0));
+                    break;
+                case 1:
+                    m_value->replaceWithIdentity(m_value->child(0));
+                    m_changed = true;
+                    break;
+                case 2:
+                    m_value->as<CheckValue>()->convertToAdd();
+                    m_value->child(1) = m_value->child(0);
+                    m_changed = true;
+                    break;
+                default:
+                    modified = false;
+                    break;
+                }
+                if (modified)
+                    break;
             }
             break;
 
@@ -761,6 +767,10 @@ private:
     // If we decide that value2 coming first is the canonical ordering.
     void handleCommutativity()
     {
+        // Note that we have commutative operations that take more than two children. Those operations may
+        // commute their first two children while leaving the rest unaffected.
+        ASSERT(m_value->numChildren() >= 2);
+        
         // Leave it alone if the right child is a constant.
         if (m_value->child(1)->isConstant())
             return;
index 9d641a4..598ca12 100644 (file)
@@ -65,7 +65,7 @@ const RegisterSet& StackmapSpecial::extraClobberedRegs(Inst& inst)
 
 void StackmapSpecial::forEachArgImpl(
     unsigned numIgnoredB3Args, unsigned numIgnoredAirArgs,
-    Inst& inst, const ScopedLambda<Inst::EachArgCallback>& callback)
+    Inst& inst, Arg::Role role, const ScopedLambda<Inst::EachArgCallback>& callback)
 {
     Value* value = inst.origin;
 
@@ -78,7 +78,7 @@ void StackmapSpecial::forEachArgImpl(
         Arg& arg = inst.args[i + numIgnoredAirArgs];
         Value* child = value->child(i + numIgnoredB3Args);
 
-        callback(arg, Arg::Use, Arg::typeForB3Type(child->type()));
+        callback(arg, role, Arg::typeForB3Type(child->type()));
     }
 }
 
index c09d999..93f0e6b 100644 (file)
@@ -28,6 +28,7 @@
 
 #if ENABLE(B3_JIT)
 
+#include "AirArg.h"
 #include "AirSpecial.h"
 #include "B3ValueRep.h"
 
@@ -53,7 +54,7 @@ protected:
 
     void forEachArgImpl(
         unsigned numIgnoredB3Args, unsigned numIgnoredAirArgs,
-        Air::Inst&, const ScopedLambda<Air::Inst::EachArgCallback>&);
+        Air::Inst&, Air::Arg::Role role, const ScopedLambda<Air::Inst::EachArgCallback>&);
     bool isValidImpl(
         unsigned numIgnoredB3Args, unsigned numIgnoredAirArgs,
         Air::Inst&);
index 7372110..1a3658f 100644 (file)
@@ -48,6 +48,11 @@ void StackmapValue::append(const ConstrainedValue& constrainedValue)
     m_reps.append(constrainedValue.rep());
 }
 
+void StackmapValue::appendSomeRegister(Value* value)
+{
+    append(ConstrainedValue(value, ValueRep::SomeRegister));
+}
+
 void StackmapValue::setConstrainedChild(unsigned index, const ConstrainedValue& constrainedValue)
 {
     child(index) = constrainedValue.value();
index aaa9030..baf71a3 100644 (file)
@@ -77,6 +77,10 @@ public:
     // children().append(). That will work fine, but it's not recommended.
     void append(const ConstrainedValue&);
 
+    // This is a helper for something you might do a lot of: append a value that should be constrained
+    // to SomeRegister.
+    void appendSomeRegister(Value*);
+
     const Vector<ValueRep>& reps() const { return m_reps; }
 
     void clobber(const RegisterSet& set)
index b70ea02..79c9968 100644 (file)
@@ -278,6 +278,8 @@ public:
                 VALIDATE(value->numChildren() >= 2, ("At ", *value));
                 VALIDATE(isInt(value->child(0)->type()), ("At ", *value));
                 VALIDATE(isInt(value->child(1)->type()), ("At ", *value));
+                VALIDATE(value->as<StackmapValue>()->constrainedChild(0).rep() == ValueRep::Any, ("At ", *value));
+                VALIDATE(value->as<StackmapValue>()->constrainedChild(1).rep() == ValueRep::Any, ("At ", *value));
                 validateStackmap(value);
                 break;
             case Check:
index 5485813..f15e82f 100644 (file)
@@ -154,6 +154,11 @@ Value* Value::checkMulConstant(Procedure&, const Value*) const
     return nullptr;
 }
 
+Value* Value::checkNegConstant(Procedure&) const
+{
+    return nullptr;
+}
+
 Value* Value::divConstant(Procedure&, const Value*) const
 {
     return nullptr;
index 2dd34e7..aa99f5f 100644 (file)
@@ -41,6 +41,7 @@
 namespace JSC { namespace B3 {
 
 class BasicBlock;
+class CheckValue;
 class Procedure;
 
 class JS_EXPORT_PRIVATE Value {
@@ -118,6 +119,7 @@ public:
     virtual Value* checkAddConstant(Procedure&, const Value* other) const;
     virtual Value* checkSubConstant(Procedure&, const Value* other) const;
     virtual Value* checkMulConstant(Procedure&, const Value* other) const;
+    virtual Value* checkNegConstant(Procedure&) const;
     virtual Value* divConstant(Procedure&, const Value* other) const; // This chooses ChillDiv semantics for integers.
     virtual Value* bitAndConstant(Procedure&, const Value* other) const;
     virtual Value* bitOrConstant(Procedure&, const Value* other) const;
@@ -272,6 +274,8 @@ protected:
     }
 
 private:
+    friend class CheckValue; // CheckValue::convertToAdd() modifies m_opcode.
+    
     static Type typeFor(Opcode, Value* firstChild);
 
     // This group of fields is arranged to fit in 64 bits.
index d7ce132..205b585 100644 (file)
@@ -146,38 +146,22 @@ void allocateStack(Code& code)
         auto interfere = [&] (Inst& inst) {
             if (verbose)
                 dataLog("Interfering: ", pointerListDump(localCalc.live()), "\n");
-            
-            // Form a clique of stack slots that interfere. First find the list of stack slots
-            // that are live right now.
-            slots.resize(0);
-            for (StackSlot* slot : localCalc.live()) {
-                if (slot->kind() == StackSlotKind::Anonymous)
-                    slots.append(slot);
-            }
 
-            // We mustn't mandate that the input code is optimal. Therefore, it may have dead stores
-            // to the stack. We need to treat these as interfering.
             inst.forEachArg(
                 [&] (Arg& arg, Arg::Role role, Arg::Type) {
-                    if (Arg::isDef(role) && arg.isStack()) {
-                        StackSlot* slot = arg.stackSlot();
-                        if (slot->kind() == StackSlotKind::Anonymous
-                            && !localCalc.live().contains(slot))
-                            slots.append(slot);
+                    if (!Arg::isDef(role))
+                        return;
+                    if (!arg.isStack())
+                        return;
+                    StackSlot* slot = arg.stackSlot();
+                    if (slot->kind() != StackSlotKind::Anonymous)
+                        return;
+
+                    for (StackSlot* otherSlot : localCalc.live()) {
+                        interference[slot].add(otherSlot);
+                        interference[otherSlot].add(slot);
                     }
                 });
-            
-            if (verbose)
-                dataLog("    Slots: ", pointerListDump(slots), "\n");
-            for (unsigned i = 0; i < slots.size(); ++i) {
-                StackSlot* outer = slots[i];
-                for (unsigned j = i + 1; j < slots.size(); ++j) {
-                    StackSlot* inner = slots[j];
-
-                    interference[inner].add(outer);
-                    interference[outer].add(inner);
-                }
-            }
         };
 
         for (unsigned instIndex = block->size(); instIndex--;) {
@@ -185,7 +169,7 @@ void allocateStack(Code& code)
                 dataLog("Analyzing: ", block->at(instIndex), "\n");
             Inst& inst = block->at(instIndex);
             interfere(inst);
-            localCalc.execute(inst);
+            localCalc.execute(instIndex);
         }
         Inst nop;
         interfere(nop);
index 4b93ccd..d5c1ab2 100644 (file)
@@ -181,6 +181,9 @@ void printInternal(PrintStream& out, Arg::Role role)
     case Arg::UseAddr:
         out.print("UseAddr");
         return;
+    case Arg::LateUse:
+        out.print("LateUse");
+        return;
     }
 
     RELEASE_ASSERT_NOT_REACHED();
index 57ad5c0..15a8f0e 100644 (file)
@@ -83,6 +83,10 @@ public:
         // always escaping, and Stack is presumed to be always escaping if it's Locked.
         Use,
 
+        // LateUse means that the Inst will read from this value after doing its Def's. Note that LateUse
+        // on an Addr or Index still means Use on the internal temporaries.
+        LateUse,
+
         // Def means that the Inst will write to this value after doing everything else.
         //
         // For Tmp: The Inst will write to this Tmp.
@@ -123,12 +127,13 @@ public:
     };
 
     // Returns true if the Role implies that the Inst will Use the Arg. It's deliberately false for
-    // UseAddr, since isUse() for an Arg::addr means that we are loading from the address.
-    static bool isUse(Role role)
+    // UseAddr, since isAnyUse() for an Arg::addr means that we are loading from the address.
+    static bool isAnyUse(Role role)
     {
         switch (role) {
         case Use:
         case UseDef:
+        case LateUse:
             return true;
         case Def:
         case UseAddr:
@@ -136,12 +141,33 @@ public:
         }
     }
 
+    // Returns true if the Role implies that the Inst will Use the Arg before doing anything else.
+    static bool isEarlyUse(Role role)
+    {
+        switch (role) {
+        case Use:
+        case UseDef:
+            return true;
+        case Def:
+        case UseAddr:
+        case LateUse:
+            return false;
+        }
+    }
+
+    // Returns true if the Role implies that the Inst will Use the Arg after doing everything else.
+    static bool isLateUse(Role role)
+    {
+        return role == LateUse;
+    }
+
     // Returns true if the Role implies that the Inst will Def the Arg.
     static bool isDef(Role role)
     {
         switch (role) {
         case Use:
         case UseAddr:
+        case LateUse:
             return false;
         case Def:
         case UseDef:
@@ -666,7 +692,7 @@ public:
     {
         switch (m_kind) {
         case Tmp:
-            ASSERT(isUse(argRole) || isDef(argRole));
+            ASSERT(isAnyUse(argRole) || isDef(argRole));
             functor(m_base, argRole, argType);
             break;
         case Addr:
index bd655e8..2dad43b 100644 (file)
@@ -72,7 +72,10 @@ void generate(Code& code, CCallHelpers& jit)
     // After this phase, every Tmp has a reg.
     //
     // For debugging, you can use spillEverything() to put everything to the stack between each Inst.
-    iteratedRegisterCoalescing(code);
+    if (false)
+        spillEverything(code);
+    else
+        iteratedRegisterCoalescing(code);
 
     // Prior to this point the prologue and epilogue is implicit. This makes it explicit. It also
     // does things like identify which callee-saves we're using and saves them.
index 3ca2961..e99571c 100644 (file)
 #include "AirLiveness.h"
 #include "AirPhaseScope.h"
 #include "AirRegisterPriority.h"
+#include <wtf/ListDump.h>
 #include <wtf/ListHashSet.h>
 
 namespace JSC { namespace B3 { namespace Air {
 
 static bool debug = false;
 static bool traceDebug = false;
+static bool reportStats = false;
 
 template<Arg::Type type>
 struct MoveInstHelper;
@@ -168,7 +170,7 @@ public:
                 if (Arg::isDef(role))
                     defTmp = argTmp;
                 else {
-                    ASSERT(Arg::isUse(role));
+                    ASSERT(Arg::isEarlyUse(role));
                     useTmp = argTmp;
                 }
             });
@@ -204,6 +206,7 @@ public:
         makeWorkList();
 
         if (debug) {
+            dataLog("Interference: ", listDump(m_interferenceEdges), "\n");
             dumpInterferenceGraphInDot(WTF::dataFile());
             dataLog("Initial work list\n");
             dumpWorkLists(WTF::dataFile());
@@ -742,6 +745,11 @@ private:
             return WTF::IntHash<uint64_t>::hash(m_value);
         }
 
+        void dump(PrintStream& out) const
+        {
+            out.print(first(), "<=>", second());
+        }
+
     private:
         uint64_t m_value { 0 };
     };
@@ -951,7 +959,7 @@ static void addSpillAndFillToProgram(Code& code, const HashSet<Tmp>& spilledTmp,
                 Arg arg = Arg::stack(stackSlotEntry->value);
                 Opcode move = type == Arg::GP ? Move : MoveDouble;
 
-                if (Arg::isUse(role)) {
+                if (Arg::isAnyUse(role)) {
                     Tmp newTmp = code.newTmp(type);
                     insertionSet.insert(instIndex, move, inst.origin, arg, newTmp);
                     tmp = newTmp;
@@ -968,9 +976,10 @@ static void addSpillAndFillToProgram(Code& code, const HashSet<Tmp>& spilledTmp,
 }
 
 template<Arg::Type type>
-static void iteratedRegisterCoalescingOnType(Code& code, HashSet<Tmp>& unspillableTmps)
+static void iteratedRegisterCoalescingOnType(Code& code, HashSet<Tmp>& unspillableTmps, unsigned& numIterations)
 {
     while (true) {
+        numIterations++;
         IteratedRegisterCoalescingAllocator<type> allocator(code, unspillableTmps);
         Liveness<Tmp> liveness(code);
         for (BasicBlock* block : code) {
@@ -978,7 +987,7 @@ static void iteratedRegisterCoalescingOnType(Code& code, HashSet<Tmp>& unspillab
             for (unsigned instIndex = block->size(); instIndex--;) {
                 Inst& inst = block->at(instIndex);
                 allocator.build(inst, localCalc);
-                localCalc.execute(inst);
+                localCalc.execute(instIndex);
             }
         }
 
@@ -997,12 +1006,14 @@ void iteratedRegisterCoalescing(Code& code)
 
     bool gpIsColored = false;
     bool fpIsColored = false;
+    unsigned numIterations = 0;
 
     HashSet<Tmp> unspillableGPs;
     HashSet<Tmp> unspillableFPs;
 
     // First we run both allocator together as long as they both spill.
     while (!gpIsColored && !fpIsColored) {
+        numIterations++;
         IteratedRegisterCoalescingAllocator<Arg::GP> gpAllocator(code, unspillableGPs);
         IteratedRegisterCoalescingAllocator<Arg::FP> fpAllocator(code, unspillableFPs);
 
@@ -1017,7 +1028,7 @@ void iteratedRegisterCoalescing(Code& code)
                 gpAllocator.build(inst, localCalc);
                 fpAllocator.build(inst, localCalc);
 
-                localCalc.execute(inst);
+                localCalc.execute(instIndex);
             }
         }
 
@@ -1038,9 +1049,12 @@ void iteratedRegisterCoalescing(Code& code)
     };
 
     if (!gpIsColored)
-        iteratedRegisterCoalescingOnType<Arg::GP>(code, unspillableGPs);
+        iteratedRegisterCoalescingOnType<Arg::GP>(code, unspillableGPs, numIterations);
     if (!fpIsColored)
-        iteratedRegisterCoalescingOnType<Arg::FP>(code, unspillableFPs);
+        iteratedRegisterCoalescingOnType<Arg::FP>(code, unspillableFPs, numIterations);
+
+    if (reportStats)
+        dataLog("Num iterations = ", numIterations, "\n");
 }
 
 } } } // namespace JSC::B3::Air
index f8dc29c..a40c188 100644 (file)
@@ -49,6 +49,16 @@ public:
         m_liveAtHead.resize(code.size());
         m_liveAtTail.resize(code.size());
 
+        // The liveAtTail of each block automatically contains the LateUse's of the terminal.
+        for (BasicBlock* block : code) {
+            HashSet<Thing>& live = m_liveAtTail[block];
+            block->last().forEach<Thing>(
+                [&] (Thing& thing, Arg::Role role, Arg::Type) {
+                    if (Arg::isLateUse(role))
+                        live.add(thing);
+                });
+        }
+
         IndexSet<BasicBlock> seen;
 
         bool changed = true;
@@ -61,7 +71,7 @@ public:
                     continue;
                 LocalCalc localCalc(*this, block);
                 for (size_t instIndex = block->size(); instIndex--;)
-                    localCalc.execute(block->at(instIndex));
+                    localCalc.execute(instIndex);
                 bool firstTime = seen.add(block);
                 if (!firstTime && localCalc.live() == m_liveAtHead[block])
                     continue;
@@ -90,14 +100,17 @@ public:
     public:
         LocalCalc(Liveness& liveness, BasicBlock* block)
             : m_live(liveness.liveAtTail(block))
+            , m_block(block)
         {
         }
 
         const HashSet<Thing>& live() const { return m_live; }
         HashSet<Thing>&& takeLive() { return WTF::move(m_live); }
 
-        void execute(Inst& inst)
+        void execute(unsigned instIndex)
         {
+            Inst& inst = m_block->at(instIndex);
+            
             // First handle def's.
             inst.forEach<Thing>(
                 [this] (Thing& arg, Arg::Role role, Arg::Type) {
@@ -107,20 +120,34 @@ public:
                         m_live.remove(arg);
                 });
 
-            // Finally handle use's.
+            // Then handle use's.
             inst.forEach<Thing>(
                 [this] (Thing& arg, Arg::Role role, Arg::Type) {
                     if (!isAlive(arg))
                         return;
-                    if (Arg::isUse(role))
+                    if (Arg::isEarlyUse(role))
                         m_live.add(arg);
                 });
+
+            // And finally, handle the late use's of the previous instruction.
+            if (instIndex) {
+                Inst& prevInst = m_block->at(instIndex - 1);
+
+                prevInst.forEach<Thing>(
+                    [this] (Thing& arg, Arg::Role role, Arg::Type) {
+                        if (!Arg::isLateUse(role))
+                            return;
+                        if (isAlive(arg))
+                            m_live.add(arg);
+                    });
+            }
         }
 
     private:
         HashSet<Thing> m_live;
+        BasicBlock* m_block;
     };
-    
+
 private:
     IndexMap<BasicBlock, HashSet<Thing>> m_liveAtHead;
     IndexMap<BasicBlock, HashSet<Thing>> m_liveAtTail;
index 526caa7..cc98382 100644 (file)
@@ -331,15 +331,9 @@ BranchMul32 U:G, U:G, UD:G /branch
     ResCond, Tmp, Tmp
     ResCond, Addr, Tmp
 
-BranchMul32 U:G, U:G, U:G, D:G /branch
-    ResCond, Tmp, Tmp, Tmp
-
 BranchMul64 U:G, U:G, UD:G /branch
     ResCond, Tmp, Tmp
 
-BranchMul64 U:G, U:G, U:G, D:G /branch
-    ResCond, Tmp, Tmp, Tmp
-
 BranchSub32 U:G, U:G, UD:G /branch
     ResCond, Tmp, Tmp
     ResCond, Imm, Tmp
index 12b2033..d6fdf7d 100644 (file)
@@ -56,7 +56,7 @@ void reportUsedRegisters(Code& code)
                 }
                 inst.reportUsedRegisters(registerSet);
             }
-            localCalc.execute(inst);
+            localCalc.execute(instIndex);
         }
     }
 }
index 9737a66..3dbf00e 100644 (file)
@@ -69,7 +69,7 @@ void spillEverything(Code& code)
         for (unsigned instIndex = block->size(); instIndex--;) {
             Inst& inst = block->at(instIndex);
             setUsedRegisters(instIndex + 1, inst);
-            localCalc.execute(inst);
+            localCalc.execute(instIndex);
         }
 
         Inst nop;
@@ -140,6 +140,7 @@ void spillEverything(Code& code)
                         }
                         break;
                     case Arg::UseDef:
+                    case Arg::LateUse:
                         for (Reg reg : regsInPriorityOrder(type)) {
                             if (!setBefore.get(reg) && !setAfter.get(reg)) {
                                 setAfter.set(reg);
@@ -160,7 +161,7 @@ void spillEverything(Code& code)
 
                     Opcode move = type == Arg::GP ? Move : MoveDouble;
 
-                    if (Arg::isUse(role)) {
+                    if (Arg::isAnyUse(role)) {
                         insertionSet.insert(
                             instIndex, move, inst.origin, arg, tmp);
                     }
index 2a2cf0e..e8de623 100644 (file)
@@ -297,7 +297,8 @@ void testMulArg(int a)
 {
     Procedure proc;
     BasicBlock* root = proc.addBlock();
-    Value* value = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
+    Value* value = root->appendNew<Value>(
+        proc, Trunc, Origin(), root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0));
     root->appendNew<ControlValue>(
         proc, Return, Origin(),
         root->appendNew<Value>(proc, Mul, Origin(), value, value));
@@ -305,6 +306,51 @@ void testMulArg(int a)
     CHECK(compileAndRun<int>(proc, a) == a * a);
 }
 
+void testMulArgStore(int a)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+
+    int mulSlot;
+    int valueSlot;
+    
+    Value* value = root->appendNew<Value>(
+        proc, Trunc, Origin(),
+        root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0));
+    Value* mul = root->appendNew<Value>(proc, Mul, Origin(), value, value);
+
+    root->appendNew<MemoryValue>(
+        proc, Store, Origin(), value,
+        root->appendNew<ConstPtrValue>(proc, Origin(), &valueSlot));
+    root->appendNew<MemoryValue>(
+        proc, Store, Origin(), mul,
+        root->appendNew<ConstPtrValue>(proc, Origin(), &mulSlot));
+
+    root->appendNew<ControlValue>(
+        proc, Return, Origin(), root->appendNew<Const32Value>(proc, Origin(), 0));
+
+    CHECK(!compileAndRun<int>(proc, a));
+    CHECK(mulSlot == a * a);
+    CHECK(valueSlot == a);
+}
+
+void testMulAddArg(int a)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    Value* value = root->appendNew<Value>(
+        proc, Trunc, Origin(),
+        root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0));
+    root->appendNew<ControlValue>(
+        proc, Return, Origin(),
+        root->appendNew<Value>(
+            proc, Add, Origin(),
+            root->appendNew<Value>(proc, Mul, Origin(), value, value),
+            value));
+
+    CHECK(compileAndRun<int>(proc, a) == a * a + a);
+}
+
 void testMulArgs(int a, int b)
 {
     Procedure proc;
@@ -2754,9 +2800,9 @@ void testComplex(unsigned numVars, unsigned numConstructs)
     for (unsigned i = 0; i < numConstructs; ++i) {
         if (i & 1) {
             // Control flow diamond.
-            unsigned predicateVarIndex = (i >> 1) % numVars;
-            unsigned thenIncVarIndex = ((i >> 1) + 1) % numVars;
-            unsigned elseIncVarIndex = ((i >> 1) + 2) % numVars;
+            unsigned predicateVarIndex = ((i >> 1) + 2) % numVars;
+            unsigned thenIncVarIndex = ((i >> 1) + 0) % numVars;
+            unsigned elseIncVarIndex = ((i >> 1) + 1) % numVars;
 
             BasicBlock* thenBlock = proc.addBlock();
             BasicBlock* elseBlock = proc.addBlock();
@@ -2801,7 +2847,7 @@ void testComplex(unsigned numVars, unsigned numConstructs)
             BasicBlock* loopSkip = proc.addBlock();
             BasicBlock* continuation = proc.addBlock();
             
-            Value* startIndex = vars[(i >> 1) % numVars];
+            Value* startIndex = vars[((i >> 1) + 1) % numVars];
             Value* startSum = current->appendNew<Const32Value>(proc, Origin(), 0);
             current->appendNew<ControlValue>(
                 proc, Branch, Origin(), startIndex,
@@ -2855,7 +2901,7 @@ void testComplex(unsigned numVars, unsigned numConstructs)
             skipSum->setPhi(finalSum);
 
             current = continuation;
-            vars[((i >> 1) + 1) % numVars] = finalSum;
+            vars[((i >> 1) + 0) % numVars] = finalSum;
         }
     }
 
@@ -3183,8 +3229,6 @@ void testSimpleCheck()
     check->setGenerator(
         [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
             CHECK(params.reps.size() == 1);
-            CHECK(params.reps[0].isConstant());
-            CHECK(params.reps[0].value() == 1);
 
             // This should always work because a function this simple should never have callee
             // saves.
@@ -3216,8 +3260,6 @@ void testCheckLessThan()
     check->setGenerator(
         [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
             CHECK(params.reps.size() == 1);
-            CHECK(params.reps[0].isConstant());
-            CHECK(params.reps[0].value() == 1);
 
             // This should always work because a function this simple should never have callee
             // saves.
@@ -3263,8 +3305,6 @@ void testCheckMegaCombo()
     check->setGenerator(
         [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
             CHECK(params.reps.size() == 1);
-            CHECK(params.reps[0].isConstant());
-            CHECK(params.reps[0].value() == 1);
 
             // This should always work because a function this simple should never have callee
             // saves.
@@ -3299,14 +3339,15 @@ void testCheckAddImm()
         root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0));
     Value* arg2 = root->appendNew<Const32Value>(proc, Origin(), 42);
     CheckValue* checkAdd = root->appendNew<CheckValue>(proc, CheckAdd, Origin(), arg1, arg2);
+    checkAdd->append(arg1);
+    checkAdd->append(arg2);
     checkAdd->setGenerator(
         [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
-            CHECK(params.reps.size() == 2);
-            CHECK(params.reps[0].isGPR());
-            CHECK(params.reps[1].isConstant());
-            CHECK(params.reps[1].value() == 42);
-            jit.sub32(CCallHelpers::TrustedImm32(42), params.reps[0].gpr());
-            jit.convertInt32ToDouble(params.reps[0].gpr(), FPRInfo::fpRegT0);
+            CHECK(params.reps.size() == 4);
+            CHECK(params.reps[2].isGPR());
+            CHECK(params.reps[3].isConstant());
+            CHECK(params.reps[3].value() == 42);
+            jit.convertInt32ToDouble(params.reps[2].gpr(), FPRInfo::fpRegT0);
             jit.convertInt32ToDouble(CCallHelpers::TrustedImm32(42), FPRInfo::fpRegT1);
             jit.addDouble(FPRInfo::fpRegT1, FPRInfo::fpRegT0);
             jit.emitFunctionEpilogue();
@@ -3324,6 +3365,75 @@ void testCheckAddImm()
     CHECK(invoke<double>(*code, 2147483647) == 2147483689.0);
 }
 
+void testCheckAddImmCommute()
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    Value* arg1 = root->appendNew<Value>(
+        proc, Trunc, Origin(),
+        root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0));
+    Value* arg2 = root->appendNew<Const32Value>(proc, Origin(), 42);
+    CheckValue* checkAdd = root->appendNew<CheckValue>(proc, CheckAdd, Origin(), arg2, arg1);
+    checkAdd->append(arg1);
+    checkAdd->append(arg2);
+    checkAdd->setGenerator(
+        [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
+            CHECK(params.reps.size() == 4);
+            CHECK(params.reps[2].isGPR());
+            CHECK(params.reps[3].isConstant());
+            CHECK(params.reps[3].value() == 42);
+            jit.convertInt32ToDouble(params.reps[2].gpr(), FPRInfo::fpRegT0);
+            jit.convertInt32ToDouble(CCallHelpers::TrustedImm32(42), FPRInfo::fpRegT1);
+            jit.addDouble(FPRInfo::fpRegT1, FPRInfo::fpRegT0);
+            jit.emitFunctionEpilogue();
+            jit.ret();
+        });
+    root->appendNew<ControlValue>(
+        proc, Return, Origin(),
+        root->appendNew<Value>(proc, IToD, Origin(), checkAdd));
+
+    auto code = compile(proc);
+
+    CHECK(invoke<double>(*code, 0) == 42.0);
+    CHECK(invoke<double>(*code, 1) == 43.0);
+    CHECK(invoke<double>(*code, 42) == 84.0);
+    CHECK(invoke<double>(*code, 2147483647) == 2147483689.0);
+}
+
+void testCheckAddImmSomeRegister()
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    Value* arg1 = root->appendNew<Value>(
+        proc, Trunc, Origin(),
+        root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0));
+    Value* arg2 = root->appendNew<Const32Value>(proc, Origin(), 42);
+    CheckValue* checkAdd = root->appendNew<CheckValue>(proc, CheckAdd, Origin(), arg1, arg2);
+    checkAdd->appendSomeRegister(arg1);
+    checkAdd->appendSomeRegister(arg2);
+    checkAdd->setGenerator(
+        [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
+            CHECK(params.reps.size() == 4);
+            CHECK(params.reps[2].isGPR());
+            CHECK(params.reps[3].isGPR());
+            jit.convertInt32ToDouble(params.reps[2].gpr(), FPRInfo::fpRegT0);
+            jit.convertInt32ToDouble(params.reps[3].gpr(), FPRInfo::fpRegT1);
+            jit.addDouble(FPRInfo::fpRegT1, FPRInfo::fpRegT0);
+            jit.emitFunctionEpilogue();
+            jit.ret();
+        });
+    root->appendNew<ControlValue>(
+        proc, Return, Origin(),
+        root->appendNew<Value>(proc, IToD, Origin(), checkAdd));
+
+    auto code = compile(proc);
+
+    CHECK(invoke<double>(*code, 0) == 42.0);
+    CHECK(invoke<double>(*code, 1) == 43.0);
+    CHECK(invoke<double>(*code, 42) == 84.0);
+    CHECK(invoke<double>(*code, 2147483647) == 2147483689.0);
+}
+
 void testCheckAdd()
 {
     Procedure proc;
@@ -3335,14 +3445,15 @@ void testCheckAdd()
         proc, Trunc, Origin(),
         root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1));
     CheckValue* checkAdd = root->appendNew<CheckValue>(proc, CheckAdd, Origin(), arg1, arg2);
+    checkAdd->appendSomeRegister(arg1);
+    checkAdd->appendSomeRegister(arg2);
     checkAdd->setGenerator(
         [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
-            CHECK(params.reps.size() == 2);
-            CHECK(params.reps[0].isGPR());
-            CHECK(params.reps[1].isGPR());
-            jit.sub32(params.reps[1].gpr(), params.reps[0].gpr());
-            jit.convertInt32ToDouble(params.reps[0].gpr(), FPRInfo::fpRegT0);
-            jit.convertInt32ToDouble(params.reps[1].gpr(), FPRInfo::fpRegT1);
+            CHECK(params.reps.size() == 4);
+            CHECK(params.reps[2].isGPR());
+            CHECK(params.reps[3].isGPR());
+            jit.convertInt32ToDouble(params.reps[2].gpr(), FPRInfo::fpRegT0);
+            jit.convertInt32ToDouble(params.reps[3].gpr(), FPRInfo::fpRegT1);
             jit.addDouble(FPRInfo::fpRegT1, FPRInfo::fpRegT0);
             jit.emitFunctionEpilogue();
             jit.ret();
@@ -3366,14 +3477,15 @@ void testCheckAdd64()
     Value* arg1 = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
     Value* arg2 = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1);
     CheckValue* checkAdd = root->appendNew<CheckValue>(proc, CheckAdd, Origin(), arg1, arg2);
+    checkAdd->appendSomeRegister(arg1);
+    checkAdd->appendSomeRegister(arg2);
     checkAdd->setGenerator(
         [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
-            CHECK(params.reps.size() == 2);
-            CHECK(params.reps[0].isGPR());
-            CHECK(params.reps[1].isGPR());
-            jit.sub64(params.reps[1].gpr(), params.reps[0].gpr());
-            jit.convertInt64ToDouble(params.reps[0].gpr(), FPRInfo::fpRegT0);
-            jit.convertInt64ToDouble(params.reps[1].gpr(), FPRInfo::fpRegT1);
+            CHECK(params.reps.size() == 4);
+            CHECK(params.reps[2].isGPR());
+            CHECK(params.reps[3].isGPR());
+            jit.convertInt64ToDouble(params.reps[2].gpr(), FPRInfo::fpRegT0);
+            jit.convertInt64ToDouble(params.reps[3].gpr(), FPRInfo::fpRegT1);
             jit.addDouble(FPRInfo::fpRegT1, FPRInfo::fpRegT0);
             jit.emitFunctionEpilogue();
             jit.ret();
@@ -3437,14 +3549,15 @@ void testCheckSubImm()
         root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0));
     Value* arg2 = root->appendNew<Const32Value>(proc, Origin(), 42);
     CheckValue* checkSub = root->appendNew<CheckValue>(proc, CheckSub, Origin(), arg1, arg2);
+    checkSub->append(arg1);
+    checkSub->append(arg2);
     checkSub->setGenerator(
         [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
-            CHECK(params.reps.size() == 2);
-            CHECK(params.reps[0].isGPR());
-            CHECK(params.reps[1].isConstant());
-            CHECK(params.reps[1].value() == 42);
-            jit.add32(CCallHelpers::TrustedImm32(42), params.reps[0].gpr());
-            jit.convertInt32ToDouble(params.reps[0].gpr(), FPRInfo::fpRegT0);
+            CHECK(params.reps.size() == 4);
+            CHECK(params.reps[2].isGPR());
+            CHECK(params.reps[3].isConstant());
+            CHECK(params.reps[3].value() == 42);
+            jit.convertInt32ToDouble(params.reps[2].gpr(), FPRInfo::fpRegT0);
             jit.convertInt32ToDouble(CCallHelpers::TrustedImm32(42), FPRInfo::fpRegT1);
             jit.subDouble(FPRInfo::fpRegT1, FPRInfo::fpRegT0);
             jit.emitFunctionEpilogue();
@@ -3462,6 +3575,42 @@ void testCheckSubImm()
     CHECK(invoke<double>(*code, -2147483647) == -2147483689.0);
 }
 
+void testCheckSubBadImm()
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    Value* arg1 = root->appendNew<Value>(
+        proc, Trunc, Origin(),
+        root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0));
+    int32_t badImm = std::numeric_limits<int>::min();
+    Value* arg2 = root->appendNew<Const32Value>(proc, Origin(), badImm);
+    CheckValue* checkSub = root->appendNew<CheckValue>(proc, CheckSub, Origin(), arg1, arg2);
+    checkSub->append(arg1);
+    checkSub->append(arg2);
+    checkSub->setGenerator(
+        [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
+            CHECK(params.reps.size() == 4);
+            CHECK(params.reps[2].isGPR());
+            CHECK(params.reps[3].isConstant());
+            CHECK(params.reps[3].value() == badImm);
+            jit.convertInt32ToDouble(params.reps[2].gpr(), FPRInfo::fpRegT0);
+            jit.convertInt32ToDouble(CCallHelpers::TrustedImm32(badImm), FPRInfo::fpRegT1);
+            jit.subDouble(FPRInfo::fpRegT1, FPRInfo::fpRegT0);
+            jit.emitFunctionEpilogue();
+            jit.ret();
+        });
+    root->appendNew<ControlValue>(
+        proc, Return, Origin(),
+        root->appendNew<Value>(proc, IToD, Origin(), checkSub));
+
+    auto code = compile(proc);
+
+    CHECK(invoke<double>(*code, 0) == -static_cast<double>(badImm));
+    CHECK(invoke<double>(*code, -1) == -static_cast<double>(badImm) - 1);
+    CHECK(invoke<double>(*code, 1) == -static_cast<double>(badImm) + 1);
+    CHECK(invoke<double>(*code, 42) == -static_cast<double>(badImm) + 42);
+}
+
 void testCheckSub()
 {
     Procedure proc;
@@ -3473,14 +3622,15 @@ void testCheckSub()
         proc, Trunc, Origin(),
         root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1));
     CheckValue* checkSub = root->appendNew<CheckValue>(proc, CheckSub, Origin(), arg1, arg2);
+    checkSub->append(arg1);
+    checkSub->append(arg2);
     checkSub->setGenerator(
         [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
-            CHECK(params.reps.size() == 2);
-            CHECK(params.reps[0].isGPR());
-            CHECK(params.reps[1].isGPR());
-            jit.add32(params.reps[1].gpr(), params.reps[0].gpr());
-            jit.convertInt32ToDouble(params.reps[0].gpr(), FPRInfo::fpRegT0);
-            jit.convertInt32ToDouble(params.reps[1].gpr(), FPRInfo::fpRegT1);
+            CHECK(params.reps.size() == 4);
+            CHECK(params.reps[2].isGPR());
+            CHECK(params.reps[3].isGPR());
+            jit.convertInt32ToDouble(params.reps[2].gpr(), FPRInfo::fpRegT0);
+            jit.convertInt32ToDouble(params.reps[3].gpr(), FPRInfo::fpRegT1);
             jit.subDouble(FPRInfo::fpRegT1, FPRInfo::fpRegT0);
             jit.emitFunctionEpilogue();
             jit.ret();
@@ -3509,14 +3659,15 @@ void testCheckSub64()
     Value* arg1 = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
     Value* arg2 = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1);
     CheckValue* checkSub = root->appendNew<CheckValue>(proc, CheckSub, Origin(), arg1, arg2);
+    checkSub->append(arg1);
+    checkSub->append(arg2);
     checkSub->setGenerator(
         [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
-            CHECK(params.reps.size() == 2);
-            CHECK(params.reps[0].isGPR());
-            CHECK(params.reps[1].isGPR());
-            jit.add64(params.reps[1].gpr(), params.reps[0].gpr());
-            jit.convertInt64ToDouble(params.reps[0].gpr(), FPRInfo::fpRegT0);
-            jit.convertInt64ToDouble(params.reps[1].gpr(), FPRInfo::fpRegT1);
+            CHECK(params.reps.size() == 4);
+            CHECK(params.reps[2].isGPR());
+            CHECK(params.reps[3].isGPR());
+            jit.convertInt64ToDouble(params.reps[2].gpr(), FPRInfo::fpRegT0);
+            jit.convertInt64ToDouble(params.reps[3].gpr(), FPRInfo::fpRegT1);
             jit.subDouble(FPRInfo::fpRegT1, FPRInfo::fpRegT0);
             jit.emitFunctionEpilogue();
             jit.ret();
@@ -3580,13 +3731,12 @@ void testCheckNeg()
         proc, Trunc, Origin(),
         root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0));
     CheckValue* checkNeg = root->appendNew<CheckValue>(proc, CheckSub, Origin(), arg1, arg2);
+    checkNeg->append(arg2);
     checkNeg->setGenerator(
         [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
-            CHECK(params.reps.size() == 2);
-            CHECK(params.reps[0] == ValueRep::constant(0));
-            CHECK(params.reps[1].isGPR());
-            jit.neg32(params.reps[1].gpr());
-            jit.convertInt32ToDouble(params.reps[1].gpr(), FPRInfo::fpRegT1);
+            CHECK(params.reps.size() == 3);
+            CHECK(params.reps[2].isGPR());
+            jit.convertInt32ToDouble(params.reps[2].gpr(), FPRInfo::fpRegT1);
             jit.negateDouble(FPRInfo::fpRegT1, FPRInfo::fpRegT0);
             jit.emitFunctionEpilogue();
             jit.ret();
@@ -3610,13 +3760,12 @@ void testCheckNeg64()
     Value* arg1 = root->appendNew<Const64Value>(proc, Origin(), 0);
     Value* arg2 = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
     CheckValue* checkNeg = root->appendNew<CheckValue>(proc, CheckSub, Origin(), arg1, arg2);
+    checkNeg->append(arg2);
     checkNeg->setGenerator(
         [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
-            CHECK(params.reps.size() == 2);
-            CHECK(params.reps[0] == ValueRep::constant(0));
-            CHECK(params.reps[1].isGPR());
-            jit.neg64(params.reps[1].gpr());
-            jit.convertInt64ToDouble(params.reps[1].gpr(), FPRInfo::fpRegT1);
+            CHECK(params.reps.size() == 3);
+            CHECK(params.reps[2].isGPR());
+            jit.convertInt64ToDouble(params.reps[2].gpr(), FPRInfo::fpRegT1);
             jit.negateDouble(FPRInfo::fpRegT1, FPRInfo::fpRegT0);
             jit.emitFunctionEpilogue();
             jit.ret();
@@ -3644,13 +3793,15 @@ void testCheckMul()
         proc, Trunc, Origin(),
         root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1));
     CheckValue* checkMul = root->appendNew<CheckValue>(proc, CheckMul, Origin(), arg1, arg2);
+    checkMul->append(arg1);
+    checkMul->append(arg2);
     checkMul->setGenerator(
         [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
-            CHECK(params.reps.size() == 2);
-            CHECK(params.reps[0].isGPR());
-            CHECK(params.reps[1].isGPR());
-            jit.convertInt32ToDouble(params.reps[0].gpr(), FPRInfo::fpRegT0);
-            jit.convertInt32ToDouble(params.reps[1].gpr(), FPRInfo::fpRegT1);
+            CHECK(params.reps.size() == 4);
+            CHECK(params.reps[2].isGPR());
+            CHECK(params.reps[3].isGPR());
+            jit.convertInt32ToDouble(params.reps[2].gpr(), FPRInfo::fpRegT0);
+            jit.convertInt32ToDouble(params.reps[3].gpr(), FPRInfo::fpRegT1);
             jit.mulDouble(FPRInfo::fpRegT1, FPRInfo::fpRegT0);
             jit.emitFunctionEpilogue();
             jit.ret();
@@ -3667,6 +3818,92 @@ void testCheckMul()
     CHECK(invoke<double>(*code, 2147483647, 42) == 2147483647.0 * 42.0);
 }
 
+void testCheckMulMemory()
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+
+    int left;
+    int right;
+    
+    Value* arg1 = root->appendNew<MemoryValue>(
+        proc, Load, Int32, Origin(),
+        root->appendNew<ConstPtrValue>(proc, Origin(), &left));
+    Value* arg2 = root->appendNew<MemoryValue>(
+        proc, Load, Int32, Origin(),
+        root->appendNew<ConstPtrValue>(proc, Origin(), &right));
+    CheckValue* checkMul = root->appendNew<CheckValue>(proc, CheckMul, Origin(), arg1, arg2);
+    checkMul->append(arg1);
+    checkMul->append(arg2);
+    checkMul->setGenerator(
+        [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
+            CHECK(params.reps.size() == 4);
+            CHECK(params.reps[2].isGPR());
+            CHECK(params.reps[3].isGPR());
+            jit.convertInt32ToDouble(params.reps[2].gpr(), FPRInfo::fpRegT0);
+            jit.convertInt32ToDouble(params.reps[3].gpr(), FPRInfo::fpRegT1);
+            jit.mulDouble(FPRInfo::fpRegT1, FPRInfo::fpRegT0);
+            jit.emitFunctionEpilogue();
+            jit.ret();
+        });
+    root->appendNew<ControlValue>(
+        proc, Return, Origin(),
+        root->appendNew<Value>(proc, IToD, Origin(), checkMul));
+
+    auto code = compile(proc);
+
+    left = 0;
+    right = 42;
+    CHECK(invoke<double>(*code) == 0.0);
+    
+    left = 1;
+    right = 42;
+    CHECK(invoke<double>(*code) == 42.0);
+
+    left = 42;
+    right = 42;
+    CHECK(invoke<double>(*code) == 42.0 * 42.0);
+
+    left = 2147483647;
+    right = 42;
+    CHECK(invoke<double>(*code) == 2147483647.0 * 42.0);
+}
+
+void testCheckMul2()
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    Value* arg1 = root->appendNew<Value>(
+        proc, Trunc, Origin(),
+        root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0));
+    Value* arg2 = root->appendNew<Const32Value>(proc, Origin(), 2);
+    CheckValue* checkMul = root->appendNew<CheckValue>(proc, CheckMul, Origin(), arg1, arg2);
+    checkMul->append(arg1);
+    checkMul->append(arg2);
+    checkMul->setGenerator(
+        [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
+            CHECK(params.reps.size() == 4);
+            CHECK(params.reps[2].isGPR());
+            CHECK(params.reps[3].isConstant());
+            CHECK(params.reps[3].value() == 2);
+            jit.convertInt32ToDouble(params.reps[2].gpr(), FPRInfo::fpRegT0);
+            jit.convertInt32ToDouble(CCallHelpers::TrustedImm32(2), FPRInfo::fpRegT1);
+            jit.mulDouble(FPRInfo::fpRegT1, FPRInfo::fpRegT0);
+            jit.emitFunctionEpilogue();
+            jit.ret();
+        });
+    root->appendNew<ControlValue>(
+        proc, Return, Origin(),
+        root->appendNew<Value>(proc, IToD, Origin(), checkMul));
+
+    auto code = compile(proc);
+
+    CHECK(invoke<double>(*code, 0) == 0.0);
+    CHECK(invoke<double>(*code, 1) == 2.0);
+    CHECK(invoke<double>(*code, 42) == 42.0 * 2.0);
+    CHECK(invoke<double>(*code, 2147483647) == 2147483647.0 * 2.0);
+}
+
 void testCheckMul64()
 {
     Procedure proc;
@@ -3674,13 +3911,15 @@ void testCheckMul64()
     Value* arg1 = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
     Value* arg2 = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1);
     CheckValue* checkMul = root->appendNew<CheckValue>(proc, CheckMul, Origin(), arg1, arg2);
+    checkMul->append(arg1);
+    checkMul->append(arg2);
     checkMul->setGenerator(
         [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
-            CHECK(params.reps.size() == 2);
-            CHECK(params.reps[0].isGPR());
-            CHECK(params.reps[1].isGPR());
-            jit.convertInt64ToDouble(params.reps[0].gpr(), FPRInfo::fpRegT0);
-            jit.convertInt64ToDouble(params.reps[1].gpr(), FPRInfo::fpRegT1);
+            CHECK(params.reps.size() == 4);
+            CHECK(params.reps[2].isGPR());
+            CHECK(params.reps[3].isGPR());
+            jit.convertInt64ToDouble(params.reps[2].gpr(), FPRInfo::fpRegT0);
+            jit.convertInt64ToDouble(params.reps[3].gpr(), FPRInfo::fpRegT1);
             jit.mulDouble(FPRInfo::fpRegT1, FPRInfo::fpRegT0);
             jit.emitFunctionEpilogue();
             jit.ret();
@@ -4425,6 +4664,10 @@ void run(const char* filter)
     RUN(testAddImmsDouble(negativeZero(), negativeZero()));
 
     RUN(testMulArg(5));
+    RUN(testMulAddArg(5));
+    RUN(testMulAddArg(85));
+    RUN(testMulArgStore(5));
+    RUN(testMulArgStore(85));
     RUN(testMulArgs(1, 1));
     RUN(testMulArgs(1, 2));
     RUN(testMulArgs(3, 3));
@@ -4880,11 +5123,14 @@ void run(const char* filter)
     RUN(testCheckLessThan());
     RUN(testCheckMegaCombo());
     RUN(testCheckAddImm());
+    RUN(testCheckAddImmCommute());
+    RUN(testCheckAddImmSomeRegister());
     RUN(testCheckAdd());
     RUN(testCheckAdd64());
     RUN(testCheckAddFold(100, 200));
     RUN(testCheckAddFoldFail(2147483647, 100));
     RUN(testCheckSubImm());
+    RUN(testCheckSubBadImm());
     RUN(testCheckSub());
     RUN(testCheckSub64());
     RUN(testCheckSubFold(100, 200));
@@ -4892,6 +5138,8 @@ void run(const char* filter)
     RUN(testCheckNeg());
     RUN(testCheckNeg64());
     RUN(testCheckMul());
+    RUN(testCheckMulMemory());
+    RUN(testCheckMul2());
     RUN(testCheckMul64());
     RUN(testCheckMulFold(100, 200));
     RUN(testCheckMulFoldFail(2147483647, 100));
index 8ff87bb..a3212b9 100644 (file)
@@ -1,5 +1,18 @@
 2015-11-17  Filip Pizlo  <fpizlo@apple.com>
 
+        CheckAdd/Mul should have commutativity optimizations in B3->Air lowering
+        https://bugs.webkit.org/show_bug.cgi?id=151214
+
+        Reviewed by Geoffrey Garen.
+
+        Disable my failed attempts at perfect forwarding, since they were incorrect, and caused compile
+        errors if you tried to pass an argument that bound to lvalue. This shouldn't affect performance of
+        anything we care about, and performance tests seem to confirm that it's all good.
+
+        * wtf/ScopedLambda.h:
+
+2015-11-17  Filip Pizlo  <fpizlo@apple.com>
+
         Air should lay out code optimally
         https://bugs.webkit.org/show_bug.cgi?id=150478
 
index b37ca49..2d8dfd4 100644 (file)
@@ -44,19 +44,20 @@ template<typename FunctionType> class ScopedLambda;
 template<typename ResultType, typename... ArgumentTypes>
 class ScopedLambda<ResultType (ArgumentTypes...)> {
 public:
-    ScopedLambda(ResultType (*impl)(void* arg, ArgumentTypes&&...) = nullptr, void* arg = nullptr)
+    ScopedLambda(ResultType (*impl)(void* arg, ArgumentTypes...) = nullptr, void* arg = nullptr)
         : m_impl(impl)
         , m_arg(arg)
     {
     }
-    
-    ResultType operator()(ArgumentTypes&&... arguments) const
+
+    template<typename... PassedArgumentTypes>
+    ResultType operator()(PassedArgumentTypes&&... arguments) const
     {
-        return m_impl(m_arg, std::forward<ArgumentTypes>(arguments)...);
+        return m_impl(m_arg, std::forward<PassedArgumentTypes>(arguments)...);
     }
 
 private:
-    ResultType (*m_impl)(void* arg, ArgumentTypes&&...);
+    ResultType (*m_impl)(void* arg, ArgumentTypes...);
     void *m_arg;
 };
 
@@ -72,10 +73,9 @@ public:
     }
 
 private:
-    static ResultType implFunction(void* argument, ArgumentTypes&&... arguments)
+    static ResultType implFunction(void* argument, ArgumentTypes... arguments)
     {
-        return static_cast<ScopedLambdaFunctor*>(argument)->m_functor(
-            std::forward<ArgumentTypes>(arguments)...);
+        return static_cast<ScopedLambdaFunctor*>(argument)->m_functor(arguments...);
     }
 
     Functor m_functor;
index ff69b5a..e694333 100644 (file)
@@ -102,9 +102,9 @@ void run(unsigned numVars, unsigned numConstructs)
     for (unsigned i = 0; i < numConstructs; ++i) {
         if (i & 1) {
             // Control flow diamond.
-            unsigned predicateVarIndex = (i >> 1) % numVars;
-            unsigned thenIncVarIndex = ((i >> 1) + 1) % numVars;
-            unsigned elseIncVarIndex = ((i >> 1) + 2) % numVars;
+            unsigned predicateVarIndex = ((i >> 1) + 2) % numVars;
+            unsigned thenIncVarIndex = ((i >> 1) + 0) % numVars;
+            unsigned elseIncVarIndex = ((i >> 1) + 1) % numVars;
 
             BasicBlock* thenBlock = BasicBlock::Create(context, "", function);
             BasicBlock* elseBlock = BasicBlock::Create(context, "", function);
@@ -138,7 +138,7 @@ void run(unsigned numVars, unsigned numConstructs)
             BasicBlock* loopBody = BasicBlock::Create(context, "", function);
             BasicBlock* continuation = BasicBlock::Create(context, "", function);
 
-            Value* startIndex = vars[(i >> 1) % numVars];
+            Value* startIndex = vars[((i >> 1) + 1) % numVars];
             Value* startSum = builder->getInt32(0);
             builder->CreateCondBr(
                 builder->CreateICmpNE(startIndex, builder->getInt32(0)),
@@ -176,6 +176,7 @@ void run(unsigned numVars, unsigned numConstructs)
             finalSum->addIncoming(startSum, current);
             finalSum->addIncoming(newBodySum, loopBody);
             current = continuation;
+            vars[((i >> 1) + 0) % numVars] = finalSum;
         }
     }