FTL B3 should be just as fast as FTL LLVM on Octane/crypto
authorfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Tue, 19 Jan 2016 19:20:35 +0000 (19:20 +0000)
committerfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Tue, 19 Jan 2016 19:20:35 +0000 (19:20 +0000)
https://bugs.webkit.org/show_bug.cgi?id=153113

Reviewed by Saam Barati.

Source/JavaScriptCore:

This is the result of a hacking rampage to close the gap between FTL B3 and FTL LLVM on
Octane/crypto. It was a very successful rampage.

The biggest change in this patch is the introduction of a phase called fixObviousSpills()
that fixes patterns like:

Store register to stack slot and then use stack slot:
    Move %rcx, (stack42)
    Foo use:(stack42) // replace (stack42) with %rcx here.

Load stack slot into register and then use stack slot:
    Move (stack42), %rcx
    Foo use:(stack42) // replace (stack42) with %rcx here.

Store constant into stack slot and then use stack slot:
    Move $42, %rcx
    Move %rcx, (stack42)
    Bar def:%rcx // %rcx isn't available anymore, but we still know that (stack42) is $42
    Foo use:(stack42) // replace (stack42) with $42 here.

This phases does these fixups by doing a global forward flow that propagates sets of
must-aliases.

Also added a phase to report register pressure. It pretty-prints code alongside the set of
in-use registers above each instruction. Using this phase, I found that our register
allocator is actually doing a pretty awesome job. I had previously feared that we'd have to
make substantial changes to register allocation. I don't have such a fear anymore, at least
for Octane/crypto. In the future, we can check how the regalloc is performing just by
enabling logAirRegisterPressure.

Also fixed some FTL codegen pathologies. We were using bitOr where we meant to use a
conditional or. LLVM likes to canonicalize boolean expressions this way. B3, on the other
hand, doesn't do this canonicalization and doesn't have logic to decompose it into sequences
of branches.

Also added strength reductions for checked arithmetic. It turns out that LLVM learned how to
reduce checked multiply to unchecked multiply in some obvious cases that our existing DFG
optimizations lacked. Ideally, our DFG integer range optimization phase would cover this. But
the cases of interest were dead simple - the incoming values to the CheckMul were obviously
too small to cause overflow. I added such reasoning to B3's strength reduction.

Finally, this fixes some bugs with how we were handling subwidth spill slots. The register
allocator was making two mistakes. First, it might cause a Width64 def or use of a 4-byte
spill slot. In that case, it would extend the size of the spill slot to ensure that the use
or def is safe. Second, it emulates ZDef on Tmp behavior by emitting a Move32 to initialize
the high bits of a spill slot. But this is unsound because of the liveness semantics of spill
slots. They cannot have more than one def to initialize their value. I fixed that by making
allocateStack() be the thing that fixes ZDefs. That's a change to ZDef semantics: now, ZDef
on an anonymous stack slot means that the high bits are zero-filled. I wasn't able to
construct a test for this. It might be a hypothetical bug, but still, I like how this
simplifies the register allocator.

This is a ~0.7% speed-up on Octane.

* CMakeLists.txt:
* JavaScriptCore.xcodeproj/project.pbxproj:
* b3/B3CheckSpecial.cpp:
(JSC::B3::CheckSpecial::hiddenBranch):
(JSC::B3::CheckSpecial::forEachArg):
(JSC::B3::CheckSpecial::commitHiddenBranch): Deleted.
* b3/B3CheckSpecial.h:
* b3/B3LowerToAir.cpp:
(JSC::B3::Air::LowerToAir::fillStackmap):
(JSC::B3::Air::LowerToAir::lower):
* b3/B3StackmapValue.h:
* b3/air/AirAllocateStack.cpp:
(JSC::B3::Air::allocateStack):
* b3/air/AirAllocateStack.h:
* b3/air/AirArg.h:
(JSC::B3::Air::Arg::callArg):
(JSC::B3::Air::Arg::stackAddr):
(JSC::B3::Air::Arg::isValidScale):
* b3/air/AirBasicBlock.cpp:
(JSC::B3::Air::BasicBlock::deepDump):
(JSC::B3::Air::BasicBlock::dumpHeader):
(JSC::B3::Air::BasicBlock::dumpFooter):
* b3/air/AirBasicBlock.h:
* b3/air/AirCCallSpecial.cpp:
(JSC::B3::Air::CCallSpecial::CCallSpecial):
(JSC::B3::Air::CCallSpecial::~CCallSpecial):
* b3/air/AirCode.h:
(JSC::B3::Air::Code::lastPhaseName):
(JSC::B3::Air::Code::setEnableRCRS):
(JSC::B3::Air::Code::enableRCRS):
* b3/air/AirCustom.cpp:
(JSC::B3::Air::PatchCustom::isValidForm):
(JSC::B3::Air::CCallCustom::isValidForm):
* b3/air/AirCustom.h:
(JSC::B3::Air::PatchCustom::isValidFormStatic):
(JSC::B3::Air::PatchCustom::admitsStack):
(JSC::B3::Air::PatchCustom::isValidForm): Deleted.
* b3/air/AirEmitShuffle.cpp:
(JSC::B3::Air::ShufflePair::dump):
(JSC::B3::Air::createShuffle):
(JSC::B3::Air::emitShuffle):
* b3/air/AirEmitShuffle.h:
* b3/air/AirFixObviousSpills.cpp: Added.
(JSC::B3::Air::fixObviousSpills):
* b3/air/AirFixObviousSpills.h: Added.
* b3/air/AirFixSpillSlotZDef.h: Removed.
* b3/air/AirGenerate.cpp:
(JSC::B3::Air::prepareForGeneration):
(JSC::B3::Air::generate):
* b3/air/AirHandleCalleeSaves.cpp:
(JSC::B3::Air::handleCalleeSaves):
* b3/air/AirInst.h:
* b3/air/AirInstInlines.h:
(JSC::B3::Air::Inst::reportUsedRegisters):
(JSC::B3::Air::Inst::admitsStack):
(JSC::B3::Air::isShiftValid):
* b3/air/AirIteratedRegisterCoalescing.cpp:
* b3/air/AirLiveness.h:
(JSC::B3::Air::AbstractLiveness::AbstractLiveness):
(JSC::B3::Air::AbstractLiveness::LocalCalc::Iterable::begin):
(JSC::B3::Air::AbstractLiveness::LocalCalc::Iterable::end):
(JSC::B3::Air::AbstractLiveness::LocalCalc::Iterable::contains):
(JSC::B3::Air::AbstractLiveness::LocalCalc::live):
(JSC::B3::Air::AbstractLiveness::LocalCalc::isLive):
(JSC::B3::Air::AbstractLiveness::LocalCalc::execute):
(JSC::B3::Air::AbstractLiveness::rawLiveAtHead):
(JSC::B3::Air::AbstractLiveness::Iterable::begin):
(JSC::B3::Air::AbstractLiveness::Iterable::end):
(JSC::B3::Air::AbstractLiveness::Iterable::contains):
(JSC::B3::Air::AbstractLiveness::liveAtTail):
(JSC::B3::Air::AbstractLiveness::workset):
* b3/air/AirLogRegisterPressure.cpp: Added.
(JSC::B3::Air::logRegisterPressure):
* b3/air/AirLogRegisterPressure.h: Added.
* b3/air/AirOptimizeBlockOrder.cpp:
(JSC::B3::Air::blocksInOptimizedOrder):
(JSC::B3::Air::optimizeBlockOrder):
* b3/air/AirOptimizeBlockOrder.h:
* b3/air/AirReportUsedRegisters.cpp:
(JSC::B3::Air::reportUsedRegisters):
* b3/air/AirReportUsedRegisters.h:
* b3/air/AirSpillEverything.cpp:
(JSC::B3::Air::spillEverything):
* b3/air/AirStackSlot.h:
(JSC::B3::Air::StackSlot::isLocked):
(JSC::B3::Air::StackSlot::index):
(JSC::B3::Air::StackSlot::ensureSize):
(JSC::B3::Air::StackSlot::alignment):
* b3/air/AirValidate.cpp:
* ftl/FTLB3Compile.cpp:
(JSC::FTL::compile):
* ftl/FTLLowerDFGToLLVM.cpp:
(JSC::FTL::DFG::LowerDFGToLLVM::compileArithMul):
(JSC::FTL::DFG::LowerDFGToLLVM::compileArithDiv):
(JSC::FTL::DFG::LowerDFGToLLVM::compileArithMod):
* jit/RegisterSet.h:
(JSC::RegisterSet::get):
(JSC::RegisterSet::setAll):
(JSC::RegisterSet::merge):
(JSC::RegisterSet::filter):
* runtime/Options.h:

Source/WTF:

* wtf/IndexSparseSet.h:
(WTF::IndexSparseSet<OverflowHandler>::IndexSparseSet):
(WTF::IndexSparseSet<OverflowHandler>::add):
(WTF::IndexSparseSet<OverflowHandler>::remove):
* wtf/StringPrintStream.h:
(WTF::StringPrintStream::length):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@195298 268f45cc-cd09-0410-ab3c-d52691b4dbfc

45 files changed:
Source/JavaScriptCore/CMakeLists.txt
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj
Source/JavaScriptCore/b3/B3CheckSpecial.cpp
Source/JavaScriptCore/b3/B3CheckSpecial.h
Source/JavaScriptCore/b3/B3LowerToAir.cpp
Source/JavaScriptCore/b3/B3ReduceStrength.cpp
Source/JavaScriptCore/b3/B3StackmapValue.h
Source/JavaScriptCore/b3/air/AirAllocateStack.cpp
Source/JavaScriptCore/b3/air/AirAllocateStack.h
Source/JavaScriptCore/b3/air/AirArg.h
Source/JavaScriptCore/b3/air/AirBasicBlock.cpp
Source/JavaScriptCore/b3/air/AirBasicBlock.h
Source/JavaScriptCore/b3/air/AirCCallSpecial.cpp
Source/JavaScriptCore/b3/air/AirCode.h
Source/JavaScriptCore/b3/air/AirCustom.cpp
Source/JavaScriptCore/b3/air/AirCustom.h
Source/JavaScriptCore/b3/air/AirEmitShuffle.cpp
Source/JavaScriptCore/b3/air/AirEmitShuffle.h
Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp [new file with mode: 0644]
Source/JavaScriptCore/b3/air/AirFixObviousSpills.h [new file with mode: 0644]
Source/JavaScriptCore/b3/air/AirFixSpillSlotZDef.h [deleted file]
Source/JavaScriptCore/b3/air/AirGenerate.cpp
Source/JavaScriptCore/b3/air/AirHandleCalleeSaves.cpp
Source/JavaScriptCore/b3/air/AirInst.h
Source/JavaScriptCore/b3/air/AirInstInlines.h
Source/JavaScriptCore/b3/air/AirIteratedRegisterCoalescing.cpp
Source/JavaScriptCore/b3/air/AirLiveness.h
Source/JavaScriptCore/b3/air/AirLogRegisterPressure.cpp [new file with mode: 0644]
Source/JavaScriptCore/b3/air/AirLogRegisterPressure.h [new file with mode: 0644]
Source/JavaScriptCore/b3/air/AirOptimizeBlockOrder.cpp
Source/JavaScriptCore/b3/air/AirOptimizeBlockOrder.h
Source/JavaScriptCore/b3/air/AirReportUsedRegisters.cpp
Source/JavaScriptCore/b3/air/AirReportUsedRegisters.h
Source/JavaScriptCore/b3/air/AirSpillEverything.cpp
Source/JavaScriptCore/b3/air/AirStackSlot.h
Source/JavaScriptCore/b3/air/AirValidate.cpp
Source/JavaScriptCore/ftl/FTLB3Compile.cpp
Source/JavaScriptCore/ftl/FTLLowerDFGToLLVM.cpp
Source/JavaScriptCore/jit/RegisterSet.h
Source/JavaScriptCore/runtime/Options.h
Source/WTF/ChangeLog
Source/WTF/wtf/CheckedArithmetic.h
Source/WTF/wtf/IndexSparseSet.h
Source/WTF/wtf/StringPrintStream.h

index 9916ad4..4178bea 100644 (file)
@@ -78,6 +78,7 @@ set(JavaScriptCore_SOURCES
     b3/air/AirCustom.cpp
     b3/air/AirEliminateDeadCode.cpp
     b3/air/AirEmitShuffle.cpp
+    b3/air/AirFixObviousSpills.cpp
     b3/air/AirFixPartialRegisterStalls.cpp
     b3/air/AirGenerate.cpp
     b3/air/AirGenerated.cpp
@@ -85,6 +86,7 @@ set(JavaScriptCore_SOURCES
     b3/air/AirInsertionSet.cpp
     b3/air/AirInst.cpp
     b3/air/AirIteratedRegisterCoalescing.cpp
+    b3/air/AirLogRegisterPressure.cpp
     b3/air/AirLowerAfterRegAlloc.cpp
     b3/air/AirLowerMacros.cpp
     b3/air/AirOptimizeBlockOrder.cpp
index 7c56b86..e90fb79 100644 (file)
@@ -1,3 +1,166 @@
+2016-01-17  Filip Pizlo  <fpizlo@apple.com>
+
+        FTL B3 should be just as fast as FTL LLVM on Octane/crypto
+        https://bugs.webkit.org/show_bug.cgi?id=153113
+
+        Reviewed by Saam Barati.
+
+        This is the result of a hacking rampage to close the gap between FTL B3 and FTL LLVM on
+        Octane/crypto. It was a very successful rampage.
+
+        The biggest change in this patch is the introduction of a phase called fixObviousSpills()
+        that fixes patterns like:
+
+        Store register to stack slot and then use stack slot:
+            Move %rcx, (stack42)
+            Foo use:(stack42) // replace (stack42) with %rcx here.
+
+        Load stack slot into register and then use stack slot:
+            Move (stack42), %rcx
+            Foo use:(stack42) // replace (stack42) with %rcx here.
+
+        Store constant into stack slot and then use stack slot:
+            Move $42, %rcx
+            Move %rcx, (stack42)
+            Bar def:%rcx // %rcx isn't available anymore, but we still know that (stack42) is $42
+            Foo use:(stack42) // replace (stack42) with $42 here.
+
+        This phases does these fixups by doing a global forward flow that propagates sets of
+        must-aliases.
+
+        Also added a phase to report register pressure. It pretty-prints code alongside the set of
+        in-use registers above each instruction. Using this phase, I found that our register
+        allocator is actually doing a pretty awesome job. I had previously feared that we'd have to
+        make substantial changes to register allocation. I don't have such a fear anymore, at least
+        for Octane/crypto. In the future, we can check how the regalloc is performing just by
+        enabling logAirRegisterPressure.
+
+        Also fixed some FTL codegen pathologies. We were using bitOr where we meant to use a
+        conditional or. LLVM likes to canonicalize boolean expressions this way. B3, on the other
+        hand, doesn't do this canonicalization and doesn't have logic to decompose it into sequences
+        of branches.
+
+        Also added strength reductions for checked arithmetic. It turns out that LLVM learned how to
+        reduce checked multiply to unchecked multiply in some obvious cases that our existing DFG
+        optimizations lacked. Ideally, our DFG integer range optimization phase would cover this. But
+        the cases of interest were dead simple - the incoming values to the CheckMul were obviously
+        too small to cause overflow. I added such reasoning to B3's strength reduction.
+
+        Finally, this fixes some bugs with how we were handling subwidth spill slots. The register
+        allocator was making two mistakes. First, it might cause a Width64 def or use of a 4-byte
+        spill slot. In that case, it would extend the size of the spill slot to ensure that the use
+        or def is safe. Second, it emulates ZDef on Tmp behavior by emitting a Move32 to initialize
+        the high bits of a spill slot. But this is unsound because of the liveness semantics of spill
+        slots. They cannot have more than one def to initialize their value. I fixed that by making
+        allocateStack() be the thing that fixes ZDefs. That's a change to ZDef semantics: now, ZDef
+        on an anonymous stack slot means that the high bits are zero-filled. I wasn't able to
+        construct a test for this. It might be a hypothetical bug, but still, I like how this
+        simplifies the register allocator.
+
+        This is a ~0.7% speed-up on Octane.
+
+        * CMakeLists.txt:
+        * JavaScriptCore.xcodeproj/project.pbxproj:
+        * b3/B3CheckSpecial.cpp:
+        (JSC::B3::CheckSpecial::hiddenBranch):
+        (JSC::B3::CheckSpecial::forEachArg):
+        (JSC::B3::CheckSpecial::commitHiddenBranch): Deleted.
+        * b3/B3CheckSpecial.h:
+        * b3/B3LowerToAir.cpp:
+        (JSC::B3::Air::LowerToAir::fillStackmap):
+        (JSC::B3::Air::LowerToAir::lower):
+        * b3/B3StackmapValue.h:
+        * b3/air/AirAllocateStack.cpp:
+        (JSC::B3::Air::allocateStack):
+        * b3/air/AirAllocateStack.h:
+        * b3/air/AirArg.h:
+        (JSC::B3::Air::Arg::callArg):
+        (JSC::B3::Air::Arg::stackAddr):
+        (JSC::B3::Air::Arg::isValidScale):
+        * b3/air/AirBasicBlock.cpp:
+        (JSC::B3::Air::BasicBlock::deepDump):
+        (JSC::B3::Air::BasicBlock::dumpHeader):
+        (JSC::B3::Air::BasicBlock::dumpFooter):
+        * b3/air/AirBasicBlock.h:
+        * b3/air/AirCCallSpecial.cpp:
+        (JSC::B3::Air::CCallSpecial::CCallSpecial):
+        (JSC::B3::Air::CCallSpecial::~CCallSpecial):
+        * b3/air/AirCode.h:
+        (JSC::B3::Air::Code::lastPhaseName):
+        (JSC::B3::Air::Code::setEnableRCRS):
+        (JSC::B3::Air::Code::enableRCRS):
+        * b3/air/AirCustom.cpp:
+        (JSC::B3::Air::PatchCustom::isValidForm):
+        (JSC::B3::Air::CCallCustom::isValidForm):
+        * b3/air/AirCustom.h:
+        (JSC::B3::Air::PatchCustom::isValidFormStatic):
+        (JSC::B3::Air::PatchCustom::admitsStack):
+        (JSC::B3::Air::PatchCustom::isValidForm): Deleted.
+        * b3/air/AirEmitShuffle.cpp:
+        (JSC::B3::Air::ShufflePair::dump):
+        (JSC::B3::Air::createShuffle):
+        (JSC::B3::Air::emitShuffle):
+        * b3/air/AirEmitShuffle.h:
+        * b3/air/AirFixObviousSpills.cpp: Added.
+        (JSC::B3::Air::fixObviousSpills):
+        * b3/air/AirFixObviousSpills.h: Added.
+        * b3/air/AirFixSpillSlotZDef.h: Removed.
+        * b3/air/AirGenerate.cpp:
+        (JSC::B3::Air::prepareForGeneration):
+        (JSC::B3::Air::generate):
+        * b3/air/AirHandleCalleeSaves.cpp:
+        (JSC::B3::Air::handleCalleeSaves):
+        * b3/air/AirInst.h:
+        * b3/air/AirInstInlines.h:
+        (JSC::B3::Air::Inst::reportUsedRegisters):
+        (JSC::B3::Air::Inst::admitsStack):
+        (JSC::B3::Air::isShiftValid):
+        * b3/air/AirIteratedRegisterCoalescing.cpp:
+        * b3/air/AirLiveness.h:
+        (JSC::B3::Air::AbstractLiveness::AbstractLiveness):
+        (JSC::B3::Air::AbstractLiveness::LocalCalc::Iterable::begin):
+        (JSC::B3::Air::AbstractLiveness::LocalCalc::Iterable::end):
+        (JSC::B3::Air::AbstractLiveness::LocalCalc::Iterable::contains):
+        (JSC::B3::Air::AbstractLiveness::LocalCalc::live):
+        (JSC::B3::Air::AbstractLiveness::LocalCalc::isLive):
+        (JSC::B3::Air::AbstractLiveness::LocalCalc::execute):
+        (JSC::B3::Air::AbstractLiveness::rawLiveAtHead):
+        (JSC::B3::Air::AbstractLiveness::Iterable::begin):
+        (JSC::B3::Air::AbstractLiveness::Iterable::end):
+        (JSC::B3::Air::AbstractLiveness::Iterable::contains):
+        (JSC::B3::Air::AbstractLiveness::liveAtTail):
+        (JSC::B3::Air::AbstractLiveness::workset):
+        * b3/air/AirLogRegisterPressure.cpp: Added.
+        (JSC::B3::Air::logRegisterPressure):
+        * b3/air/AirLogRegisterPressure.h: Added.
+        * b3/air/AirOptimizeBlockOrder.cpp:
+        (JSC::B3::Air::blocksInOptimizedOrder):
+        (JSC::B3::Air::optimizeBlockOrder):
+        * b3/air/AirOptimizeBlockOrder.h:
+        * b3/air/AirReportUsedRegisters.cpp:
+        (JSC::B3::Air::reportUsedRegisters):
+        * b3/air/AirReportUsedRegisters.h:
+        * b3/air/AirSpillEverything.cpp:
+        (JSC::B3::Air::spillEverything):
+        * b3/air/AirStackSlot.h:
+        (JSC::B3::Air::StackSlot::isLocked):
+        (JSC::B3::Air::StackSlot::index):
+        (JSC::B3::Air::StackSlot::ensureSize):
+        (JSC::B3::Air::StackSlot::alignment):
+        * b3/air/AirValidate.cpp:
+        * ftl/FTLB3Compile.cpp:
+        (JSC::FTL::compile):
+        * ftl/FTLLowerDFGToLLVM.cpp:
+        (JSC::FTL::DFG::LowerDFGToLLVM::compileArithMul):
+        (JSC::FTL::DFG::LowerDFGToLLVM::compileArithDiv):
+        (JSC::FTL::DFG::LowerDFGToLLVM::compileArithMod):
+        * jit/RegisterSet.h:
+        (JSC::RegisterSet::get):
+        (JSC::RegisterSet::setAll):
+        (JSC::RegisterSet::merge):
+        (JSC::RegisterSet::filter):
+        * runtime/Options.h:
+
 2016-01-19  Filip Pizlo  <fpizlo@apple.com>
 
         Unreviewed, undo unintended commit.
index a93a6aa..4b19632 100644 (file)
                0F493AFA16D0CAD30084508B /* SourceProvider.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F493AF816D0CAD10084508B /* SourceProvider.cpp */; };
                0F4B94DC17B9F07500DD03A4 /* TypedArrayInlines.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F4B94DB17B9F07500DD03A4 /* TypedArrayInlines.h */; settings = {ATTRIBUTES = (Private, ); }; };
                0F4C91661C29F4F2004341A6 /* B3OriginDump.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F4C91651C29F4F2004341A6 /* B3OriginDump.h */; };
-               0F4C91681C2B3D68004341A6 /* AirFixSpillSlotZDef.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F4C91671C2B3D68004341A6 /* AirFixSpillSlotZDef.h */; };
+               0F4DE1CE1C4C1B54004D6C11 /* AirFixObviousSpills.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F4DE1CC1C4C1B54004D6C11 /* AirFixObviousSpills.cpp */; };
+               0F4DE1CF1C4C1B54004D6C11 /* AirFixObviousSpills.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F4DE1CD1C4C1B54004D6C11 /* AirFixObviousSpills.h */; };
                0F4F29DF18B6AD1C0057BC15 /* DFGStaticExecutionCountEstimationPhase.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F4F29DD18B6AD1C0057BC15 /* DFGStaticExecutionCountEstimationPhase.cpp */; };
                0F4F29E018B6AD1C0057BC15 /* DFGStaticExecutionCountEstimationPhase.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F4F29DE18B6AD1C0057BC15 /* DFGStaticExecutionCountEstimationPhase.h */; };
                0F50AF3C193E8B3900674EE8 /* DFGStructureClobberState.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F50AF3B193E8B3900674EE8 /* DFGStructureClobberState.h */; };
                0FE228EE1436AB2C00196C48 /* Options.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE228EA1436AB2300196C48 /* Options.cpp */; };
                0FE254F61ABDDD2200A7C6D2 /* DFGVarargsForwardingPhase.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE254F41ABDDD2200A7C6D2 /* DFGVarargsForwardingPhase.cpp */; };
                0FE254F71ABDDD2200A7C6D2 /* DFGVarargsForwardingPhase.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FE254F51ABDDD2200A7C6D2 /* DFGVarargsForwardingPhase.h */; };
+               0FE34C191C4B39AE0003A512 /* AirLogRegisterPressure.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE34C171C4B39AE0003A512 /* AirLogRegisterPressure.cpp */; };
+               0FE34C1A1C4B39AE0003A512 /* AirLogRegisterPressure.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FE34C181C4B39AE0003A512 /* AirLogRegisterPressure.h */; };
                0FE7211D193B9C590031F6ED /* DFGTransition.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE7211B193B9C590031F6ED /* DFGTransition.cpp */; };
                0FE7211E193B9C590031F6ED /* DFGTransition.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FE7211C193B9C590031F6ED /* DFGTransition.h */; };
                0FE834171A6EF97B00D04847 /* PolymorphicCallStubRoutine.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE834151A6EF97B00D04847 /* PolymorphicCallStubRoutine.cpp */; };
                0F493AF816D0CAD10084508B /* SourceProvider.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = SourceProvider.cpp; sourceTree = "<group>"; };
                0F4B94DB17B9F07500DD03A4 /* TypedArrayInlines.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = TypedArrayInlines.h; sourceTree = "<group>"; };
                0F4C91651C29F4F2004341A6 /* B3OriginDump.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = B3OriginDump.h; path = b3/B3OriginDump.h; sourceTree = "<group>"; };
-               0F4C91671C2B3D68004341A6 /* AirFixSpillSlotZDef.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirFixSpillSlotZDef.h; path = b3/air/AirFixSpillSlotZDef.h; sourceTree = "<group>"; };
+               0F4DE1CC1C4C1B54004D6C11 /* AirFixObviousSpills.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirFixObviousSpills.cpp; path = b3/air/AirFixObviousSpills.cpp; sourceTree = "<group>"; };
+               0F4DE1CD1C4C1B54004D6C11 /* AirFixObviousSpills.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirFixObviousSpills.h; path = b3/air/AirFixObviousSpills.h; sourceTree = "<group>"; };
                0F4F29DD18B6AD1C0057BC15 /* DFGStaticExecutionCountEstimationPhase.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = DFGStaticExecutionCountEstimationPhase.cpp; path = dfg/DFGStaticExecutionCountEstimationPhase.cpp; sourceTree = "<group>"; };
                0F4F29DE18B6AD1C0057BC15 /* DFGStaticExecutionCountEstimationPhase.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = DFGStaticExecutionCountEstimationPhase.h; path = dfg/DFGStaticExecutionCountEstimationPhase.h; sourceTree = "<group>"; };
                0F50AF3B193E8B3900674EE8 /* DFGStructureClobberState.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = DFGStructureClobberState.h; path = dfg/DFGStructureClobberState.h; sourceTree = "<group>"; };
                0FE228EB1436AB2300196C48 /* Options.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = Options.h; sourceTree = "<group>"; };
                0FE254F41ABDDD2200A7C6D2 /* DFGVarargsForwardingPhase.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = DFGVarargsForwardingPhase.cpp; path = dfg/DFGVarargsForwardingPhase.cpp; sourceTree = "<group>"; };
                0FE254F51ABDDD2200A7C6D2 /* DFGVarargsForwardingPhase.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = DFGVarargsForwardingPhase.h; path = dfg/DFGVarargsForwardingPhase.h; sourceTree = "<group>"; };
+               0FE34C171C4B39AE0003A512 /* AirLogRegisterPressure.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirLogRegisterPressure.cpp; path = b3/air/AirLogRegisterPressure.cpp; sourceTree = "<group>"; };
+               0FE34C181C4B39AE0003A512 /* AirLogRegisterPressure.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirLogRegisterPressure.h; path = b3/air/AirLogRegisterPressure.h; sourceTree = "<group>"; };
                0FE7211B193B9C590031F6ED /* DFGTransition.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = DFGTransition.cpp; path = dfg/DFGTransition.cpp; sourceTree = "<group>"; };
                0FE7211C193B9C590031F6ED /* DFGTransition.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = DFGTransition.h; path = dfg/DFGTransition.h; sourceTree = "<group>"; };
                0FE834151A6EF97B00D04847 /* PolymorphicCallStubRoutine.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = PolymorphicCallStubRoutine.cpp; sourceTree = "<group>"; };
                                0F4570371BE44C910062A629 /* AirEliminateDeadCode.h */,
                                0F6183231C45BF070072450B /* AirEmitShuffle.cpp */,
                                0F6183241C45BF070072450B /* AirEmitShuffle.h */,
+                               0F4DE1CC1C4C1B54004D6C11 /* AirFixObviousSpills.cpp */,
+                               0F4DE1CD1C4C1B54004D6C11 /* AirFixObviousSpills.h */,
                                262D85B41C0D650F006ACB61 /* AirFixPartialRegisterStalls.cpp */,
                                262D85B51C0D650F006ACB61 /* AirFixPartialRegisterStalls.h */,
-                               0F4C91671C2B3D68004341A6 /* AirFixSpillSlotZDef.h */,
                                0FEC85521BDACDC70080FF74 /* AirFrequentedBlock.h */,
                                0FEC85531BDACDC70080FF74 /* AirGenerate.cpp */,
                                0FEC85541BDACDC70080FF74 /* AirGenerate.h */,
                                26718BA21BE99F780052017B /* AirIteratedRegisterCoalescing.cpp */,
                                26718BA31BE99F780052017B /* AirIteratedRegisterCoalescing.h */,
                                2684D4371C00161C0081D663 /* AirLiveness.h */,
+                               0FE34C171C4B39AE0003A512 /* AirLogRegisterPressure.cpp */,
+                               0FE34C181C4B39AE0003A512 /* AirLogRegisterPressure.h */,
                                0F6183251C45BF070072450B /* AirLowerAfterRegAlloc.cpp */,
                                0F6183261C45BF070072450B /* AirLowerAfterRegAlloc.h */,
                                0F6183271C45BF070072450B /* AirLowerMacros.cpp */,
                                DC00039319D8BE6F00023EB0 /* DFGPreciseLocalClobberize.h in Headers */,
                                0FBE0F7516C1DB0B0082C5E8 /* DFGPredictionInjectionPhase.h in Headers */,
                                0FFFC95E14EF90B700C72532 /* DFGPredictionPropagationPhase.h in Headers */,
-                               0F4C91681C2B3D68004341A6 /* AirFixSpillSlotZDef.h in Headers */,
                                0F3E01AB19D353A500F61B7F /* DFGPrePostNumbering.h in Headers */,
                                0F2B9CED19D0BA7D00B1D1B5 /* DFGPromotedHeapLocation.h in Headers */,
                                0FFC92161B94FB3E0071DD66 /* DFGPropertyTypeKey.h in Headers */,
                                70B791991C024A29002481E2 /* GeneratorPrototype.h in Headers */,
                                0FC097A2146B28CC00CF2442 /* DFGThunks.h in Headers */,
                                0FD8A32817D51F5700CA2C40 /* DFGTierUpCheckInjectionPhase.h in Headers */,
+                               0FE34C1A1C4B39AE0003A512 /* AirLogRegisterPressure.h in Headers */,
                                0FD8A32A17D51F5700CA2C40 /* DFGToFTLDeferredCompilationCallback.h in Headers */,
                                0FD8A32C17D51F5700CA2C40 /* DFGToFTLForOSREntryDeferredCompilationCallback.h in Headers */,
                                FE3A06B41C10CB9300390FDD /* JITBitXorGenerator.h in Headers */,
                                0F3AC754188E5EC80032029F /* ExitingJITType.h in Headers */,
                                0FB105861675481200F8AB6E /* ExitKind.h in Headers */,
                                0F0B83AB14BCF5BB00885B4F /* ExpressionRangeInfo.h in Headers */,
+                               0F4DE1CF1C4C1B54004D6C11 /* AirFixObviousSpills.h in Headers */,
                                A7A8AF3817ADB5F3005AB174 /* Float32Array.h in Headers */,
                                A7A8AF3917ADB5F3005AB174 /* Float64Array.h in Headers */,
                                0F24E54317EA9F5900ABB217 /* FPRInfo.h in Headers */,
                                0FEC856D1BDACDC70080FF74 /* AirAllocateStack.cpp in Sources */,
                                43422A661C16267500E2EB98 /* B3ReduceDoubleToFloat.cpp in Sources */,
                                0FEC856F1BDACDC70080FF74 /* AirArg.cpp in Sources */,
+                               0F4DE1CE1C4C1B54004D6C11 /* AirFixObviousSpills.cpp in Sources */,
                                0FEC85711BDACDC70080FF74 /* AirBasicBlock.cpp in Sources */,
                                0FEC85731BDACDC70080FF74 /* AirCCallSpecial.cpp in Sources */,
                                0FEC85751BDACDC70080FF74 /* AirCode.cpp in Sources */,
                                A593CF821840377100BFCE27 /* InspectorValues.cpp in Sources */,
                                147F39CF107EC37600427A48 /* InternalFunction.cpp in Sources */,
                                1429D7D40ED2128200B89619 /* Interpreter.cpp in Sources */,
+                               0FE34C191C4B39AE0003A512 /* AirLogRegisterPressure.cpp in Sources */,
                                A1B9E2391B4E0D6700BC7FED /* IntlCollator.cpp in Sources */,
                                A1B9E23B1B4E0D6700BC7FED /* IntlCollatorConstructor.cpp in Sources */,
                                A1B9E23D1B4E0D6700BC7FED /* IntlCollatorPrototype.cpp in Sources */,
index 94dd346..c4aa1a8 100644 (file)
@@ -105,19 +105,14 @@ Inst CheckSpecial::hiddenBranch(const Inst& inst) const
     return hiddenBranch;
 }
 
-void CheckSpecial::commitHiddenBranch(Inst& original, Inst& hiddenBranch)
-{
-    ASSERT(hiddenBranch.args.size() == m_numCheckArgs);
-    ASSERT(hiddenBranch.opcode = m_checkOpcode);
-    for (unsigned i = 0; i < m_numCheckArgs; ++i)
-        original.args[i + 1] = hiddenBranch.args[i];
-}
-
 void CheckSpecial::forEachArg(Inst& inst, const ScopedLambda<Inst::EachArgCallback>& callback)
 {
     Inst hidden = hiddenBranch(inst);
-    hidden.forEachArg(callback);
-    commitHiddenBranch(inst, hidden);
+    hidden.forEachArg(
+        [&] (Arg& arg, Arg::Role role, Arg::Type type, Arg::Width width) {
+            unsigned index = &arg - &hidden.args[0];
+            callback(inst.args[1 + index], role, type, width);
+        });
     forEachArgImpl(numB3Args(inst), m_numCheckArgs + 1, inst, m_stackmapRole, callback);
 }
 
index c289895..61186be 100644 (file)
@@ -123,9 +123,6 @@ protected:
     // Constructs and returns the Inst representing the branch that this will use.
     Air::Inst hiddenBranch(const Air::Inst&) const;
 
-    // If we edited the hidden branch, this installs the edits into the given inst.
-    void commitHiddenBranch(Air::Inst& original, Air::Inst& hiddenBranch);
-
     void forEachArg(Air::Inst&, const ScopedLambda<Air::Inst::EachArgCallback>&) override;
     bool isValid(Air::Inst&) override;
     bool admitsStack(Air::Inst&, unsigned argIndex) override;
index 0f1f6da..623d34d 100644 (file)
@@ -871,6 +871,7 @@ private:
                 arg = tmp(value.value());
                 break;
             case ValueRep::Register:
+                stackmap->earlyClobbered().clear(value.rep().reg());
                 arg = Tmp(value.rep().reg());
                 append(Move, immOrTmp(value.value()), arg);
                 break;
@@ -1973,6 +1974,9 @@ private:
             }
             
             fillStackmap(inst, patchpointValue, 0);
+            
+            if (patchpointValue->resultConstraint.isReg())
+                patchpointValue->lateClobbered().clear(patchpointValue->resultConstraint.reg());
 
             for (unsigned i = patchpointValue->numGPScratchRegisters; i--;)
                 inst.args.append(m_code.newTmp(Arg::GP));
index 2ccdd1d..b8a3bc2 100644 (file)
@@ -84,6 +84,175 @@ namespace {
 
 bool verbose = false;
 
+class IntRange {
+public:
+    IntRange()
+    {
+    }
+
+    IntRange(int64_t min, int64_t max)
+        : m_min(min)
+        , m_max(max)
+    {
+    }
+
+    template<typename T>
+    static IntRange top()
+    {
+        return IntRange(std::numeric_limits<T>::min(), std::numeric_limits<T>::max());
+    }
+
+    static IntRange top(Type type)
+    {
+        switch (type) {
+        case Int32:
+            return top<int32_t>();
+        case Int64:
+            return top<int64_t>();
+        default:
+            RELEASE_ASSERT_NOT_REACHED();
+            return IntRange();
+        }
+    }
+
+    template<typename T>
+    static IntRange rangeForMask(T mask)
+    {
+        if (!(mask + 1))
+            return top<T>();
+        return IntRange(0, mask);
+    }
+
+    static IntRange rangeForMask(int64_t mask, Type type)
+    {
+        switch (type) {
+        case Int32:
+            return rangeForMask<int32_t>(static_cast<int32_t>(mask));
+        case Int64:
+            return rangeForMask<int64_t>(mask);
+        default:
+            RELEASE_ASSERT_NOT_REACHED();
+            return IntRange();
+        }
+    }
+
+    template<typename T>
+    static IntRange rangeForZShr(int32_t shiftAmount)
+    {
+        typename std::make_unsigned<T>::type mask = 0;
+        mask--;
+        mask >>= shiftAmount;
+        return rangeForMask<T>(static_cast<T>(mask));
+    }
+
+    static IntRange rangeForZShr(int32_t shiftAmount, Type type)
+    {
+        switch (type) {
+        case Int32:
+            return rangeForZShr<int32_t>(shiftAmount);
+        case Int64:
+            return rangeForZShr<int64_t>(shiftAmount);
+        default:
+            RELEASE_ASSERT_NOT_REACHED();
+            return IntRange();
+        }
+    }
+
+    template<typename T>
+    static IntRange rangeForSShr(int32_t shiftAmount)
+    {
+        return IntRange(top<T>().min() >> shiftAmount, top<T>().max() >> shiftAmount);
+    }
+
+    static IntRange rangeForSShr(int32_t shiftAmount, Type type)
+    {
+        switch (type) {
+        case Int32:
+            return rangeForSShr<int32_t>(shiftAmount);
+        case Int64:
+            return rangeForSShr<int64_t>(shiftAmount);
+        default:
+            RELEASE_ASSERT_NOT_REACHED();
+            return IntRange();
+        }
+    }
+
+    int64_t min() const { return m_min; }
+    int64_t max() const { return m_max; }
+
+    void dump(PrintStream& out) const
+    {
+        out.print("[", m_min, ",", m_max, "]");
+    }
+
+    template<typename T>
+    bool couldOverflowAdd(const IntRange& other)
+    {
+        return sumOverflows<T>(m_min, other.m_min)
+            || sumOverflows<T>(m_min, other.m_max)
+            || sumOverflows<T>(m_max, other.m_min)
+            || sumOverflows<T>(m_max, other.m_max);
+    }
+
+    bool couldOverflowAdd(const IntRange& other, Type type)
+    {
+        switch (type) {
+        case Int32:
+            return couldOverflowAdd<int32_t>(other);
+        case Int64:
+            return couldOverflowAdd<int64_t>(other);
+        default:
+            return true;
+        }
+    }
+
+    template<typename T>
+    bool couldOverflowSub(const IntRange& other)
+    {
+        return differenceOverflows<T>(m_min, other.m_min)
+            || differenceOverflows<T>(m_min, other.m_max)
+            || differenceOverflows<T>(m_max, other.m_min)
+            || differenceOverflows<T>(m_max, other.m_max);
+    }
+
+    bool couldOverflowSub(const IntRange& other, Type type)
+    {
+        switch (type) {
+        case Int32:
+            return couldOverflowSub<int32_t>(other);
+        case Int64:
+            return couldOverflowSub<int64_t>(other);
+        default:
+            return true;
+        }
+    }
+
+    template<typename T>
+    bool couldOverflowMul(const IntRange& other)
+    {
+        return productOverflows<T>(m_min, other.m_min)
+            || productOverflows<T>(m_min, other.m_max)
+            || productOverflows<T>(m_max, other.m_min)
+            || productOverflows<T>(m_max, other.m_max);
+    }
+
+    bool couldOverflowMul(const IntRange& other, Type type)
+    {
+        switch (type) {
+        case Int32:
+            return couldOverflowMul<int32_t>(other);
+        case Int64:
+            return couldOverflowMul<int64_t>(other);
+        default:
+            return true;
+        }
+    }
+
+private:
+    int64_t m_min { 0 };
+    int64_t m_max { 0 };
+};
+
 class ReduceStrength {
 public:
     ReduceStrength(Procedure& proc)
@@ -1091,7 +1260,7 @@ private:
                     m_value->child(1)->equalOrUnorderedConstant(m_value->child(0))));
             break;
 
-        case CheckAdd:
+        case CheckAdd: {
             if (replaceWithNewValue(m_value->child(0)->checkAddConstant(m_proc, m_value->child(1))))
                 break;
 
@@ -1102,9 +1271,18 @@ private:
                 m_changed = true;
                 break;
             }
+
+            IntRange leftRange = rangeFor(m_value->child(0));
+            IntRange rightRange = rangeFor(m_value->child(1));
+            if (!leftRange.couldOverflowAdd(rightRange, m_value->type())) {
+                replaceWithNewValue(
+                    m_proc.add<Value>(Add, m_value->origin(), m_value->child(0), m_value->child(1)));
+                break;
+            }
             break;
+        }
 
-        case CheckSub:
+        case CheckSub: {
             if (replaceWithNewValue(m_value->child(0)->checkSubConstant(m_proc, m_value->child(1))))
                 break;
 
@@ -1121,9 +1299,18 @@ private:
                 m_changed = true;
                 break;
             }
+
+            IntRange leftRange = rangeFor(m_value->child(0));
+            IntRange rightRange = rangeFor(m_value->child(1));
+            if (!leftRange.couldOverflowSub(rightRange, m_value->type())) {
+                replaceWithNewValue(
+                    m_proc.add<Value>(Sub, m_value->origin(), m_value->child(0), m_value->child(1)));
+                break;
+            }
             break;
+        }
 
-        case CheckMul:
+        case CheckMul: {
             if (replaceWithNewValue(m_value->child(0)->checkMulConstant(m_proc, m_value->child(1))))
                 break;
 
@@ -1151,7 +1338,16 @@ private:
                 if (modified)
                     break;
             }
+
+            IntRange leftRange = rangeFor(m_value->child(0));
+            IntRange rightRange = rangeFor(m_value->child(1));
+            if (!leftRange.couldOverflowMul(rightRange, m_value->type())) {
+                replaceWithNewValue(
+                    m_proc.add<Value>(Mul, m_value->origin(), m_value->child(0), m_value->child(1)));
+                break;
+            }
             break;
+        }
 
         case Check: {
             CheckValue* checkValue = m_value->as<CheckValue>();
@@ -1294,6 +1490,37 @@ private:
         }
     }
 
+    IntRange rangeFor(Value* value)
+    {
+        switch (value->opcode()) {
+        case Const32:
+        case Const64: {
+            int64_t intValue = value->asInt();
+            return IntRange(intValue, intValue);
+        }
+
+        case BitAnd:
+            if (value->child(1)->hasInt())
+                return IntRange::rangeForMask(value->child(1)->asInt(), value->type());
+            break;
+
+        case SShr:
+            if (value->child(1)->hasInt32())
+                return IntRange::rangeForSShr(value->child(1)->asInt32(), value->type());
+            break;
+
+        case ZShr:
+            if (value->child(1)->hasInt32())
+                return IntRange::rangeForZShr(value->child(1)->asInt32(), value->type());
+            break;
+
+        default:
+            break;
+        }
+
+        return IntRange::top(value->type());
+    }
+
     template<typename ValueType, typename... Arguments>
     void replaceWithNew(Arguments... arguments)
     {
index 350c329..c8c84b2 100644 (file)
@@ -187,6 +187,8 @@ public:
         clobberLate(set);
     }
 
+    RegisterSet& earlyClobbered() { return m_earlyClobbered; }
+    RegisterSet& lateClobbered() { return m_lateClobbered; }
     const RegisterSet& earlyClobbered() const { return m_earlyClobbered; }
     const RegisterSet& lateClobbered() const { return m_lateClobbered; }
 
index d90f55d..6713a29 100644 (file)
@@ -29,6 +29,7 @@
 #if ENABLE(B3_JIT)
 
 #include "AirCode.h"
+#include "AirInsertionSet.h"
 #include "AirInstInlines.h"
 #include "AirLiveness.h"
 #include "AirPhaseScope.h"
@@ -166,10 +167,49 @@ void allocateStack(Code& code)
         for (unsigned instIndex = block->size(); instIndex--;) {
             if (verbose)
                 dataLog("Analyzing: ", block->at(instIndex), "\n");
+
+            // Kill dead stores. For simplicity we say that a store is killable if it has only late
+            // defs and those late defs are to things that are dead right now. We only do that
+            // because that's the only kind of dead stack store we will see here.
+            Inst& inst = block->at(instIndex);
+            if (!inst.hasNonArgEffects()) {
+                bool ok = true;
+                inst.forEachArg(
+                    [&] (Arg& arg, Arg::Role role, Arg::Type, Arg::Width) {
+                        if (Arg::isEarlyDef(role)) {
+                            ok = false;
+                            return;
+                        }
+                        if (!Arg::isLateDef(role))
+                            return;
+                        if (!arg.isStack()) {
+                            ok = false;
+                            return;
+                        }
+                        StackSlot* slot = arg.stackSlot();
+                        if (slot->kind() != StackSlotKind::Anonymous) {
+                            ok = false;
+                            return;
+                        }
+
+                        if (localCalc.isLive(slot)) {
+                            ok = false;
+                            return;
+                        }
+                    });
+                if (ok)
+                    inst = Inst();
+            }
+            
             interfere(instIndex);
             localCalc.execute(instIndex);
         }
         interfere(-1);
+        
+        block->insts().removeAllMatching(
+            [&] (const Inst& inst) -> bool {
+                return !inst;
+            });
     }
 
     if (verbose) {
@@ -232,33 +272,43 @@ void allocateStack(Code& code)
     // We would have to scavenge for temporaries if this happened. Fortunately, this case will be
     // extremely rare so we can do crazy things when it arises.
     // https://bugs.webkit.org/show_bug.cgi?id=152530
-    
+
+    InsertionSet insertionSet(code);
     for (BasicBlock* block : code) {
-        for (Inst& inst : *block) {
+        for (unsigned instIndex = 0; instIndex < block->size(); ++instIndex) {
+            Inst& inst = block->at(instIndex);
             inst.forEachArg(
-                [&] (Arg& arg, Arg::Role, Arg::Type, Arg::Width width)
-                {
+                [&] (Arg& arg, Arg::Role role, Arg::Type, Arg::Width width) {
+                    auto stackAddr = [&] (int32_t offset) -> Arg {
+                        return Arg::stackAddr(offset, code.frameSize(), width);
+                    };
+                    
                     switch (arg.kind()) {
                     case Arg::Stack: {
-                        arg = Arg::addr(
-                            Tmp(GPRInfo::callFrameRegister),
-                            arg.offset() + arg.stackSlot()->offsetFromFP());
-                        if (!arg.isValidForm(width)) {
-                            arg = Arg::addr(
-                                Tmp(MacroAssembler::stackPointerRegister),
-                                arg.offset() + code.frameSize());
+                        StackSlot* slot = arg.stackSlot();
+                        if (Arg::isZDef(role)
+                            && slot->kind() == StackSlotKind::Anonymous
+                            && slot->byteSize() > Arg::bytes(width)) {
+                            // Currently we only handle this simple case because it's the only one
+                            // that arises: ZDef's are only 32-bit right now. So, when we hit these
+                            // assertions it means that we need to implement those other kinds of
+                            // zero fills.
+                            RELEASE_ASSERT(slot->byteSize() == 8);
+                            RELEASE_ASSERT(width == Arg::Width32);
+
+                            // We rely on the fact that there must be some way to move zero to a
+                            // memory location without first burning a register. On ARM, we would do
+                            // this using zr.
+                            RELEASE_ASSERT(isValidForm(Move32, Arg::Imm, Arg::Addr));
+                            insertionSet.insert(
+                                instIndex + 1, Move32, inst.origin, Arg::imm(0),
+                                stackAddr(arg.offset() + 4 + slot->offsetFromFP()));
                         }
+                        arg = stackAddr(arg.offset() + slot->offsetFromFP());
                         break;
                     }
                     case Arg::CallArg:
-                        arg = Arg::addr(
-                            Tmp(GPRInfo::callFrameRegister),
-                            arg.offset() - code.frameSize());
-                        if (!arg.isValidForm(width)) {
-                            arg = Arg::addr(
-                                Tmp(MacroAssembler::stackPointerRegister),
-                                arg.offset() + code.frameSize());
-                        }
+                        arg = stackAddr(arg.offset() - code.frameSize());
                         break;
                     default:
                         break;
@@ -266,6 +316,7 @@ void allocateStack(Code& code)
                 }
             );
         }
+        insertionSet.execute(block);
     }
 }
 
index f73785b..74d347e 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -33,7 +33,9 @@ namespace JSC { namespace B3 { namespace Air {
 class Code;
 
 // This allocates StackSlots to places on the stack. It first allocates the pinned ones in index
-// order and then it allocates the rest using first fit.
+// order and then it allocates the rest using first fit. Takes the opportunity to kill dead
+// assignments to stack slots, since it knows which ones are live. Also fixes ZDefs to anonymous
+// stack slots.
 
 void allocateStack(Code&);
 
index 22d802a..4216161 100644 (file)
@@ -126,6 +126,7 @@ public:
         // depending on the kind of the argument:
         //
         // For register: the upper bits are zero-filled.
+        // For anonymous stack slot: the upper bits are zero-filled.
         // For address: the upper bits are not touched (i.e. we do a 32-bit store in our example).
         // For tmp: either the upper bits are not touched or they are zero-filled, and we won't know
         // which until we lower the tmp to either a StackSlot or a Reg.
@@ -513,6 +514,17 @@ public:
         return result;
     }
 
+    static Arg stackAddr(int32_t offsetFromFP, unsigned frameSize, Width width)
+    {
+        Arg result = Arg::addr(Air::Tmp(GPRInfo::callFrameRegister), offsetFromFP);
+        if (!result.isValidForm(width)) {
+            result = Arg::addr(
+                Air::Tmp(MacroAssembler::stackPointerRegister),
+                offsetFromFP + frameSize);
+        }
+        return result;
+    }
+
     // If you don't pass a Width, this optimistically assumes that you're using the right width.
     static bool isValidScale(unsigned scale, Optional<Width> width = Nullopt)
     {
index 97790c8..fa3ad8e 100644 (file)
@@ -57,11 +57,21 @@ void BasicBlock::dump(PrintStream& out) const
 
 void BasicBlock::deepDump(PrintStream& out) const
 {
+    dumpHeader(out);
+    for (const Inst& inst : *this)
+        out.print("    ", inst, "\n");
+    dumpFooter(out);
+}
+
+void BasicBlock::dumpHeader(PrintStream& out) const
+{
     out.print("BB", *this, ": ; frequency = ", m_frequency, "\n");
     if (predecessors().size())
         out.print("  Predecessors: ", pointerListDump(predecessors()), "\n");
-    for (const Inst& inst : *this)
-        out.print("    ", inst, "\n");
+}
+
+void BasicBlock::dumpFooter(PrintStream& out) const
+{
     if (successors().size())
         out.print("  Successors: ", listDump(successors()), "\n");
 }
index e7fed3b..0f56e0b 100644 (file)
@@ -36,6 +36,7 @@
 
 namespace JSC { namespace B3 { namespace Air {
 
+class BlockInsertionSet;
 class Code;
 class InsertionSet;
 
@@ -126,7 +127,11 @@ public:
     void dump(PrintStream&) const;
     void deepDump(PrintStream&) const;
 
+    void dumpHeader(PrintStream&) const;
+    void dumpFooter(PrintStream&) const;
+
 private:
+    friend class BlockInsertionSet;
     friend class Code;
     friend class InsertionSet;
     
index e06db12..c7c2273 100644 (file)
@@ -36,6 +36,9 @@ CCallSpecial::CCallSpecial()
     m_clobberedRegs.exclude(RegisterSet::stackRegisters());
     m_clobberedRegs.exclude(RegisterSet::reservedHardwareRegisters());
     m_clobberedRegs.exclude(RegisterSet::calleeSaveRegisters());
+    m_clobberedRegs.clear(GPRInfo::returnValueGPR);
+    m_clobberedRegs.clear(GPRInfo::returnValueGPR2);
+    m_clobberedRegs.clear(FPRInfo::returnValueFPR);
 }
 
 CCallSpecial::~CCallSpecial()
index 694a9a6..ebb15e9 100644 (file)
@@ -46,6 +46,7 @@ class Procedure;
 
 namespace Air {
 
+class BlockInsertionSet;
 class CCallSpecial;
 
 // This is an IR that is very close to the bare metal. It requires about 40x more bytes than the
@@ -324,6 +325,7 @@ public:
 
 private:
     friend class ::JSC::B3::Procedure;
+    friend class BlockInsertionSet;
     
     Code(Procedure&);
 
index 220e495..f6eb5f3 100644 (file)
 
 #if ENABLE(B3_JIT)
 
+#include "AirInstInlines.h"
 #include "B3CCallValue.h"
 #include "B3ValueInlines.h"
 
 namespace JSC { namespace B3 { namespace Air {
 
+bool PatchCustom::isValidForm(Inst& inst)
+{
+    if (inst.args.size() < 1)
+        return false;
+    if (!inst.args[0].isSpecial())
+        return false;
+    if (!inst.args[0].special()->isValid(inst))
+        return false;
+    RegisterSet clobberedEarly = inst.extraEarlyClobberedRegs();
+    RegisterSet clobberedLate = inst.extraClobberedRegs();
+    bool ok = true;
+    inst.forEachTmp(
+        [&] (Tmp& tmp, Arg::Role role, Arg::Type, Arg::Width) {
+            if (!tmp.isReg())
+                return;
+            if (Arg::isLateDef(role) || Arg::isLateUse(role))
+                ok &= !clobberedLate.get(tmp.reg());
+            else
+                ok &= !clobberedEarly.get(tmp.reg());
+        });
+    return ok;
+}
+
 bool CCallCustom::isValidForm(Inst& inst)
 {
     CCallValue* value = inst.origin->as<CCallValue>();
index 25824e2..d7e9c9c 100644 (file)
@@ -71,14 +71,7 @@ struct PatchCustom {
         return false;
     }
 
-    static bool isValidForm(Inst& inst)
-    {
-        if (inst.args.size() < 1)
-            return false;
-        if (!inst.args[0].isSpecial())
-            return false;
-        return inst.args[0].special()->isValid(inst);
-    }
+    static bool isValidForm(Inst& inst);
 
     static bool admitsStack(Inst& inst, unsigned argIndex)
     {
index 84b9b75..32567b6 100644 (file)
@@ -70,6 +70,14 @@ void ShufflePair::dump(PrintStream& out) const
     out.print(width(), ":", src(), "=>", dst());
 }
 
+Inst createShuffle(Value* origin, const Vector<ShufflePair>& pairs)
+{
+    Inst result(Shuffle, origin);
+    for (const ShufflePair& pair : pairs)
+        result.append(pair.src(), pair.dst(), Arg::widthArg(pair.width()));
+    return result;
+}
+
 Vector<Inst> emitShuffle(
     Vector<ShufflePair> pairs, std::array<Arg, 2> scratches, Arg::Type type, Value* origin)
 {
index 1273702..f4ac0d2 100644 (file)
@@ -66,6 +66,9 @@ private:
     Arg::Width m_width { Arg::Width8 };
 };
 
+// Create a Shuffle instruction.
+Inst createShuffle(Value* origin, const Vector<ShufflePair>&);
+
 // Perform a shuffle of a given type. The scratch argument is mandatory. You should pass it as
 // follows: If you know that you have scratch registers or temporaries available - that is, they're
 // registers that are not mentioned in the shuffle, have the same type as the shuffle, and are not
diff --git a/Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp b/Source/JavaScriptCore/b3/air/AirFixObviousSpills.cpp
new file mode 100644 (file)
index 0000000..423fb56
--- /dev/null
@@ -0,0 +1,515 @@
+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "config.h"
+#include "AirFixObviousSpills.h"
+
+#if ENABLE(B3_JIT)
+
+#include "AirCode.h"
+#include "AirInstInlines.h"
+#include "AirPhaseScope.h"
+#include "B3IndexMap.h"
+#include <wtf/ListDump.h>
+
+namespace JSC { namespace B3 { namespace Air {
+
+namespace {
+
+bool verbose = false;
+
+class FixObviousSpills {
+public:
+    FixObviousSpills(Code& code)
+        : m_code(code)
+        , m_atHead(code.size())
+    {
+    }
+
+    void run()
+    {
+        if (verbose)
+            dataLog("Code before fixObviousSpills:\n", m_code);
+        
+        computeAliases();
+        fixCode();
+    }
+
+private:
+    void computeAliases()
+    {
+        m_atHead[m_code[0]].wasVisited = true;
+        
+        bool changed = true;
+        while (changed) {
+            changed = false;
+            
+            for (BasicBlock* block : m_code) {
+                m_block = block;
+                m_state = m_atHead[block];
+                if (!m_state.wasVisited)
+                    continue;
+
+                if (verbose)
+                    dataLog("Executing block ", *m_block, ": ", m_state, "\n");
+                
+                for (m_instIndex = 0; m_instIndex < block->size(); ++m_instIndex)
+                    executeInst();
+
+                for (BasicBlock* successor : block->successorBlocks()) {
+                    State& toState = m_atHead[successor];
+                    if (toState.wasVisited)
+                        changed |= toState.merge(m_state);
+                    else {
+                        toState = m_state;
+                        changed = true;
+                    }
+                }
+            }
+        }
+    }
+
+    void fixCode()
+    {
+        for (BasicBlock* block : m_code) {
+            m_block = block;
+            m_state = m_atHead[block];
+            RELEASE_ASSERT(m_state.wasVisited);
+
+            for (m_instIndex = 0; m_instIndex < block->size(); ++m_instIndex) {
+                fixInst();
+                executeInst();
+            }
+        }
+    }
+
+    void executeInst()
+    {
+        Inst& inst = m_block->at(m_instIndex);
+
+        if (verbose)
+            dataLog("    Executing ", inst, ": ", m_state, "\n");
+
+        Inst::forEachDefWithExtraClobberedRegs<Arg>(
+            &inst, &inst,
+            [&] (const Arg& arg, Arg::Role, Arg::Type, Arg::Width) {
+                if (verbose)
+                    dataLog("        Clobbering ", arg, "\n");
+                m_state.clobber(arg);
+            });
+
+        // FIXME: This needs a story for floats and doubles.
+        // https://bugs.webkit.org/show_bug.cgi?id=153197
+
+        switch (inst.opcode) {
+        case Move:
+            if (inst.args[0].isSomeImm()) {
+                if (inst.args[1].isReg())
+                    m_state.addAlias(RegConst(inst.args[1].reg(), inst.args[0].value()));
+                else if (isSpillSlot(inst.args[1]))
+                    m_state.addAlias(SlotConst(inst.args[1].stackSlot(), inst.args[0].value()));
+            } else if (isSpillSlot(inst.args[0]) && inst.args[1].isReg()) {
+                if (Optional<int64_t> constant = m_state.constantFor(inst.args[0]))
+                    m_state.addAlias(RegConst(inst.args[1].reg(), *constant));
+                m_state.addAlias(
+                    RegSlot(inst.args[1].reg(), inst.args[0].stackSlot(), RegSlot::AllBits));
+            } else if (inst.args[0].isReg() && isSpillSlot(inst.args[1])) {
+                if (Optional<int64_t> constant = m_state.constantFor(inst.args[0]))
+                    m_state.addAlias(SlotConst(inst.args[1].stackSlot(), *constant));
+                m_state.addAlias(
+                    RegSlot(inst.args[0].reg(), inst.args[1].stackSlot(), RegSlot::AllBits));
+            }
+            break;
+
+        case Move32:
+            if (inst.args[0].isSomeImm()) {
+                if (inst.args[1].isReg())
+                    m_state.addAlias(RegConst(inst.args[1].reg(), static_cast<uint32_t>(inst.args[0].value())));
+                else if (isSpillSlot(inst.args[1]))
+                    m_state.addAlias(SlotConst(inst.args[1].stackSlot(), static_cast<uint32_t>(inst.args[0].value())));
+            } else if (isSpillSlot(inst.args[0]) && inst.args[1].isReg()) {
+                if (Optional<int64_t> constant = m_state.constantFor(inst.args[0]))
+                    m_state.addAlias(RegConst(inst.args[1].reg(), static_cast<uint32_t>(*constant)));
+                m_state.addAlias(
+                    RegSlot(inst.args[1].reg(), inst.args[0].stackSlot(), RegSlot::ZExt32));
+            } else if (inst.args[0].isReg() && isSpillSlot(inst.args[1])) {
+                if (Optional<int64_t> constant = m_state.constantFor(inst.args[0]))
+                    m_state.addAlias(SlotConst(inst.args[1].stackSlot(), static_cast<int32_t>(*constant)));
+                m_state.addAlias(
+                    RegSlot(inst.args[0].reg(), inst.args[1].stackSlot(), RegSlot::Match32));
+            }
+            break;
+
+        default:
+            break;
+        }
+    }
+
+    void fixInst()
+    {
+        Inst& inst = m_block->at(m_instIndex);
+
+        if (verbose)
+            dataLog("Fixing inst ", inst, ": ", m_state, "\n");
+
+        // Create a copy in case we invalidate the instruction. That doesn't happen often.
+        Inst instCopy = inst;
+
+        // The goal is to replace references to stack slots. We only care about early uses. We can't
+        // handle UseDefs. We could teach this to handle UseDefs if we inserted a store instruction
+        // after and we proved that the register aliased to the stack slot dies here. We can get that
+        // information from the liveness analysis. We also can't handle late uses, because we don't
+        // look at late clobbers when doing this.
+        bool didThings = false;
+        auto handleArg = [&] (Arg& arg, Arg::Role role, Arg::Type, Arg::Width width) {
+            if (!isSpillSlot(arg))
+                return;
+            if (!Arg::isEarlyUse(role))
+                return;
+            if (Arg::isAnyDef(role))
+                return;
+            
+            // Try to get a register if at all possible.
+            if (const RegSlot* alias = m_state.getRegSlot(arg.stackSlot())) {
+                switch (width) {
+                case Arg::Width64:
+                    if (alias->mode != RegSlot::AllBits)
+                        return;
+                    if (verbose)
+                        dataLog("    Replacing ", arg, " with ", alias->reg, "\n");
+                    arg = Tmp(alias->reg);
+                    didThings = true;
+                    return;
+                case Arg::Width32:
+                    if (verbose)
+                        dataLog("    Replacing ", arg, " with ", alias->reg, " (subwidth case)\n");
+                    arg = Tmp(alias->reg);
+                    didThings = true;
+                    return;
+                default:
+                    return;
+                }
+            }
+
+            // Revert to immediate if that didn't work.
+            if (const SlotConst* alias = m_state.getSlotConst(arg.stackSlot())) {
+                if (verbose)
+                    dataLog("    Replacing ", arg, " with constant ", alias->constant, "\n");
+                if (Arg::isValidImmForm(alias->constant))
+                    arg = Arg::imm(alias->constant);
+                else
+                    arg = Arg::imm64(alias->constant);
+                didThings = true;
+                return;
+            }
+        };
+        
+        inst.forEachArg(handleArg);
+        if (!didThings || inst.isValidForm())
+            return;
+        
+        // We introduced something invalid along the way. Back up and carefully handle each argument.
+        inst = instCopy;
+        ASSERT(inst.isValidForm());
+        inst.forEachArg(
+            [&] (Arg& arg, Arg::Role role, Arg::Type type, Arg::Width width) {
+                Arg argCopy = arg;
+                handleArg(arg, role, type, width);
+                if (!inst.isValidForm())
+                    arg = argCopy;
+            });
+    }
+    
+    static bool isSpillSlot(const Arg& arg)
+    {
+        return arg.isStack() && !arg.stackSlot()->isLocked();
+    }
+    
+    struct RegConst {
+        RegConst()
+        {
+        }
+        
+        RegConst(Reg reg, int64_t constant)
+            : reg(reg)
+            , constant(constant)
+        {
+        }
+
+        explicit operator bool() const
+        {
+            return !!reg;
+        }
+
+        void dump(PrintStream& out) const
+        {
+            out.print(reg, "->", constant);
+        }
+        
+        Reg reg;
+        int64_t constant { 0 };
+    };
+
+    struct RegSlot {
+        enum Mode : int8_t {
+            AllBits,
+            ZExt32, // Register contains zero-extended contents of stack slot.
+            Match32 // Low 32 bits of register match low 32 bits of stack slot.
+        };
+        
+        RegSlot()
+        {
+        }
+
+        RegSlot(Reg reg, StackSlot* slot, Mode mode)
+            : slot(slot)
+            , reg(reg)
+            , mode(mode)
+        {
+        }
+
+        explicit operator bool() const
+        {
+            return slot && reg;
+        }
+
+        void dump(PrintStream& out) const
+        {
+            out.print(pointerDump(slot), "->", reg);
+            switch (mode) {
+            case AllBits:
+                out.print("(AllBits)");
+                break;
+            case ZExt32:
+                out.print("(ZExt32)");
+                break;
+            case Match32:
+                out.print("(Match32)");
+                break;
+            }
+        }
+        
+        StackSlot* slot { nullptr };
+        Reg reg;
+        Mode mode { AllBits };
+    };
+
+    struct SlotConst {
+        SlotConst()
+        {
+        }
+
+        SlotConst(StackSlot* slot, int64_t constant)
+            : slot(slot)
+            , constant(constant)
+        {
+        }
+
+        explicit operator bool() const
+        {
+            return slot;
+        }
+
+        void dump(PrintStream& out) const
+        {
+            out.print(pointerDump(slot), "->", constant);
+        }
+        
+        StackSlot* slot { nullptr };
+        int64_t constant { 0 };
+    };
+
+    struct State {
+        void addAlias(const RegConst& newAlias)
+        {
+            regConst.append(newAlias);
+        }
+        void addAlias(const RegSlot& newAlias)
+        {
+            regSlot.append(newAlias);
+        }
+        void addAlias(const SlotConst& newAlias)
+        {
+            slotConst.append(newAlias);
+        }
+
+        const RegConst* getRegConst(Reg reg) const
+        {
+            for (const RegConst& alias : regConst) {
+                if (alias.reg == reg)
+                    return &alias;
+            }
+            return nullptr;
+        }
+
+        const RegSlot* getRegSlot(Reg reg) const
+        {
+            for (const RegSlot& alias : regSlot) {
+                if (alias.reg == reg)
+                    return &alias;
+            }
+            return nullptr;
+        }
+
+        const RegSlot* getRegSlot(StackSlot* slot) const
+        {
+            for (const RegSlot& alias : regSlot) {
+                if (alias.slot == slot)
+                    return &alias;
+            }
+            return nullptr;
+        }
+
+        const RegSlot* getRegSlot(Reg reg, StackSlot* slot) const
+        {
+            for (const RegSlot& alias : regSlot) {
+                if (alias.reg == reg && alias.slot == slot)
+                    return &alias;
+            }
+            return nullptr;
+        }
+
+        const SlotConst* getSlotConst(StackSlot* slot) const
+        {
+            for (const SlotConst& alias : slotConst) {
+                if (alias.slot == slot)
+                    return &alias;
+            }
+            return nullptr;
+        }
+
+        Optional<int64_t> constantFor(const Arg& arg)
+        {
+            if (arg.isReg()) {
+                if (const RegConst* alias = getRegConst(arg.reg()))
+                    return alias->constant;
+                return Nullopt;
+            }
+            if (arg.isStack()) {
+                if (const SlotConst* alias = getSlotConst(arg.stackSlot()))
+                    return alias->constant;
+                return Nullopt;
+            }
+            return Nullopt;
+        }
+
+        void clobber(const Arg& arg)
+        {
+            if (arg.isReg()) {
+                regConst.removeAllMatching(
+                    [&] (const RegConst& alias) -> bool {
+                        return alias.reg == arg.reg();
+                    });
+                regSlot.removeAllMatching(
+                    [&] (const RegSlot& alias) -> bool {
+                        return alias.reg == arg.reg();
+                    });
+                return;
+            }
+            if (arg.isStack()) {
+                slotConst.removeAllMatching(
+                    [&] (const SlotConst& alias) -> bool {
+                        return alias.slot == arg.stackSlot();
+                    });
+                regSlot.removeAllMatching(
+                    [&] (const RegSlot& alias) -> bool {
+                        return alias.slot == arg.stackSlot();
+                    });
+            }
+        }
+
+        bool merge(const State& other)
+        {
+            bool changed = false;
+            
+            changed |= !!regConst.removeAllMatching(
+                [&] (RegConst& alias) -> bool {
+                    const RegConst* otherAlias = other.getRegConst(alias.reg);
+                    if (!otherAlias)
+                        return true;
+                    if (alias.constant != otherAlias->constant)
+                        return true;
+                    return false;
+                });
+
+            changed |= !!slotConst.removeAllMatching(
+                [&] (SlotConst& alias) -> bool {
+                    const SlotConst* otherAlias = other.getSlotConst(alias.slot);
+                    if (!otherAlias)
+                        return true;
+                    if (alias.constant != otherAlias->constant)
+                        return true;
+                    return false;
+                });
+
+            changed |= !!regSlot.removeAllMatching(
+                [&] (RegSlot& alias) -> bool {
+                    const RegSlot* otherAlias = other.getRegSlot(alias.reg, alias.slot);
+                    if (!otherAlias)
+                        return true;
+                    if (alias.mode != RegSlot::Match32 && alias.mode != otherAlias->mode) {
+                        alias.mode = RegSlot::Match32;
+                        changed = true;
+                    }
+                    return false;
+                });
+
+            return changed;
+        }
+
+        void dump(PrintStream& out) const
+        {
+            out.print(
+                "{regConst = [", listDump(regConst), "], slotConst = [", listDump(slotConst),
+                "], regSlot = [", listDump(regSlot), "], wasVisited = ", wasVisited, "}");
+        }
+
+        Vector<RegConst> regConst;
+        Vector<SlotConst> slotConst;
+        Vector<RegSlot> regSlot;
+        bool wasVisited { false };
+    };
+
+    Code& m_code;
+    IndexMap<BasicBlock, State> m_atHead;
+    State m_state;
+    BasicBlock* m_block { nullptr };
+    unsigned m_instIndex { 0 };
+};
+
+} // anonymous namespace
+
+void fixObviousSpills(Code& code)
+{
+    PhaseScope phaseScope(code, "fixObviousSpills");
+
+    FixObviousSpills fixObviousSpills(code);
+    fixObviousSpills.run();
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
diff --git a/Source/JavaScriptCore/b3/air/AirFixObviousSpills.h b/Source/JavaScriptCore/b3/air/AirFixObviousSpills.h
new file mode 100644 (file)
index 0000000..92b1d2a
--- /dev/null
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef AirFixObviousSpills_h
+#define AirFixObviousSpills_h
+
+#if ENABLE(B3_JIT)
+
+namespace JSC { namespace B3 { namespace Air {
+
+class Code;
+
+// This is a forward flow phase that tracks equivalence between spills slots and registers. It
+// removes loads from spill slots in cases when the contents of the spill slot can be found in (or
+// computed from) a register.
+void fixObviousSpills(Code&);
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
+#endif // AirFixObviousSpills_h
+
diff --git a/Source/JavaScriptCore/b3/air/AirFixSpillSlotZDef.h b/Source/JavaScriptCore/b3/air/AirFixSpillSlotZDef.h
deleted file mode 100644 (file)
index 2012d89..0000000
+++ /dev/null
@@ -1,78 +0,0 @@
-/*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- * 1. Redistributions of source code must retain the above copyright
- *    notice, this list of conditions and the following disclaimer.
- * 2. Redistributions in binary form must reproduce the above copyright
- *    notice, this list of conditions and the following disclaimer in the
- *    documentation and/or other materials provided with the distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
- * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
- * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
- * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
- * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
- * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
- * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
- * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
- */
-
-#ifndef AirFixSpillSlotZDef_h
-#define AirFixSpillSlotZDef_h
-
-#include "AirCode.h"
-#include "AirInsertionSet.h"
-#include "AirInstInlines.h"
-
-namespace JSC { namespace B3 { namespace Air {
-
-template<typename IsSpillSlot>
-void fixSpillSlotZDef(Code& code, const IsSpillSlot& isSpillSlot)
-{
-    // We could have introduced ZDef's to StackSlots that are wider than the def. In that case, we
-    // need to emit code to zero-fill the top bits of the StackSlot.
-    InsertionSet insertionSet(code);
-    for (BasicBlock* block : code) {
-        for (unsigned instIndex = 0; instIndex < block->size(); ++instIndex) {
-            Inst& inst = block->at(instIndex);
-
-            inst.forEachArg(
-                [&] (Arg& arg, Arg::Role role, Arg::Type, Arg::Width width) {
-                    if (!Arg::isZDef(role))
-                        return;
-                    if (!arg.isStack())
-                        return;
-                    if (!isSpillSlot(arg.stackSlot()))
-                        return;
-                    if (arg.stackSlot()->byteSize() == Arg::bytes(width))
-                        return;
-
-                    // Currently we only handle this simple case because it's the only one that
-                    // arises: ZDef's are only 32-bit right now. So, when we hit these
-                    // assertions it means that we need to implement those other kinds of zero
-                    // fills.
-                    RELEASE_ASSERT(arg.stackSlot()->byteSize() == 8);
-                    RELEASE_ASSERT(width == Arg::Width32);
-
-                    // We rely on the fact that there must be some way to move zero to a memory
-                    // location without first burning a register. On ARM, we would do this using
-                    // zr.
-                    RELEASE_ASSERT(isValidForm(Move32, Arg::Imm, Arg::Stack));
-                    insertionSet.insert(
-                        instIndex + 1, Move32, inst.origin, Arg::imm(0), arg.withOffset(4));
-                });
-        }
-        insertionSet.execute(block);
-    }
-}
-
-} } } // namespace JSC::B3::Air
-
-#endif // AirFixSpillSlotZDef_h
-
index 8e33c7f..ce7302a 100644 (file)
 #include "AirAllocateStack.h"
 #include "AirCode.h"
 #include "AirEliminateDeadCode.h"
+#include "AirFixObviousSpills.h"
 #include "AirFixPartialRegisterStalls.h"
 #include "AirGenerationContext.h"
 #include "AirHandleCalleeSaves.h"
 #include "AirIteratedRegisterCoalescing.h"
+#include "AirLogRegisterPressure.h"
 #include "AirLowerAfterRegAlloc.h"
 #include "AirLowerMacros.h"
 #include "AirOpcodeUtils.h"
@@ -84,6 +86,16 @@ void prepareForGeneration(Code& code)
     else
         iteratedRegisterCoalescing(code);
 
+    if (Options::logAirRegisterPressure()) {
+        dataLog("Register pressure after register allocation:\n");
+        logRegisterPressure(code);
+    }
+
+    // This replaces uses of spill slots with registers or constants if possible. It does this by
+    // minimizing the amount that we perturb the already-chosen register allocation. It may extend
+    // the live ranges of registers though.
+    fixObviousSpills(code);
+
     lowerAfterRegAlloc(code);
 
     // Prior to this point the prologue and epilogue is implicit. This makes it explicit. It also
@@ -103,13 +115,15 @@ void prepareForGeneration(Code& code)
     // frequency successor is also the fall-through target.
     optimizeBlockOrder(code);
 
-    // Attempt to remove false dependencies between instructions created by partial register changes.
-    // This must be executed as late as possible as it depends on the instructions order and register use.
-    fixPartialRegisterStalls(code);
-
     // This is needed to satisfy a requirement of B3::StackmapValue.
     reportUsedRegisters(code);
 
+    // Attempt to remove false dependencies between instructions created by partial register changes.
+    // This must be executed as late as possible as it depends on the instructions order and register
+    // use. We _must_ run this after reportUsedRegisters(), since that kills variable assignments
+    // that seem dead. Luckily, this phase does not change register liveness, so that's OK.
+    fixPartialRegisterStalls(code);
+
     if (shouldValidateIR())
         validate(code);
 
@@ -132,6 +146,17 @@ void generate(Code& code, CCallHelpers& jit)
     if (code.frameSize())
         jit.addPtr(CCallHelpers::TrustedImm32(-code.frameSize()), MacroAssembler::stackPointerRegister);
 
+    auto argFor = [&] (const RegisterAtOffset& entry) -> CCallHelpers::Address {
+        return CCallHelpers::Address(GPRInfo::callFrameRegister, entry.offset());
+    };
+    
+    for (const RegisterAtOffset& entry : code.calleeSaveRegisters()) {
+        if (entry.reg().isGPR())
+            jit.storePtr(entry.reg().gpr(), argFor(entry));
+        else
+            jit.storeDouble(entry.reg().fpr(), argFor(entry));
+    }
+
     GenerationContext context;
     context.code = &code;
     IndexMap<BasicBlock, CCallHelpers::Label> blockLabels(code.size());
@@ -162,9 +187,15 @@ void generate(Code& code, CCallHelpers& jit)
         if (isReturn(block->last().opcode)) {
             // We currently don't represent the full prologue/epilogue in Air, so we need to
             // have this override.
-            if (code.frameSize())
+            if (code.frameSize()) {
+                for (const RegisterAtOffset& entry : code.calleeSaveRegisters()) {
+                    if (entry.reg().isGPR())
+                        jit.loadPtr(argFor(entry), entry.reg().gpr());
+                    else
+                        jit.loadDouble(argFor(entry), entry.reg().fpr());
+                }
                 jit.emitFunctionEpilogue();
-            else
+            else
                 jit.emitFunctionEpilogueWithEmptyFrame();
             jit.ret();
             continue;
index a5470fe..90b2267 100644 (file)
@@ -29,7 +29,6 @@
 #if ENABLE(B3_JIT)
 
 #include "AirCode.h"
-#include "AirInsertionSet.h"
 #include "AirInstInlines.h"
 #include "AirPhaseScope.h"
 
@@ -71,34 +70,6 @@ void handleCalleeSaves(Code& code)
     // This is a bit weird since we could have already pinned a different stack slot to this
     // area. Also, our runtime does not require us to pin the saves area. Maybe we shouldn't pin it?
     savesArea->setOffsetFromFP(-byteSize);
-
-    auto argFor = [&] (const RegisterAtOffset& entry) -> Arg {
-        return Arg::stack(savesArea, entry.offset() + byteSize);
-    };
-
-    InsertionSet insertionSet(code);
-    
-    // First insert saving code in the prologue.
-    for (const RegisterAtOffset& entry : code.calleeSaveRegisters()) {
-        insertionSet.insert(
-            0, entry.reg().isGPR() ? Move : MoveDouble, code[0]->at(0).origin,
-            Tmp(entry.reg()), argFor(entry));
-    }
-    insertionSet.execute(code[0]);
-
-    // Now insert restore code at epilogues.
-    for (BasicBlock* block : code) {
-        Inst& last = block->last();
-        if (!isReturn(last.opcode))
-            continue;
-
-        for (const RegisterAtOffset& entry : code.calleeSaveRegisters()) {
-            insertionSet.insert(
-                block->size() - 1, entry.reg().isGPR() ? Move : MoveDouble, last.origin,
-                argFor(entry), Tmp(entry.reg()));
-        }
-        insertionSet.execute(block);
-    }
 }
 
 } } } // namespace JSC::B3::Air
index 797c292..7190ee8 100644 (file)
@@ -162,6 +162,7 @@ public:
     //
     // This function is auto-generated by opcode_generator.rb.
     bool admitsStack(unsigned argIndex);
+    bool admitsStack(Arg&);
 
     // Returns true if this instruction can have any effects other than control flow or arguments.
     bool hasNonArgNonControlEffects();
index 1e2f3e6..5ca9f2b 100644 (file)
@@ -158,6 +158,11 @@ inline void Inst::reportUsedRegisters(const RegisterSet& usedRegisters)
     args[0].special()->reportUsedRegisters(*this, usedRegisters);
 }
 
+inline bool Inst::admitsStack(Arg& arg)
+{
+    return admitsStack(&arg - &args[0]);
+}
+
 inline bool isShiftValid(const Inst& inst)
 {
 #if CPU(X86) || CPU(X86_64)
index 3b6886f..7e24a0d 100644 (file)
@@ -29,7 +29,6 @@
 #if ENABLE(B3_JIT)
 
 #include "AirCode.h"
-#include "AirFixSpillSlotZDef.h"
 #include "AirInsertionSet.h"
 #include "AirInstInlines.h"
 #include "AirLiveness.h"
@@ -1124,11 +1123,11 @@ private:
                 // complete register allocation. So, we record this before starting.
                 bool mayBeCoalescable = allocator.mayBeCoalescable(inst);
 
-                // On X86_64, Move32 is cheaper if we know that it's equivalent to a Move. It's
+                // Move32 is cheaper if we know that it's equivalent to a Move. It's
                 // equivalent if the destination's high bits are not observable or if the source's high
                 // bits are all zero. Note that we don't have the opposite optimization for other
                 // architectures, which may prefer Move over Move32, because Move is canonical already.
-                if (type == Arg::GP && optimizeForX86_64() && inst.opcode == Move
+                if (type == Arg::GP && inst.opcode == Move
                     && inst.args[0].isTmp() && inst.args[1].isTmp()) {
                     if (m_tmpWidth.useWidth(inst.args[1].tmp()) <= Arg::Width32
                         || m_tmpWidth.defWidth(inst.args[0].tmp()) <= Arg::Width32)
@@ -1168,7 +1167,6 @@ private:
     void addSpillAndFill(const ColoringAllocator<type>& allocator, HashSet<unsigned>& unspillableTmps)
     {
         HashMap<Tmp, StackSlot*> stackSlots;
-        unsigned newStackSlotThreshold = m_code.stackSlots().size();
         for (Tmp tmp : allocator.spilledTmps()) {
             // All the spilled values become unspillable.
             unspillableTmps.add(AbsoluteTmpMapper<type>::absoluteIndex(tmp));
@@ -1202,16 +1200,19 @@ private:
                 }
 
                 // Try to replace the register use by memory use when possible.
-                for (unsigned i = 0; i < inst.args.size(); ++i) {
-                    Arg& arg = inst.args[i];
-                    if (arg.isTmp() && arg.type() == type && !arg.isReg()) {
-                        auto stackSlotEntry = stackSlots.find(arg.tmp());
-                        if (stackSlotEntry != stackSlots.end() && inst.admitsStack(i)) {
-                            arg = Arg::stack(stackSlotEntry->value);
-                            didSpill = true;
+                inst.forEachArg(
+                    [&] (Arg& arg, Arg::Role, Arg::Type argType, Arg::Width width) {
+                        if (arg.isTmp() && argType == type && !arg.isReg()) {
+                            auto stackSlotEntry = stackSlots.find(arg.tmp());
+                            if (stackSlotEntry != stackSlots.end()
+                                && inst.admitsStack(arg)) {
+                                stackSlotEntry->value->ensureSize(
+                                    forceMove32IfDidSpill ? 4 : Arg::bytes(width));
+                                arg = Arg::stack(stackSlotEntry->value);
+                                didSpill = true;
+                            }
                         }
-                    }
-                }
+                    });
 
                 if (didSpill && forceMove32IfDidSpill)
                     inst.opcode = Move32;
@@ -1262,12 +1263,6 @@ private:
                 });
             }
         }
-
-        fixSpillSlotZDef(
-            m_code,
-            [&] (StackSlot* stackSlot) -> bool {
-                return stackSlot->index() >= newStackSlotThreshold;
-            });
     }
 
     Code& m_code;
index fb59770..2ba3f9e 100644 (file)
@@ -29,6 +29,7 @@
 #if ENABLE(B3_JIT)
 
 #include "AirBasicBlock.h"
+#include "AirCode.h"
 #include "AirInstInlines.h"
 #include "AirTmpInlines.h"
 #include "B3IndexMap.h"
@@ -93,9 +94,11 @@ struct RegLivenessAdapter {
 };
 
 template<typename Adapter>
-class AbstractLiveness : private Adapter {
+class AbstractLiveness : public Adapter {
     struct Workset;
 public:
+    typedef typename Adapter::Thing Thing;
+    
     AbstractLiveness(Code& code)
         : Adapter(code)
         , m_workset(Adapter::maxIndex(code))
@@ -222,6 +225,11 @@ public:
 
             Iterator begin() const { return Iterator(m_liveness, m_liveness.m_workset.begin()); }
             Iterator end() const { return Iterator(m_liveness, m_liveness.m_workset.end()); }
+            
+            bool contains(const typename Adapter::Thing& thing) const
+            {
+                return m_liveness.m_workset.contains(Adapter::valueToIndex(thing));
+            }
 
         private:
             AbstractLiveness& m_liveness;
@@ -232,6 +240,11 @@ public:
             return Iterable(m_liveness);
         }
 
+        bool isLive(const typename Adapter::Thing& thing) const
+        {
+            return live().contains(thing);
+        }
+
         void execute(unsigned instIndex)
         {
             Inst& inst = m_block->at(instIndex);
@@ -277,6 +290,11 @@ public:
         BasicBlock* m_block;
     };
 
+    const Vector<unsigned>& rawLiveAtHead(BasicBlock* block)
+    {
+        return m_liveAtHead[block];
+    }
+
     template<typename UnderlyingIterable>
     class Iterable {
     public:
@@ -330,6 +348,11 @@ public:
         iterator begin() const { return iterator(m_liveness, m_iterable.begin()); }
         iterator end() const { return iterator(m_liveness, m_iterable.end()); }
 
+        bool contains(const typename Adapter::Thing& thing) const
+        {
+            return m_liveness.m_workset.contains(Adapter::valueToIndex(thing));
+        }
+
     private:
         AbstractLiveness& m_liveness;
         const UnderlyingIterable& m_iterable;
@@ -345,6 +368,8 @@ public:
         return Iterable<typename Adapter::IndexSet>(*this, m_liveAtTail[block]);
     }
 
+    IndexSparseSet<UnsafeVectorOverflow>& workset() { return m_workset; }
+
 private:
     friend class LocalCalc;
     friend struct LocalCalc::Iterable;
diff --git a/Source/JavaScriptCore/b3/air/AirLogRegisterPressure.cpp b/Source/JavaScriptCore/b3/air/AirLogRegisterPressure.cpp
new file mode 100644 (file)
index 0000000..ab323a1
--- /dev/null
@@ -0,0 +1,102 @@
+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "config.h"
+#include "AirLogRegisterPressure.h"
+
+#if ENABLE(B3_JIT)
+
+#include "AirCode.h"
+#include "AirInstInlines.h"
+#include "AirLiveness.h"
+
+namespace JSC { namespace B3 { namespace Air {
+
+void logRegisterPressure(Code& code)
+{
+    const unsigned totalColumns = 200;
+    const unsigned registerColumns = 100;
+    
+    RegLiveness liveness(code);
+
+    for (BasicBlock* block : code) {
+        RegLiveness::LocalCalc localCalc(liveness, block);
+
+        block->dumpHeader(WTF::dataFile());
+
+        Vector<CString> instDumps;
+        for (unsigned instIndex = block->size(); instIndex--;) {
+            Inst& inst = block->at(instIndex);
+            Inst* prevInst = block->get(instIndex - 1);
+
+            localCalc.execute(instIndex);
+
+            RegisterSet set;
+            set.setAll(localCalc.live());
+            Inst::forEachDefWithExtraClobberedRegs<Reg>(
+                prevInst, &inst,
+                [&] (Reg reg, Arg::Role, Arg::Type, Arg::Width) {
+                    set.set(reg);
+                });
+
+            StringPrintStream instOut;
+            StringPrintStream lineOut;
+            lineOut.print("   ");
+            if (set.numberOfSetRegisters()) {
+                set.forEach(
+                    [&] (Reg reg) {
+                        CString text = toCString(" ", reg);
+                        if (text.length() + lineOut.length() > totalColumns) {
+                            instOut.print(lineOut.toCString(), "\n");
+                            lineOut.reset();
+                            lineOut.print("       ");
+                        }
+                        lineOut.print(text);
+                    });
+                lineOut.print(":");
+            }
+            if (lineOut.length() > registerColumns) {
+                instOut.print(lineOut.toCString(), "\n");
+                lineOut.reset();
+            }
+            while (lineOut.length() < registerColumns)
+                lineOut.print(" ");
+            lineOut.print(" ");
+            lineOut.print(inst);
+            instOut.print(lineOut.toCString(), "\n");
+            instDumps.append(instOut.toCString());
+        }
+
+        for (unsigned i = instDumps.size(); i--;)
+            dataLog(instDumps[i]);
+        
+        block->dumpFooter(WTF::dataFile());
+    }
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
diff --git a/Source/JavaScriptCore/b3/air/AirLogRegisterPressure.h b/Source/JavaScriptCore/b3/air/AirLogRegisterPressure.h
new file mode 100644 (file)
index 0000000..6cc0323
--- /dev/null
@@ -0,0 +1,43 @@
+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef AirLogRegisterPressure_h
+#define AirLogRegisterPressure_h
+
+#if ENABLE(B3_JIT)
+
+namespace JSC { namespace B3 { namespace Air {
+
+class Code;
+
+// Dumps the registers that are used at each instruction.
+void logRegisterPressure(Code&);
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
+#endif // AirLogRegisterPressure_h
+
index 696cdff..3e84a0b 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -73,10 +73,8 @@ private:
 
 } // anonymous namespace
 
-void optimizeBlockOrder(Code& code)
+Vector<BasicBlock*> blocksInOptimizedOrder(Code& code)
 {
-    PhaseScope phaseScope(code, "optimizeBlockOrder");
-
     Vector<BasicBlock*> blocksInOrder;
 
     BlockWorklist fastWorklist;
@@ -113,6 +111,15 @@ void optimizeBlockOrder(Code& code)
     ASSERT(fastWorklist.isEmpty());
     ASSERT(slowWorklist.isEmpty());
 
+    return blocksInOrder;
+}
+
+void optimizeBlockOrder(Code& code)
+{
+    PhaseScope phaseScope(code, "optimizeBlockOrder");
+
+    Vector<BasicBlock*> blocksInOrder = blocksInOptimizedOrder(code);
+    
     // Place blocks into Code's block list according to the ordering in blocksInOrder. We do this by leaking
     // all of the blocks and then readopting them.
     for (auto& entry : code.blockList())
index 03d19a0..647c541 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
 
 #if ENABLE(B3_JIT)
 
+#include <wtf/Vector.h>
+
 namespace JSC { namespace B3 { namespace Air {
 
+class BasicBlock;
 class Code;
 
+// Returns a list of blocks sorted according to what would be the current optimal order. This shares
+// some properties with a pre-order traversal. In particular, each block will appear after at least
+// one of its predecessors.
+Vector<BasicBlock*> blocksInOptimizedOrder(Code&);
+
 // Reorders the basic blocks to keep hot blocks at the top, and maximize the likelihood that a frequently
 // taken edge is just a fall-through.
 
index c992303..edcc885 100644 (file)
@@ -46,6 +46,32 @@ void reportUsedRegisters(Code& code)
 
         for (unsigned instIndex = block->size(); instIndex--;) {
             Inst& inst = block->at(instIndex);
+
+            // Kill dead assignments to registers. For simplicity we say that a store is killable if
+            // it has only late defs and those late defs are to registers that are dead right now.
+            if (!inst.hasNonArgEffects()) {
+                bool canDelete = true;
+                inst.forEachArg(
+                    [&] (Arg& arg, Arg::Role role, Arg::Type, Arg::Width) {
+                        if (Arg::isEarlyDef(role)) {
+                            canDelete = false;
+                            return;
+                        }
+                        if (!Arg::isLateDef(role))
+                            return;
+                        if (!arg.isReg()) {
+                            canDelete = false;
+                            return;
+                        }
+                        if (localCalc.isLive(arg.reg())) {
+                            canDelete = false;
+                            return;
+                        }
+                    });
+                if (canDelete)
+                    inst = Inst();
+            }
+            
             if (inst.opcode == Patch) {
                 RegisterSet registerSet;
                 for (Reg reg : localCalc.live())
@@ -54,6 +80,11 @@ void reportUsedRegisters(Code& code)
             }
             localCalc.execute(instIndex);
         }
+        
+        block->insts().removeAllMatching(
+            [&] (const Inst& inst) -> bool {
+                return !inst;
+            });
     }
 }
 
index ee424b3..2f81198 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -32,7 +32,8 @@ namespace JSC { namespace B3 { namespace Air {
 
 class Code;
 
-// Performs a liveness analysis over registers and reports the live registers to every Special.
+// Performs a liveness analysis over registers and reports the live registers to every Special. Takes
+// the opportunity to kill dead assignments to registers, since it has access to register liveness.
 
 void reportUsedRegisters(Code&);
 
index 5ba6716..22d009c 100644 (file)
@@ -29,7 +29,6 @@
 #if ENABLE(B3_JIT)
 
 #include "AirCode.h"
-#include "AirFixSpillSlotZDef.h"
 #include "AirInsertionSet.h"
 #include "AirInstInlines.h"
 #include "AirLiveness.h"
@@ -84,7 +83,6 @@ void spillEverything(Code& code)
 
     // Allocate a stack slot for each tmp.
     Vector<StackSlot*> allStackSlots[Arg::numTypes];
-    unsigned newStackSlotThreshold = code.stackSlots().size();
     for (unsigned typeIndex = 0; typeIndex < Arg::numTypes; ++typeIndex) {
         Vector<StackSlot*>& stackSlots = allStackSlots[typeIndex];
         Arg::Type type = static_cast<Arg::Type>(typeIndex);
@@ -130,7 +128,6 @@ void spillEverything(Code& code)
                     switch (role) {
                     case Arg::Use:
                     case Arg::ColdUse:
-                    case Arg::EarlyDef:
                         for (Reg reg : regsInPriorityOrder(type)) {
                             if (!setBefore.get(reg)) {
                                 setBefore.set(reg);
@@ -154,6 +151,7 @@ void spillEverything(Code& code)
                     case Arg::LateUse:
                     case Arg::LateColdUse:
                     case Arg::Scratch:
+                    case Arg::EarlyDef:
                         for (Reg reg : regsInPriorityOrder(type)) {
                             if (!setBefore.get(reg) && !setAfter.get(reg)) {
                                 setAfter.set(reg);
@@ -182,12 +180,6 @@ void spillEverything(Code& code)
         }
         insertionSet.execute(block);
     }
-
-    fixSpillSlotZDef(
-        code,
-        [&] (StackSlot* stackSlot) -> bool {
-            return stackSlot->index() >= newStackSlotThreshold;
-        });
 }
 
 } } } // namespace JSC::B3::Air
index 9486730..dd02729 100644 (file)
@@ -48,6 +48,12 @@ public:
     bool isLocked() const { return m_kind == StackSlotKind::Locked; }
     unsigned index() const { return m_index; }
 
+    void ensureSize(unsigned requestedSize)
+    {
+        ASSERT(!m_offsetFromFP);
+        m_byteSize = std::max(m_byteSize, requestedSize);
+    }
+
     unsigned alignment() const
     {
         if (byteSize() <= 1)
index c7cf13d..bc81714 100644 (file)
@@ -82,6 +82,13 @@ public:
                     VALIDATE(isTerminal(inst.opcode), ("At ", inst, " in ", *block));
                 else
                     VALIDATE(!isTerminal(inst.opcode), ("At ", inst, " in ", *block));
+
+                // forEachArg must return Arg&'s that point into the args array.
+                inst.forEachArg(
+                    [&] (Arg& arg, Arg::Role, Arg::Type, Arg::Width) {
+                        VALIDATE(&arg >= &inst.args[0], ("At ", arg, " in ", inst, " in ", *block));
+                        VALIDATE(&arg <= &inst.args.last(), ("At ", arg, " in ", inst, " in ", *block));
+                    });
             }
             for (BasicBlock* successor : block->successorBlocks())
                 VALIDATE(validBlocks.contains(successor), ("In ", *block));
index 2cbbc35..97f34e0 100644 (file)
@@ -28,6 +28,7 @@
 
 #if ENABLE(FTL_JIT) && FTL_USES_B3
 
+#include "AirCode.h"
 #include "B3Generate.h"
 #include "B3ProcedureInlines.h"
 #include "B3StackSlotValue.h"
@@ -57,7 +58,7 @@ void compile(State& state, Safepoint::Result& safepointResult)
     Graph& graph = state.graph;
     CodeBlock* codeBlock = graph.m_codeBlock;
     VM& vm = graph.m_vm;
-    
+
     {
         GraphSafepoint safepoint(state.graph, safepointResult);
 
index 3c008fd..a5ab133 100644 (file)
@@ -1858,8 +1858,8 @@ private:
                     m_out.notZero32(result), usually(continuation), rarely(slowCase));
                 
                 LBasicBlock lastNext = m_out.appendTo(slowCase, continuation);
-                LValue cond = m_out.bitOr(m_out.lessThan(left, m_out.int32Zero), m_out.lessThan(right, m_out.int32Zero));
-                speculate(NegativeZero, noValue(), 0, cond);
+                speculate(NegativeZero, noValue(), nullptr, m_out.lessThan(left, m_out.int32Zero));
+                speculate(NegativeZero, noValue(), nullptr, m_out.lessThan(right, m_out.int32Zero));
                 m_out.jump(continuation);
                 m_out.appendTo(continuation, lastNext);
             }
@@ -1890,8 +1890,8 @@ private:
                     m_out.notZero64(result), usually(continuation), rarely(slowCase));
                 
                 LBasicBlock lastNext = m_out.appendTo(slowCase, continuation);
-                LValue cond = m_out.bitOr(m_out.lessThan(left, m_out.int64Zero), m_out.lessThan(right, m_out.int64Zero));
-                speculate(NegativeZero, noValue(), 0, cond);
+                speculate(NegativeZero, noValue(), nullptr, m_out.lessThan(left, m_out.int64Zero));
+                speculate(NegativeZero, noValue(), nullptr, m_out.lessThan(right, m_out.int64Zero));
                 m_out.jump(continuation);
                 m_out.appendTo(continuation, lastNext);
             }
@@ -1957,8 +1957,8 @@ private:
 
                 LBasicBlock lastNext = m_out.appendTo(unsafeDenominator, continuation);
                 LValue neg2ToThe31 = m_out.constInt32(-2147483647-1);
-                LValue cond = m_out.bitOr(m_out.isZero32(denominator), m_out.equal(numerator, neg2ToThe31));
-                speculate(Overflow, noValue(), 0, cond);
+                speculate(Overflow, noValue(), nullptr, m_out.isZero32(denominator));
+                speculate(Overflow, noValue(), nullptr, m_out.equal(numerator, neg2ToThe31));
                 m_out.jump(continuation);
 
                 m_out.appendTo(continuation, lastNext);
@@ -2013,8 +2013,8 @@ private:
 
                 LBasicBlock lastNext = m_out.appendTo(unsafeDenominator, continuation);
                 LValue neg2ToThe31 = m_out.constInt32(-2147483647-1);
-                LValue cond = m_out.bitOr(m_out.isZero32(denominator), m_out.equal(numerator, neg2ToThe31));
-                speculate(Overflow, noValue(), 0, cond);
+                speculate(Overflow, noValue(), nullptr, m_out.isZero32(denominator));
+                speculate(Overflow, noValue(), nullptr, m_out.equal(numerator, neg2ToThe31));
                 m_out.jump(continuation);
 
                 m_out.appendTo(continuation, lastNext);
index dc28b67..a95583e 100644 (file)
@@ -91,6 +91,13 @@ public:
         ASSERT(!!reg);
         return m_vector.get(reg.index());
     }
+
+    template<typename Iterable>
+    void setAll(const Iterable& iterable)
+    {
+        for (Reg reg : iterable)
+            set(reg);
+    }
     
     void merge(const RegisterSet& other) { m_vector.merge(other.m_vector); }
     void filter(const RegisterSet& other) { m_vector.filter(other.m_vector); }
index 533f5ed..5756e46 100644 (file)
@@ -343,6 +343,7 @@ typedef const char* optionString;
     v(bool, logB3PhaseTimes, false, nullptr) \
     v(double, rareBlockPenalty, 0.001, nullptr) \
     v(bool, airSpillsEverything, false, nullptr) \
+    v(bool, logAirRegisterPressure, false, nullptr) \
     \
     v(bool, useDollarVM, false, "installs the $vm debugging tool in global objects") \
     v(optionString, functionOverrides, nullptr, "file with debugging overrides for function bodies") \
index 298f350..81d4a84 100644 (file)
@@ -1,3 +1,17 @@
+2016-01-17  Filip Pizlo  <fpizlo@apple.com>
+
+        FTL B3 should be just as fast as FTL LLVM on Octane/crypto
+        https://bugs.webkit.org/show_bug.cgi?id=153113
+
+        Reviewed by Saam Barati.
+
+        * wtf/IndexSparseSet.h:
+        (WTF::IndexSparseSet<OverflowHandler>::IndexSparseSet):
+        (WTF::IndexSparseSet<OverflowHandler>::add):
+        (WTF::IndexSparseSet<OverflowHandler>::remove):
+        * wtf/StringPrintStream.h:
+        (WTF::StringPrintStream::length):
+
 2016-01-17  Michael Catanzaro  <mcatanzaro@igalia.com>
 
         [CMake] Do not build bmalloc when USE_SYSTEM_MALLOC is ON
index 0893a87..798d6f0 100644 (file)
@@ -806,6 +806,30 @@ template<typename T, typename... Args> bool sumOverflows(Args... args)
     return checkedSum<T>(args...).hasOverflowed();
 }
 
+template<typename T, typename U> bool differenceOverflows(U left, U right)
+{
+    return (Checked<T, RecordOverflow>(left) - Checked<T, RecordOverflow>(right)).hasOverflowed();
+}
+
+template<typename T, typename U>
+Checked<T, RecordOverflow> checkedProduct(U value)
+{
+    return Checked<T, RecordOverflow>(value);
+}
+template<typename T, typename U, typename... Args>
+Checked<T, RecordOverflow> checkedProduct(U value, Args... args)
+{
+    return Checked<T, RecordOverflow>(value) * checkedProduct<T>(args...);
+}
+
+// Sometimes, you just want to check if some math would overflow - the code to do the math is
+// already in place, and you want to guard it.
+
+template<typename T, typename... Args> bool productOverflows(Args... args)
+{
+    return checkedProduct<T>(args...).hasOverflowed();
+}
+
 }
 
 using WTF::Checked;
@@ -821,6 +845,8 @@ using WTF::CheckedInt64;
 using WTF::CheckedUint64;
 using WTF::CheckedSize;
 using WTF::checkedSum;
+using WTF::differenceOverflows;
+using WTF::productOverflows;
 using WTF::sumOverflows;
 
 #endif
index dc43656..381fa3f 100644 (file)
@@ -47,8 +47,8 @@ class IndexSparseSet {
 public:
     explicit IndexSparseSet(unsigned maxValue);
 
-    void add(unsigned);
-    void remove(unsigned);
+    bool add(unsigned);
+    bool remove(unsigned);
     void clear();
 
     unsigned size() const;
@@ -71,29 +71,33 @@ inline IndexSparseSet<OverflowHandler>::IndexSparseSet(unsigned maxValue)
 }
 
 template<typename OverflowHandler>
-inline void IndexSparseSet<OverflowHandler>::add(unsigned value)
+inline bool IndexSparseSet<OverflowHandler>::add(unsigned value)
 {
     if (contains(value))
-        return;
+        return false;
 
     unsigned newPosition = m_values.size();
     m_values.append(value);
     m_map[value] = newPosition;
+    return true;
 }
 
 template<typename OverflowHandler>
-inline void IndexSparseSet<OverflowHandler>::remove(unsigned value)
+inline bool IndexSparseSet<OverflowHandler>::remove(unsigned value)
 {
     unsigned position = m_map[value];
     if (position >= m_values.size())
-        return;
+        return false;
 
     if (m_values[position] == value) {
         unsigned lastValue = m_values.last();
         m_values[position] = lastValue;
         m_map[lastValue] = position;
         m_values.removeLast();
+        return true;
     }
+
+    return false;
 }
 
 template<typename OverflowHandler>
index 607b590..18eecb2 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2012 Apple Inc. All rights reserved.
+ * Copyright (C) 2012, 2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -38,6 +38,8 @@ public:
     WTF_EXPORT_PRIVATE virtual ~StringPrintStream();
     
     WTF_EXPORT_PRIVATE virtual void vprintf(const char* format, va_list) override WTF_ATTRIBUTE_PRINTF(2, 0);
+
+    size_t length() const { return m_next; }
     
     WTF_EXPORT_PRIVATE CString toCString();
     WTF_EXPORT_PRIVATE String toString();