Air needs a Shuffle instruction
authorfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Fri, 15 Jan 2016 19:41:56 +0000 (19:41 +0000)
committerfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Fri, 15 Jan 2016 19:41:56 +0000 (19:41 +0000)
https://bugs.webkit.org/show_bug.cgi?id=152952

Reviewed by Saam Barati.

This adds an instruction called Shuffle. Shuffle allows you to simultaneously perform
multiple moves to perform arbitrary permutations over registers and memory. We call these
rotations. It also allows you to perform "shifts", like (a => b, b => c): after the shift,
c will have b's old value, b will have a's old value, and a will be unchanged. Shifts can
use immediates as their source.

Shuffle is added as a custom instruction, since it has a variable number of arguments. It
takes any number of triplets of arguments, where each triplet describes one mapping of the
shuffle. For example, to represent (a => b, b => c), we might say:

    Shuffle %a, %b, 64, %b, %c, 64

Note the "64"s, those are width arguments that describe how many bits of the register are
being moved. Each triplet is referred to as a "shuffle pair". We call it a pair because the
most relevant part of it is the pair of registers or memroy locations (i.e. %a, %b form one
of the pairs in the example). For GP arguments, the width follows ZDef semantics.

In the future, we will be able to use Shuffle for a lot of things. This patch is modest about
how to use it:

- C calling convention argument marshalling. Previously we used move instructions. But that's
  problematic since it introduces artificial interference between the argument registers and
  the inputs. Using Shuffle removes that interference. This helps a bit.

- Cold C calls. This is what really motivated me to write this patch. If we have a C call on
  a cold path, then we want it to appear to the register allocator like it doesn't clobber
  any registers. Only after register allocation should we handle the clobbering by simply
  saving all of the live volatile registers to the stack. If you imagine the saving and the
  argument marshalling, you can see how before the call, we want to have a Shuffle that does
  both of those things. This is important. If argument marshalling was separate from the
  saving, then we'd still appear to clobber argument registers. Doing them together as one
  Shuffle means that the cold call doesn't appear to even clobber the argument registers.

Unfortunately, I was wrong about cold C calls being the dominant problem with our register
allocator right now. Fixing this revealed other problems in my current tuning benchmark,
Octane/encrypt. Nonetheless, this is a small speed-up across the board, and gives us some
functionality we will need to implement other optimizations.

Relanding after fixing production build.

* CMakeLists.txt:
* JavaScriptCore.xcodeproj/project.pbxproj:
* assembler/AbstractMacroAssembler.h:
(JSC::isX86_64):
(JSC::isIOS):
(JSC::optimizeForARMv7IDIVSupported):
* assembler/MacroAssemblerX86Common.h:
(JSC::MacroAssemblerX86Common::zeroExtend32ToPtr):
(JSC::MacroAssemblerX86Common::swap32):
(JSC::MacroAssemblerX86Common::moveConditionally32):
* assembler/MacroAssemblerX86_64.h:
(JSC::MacroAssemblerX86_64::store64WithAddressOffsetPatch):
(JSC::MacroAssemblerX86_64::swap64):
(JSC::MacroAssemblerX86_64::move64ToDouble):
* assembler/X86Assembler.h:
(JSC::X86Assembler::xchgl_rr):
(JSC::X86Assembler::xchgl_rm):
(JSC::X86Assembler::xchgq_rr):
(JSC::X86Assembler::xchgq_rm):
(JSC::X86Assembler::movl_rr):
* b3/B3CCallValue.h:
* b3/B3Compilation.cpp:
(JSC::B3::Compilation::Compilation):
(JSC::B3::Compilation::~Compilation):
* b3/B3Compilation.h:
(JSC::B3::Compilation::code):
* b3/B3LowerToAir.cpp:
(JSC::B3::Air::LowerToAir::run):
(JSC::B3::Air::LowerToAir::createSelect):
(JSC::B3::Air::LowerToAir::lower):
(JSC::B3::Air::LowerToAir::marshallCCallArgument): Deleted.
* b3/B3OpaqueByproducts.h:
(JSC::B3::OpaqueByproducts::count):
* b3/B3StackmapSpecial.cpp:
(JSC::B3::StackmapSpecial::isArgValidForValue):
(JSC::B3::StackmapSpecial::isArgValidForRep):
* b3/air/AirArg.cpp:
(JSC::B3::Air::Arg::isStackMemory):
(JSC::B3::Air::Arg::isRepresentableAs):
(JSC::B3::Air::Arg::usesTmp):
(JSC::B3::Air::Arg::canRepresent):
(JSC::B3::Air::Arg::isCompatibleType):
(JSC::B3::Air::Arg::dump):
(WTF::printInternal):
* b3/air/AirArg.h:
(JSC::B3::Air::Arg::forEachType):
(JSC::B3::Air::Arg::isWarmUse):
(JSC::B3::Air::Arg::cooled):
(JSC::B3::Air::Arg::isEarlyUse):
(JSC::B3::Air::Arg::imm64):
(JSC::B3::Air::Arg::immPtr):
(JSC::B3::Air::Arg::addr):
(JSC::B3::Air::Arg::special):
(JSC::B3::Air::Arg::widthArg):
(JSC::B3::Air::Arg::operator==):
(JSC::B3::Air::Arg::isImm64):
(JSC::B3::Air::Arg::isSomeImm):
(JSC::B3::Air::Arg::isAddr):
(JSC::B3::Air::Arg::isIndex):
(JSC::B3::Air::Arg::isMemory):
(JSC::B3::Air::Arg::isRelCond):
(JSC::B3::Air::Arg::isSpecial):
(JSC::B3::Air::Arg::isWidthArg):
(JSC::B3::Air::Arg::isAlive):
(JSC::B3::Air::Arg::base):
(JSC::B3::Air::Arg::hasOffset):
(JSC::B3::Air::Arg::offset):
(JSC::B3::Air::Arg::width):
(JSC::B3::Air::Arg::isGPTmp):
(JSC::B3::Air::Arg::isGP):
(JSC::B3::Air::Arg::isFP):
(JSC::B3::Air::Arg::isType):
(JSC::B3::Air::Arg::isGPR):
(JSC::B3::Air::Arg::isValidForm):
(JSC::B3::Air::Arg::forEachTmpFast):
* b3/air/AirBasicBlock.h:
(JSC::B3::Air::BasicBlock::insts):
(JSC::B3::Air::BasicBlock::appendInst):
(JSC::B3::Air::BasicBlock::append):
* b3/air/AirCCallingConvention.cpp: Added.
(JSC::B3::Air::computeCCallingConvention):
(JSC::B3::Air::cCallResult):
(JSC::B3::Air::buildCCall):
* b3/air/AirCCallingConvention.h: Added.
* b3/air/AirCode.h:
(JSC::B3::Air::Code::proc):
* b3/air/AirCustom.cpp: Added.
(JSC::B3::Air::CCallCustom::isValidForm):
(JSC::B3::Air::CCallCustom::generate):
(JSC::B3::Air::ShuffleCustom::isValidForm):
(JSC::B3::Air::ShuffleCustom::generate):
* b3/air/AirCustom.h:
(JSC::B3::Air::PatchCustom::forEachArg):
(JSC::B3::Air::PatchCustom::generate):
(JSC::B3::Air::CCallCustom::forEachArg):
(JSC::B3::Air::CCallCustom::isValidFormStatic):
(JSC::B3::Air::CCallCustom::admitsStack):
(JSC::B3::Air::CCallCustom::hasNonArgNonControlEffects):
(JSC::B3::Air::ColdCCallCustom::forEachArg):
(JSC::B3::Air::ShuffleCustom::forEachArg):
(JSC::B3::Air::ShuffleCustom::isValidFormStatic):
(JSC::B3::Air::ShuffleCustom::admitsStack):
(JSC::B3::Air::ShuffleCustom::hasNonArgNonControlEffects):
* b3/air/AirEmitShuffle.cpp: Added.
(JSC::B3::Air::ShufflePair::dump):
(JSC::B3::Air::emitShuffle):
* b3/air/AirEmitShuffle.h: Added.
(JSC::B3::Air::ShufflePair::ShufflePair):
(JSC::B3::Air::ShufflePair::src):
(JSC::B3::Air::ShufflePair::dst):
(JSC::B3::Air::ShufflePair::width):
* b3/air/AirGenerate.cpp:
(JSC::B3::Air::prepareForGeneration):
* b3/air/AirGenerate.h:
* b3/air/AirInsertionSet.cpp:
(JSC::B3::Air::InsertionSet::insertInsts):
(JSC::B3::Air::InsertionSet::execute):
* b3/air/AirInsertionSet.h:
(JSC::B3::Air::InsertionSet::insertInst):
(JSC::B3::Air::InsertionSet::insert):
* b3/air/AirInst.h:
(JSC::B3::Air::Inst::operator bool):
(JSC::B3::Air::Inst::append):
* b3/air/AirLowerAfterRegAlloc.cpp: Added.
(JSC::B3::Air::lowerAfterRegAlloc):
* b3/air/AirLowerAfterRegAlloc.h: Added.
* b3/air/AirLowerMacros.cpp: Added.
(JSC::B3::Air::lowerMacros):
* b3/air/AirLowerMacros.h: Added.
* b3/air/AirOpcode.opcodes:
* b3/air/AirRegisterPriority.h:
(JSC::B3::Air::regsInPriorityOrder):
* b3/air/testair.cpp: Added.
(hiddenTruthBecauseNoReturnIsStupid):
(usage):
(JSC::B3::Air::compile):
(JSC::B3::Air::invoke):
(JSC::B3::Air::compileAndRun):
(JSC::B3::Air::testSimple):
(JSC::B3::Air::loadConstantImpl):
(JSC::B3::Air::loadConstant):
(JSC::B3::Air::loadDoubleConstant):
(JSC::B3::Air::testShuffleSimpleSwap):
(JSC::B3::Air::testShuffleSimpleShift):
(JSC::B3::Air::testShuffleLongShift):
(JSC::B3::Air::testShuffleLongShiftBackwards):
(JSC::B3::Air::testShuffleSimpleRotate):
(JSC::B3::Air::testShuffleSimpleBroadcast):
(JSC::B3::Air::testShuffleBroadcastAllRegs):
(JSC::B3::Air::testShuffleTreeShift):
(JSC::B3::Air::testShuffleTreeShiftBackward):
(JSC::B3::Air::testShuffleTreeShiftOtherBackward):
(JSC::B3::Air::testShuffleMultipleShifts):
(JSC::B3::Air::testShuffleRotateWithFringe):
(JSC::B3::Air::testShuffleRotateWithLongFringe):
(JSC::B3::Air::testShuffleMultipleRotates):
(JSC::B3::Air::testShuffleShiftAndRotate):
(JSC::B3::Air::testShuffleShiftAllRegs):
(JSC::B3::Air::testShuffleRotateAllRegs):
(JSC::B3::Air::testShuffleSimpleSwap64):
(JSC::B3::Air::testShuffleSimpleShift64):
(JSC::B3::Air::testShuffleSwapMixedWidth):
(JSC::B3::Air::testShuffleShiftMixedWidth):
(JSC::B3::Air::testShuffleShiftMemory):
(JSC::B3::Air::testShuffleShiftMemoryLong):
(JSC::B3::Air::testShuffleShiftMemoryAllRegs):
(JSC::B3::Air::testShuffleShiftMemoryAllRegs64):
(JSC::B3::Air::combineHiLo):
(JSC::B3::Air::testShuffleShiftMemoryAllRegsMixedWidth):
(JSC::B3::Air::testShuffleRotateMemory):
(JSC::B3::Air::testShuffleRotateMemory64):
(JSC::B3::Air::testShuffleRotateMemoryMixedWidth):
(JSC::B3::Air::testShuffleRotateMemoryAllRegs64):
(JSC::B3::Air::testShuffleRotateMemoryAllRegsMixedWidth):
(JSC::B3::Air::testShuffleSwapDouble):
(JSC::B3::Air::testShuffleShiftDouble):
(JSC::B3::Air::run):
(run):
(main):
* b3/testb3.cpp:
(JSC::B3::testCallSimple):
(JSC::B3::testCallRare):
(JSC::B3::testCallRareLive):
(JSC::B3::testCallSimplePure):
(JSC::B3::run):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@195139 268f45cc-cd09-0410-ab3c-d52691b4dbfc

40 files changed:
Source/JavaScriptCore/CMakeLists.txt
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj
Source/JavaScriptCore/assembler/AbstractMacroAssembler.h
Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h
Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h
Source/JavaScriptCore/assembler/X86Assembler.h
Source/JavaScriptCore/b3/B3CCallValue.h
Source/JavaScriptCore/b3/B3Compilation.cpp
Source/JavaScriptCore/b3/B3Compilation.h
Source/JavaScriptCore/b3/B3LowerToAir.cpp
Source/JavaScriptCore/b3/B3OpaqueByproducts.h
Source/JavaScriptCore/b3/B3StackmapSpecial.cpp
Source/JavaScriptCore/b3/air/AirArg.cpp
Source/JavaScriptCore/b3/air/AirArg.h
Source/JavaScriptCore/b3/air/AirBasicBlock.h
Source/JavaScriptCore/b3/air/AirCCallingConvention.cpp [new file with mode: 0644]
Source/JavaScriptCore/b3/air/AirCCallingConvention.h [new file with mode: 0644]
Source/JavaScriptCore/b3/air/AirCode.h
Source/JavaScriptCore/b3/air/AirCustom.cpp [new file with mode: 0644]
Source/JavaScriptCore/b3/air/AirCustom.h
Source/JavaScriptCore/b3/air/AirEmitShuffle.cpp [new file with mode: 0644]
Source/JavaScriptCore/b3/air/AirEmitShuffle.h [new file with mode: 0644]
Source/JavaScriptCore/b3/air/AirGenerate.cpp
Source/JavaScriptCore/b3/air/AirGenerate.h
Source/JavaScriptCore/b3/air/AirInsertionSet.cpp
Source/JavaScriptCore/b3/air/AirInsertionSet.h
Source/JavaScriptCore/b3/air/AirInst.h
Source/JavaScriptCore/b3/air/AirLowerAfterRegAlloc.cpp [new file with mode: 0644]
Source/JavaScriptCore/b3/air/AirLowerAfterRegAlloc.h [new file with mode: 0644]
Source/JavaScriptCore/b3/air/AirLowerMacros.cpp [new file with mode: 0644]
Source/JavaScriptCore/b3/air/AirLowerMacros.h [new file with mode: 0644]
Source/JavaScriptCore/b3/air/AirOpcode.opcodes
Source/JavaScriptCore/b3/air/AirRegisterPriority.h
Source/JavaScriptCore/b3/air/testair.cpp [new file with mode: 0644]
Source/JavaScriptCore/b3/testb3.cpp
Source/JavaScriptCore/ftl/FTLLowerDFGToLLVM.cpp
Source/JavaScriptCore/ftl/FTLOSRExit.cpp
Source/JavaScriptCore/ftl/FTLOSRExitHandle.cpp
Source/JavaScriptCore/ftl/FTLOSRExitHandle.h

index 8cca47f..9916ad4 100644 (file)
@@ -73,8 +73,11 @@ set(JavaScriptCore_SOURCES
     b3/air/AirArg.cpp
     b3/air/AirBasicBlock.cpp
     b3/air/AirCCallSpecial.cpp
+    b3/air/AirCCallingConvention.cpp
     b3/air/AirCode.cpp
+    b3/air/AirCustom.cpp
     b3/air/AirEliminateDeadCode.cpp
+    b3/air/AirEmitShuffle.cpp
     b3/air/AirFixPartialRegisterStalls.cpp
     b3/air/AirGenerate.cpp
     b3/air/AirGenerated.cpp
@@ -82,6 +85,8 @@ set(JavaScriptCore_SOURCES
     b3/air/AirInsertionSet.cpp
     b3/air/AirInst.cpp
     b3/air/AirIteratedRegisterCoalescing.cpp
+    b3/air/AirLowerAfterRegAlloc.cpp
+    b3/air/AirLowerMacros.cpp
     b3/air/AirOptimizeBlockOrder.cpp
     b3/air/AirPhaseScope.cpp
     b3/air/AirRegisterPriority.cpp
index 86567b2..2ec8c91 100644 (file)
@@ -1,3 +1,236 @@
+2016-01-15  Filip Pizlo  <fpizlo@apple.com>
+
+        Air needs a Shuffle instruction
+        https://bugs.webkit.org/show_bug.cgi?id=152952
+
+        Reviewed by Saam Barati.
+
+        This adds an instruction called Shuffle. Shuffle allows you to simultaneously perform
+        multiple moves to perform arbitrary permutations over registers and memory. We call these
+        rotations. It also allows you to perform "shifts", like (a => b, b => c): after the shift,
+        c will have b's old value, b will have a's old value, and a will be unchanged. Shifts can
+        use immediates as their source.
+
+        Shuffle is added as a custom instruction, since it has a variable number of arguments. It
+        takes any number of triplets of arguments, where each triplet describes one mapping of the
+        shuffle. For example, to represent (a => b, b => c), we might say:
+
+            Shuffle %a, %b, 64, %b, %c, 64
+
+        Note the "64"s, those are width arguments that describe how many bits of the register are
+        being moved. Each triplet is referred to as a "shuffle pair". We call it a pair because the
+        most relevant part of it is the pair of registers or memroy locations (i.e. %a, %b form one
+        of the pairs in the example). For GP arguments, the width follows ZDef semantics.
+
+        In the future, we will be able to use Shuffle for a lot of things. This patch is modest about
+        how to use it:
+
+        - C calling convention argument marshalling. Previously we used move instructions. But that's
+          problematic since it introduces artificial interference between the argument registers and
+          the inputs. Using Shuffle removes that interference. This helps a bit.
+
+        - Cold C calls. This is what really motivated me to write this patch. If we have a C call on
+          a cold path, then we want it to appear to the register allocator like it doesn't clobber
+          any registers. Only after register allocation should we handle the clobbering by simply
+          saving all of the live volatile registers to the stack. If you imagine the saving and the
+          argument marshalling, you can see how before the call, we want to have a Shuffle that does
+          both of those things. This is important. If argument marshalling was separate from the
+          saving, then we'd still appear to clobber argument registers. Doing them together as one
+          Shuffle means that the cold call doesn't appear to even clobber the argument registers.
+
+        Unfortunately, I was wrong about cold C calls being the dominant problem with our register
+        allocator right now. Fixing this revealed other problems in my current tuning benchmark,
+        Octane/encrypt. Nonetheless, this is a small speed-up across the board, and gives us some
+        functionality we will need to implement other optimizations.
+
+        Relanding after fixing production build.
+
+        * CMakeLists.txt:
+        * JavaScriptCore.xcodeproj/project.pbxproj:
+        * assembler/AbstractMacroAssembler.h:
+        (JSC::isX86_64):
+        (JSC::isIOS):
+        (JSC::optimizeForARMv7IDIVSupported):
+        * assembler/MacroAssemblerX86Common.h:
+        (JSC::MacroAssemblerX86Common::zeroExtend32ToPtr):
+        (JSC::MacroAssemblerX86Common::swap32):
+        (JSC::MacroAssemblerX86Common::moveConditionally32):
+        * assembler/MacroAssemblerX86_64.h:
+        (JSC::MacroAssemblerX86_64::store64WithAddressOffsetPatch):
+        (JSC::MacroAssemblerX86_64::swap64):
+        (JSC::MacroAssemblerX86_64::move64ToDouble):
+        * assembler/X86Assembler.h:
+        (JSC::X86Assembler::xchgl_rr):
+        (JSC::X86Assembler::xchgl_rm):
+        (JSC::X86Assembler::xchgq_rr):
+        (JSC::X86Assembler::xchgq_rm):
+        (JSC::X86Assembler::movl_rr):
+        * b3/B3CCallValue.h:
+        * b3/B3Compilation.cpp:
+        (JSC::B3::Compilation::Compilation):
+        (JSC::B3::Compilation::~Compilation):
+        * b3/B3Compilation.h:
+        (JSC::B3::Compilation::code):
+        * b3/B3LowerToAir.cpp:
+        (JSC::B3::Air::LowerToAir::run):
+        (JSC::B3::Air::LowerToAir::createSelect):
+        (JSC::B3::Air::LowerToAir::lower):
+        (JSC::B3::Air::LowerToAir::marshallCCallArgument): Deleted.
+        * b3/B3OpaqueByproducts.h:
+        (JSC::B3::OpaqueByproducts::count):
+        * b3/B3StackmapSpecial.cpp:
+        (JSC::B3::StackmapSpecial::isArgValidForValue):
+        (JSC::B3::StackmapSpecial::isArgValidForRep):
+        * b3/air/AirArg.cpp:
+        (JSC::B3::Air::Arg::isStackMemory):
+        (JSC::B3::Air::Arg::isRepresentableAs):
+        (JSC::B3::Air::Arg::usesTmp):
+        (JSC::B3::Air::Arg::canRepresent):
+        (JSC::B3::Air::Arg::isCompatibleType):
+        (JSC::B3::Air::Arg::dump):
+        (WTF::printInternal):
+        * b3/air/AirArg.h:
+        (JSC::B3::Air::Arg::forEachType):
+        (JSC::B3::Air::Arg::isWarmUse):
+        (JSC::B3::Air::Arg::cooled):
+        (JSC::B3::Air::Arg::isEarlyUse):
+        (JSC::B3::Air::Arg::imm64):
+        (JSC::B3::Air::Arg::immPtr):
+        (JSC::B3::Air::Arg::addr):
+        (JSC::B3::Air::Arg::special):
+        (JSC::B3::Air::Arg::widthArg):
+        (JSC::B3::Air::Arg::operator==):
+        (JSC::B3::Air::Arg::isImm64):
+        (JSC::B3::Air::Arg::isSomeImm):
+        (JSC::B3::Air::Arg::isAddr):
+        (JSC::B3::Air::Arg::isIndex):
+        (JSC::B3::Air::Arg::isMemory):
+        (JSC::B3::Air::Arg::isRelCond):
+        (JSC::B3::Air::Arg::isSpecial):
+        (JSC::B3::Air::Arg::isWidthArg):
+        (JSC::B3::Air::Arg::isAlive):
+        (JSC::B3::Air::Arg::base):
+        (JSC::B3::Air::Arg::hasOffset):
+        (JSC::B3::Air::Arg::offset):
+        (JSC::B3::Air::Arg::width):
+        (JSC::B3::Air::Arg::isGPTmp):
+        (JSC::B3::Air::Arg::isGP):
+        (JSC::B3::Air::Arg::isFP):
+        (JSC::B3::Air::Arg::isType):
+        (JSC::B3::Air::Arg::isGPR):
+        (JSC::B3::Air::Arg::isValidForm):
+        (JSC::B3::Air::Arg::forEachTmpFast):
+        * b3/air/AirBasicBlock.h:
+        (JSC::B3::Air::BasicBlock::insts):
+        (JSC::B3::Air::BasicBlock::appendInst):
+        (JSC::B3::Air::BasicBlock::append):
+        * b3/air/AirCCallingConvention.cpp: Added.
+        (JSC::B3::Air::computeCCallingConvention):
+        (JSC::B3::Air::cCallResult):
+        (JSC::B3::Air::buildCCall):
+        * b3/air/AirCCallingConvention.h: Added.
+        * b3/air/AirCode.h:
+        (JSC::B3::Air::Code::proc):
+        * b3/air/AirCustom.cpp: Added.
+        (JSC::B3::Air::CCallCustom::isValidForm):
+        (JSC::B3::Air::CCallCustom::generate):
+        (JSC::B3::Air::ShuffleCustom::isValidForm):
+        (JSC::B3::Air::ShuffleCustom::generate):
+        * b3/air/AirCustom.h:
+        (JSC::B3::Air::PatchCustom::forEachArg):
+        (JSC::B3::Air::PatchCustom::generate):
+        (JSC::B3::Air::CCallCustom::forEachArg):
+        (JSC::B3::Air::CCallCustom::isValidFormStatic):
+        (JSC::B3::Air::CCallCustom::admitsStack):
+        (JSC::B3::Air::CCallCustom::hasNonArgNonControlEffects):
+        (JSC::B3::Air::ColdCCallCustom::forEachArg):
+        (JSC::B3::Air::ShuffleCustom::forEachArg):
+        (JSC::B3::Air::ShuffleCustom::isValidFormStatic):
+        (JSC::B3::Air::ShuffleCustom::admitsStack):
+        (JSC::B3::Air::ShuffleCustom::hasNonArgNonControlEffects):
+        * b3/air/AirEmitShuffle.cpp: Added.
+        (JSC::B3::Air::ShufflePair::dump):
+        (JSC::B3::Air::emitShuffle):
+        * b3/air/AirEmitShuffle.h: Added.
+        (JSC::B3::Air::ShufflePair::ShufflePair):
+        (JSC::B3::Air::ShufflePair::src):
+        (JSC::B3::Air::ShufflePair::dst):
+        (JSC::B3::Air::ShufflePair::width):
+        * b3/air/AirGenerate.cpp:
+        (JSC::B3::Air::prepareForGeneration):
+        * b3/air/AirGenerate.h:
+        * b3/air/AirInsertionSet.cpp:
+        (JSC::B3::Air::InsertionSet::insertInsts):
+        (JSC::B3::Air::InsertionSet::execute):
+        * b3/air/AirInsertionSet.h:
+        (JSC::B3::Air::InsertionSet::insertInst):
+        (JSC::B3::Air::InsertionSet::insert):
+        * b3/air/AirInst.h:
+        (JSC::B3::Air::Inst::operator bool):
+        (JSC::B3::Air::Inst::append):
+        * b3/air/AirLowerAfterRegAlloc.cpp: Added.
+        (JSC::B3::Air::lowerAfterRegAlloc):
+        * b3/air/AirLowerAfterRegAlloc.h: Added.
+        * b3/air/AirLowerMacros.cpp: Added.
+        (JSC::B3::Air::lowerMacros):
+        * b3/air/AirLowerMacros.h: Added.
+        * b3/air/AirOpcode.opcodes:
+        * b3/air/AirRegisterPriority.h:
+        (JSC::B3::Air::regsInPriorityOrder):
+        * b3/air/testair.cpp: Added.
+        (hiddenTruthBecauseNoReturnIsStupid):
+        (usage):
+        (JSC::B3::Air::compile):
+        (JSC::B3::Air::invoke):
+        (JSC::B3::Air::compileAndRun):
+        (JSC::B3::Air::testSimple):
+        (JSC::B3::Air::loadConstantImpl):
+        (JSC::B3::Air::loadConstant):
+        (JSC::B3::Air::loadDoubleConstant):
+        (JSC::B3::Air::testShuffleSimpleSwap):
+        (JSC::B3::Air::testShuffleSimpleShift):
+        (JSC::B3::Air::testShuffleLongShift):
+        (JSC::B3::Air::testShuffleLongShiftBackwards):
+        (JSC::B3::Air::testShuffleSimpleRotate):
+        (JSC::B3::Air::testShuffleSimpleBroadcast):
+        (JSC::B3::Air::testShuffleBroadcastAllRegs):
+        (JSC::B3::Air::testShuffleTreeShift):
+        (JSC::B3::Air::testShuffleTreeShiftBackward):
+        (JSC::B3::Air::testShuffleTreeShiftOtherBackward):
+        (JSC::B3::Air::testShuffleMultipleShifts):
+        (JSC::B3::Air::testShuffleRotateWithFringe):
+        (JSC::B3::Air::testShuffleRotateWithLongFringe):
+        (JSC::B3::Air::testShuffleMultipleRotates):
+        (JSC::B3::Air::testShuffleShiftAndRotate):
+        (JSC::B3::Air::testShuffleShiftAllRegs):
+        (JSC::B3::Air::testShuffleRotateAllRegs):
+        (JSC::B3::Air::testShuffleSimpleSwap64):
+        (JSC::B3::Air::testShuffleSimpleShift64):
+        (JSC::B3::Air::testShuffleSwapMixedWidth):
+        (JSC::B3::Air::testShuffleShiftMixedWidth):
+        (JSC::B3::Air::testShuffleShiftMemory):
+        (JSC::B3::Air::testShuffleShiftMemoryLong):
+        (JSC::B3::Air::testShuffleShiftMemoryAllRegs):
+        (JSC::B3::Air::testShuffleShiftMemoryAllRegs64):
+        (JSC::B3::Air::combineHiLo):
+        (JSC::B3::Air::testShuffleShiftMemoryAllRegsMixedWidth):
+        (JSC::B3::Air::testShuffleRotateMemory):
+        (JSC::B3::Air::testShuffleRotateMemory64):
+        (JSC::B3::Air::testShuffleRotateMemoryMixedWidth):
+        (JSC::B3::Air::testShuffleRotateMemoryAllRegs64):
+        (JSC::B3::Air::testShuffleRotateMemoryAllRegsMixedWidth):
+        (JSC::B3::Air::testShuffleSwapDouble):
+        (JSC::B3::Air::testShuffleShiftDouble):
+        (JSC::B3::Air::run):
+        (run):
+        (main):
+        * b3/testb3.cpp:
+        (JSC::B3::testCallSimple):
+        (JSC::B3::testCallRare):
+        (JSC::B3::testCallRareLive):
+        (JSC::B3::testCallSimplePure):
+        (JSC::B3::run):
+
 2016-01-15  Andy VanWagoner  <thetalecrafter@gmail.com>
 
         [INTL] Implement Date.prototype.toLocaleString in ECMA-402
index cef1895..b71875a 100644 (file)
@@ -25,6 +25,7 @@
                        buildPhases = (
                        );
                        dependencies = (
+                               0F6183471C45F67A0072450B /* PBXTargetDependency */,
                                0F93275D1C20BF3A00CF6564 /* PBXTargetDependency */,
                                0FEC85B11BDB5D8F0080FF74 /* PBXTargetDependency */,
                                5D6B2A4F152B9E23005231DE /* PBXTargetDependency */,
                0F3C1F1B1B868E7900ABB08B /* DFGClobbersExitState.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F3C1F191B868E7900ABB08B /* DFGClobbersExitState.h */; };
                0F3E01AA19D353A500F61B7F /* DFGPrePostNumbering.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F3E01A819D353A500F61B7F /* DFGPrePostNumbering.cpp */; };
                0F3E01AB19D353A500F61B7F /* DFGPrePostNumbering.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F3E01A919D353A500F61B7F /* DFGPrePostNumbering.h */; };
+               0F40E4A71C497F7400A577FA /* AirOpcode.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F6183321C45F35C0072450B /* AirOpcode.h */; settings = {ATTRIBUTES = (Private, ); }; };
+               0F40E4A81C497F7400A577FA /* AirOpcodeGenerated.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F6183341C45F3B60072450B /* AirOpcodeGenerated.h */; settings = {ATTRIBUTES = (Private, ); }; };
+               0F40E4A91C497F7400A577FA /* AirOpcodeUtils.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F6183351C45F3B60072450B /* AirOpcodeUtils.h */; settings = {ATTRIBUTES = (Private, ); }; };
                0F426A481460CBB300131F8F /* ValueRecovery.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F426A451460CBAB00131F8F /* ValueRecovery.h */; settings = {ATTRIBUTES = (Private, ); }; };
                0F426A491460CBB700131F8F /* VirtualRegister.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F426A461460CBAB00131F8F /* VirtualRegister.h */; settings = {ATTRIBUTES = (Private, ); }; };
                0F426A4B1460CD6E00131F8F /* DataFormat.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F426A4A1460CD6B00131F8F /* DataFormat.h */; settings = {ATTRIBUTES = (Private, ); }; };
                0F5EF91E16878F7A003E5C25 /* JITThunks.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F5EF91B16878F78003E5C25 /* JITThunks.cpp */; };
                0F5EF91F16878F7D003E5C25 /* JITThunks.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F5EF91C16878F78003E5C25 /* JITThunks.h */; settings = {ATTRIBUTES = (Private, ); }; };
                0F5F08CF146C7633000472A9 /* UnconditionalFinalizer.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F5F08CE146C762F000472A9 /* UnconditionalFinalizer.h */; settings = {ATTRIBUTES = (Private, ); }; };
+               0F6183291C45BF070072450B /* AirCCallingConvention.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F6183201C45BF070072450B /* AirCCallingConvention.cpp */; };
+               0F61832A1C45BF070072450B /* AirCCallingConvention.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F6183211C45BF070072450B /* AirCCallingConvention.h */; };
+               0F61832B1C45BF070072450B /* AirCustom.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F6183221C45BF070072450B /* AirCustom.cpp */; };
+               0F61832C1C45BF070072450B /* AirEmitShuffle.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F6183231C45BF070072450B /* AirEmitShuffle.cpp */; };
+               0F61832D1C45BF070072450B /* AirEmitShuffle.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F6183241C45BF070072450B /* AirEmitShuffle.h */; };
+               0F61832E1C45BF070072450B /* AirLowerAfterRegAlloc.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F6183251C45BF070072450B /* AirLowerAfterRegAlloc.cpp */; };
+               0F61832F1C45BF070072450B /* AirLowerAfterRegAlloc.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F6183261C45BF070072450B /* AirLowerAfterRegAlloc.h */; };
+               0F6183301C45BF070072450B /* AirLowerMacros.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F6183271C45BF070072450B /* AirLowerMacros.cpp */; };
+               0F6183311C45BF070072450B /* AirLowerMacros.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F6183281C45BF070072450B /* AirLowerMacros.h */; };
+               0F61833C1C45F62A0072450B /* Foundation.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 51F0EB6105C86C6B00E6DF1B /* Foundation.framework */; };
+               0F61833D1C45F62A0072450B /* JavaScriptCore.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 932F5BD90822A1C700736975 /* JavaScriptCore.framework */; };
+               0F6183451C45F6600072450B /* testair.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F6183441C45F6600072450B /* testair.cpp */; };
                0F620174143FCD330068B77C /* DFGVariableAccessData.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F620172143FCD2F0068B77C /* DFGVariableAccessData.h */; };
                0F620176143FCD3B0068B77C /* DFGBasicBlock.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F620170143FCD2F0068B77C /* DFGBasicBlock.h */; };
                0F620177143FCD3F0068B77C /* DFGAbstractValue.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F62016F143FCD2F0068B77C /* DFGAbstractValue.h */; };
 /* End PBXBuildFile section */
 
 /* Begin PBXContainerItemProxy section */
+               0F6183461C45F67A0072450B /* PBXContainerItemProxy */ = {
+                       isa = PBXContainerItemProxy;
+                       containerPortal = 0867D690FE84028FC02AAC07 /* Project object */;
+                       proxyType = 1;
+                       remoteGlobalIDString = 0F6183381C45F62A0072450B;
+                       remoteInfo = testair;
+               };
                0F93275C1C20BF3A00CF6564 /* PBXContainerItemProxy */ = {
                        isa = PBXContainerItemProxy;
                        containerPortal = 0867D690FE84028FC02AAC07 /* Project object */;
                0F5EF91B16878F78003E5C25 /* JITThunks.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = JITThunks.cpp; sourceTree = "<group>"; };
                0F5EF91C16878F78003E5C25 /* JITThunks.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = JITThunks.h; sourceTree = "<group>"; };
                0F5F08CE146C762F000472A9 /* UnconditionalFinalizer.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = UnconditionalFinalizer.h; sourceTree = "<group>"; };
+               0F6183201C45BF070072450B /* AirCCallingConvention.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirCCallingConvention.cpp; path = b3/air/AirCCallingConvention.cpp; sourceTree = "<group>"; };
+               0F6183211C45BF070072450B /* AirCCallingConvention.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirCCallingConvention.h; path = b3/air/AirCCallingConvention.h; sourceTree = "<group>"; };
+               0F6183221C45BF070072450B /* AirCustom.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirCustom.cpp; path = b3/air/AirCustom.cpp; sourceTree = "<group>"; };
+               0F6183231C45BF070072450B /* AirEmitShuffle.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirEmitShuffle.cpp; path = b3/air/AirEmitShuffle.cpp; sourceTree = "<group>"; };
+               0F6183241C45BF070072450B /* AirEmitShuffle.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirEmitShuffle.h; path = b3/air/AirEmitShuffle.h; sourceTree = "<group>"; };
+               0F6183251C45BF070072450B /* AirLowerAfterRegAlloc.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirLowerAfterRegAlloc.cpp; path = b3/air/AirLowerAfterRegAlloc.cpp; sourceTree = "<group>"; };
+               0F6183261C45BF070072450B /* AirLowerAfterRegAlloc.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirLowerAfterRegAlloc.h; path = b3/air/AirLowerAfterRegAlloc.h; sourceTree = "<group>"; };
+               0F6183271C45BF070072450B /* AirLowerMacros.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirLowerMacros.cpp; path = b3/air/AirLowerMacros.cpp; sourceTree = "<group>"; };
+               0F6183281C45BF070072450B /* AirLowerMacros.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirLowerMacros.h; path = b3/air/AirLowerMacros.h; sourceTree = "<group>"; };
+               0F6183321C45F35C0072450B /* AirOpcode.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = AirOpcode.h; sourceTree = "<group>"; };
+               0F6183341C45F3B60072450B /* AirOpcodeGenerated.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = AirOpcodeGenerated.h; sourceTree = "<group>"; };
+               0F6183351C45F3B60072450B /* AirOpcodeUtils.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = AirOpcodeUtils.h; sourceTree = "<group>"; };
+               0F6183431C45F62A0072450B /* testair */ = {isa = PBXFileReference; explicitFileType = "compiled.mach-o.executable"; includeInIndex = 0; path = testair; sourceTree = BUILT_PRODUCTS_DIR; };
+               0F6183441C45F6600072450B /* testair.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = testair.cpp; path = b3/air/testair.cpp; sourceTree = "<group>"; };
                0F62016F143FCD2F0068B77C /* DFGAbstractValue.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = DFGAbstractValue.h; path = dfg/DFGAbstractValue.h; sourceTree = "<group>"; };
                0F620170143FCD2F0068B77C /* DFGBasicBlock.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = DFGBasicBlock.h; path = dfg/DFGBasicBlock.h; sourceTree = "<group>"; };
                0F620172143FCD2F0068B77C /* DFGVariableAccessData.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = DFGVariableAccessData.h; path = dfg/DFGVariableAccessData.h; sourceTree = "<group>"; };
                7013CA8A1B491A9400CAE613 /* JSJob.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = JSJob.h; sourceTree = "<group>"; };
                7035587C1C418419004BD7BF /* MapPrototype.js */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.javascript; path = MapPrototype.js; sourceTree = "<group>"; };
                7035587D1C418419004BD7BF /* SetPrototype.js */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.javascript; path = SetPrototype.js; sourceTree = "<group>"; };
-               7035587E1C418458004BD7BF /* MapPrototype.lut.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = MapPrototype.lut.h; path = MapPrototype.lut.h; sourceTree = "<group>"; };
-               7035587F1C418458004BD7BF /* SetPrototype.lut.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = SetPrototype.lut.h; path = SetPrototype.lut.h; sourceTree = "<group>"; };
+               7035587E1C418458004BD7BF /* MapPrototype.lut.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = MapPrototype.lut.h; sourceTree = "<group>"; };
+               7035587F1C418458004BD7BF /* SetPrototype.lut.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = SetPrototype.lut.h; sourceTree = "<group>"; };
                704FD35305697E6D003DBED9 /* BooleanObject.h */ = {isa = PBXFileReference; fileEncoding = 30; indentWidth = 4; lastKnownFileType = sourcecode.c.h; path = BooleanObject.h; sourceTree = "<group>"; tabWidth = 8; };
                705B41A31A6E501E00716757 /* Symbol.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = Symbol.cpp; sourceTree = "<group>"; };
                705B41A41A6E501E00716757 /* Symbol.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = Symbol.h; sourceTree = "<group>"; };
 /* End PBXFileReference section */
 
 /* Begin PBXFrameworksBuildPhase section */
+               0F61833B1C45F62A0072450B /* Frameworks */ = {
+                       isa = PBXFrameworksBuildPhase;
+                       buildActionMask = 2147483647;
+                       files = (
+                               0F61833C1C45F62A0072450B /* Foundation.framework in Frameworks */,
+                               0F61833D1C45F62A0072450B /* JavaScriptCore.framework in Frameworks */,
+                       );
+                       runOnlyForDeploymentPostprocessing = 0;
+               };
                0F9327511C20BCBA00CF6564 /* Frameworks */ = {
                        isa = PBXFrameworksBuildPhase;
                        buildActionMask = 2147483647;
                                0FEC85AD1BDB5CF10080FF74 /* testb3 */,
                                6511230514046A4C002B101D /* testRegExp */,
                                0F9327591C20BCBA00CF6564 /* dynbench */,
+                               0F6183431C45F62A0072450B /* testair */,
                        );
                        name = Products;
                        sourceTree = "<group>";
                                0FEC854C1BDACDC70080FF74 /* AirBasicBlock.cpp */,
                                0FEC854D1BDACDC70080FF74 /* AirBasicBlock.h */,
                                0FB3878B1BFBC44D00E3AB1E /* AirBlockWorklist.h */,
+                               0F6183201C45BF070072450B /* AirCCallingConvention.cpp */,
+                               0F6183211C45BF070072450B /* AirCCallingConvention.h */,
                                0FEC854E1BDACDC70080FF74 /* AirCCallSpecial.cpp */,
                                0FEC854F1BDACDC70080FF74 /* AirCCallSpecial.h */,
                                0FEC85501BDACDC70080FF74 /* AirCode.cpp */,
                                0FEC85511BDACDC70080FF74 /* AirCode.h */,
+                               0F6183221C45BF070072450B /* AirCustom.cpp */,
                                0F10F1A21C420BF0001C07D2 /* AirCustom.h */,
                                0F4570361BE44C910062A629 /* AirEliminateDeadCode.cpp */,
                                0F4570371BE44C910062A629 /* AirEliminateDeadCode.h */,
+                               0F6183231C45BF070072450B /* AirEmitShuffle.cpp */,
+                               0F6183241C45BF070072450B /* AirEmitShuffle.h */,
                                262D85B41C0D650F006ACB61 /* AirFixPartialRegisterStalls.cpp */,
                                262D85B51C0D650F006ACB61 /* AirFixPartialRegisterStalls.h */,
                                0F4C91671C2B3D68004341A6 /* AirFixSpillSlotZDef.h */,
                                26718BA21BE99F780052017B /* AirIteratedRegisterCoalescing.cpp */,
                                26718BA31BE99F780052017B /* AirIteratedRegisterCoalescing.h */,
                                2684D4371C00161C0081D663 /* AirLiveness.h */,
+                               0F6183251C45BF070072450B /* AirLowerAfterRegAlloc.cpp */,
+                               0F6183261C45BF070072450B /* AirLowerAfterRegAlloc.h */,
+                               0F6183271C45BF070072450B /* AirLowerMacros.cpp */,
+                               0F6183281C45BF070072450B /* AirLowerMacros.h */,
                                264091FA1BE2FD4100684DB2 /* AirOpcode.opcodes */,
                                0FB3878C1BFBC44D00E3AB1E /* AirOptimizeBlockOrder.cpp */,
                                0FB3878D1BFBC44D00E3AB1E /* AirOptimizeBlockOrder.h */,
                                0F3730921C0D67EE00052BFA /* AirUseCounts.h */,
                                0FEC856B1BDACDC70080FF74 /* AirValidate.cpp */,
                                0FEC856C1BDACDC70080FF74 /* AirValidate.h */,
+                               0F6183441C45F6600072450B /* testair.cpp */,
                        );
                        name = air;
                        sourceTree = "<group>";
                650FDF8D09D0FCA700769E54 /* Derived Sources */ = {
                        isa = PBXGroup;
                        children = (
+                               0F6183321C45F35C0072450B /* AirOpcode.h */,
+                               0F6183341C45F3B60072450B /* AirOpcodeGenerated.h */,
+                               0F6183351C45F3B60072450B /* AirOpcodeUtils.h */,
                                996B73151BDA05AA00331B84 /* ArrayConstructor.lut.h */,
                                996B73161BDA05AA00331B84 /* ArrayIteratorPrototype.lut.h */,
                                996B73071BD9FA2C00331B84 /* BooleanPrototype.lut.h */,
                                0F4570391BE44C910062A629 /* AirEliminateDeadCode.h in Headers */,
                                79CFC6F01C33B10000C768EA /* LLIntPCRanges.h in Headers */,
                                79D5CD5B1C1106A900CECA07 /* SamplingProfiler.h in Headers */,
+                               0F6183311C45BF070072450B /* AirLowerMacros.h in Headers */,
                                0FEC85771BDACDC70080FF74 /* AirFrequentedBlock.h in Headers */,
                                0FEC85791BDACDC70080FF74 /* AirGenerate.h in Headers */,
                                79DF66B11BF26A570001CF11 /* FTLExceptionHandlerManager.h in Headers */,
                                0FEC85141BDACDAC0080FF74 /* B3ControlValue.h in Headers */,
                                0FEC85C11BE167A00080FF74 /* B3Effects.h in Headers */,
                                0FEC85161BDACDAC0080FF74 /* B3FrequencyClass.h in Headers */,
+                               0F61832F1C45BF070072450B /* AirLowerAfterRegAlloc.h in Headers */,
                                0FEC85171BDACDAC0080FF74 /* B3FrequentedBlock.h in Headers */,
                                0FEC85191BDACDAC0080FF74 /* B3Generate.h in Headers */,
                                0FEC851A1BDACDAC0080FF74 /* B3GenericFrequentedBlock.h in Headers */,
                                A77A424017A0BBFD00A8DB81 /* DFGClobberize.h in Headers */,
                                0F37308D1C0BD29100052BFA /* B3PhiChildren.h in Headers */,
                                A77A424217A0BBFD00A8DB81 /* DFGClobberSet.h in Headers */,
+                               0F61832D1C45BF070072450B /* AirEmitShuffle.h in Headers */,
                                0F3C1F1B1B868E7900ABB08B /* DFGClobbersExitState.h in Headers */,
                                0F04396E1B03DC0B009598B7 /* DFGCombinedLiveness.h in Headers */,
                                0F7B294D14C3CD4C007C3DB1 /* DFGCommon.h in Headers */,
                                86EC9DD01328DF82002B2AD7 /* DFGOperations.h in Headers */,
                                A7D89CFE17A0B8CC00773AD8 /* DFGOSRAvailabilityAnalysisPhase.h in Headers */,
                                0FD82E57141DAF1000179C94 /* DFGOSREntry.h in Headers */,
+                               0F40E4A71C497F7400A577FA /* AirOpcode.h in Headers */,
                                0FD8A32617D51F5700CA2C40 /* DFGOSREntrypointCreationPhase.h in Headers */,
                                0FC0976A1468A6F700CF2442 /* DFGOSRExit.h in Headers */,
                                0F235BEC17178E7300690C7F /* DFGOSRExitBase.h in Headers */,
                                0F7025AA1714B0FC00382C0E /* DFGOSRExitCompilerCommon.h in Headers */,
                                0F392C8A1B46188400844728 /* DFGOSRExitFuzz.h in Headers */,
                                0FEFC9AB1681A3B600567F53 /* DFGOSRExitJumpPlaceholder.h in Headers */,
+                               0F40E4A81C497F7400A577FA /* AirOpcodeGenerated.h in Headers */,
                                0F235BEE17178E7300690C7F /* DFGOSRExitPreparation.h in Headers */,
                                0F6237981AE45CA700D402EA /* DFGPhantomInsertionPhase.h in Headers */,
                                0FFFC95C14EF90AF00C72532 /* DFGPhase.h in Headers */,
                                0F2B66F717B6B5AB00A7AE3F /* JSInt8Array.h in Headers */,
                                A76C51761182748D00715B05 /* JSInterfaceJIT.h in Headers */,
                                E33F50811B8429A400413856 /* JSInternalPromise.h in Headers */,
+                               0F61832A1C45BF070072450B /* AirCCallingConvention.h in Headers */,
                                E33F50791B84225700413856 /* JSInternalPromiseConstructor.h in Headers */,
                                E33F50871B8449EF00413856 /* JSInternalPromiseConstructor.lut.h in Headers */,
                                E33F50851B8437A000413856 /* JSInternalPromiseDeferred.h in Headers */,
                                7C184E1F17BEE22E007CB63A /* JSPromisePrototype.h in Headers */,
                                996B731F1BDA08EF00331B84 /* JSPromisePrototype.lut.h in Headers */,
                                2A05ABD61961DF2400341750 /* JSPropertyNameEnumerator.h in Headers */,
+                               0F40E4A91C497F7400A577FA /* AirOpcodeUtils.h in Headers */,
                                E3EF88751B66DF23003F26CB /* JSPropertyNameIterator.h in Headers */,
                                862553D216136E1A009F17D0 /* JSProxy.h in Headers */,
                                A552C3801ADDB8FE00139726 /* JSRemoteInspector.h in Headers */,
 /* End PBXHeadersBuildPhase section */
 
 /* Begin PBXNativeTarget section */
+               0F6183381C45F62A0072450B /* testair */ = {
+                       isa = PBXNativeTarget;
+                       buildConfigurationList = 0F61833E1C45F62A0072450B /* Build configuration list for PBXNativeTarget "testair" */;
+                       buildPhases = (
+                               0F6183391C45F62A0072450B /* Sources */,
+                               0F61833B1C45F62A0072450B /* Frameworks */,
+                       );
+                       buildRules = (
+                       );
+                       dependencies = (
+                       );
+                       name = testair;
+                       productName = testapi;
+                       productReference = 0F6183431C45F62A0072450B /* testair */;
+                       productType = "com.apple.product-type.tool";
+               };
                0F93274E1C20BCBA00CF6564 /* dynbench */ = {
                        isa = PBXNativeTarget;
                        buildConfigurationList = 0F9327541C20BCBA00CF6564 /* Build configuration list for PBXNativeTarget "dynbench" */;
                                0FEC85941BDB5CF10080FF74 /* testb3 */,
                                5D6B2A47152B9E17005231DE /* Test Tools */,
                                0F93274E1C20BCBA00CF6564 /* dynbench */,
+                               0F6183381C45F62A0072450B /* testair */,
                        );
                };
 /* End PBXProject section */
 /* End PBXShellScriptBuildPhase section */
 
 /* Begin PBXSourcesBuildPhase section */
+               0F6183391C45F62A0072450B /* Sources */ = {
+                       isa = PBXSourcesBuildPhase;
+                       buildActionMask = 2147483647;
+                       files = (
+                               0F6183451C45F6600072450B /* testair.cpp in Sources */,
+                       );
+                       runOnlyForDeploymentPostprocessing = 0;
+               };
                0F93274F1C20BCBA00CF6564 /* Sources */ = {
                        isa = PBXSourcesBuildPhase;
                        buildActionMask = 2147483647;
                                52B717B51A0597E1007AF4F3 /* ControlFlowProfiler.cpp in Sources */,
                                0FBADF541BD1F4B800E073C1 /* CopiedBlock.cpp in Sources */,
                                C240305514B404E60079EB64 /* CopiedSpace.cpp in Sources */,
+                               0F6183301C45BF070072450B /* AirLowerMacros.cpp in Sources */,
                                C2239D1716262BDD005AC5FD /* CopyVisitor.cpp in Sources */,
                                2A111245192FCE79005EE18D /* CustomGetterSetter.cpp in Sources */,
                                62E3D5F01B8D0B7300B868BB /* DataFormat.cpp in Sources */,
                                0F0981F71BC5E565004814F8 /* DFGCopyBarrierOptimizationPhase.cpp in Sources */,
                                0FBE0F7216C1DB030082C5E8 /* DFGCPSRethreadingPhase.cpp in Sources */,
                                A7D89CF517A0B8CC00773AD8 /* DFGCriticalEdgeBreakingPhase.cpp in Sources */,
+                               0F6183291C45BF070072450B /* AirCCallingConvention.cpp in Sources */,
                                0FFFC95914EF90A600C72532 /* DFGCSEPhase.cpp in Sources */,
                                0F2FC77216E12F710038D976 /* DFGDCEPhase.cpp in Sources */,
                                0F338E121BF0276C0013C88F /* B3OpaqueByproducts.cpp in Sources */,
                                A78A9774179738B8009DF744 /* DFGFailedFinalizer.cpp in Sources */,
                                A78A9776179738B8009DF744 /* DFGFinalizer.cpp in Sources */,
                                0F2BDC15151C5D4D00CD8910 /* DFGFixupPhase.cpp in Sources */,
+                               0F61832C1C45BF070072450B /* AirEmitShuffle.cpp in Sources */,
                                0F9D339617FFC4E60073C2BC /* DFGFlushedAt.cpp in Sources */,
                                A7D89CF717A0B8CC00773AD8 /* DFGFlushFormat.cpp in Sources */,
                                0F69CC88193AC60A0045759E /* DFGFrozenValue.cpp in Sources */,
                                E33F50841B8437A000413856 /* JSInternalPromiseDeferred.cpp in Sources */,
                                E33F50741B8421C000413856 /* JSInternalPromisePrototype.cpp in Sources */,
                                A503FA1B188E0FB000110F14 /* JSJavaScriptCallFrame.cpp in Sources */,
+                               0F61832E1C45BF070072450B /* AirLowerAfterRegAlloc.cpp in Sources */,
                                A503FA1D188E0FB000110F14 /* JSJavaScriptCallFramePrototype.cpp in Sources */,
                                7013CA8B1B491A9400CAE613 /* JSJob.cpp in Sources */,
                                140B7D1D0DC69AF7009C42B8 /* JSLexicalEnvironment.cpp in Sources */,
                                0F9D4C0C1C3E1C11006CD984 /* FTLExceptionTarget.cpp in Sources */,
                                0FB1058B1675483100F8AB6E /* ProfilerOSRExit.cpp in Sources */,
                                0FB1058D1675483700F8AB6E /* ProfilerOSRExitSite.cpp in Sources */,
+                               0F61832B1C45BF070072450B /* AirCustom.cpp in Sources */,
                                0F13912B16771C3A009CCB07 /* ProfilerProfiledBytecodes.cpp in Sources */,
                                0FD3E40D1B618B6600C80E1E /* PropertyCondition.cpp in Sources */,
                                A7FB60A4103F7DC20017A286 /* PropertyDescriptor.cpp in Sources */,
 /* End PBXSourcesBuildPhase section */
 
 /* Begin PBXTargetDependency section */
+               0F6183471C45F67A0072450B /* PBXTargetDependency */ = {
+                       isa = PBXTargetDependency;
+                       target = 0F6183381C45F62A0072450B /* testair */;
+                       targetProxy = 0F6183461C45F67A0072450B /* PBXContainerItemProxy */;
+               };
                0F93275D1C20BF3A00CF6564 /* PBXTargetDependency */ = {
                        isa = PBXTargetDependency;
                        target = 0F93274E1C20BCBA00CF6564 /* dynbench */;
                        };
                        name = Production;
                };
+               0F61833F1C45F62A0072450B /* Debug */ = {
+                       isa = XCBuildConfiguration;
+                       baseConfigurationReference = BC021BF2136900C300FC5467 /* ToolExecutable.xcconfig */;
+                       buildSettings = {
+                               CODE_SIGN_ENTITLEMENTS_ios_testair = entitlements.plist;
+                               PRODUCT_NAME = testair;
+                       };
+                       name = Debug;
+               };
+               0F6183401C45F62A0072450B /* Release */ = {
+                       isa = XCBuildConfiguration;
+                       baseConfigurationReference = BC021BF2136900C300FC5467 /* ToolExecutable.xcconfig */;
+                       buildSettings = {
+                               CODE_SIGN_ENTITLEMENTS_ios_testair = entitlements.plist;
+                               PRODUCT_NAME = testair;
+                       };
+                       name = Release;
+               };
+               0F6183411C45F62A0072450B /* Profiling */ = {
+                       isa = XCBuildConfiguration;
+                       baseConfigurationReference = BC021BF2136900C300FC5467 /* ToolExecutable.xcconfig */;
+                       buildSettings = {
+                               CODE_SIGN_ENTITLEMENTS_ios_testair = entitlements.plist;
+                               PRODUCT_NAME = testair;
+                       };
+                       name = Profiling;
+               };
+               0F6183421C45F62A0072450B /* Production */ = {
+                       isa = XCBuildConfiguration;
+                       baseConfigurationReference = BC021BF2136900C300FC5467 /* ToolExecutable.xcconfig */;
+                       buildSettings = {
+                               CODE_SIGN_ENTITLEMENTS_ios_testair = entitlements.plist;
+                               PRODUCT_NAME = testair;
+                       };
+                       name = Production;
+               };
                0F9327551C20BCBA00CF6564 /* Debug */ = {
                        isa = XCBuildConfiguration;
                        baseConfigurationReference = BC021BF2136900C300FC5467 /* ToolExecutable.xcconfig */;
                        defaultConfigurationIsVisible = 0;
                        defaultConfigurationName = Production;
                };
+               0F61833E1C45F62A0072450B /* Build configuration list for PBXNativeTarget "testair" */ = {
+                       isa = XCConfigurationList;
+                       buildConfigurations = (
+                               0F61833F1C45F62A0072450B /* Debug */,
+                               0F6183401C45F62A0072450B /* Release */,
+                               0F6183411C45F62A0072450B /* Profiling */,
+                               0F6183421C45F62A0072450B /* Production */,
+                       );
+                       defaultConfigurationIsVisible = 0;
+                       defaultConfigurationName = Production;
+               };
                0F9327541C20BCBA00CF6564 /* Build configuration list for PBXNativeTarget "dynbench" */ = {
                        isa = XCConfigurationList;
                        buildConfigurations = (
index 848bac3..608bb45 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2008, 2012, 2014, 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2008, 2012, 2014-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -76,6 +76,15 @@ inline bool isX86_64()
 #endif
 }
 
+inline bool isIOS()
+{
+#if PLATFORM(IOS)
+    return true;
+#else
+    return false;
+#endif
+}
+
 inline bool optimizeForARMv7IDIVSupported()
 {
     return isARMv7IDIVSupported() && Options::useArchitectureSpecificOptimizations();
index a8400e1..2637144 100644 (file)
@@ -1411,6 +1411,16 @@ public:
     }
 #endif
 
+    void swap32(RegisterID src, RegisterID dest)
+    {
+        m_assembler.xchgl_rr(src, dest);
+    }
+
+    void swap32(RegisterID src, Address dest)
+    {
+        m_assembler.xchgl_rm(src, dest.offset, dest.base);
+    }
+
     void moveConditionally32(RelationalCondition cond, RegisterID left, RegisterID right, RegisterID src, RegisterID dest)
     {
         m_assembler.cmpl_rr(right, left);
index 3448849..1e70c70 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2008, 2012, 2014, 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2008, 2012, 2014-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -655,6 +655,16 @@ public:
         return DataLabel32(this);
     }
 
+    void swap64(RegisterID src, RegisterID dest)
+    {
+        m_assembler.xchgq_rr(src, dest);
+    }
+
+    void swap64(RegisterID src, Address dest)
+    {
+        m_assembler.xchgq_rm(src, dest.offset, dest.base);
+    }
+
     void move64ToDouble(RegisterID src, FPRegisterID dest)
     {
         m_assembler.movq_rr(src, dest);
index 67702ff..e443de2 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2008, 2012-2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2008, 2012-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -1431,6 +1431,11 @@ public:
             m_formatter.oneByteOp(OP_XCHG_EvGv, src, dst);
     }
 
+    void xchgl_rm(RegisterID src, int offset, RegisterID base)
+    {
+        m_formatter.oneByteOp(OP_XCHG_EvGv, src, base, offset);
+    }
+
 #if CPU(X86_64)
     void xchgq_rr(RegisterID src, RegisterID dst)
     {
@@ -1441,6 +1446,11 @@ public:
         else
             m_formatter.oneByteOp64(OP_XCHG_EvGv, src, dst);
     }
+
+    void xchgq_rm(RegisterID src, int offset, RegisterID base)
+    {
+        m_formatter.oneByteOp64(OP_XCHG_EvGv, src, base, offset);
+    }
 #endif
 
     void movl_rr(RegisterID src, RegisterID dst)
index 327f64b..6df6222 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -39,7 +39,7 @@ public:
 
     ~CCallValue();
 
-    Effects effects;
+    Effects effects { Effects::forCall() };
 
 private:
     friend class Procedure;
@@ -47,7 +47,6 @@ private:
     template<typename... Arguments>
     CCallValue(unsigned index, Type type, Origin origin, Arguments... arguments)
         : Value(index, CheckedOpcode, CCall, type, origin, arguments...)
-        , effects(Effects::forCall())
     {
         RELEASE_ASSERT(numChildren() >= 1);
     }
index f6fc30e..48d1f77 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -52,6 +52,12 @@ Compilation::Compilation(VM& vm, Procedure& proc, unsigned optLevel)
     m_byproducts = proc.releaseByproducts();
 }
 
+Compilation::Compilation(MacroAssemblerCodeRef codeRef, std::unique_ptr<OpaqueByproducts> byproducts)
+    : m_codeRef(codeRef)
+    , m_byproducts(WTFMove(byproducts))
+{
+}
+
 Compilation::~Compilation()
 {
 }
index 284927e..7c3b7fb 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -55,6 +55,11 @@ class Compilation {
 
 public:
     JS_EXPORT_PRIVATE Compilation(VM&, Procedure&, unsigned optLevel = 1);
+
+    // This constructor allows you to manually create a Compilation. It's currently only used by test
+    // code. Probably best to keep it that way.
+    JS_EXPORT_PRIVATE Compilation(MacroAssemblerCodeRef, std::unique_ptr<OpaqueByproducts>);
+    
     JS_EXPORT_PRIVATE ~Compilation();
 
     MacroAssemblerCodePtr code() const { return m_codeRef.code(); }
index cee4c4f..9a2950e 100644 (file)
@@ -35,6 +35,7 @@
 #include "AirStackSlot.h"
 #include "B3ArgumentRegValue.h"
 #include "B3BasicBlockInlines.h"
+#include "B3BlockWorklist.h"
 #include "B3CCallValue.h"
 #include "B3CheckSpecial.h"
 #include "B3Commutativity.h"
@@ -99,6 +100,15 @@ public:
             }
         }
 
+        // Figure out which blocks are not rare.
+        m_fastWorklist.push(m_procedure[0]);
+        while (B3::BasicBlock* block = m_fastWorklist.pop()) {
+            for (B3::FrequentedBlock& successor : block->successors()) {
+                if (!successor.isRare())
+                    m_fastWorklist.push(successor.block());
+            }
+        }
+
         m_procedure.resetValueOwners(); // Used by crossesInterference().
 
         // Lower defs before uses on a global level. This is a good heuristic to lock down a
@@ -108,6 +118,8 @@ public:
             // Reset some state.
             m_insts.resize(0);
 
+            m_isRare = !m_fastWorklist.saw(block);
+
             if (verbose)
                 dataLog("Lowering Block ", *block, ":\n");
             
@@ -1552,37 +1564,6 @@ private:
             inverted);
     }
 
-    template<typename BankInfo>
-    Arg marshallCCallArgument(unsigned& argumentCount, unsigned& stackOffset, Value* child)
-    {
-        unsigned argumentIndex = argumentCount++;
-        if (argumentIndex < BankInfo::numberOfArgumentRegisters) {
-            Tmp result = Tmp(BankInfo::toArgumentRegister(argumentIndex));
-            append(relaxedMoveForType(child->type()), immOrTmp(child), result);
-            return result;
-        }
-
-#if CPU(ARM64) && PLATFORM(IOS)
-        // iOS does not follow the ARM64 ABI regarding function calls.
-        // Arguments must be packed.
-        unsigned slotSize = sizeofType(child->type());
-        stackOffset = WTF::roundUpToMultipleOf(slotSize, stackOffset);
-#else
-        unsigned slotSize = sizeof(void*);
-#endif
-        Arg result = Arg::callArg(stackOffset);
-        stackOffset += slotSize;
-        
-        // Put the code for storing the argument before anything else. This significantly eases the
-        // burden on the register allocator. If we could, we'd hoist these stores as far as
-        // possible.
-        // FIXME: Add a phase to hoist stores as high as possible to relieve register pressure.
-        // https://bugs.webkit.org/show_bug.cgi?id=151063
-        m_insts.last().insert(0, createStore(child, result));
-        
-        return result;
-    }
-
     void lower()
     {
         switch (m_value->opcode()) {
@@ -1934,14 +1915,10 @@ private:
             return;
         }
 
-        case CCall: {
+        case B3::CCall: {
             CCallValue* cCall = m_value->as<CCallValue>();
-            Inst inst(Patch, cCall, Arg::special(m_code.cCallSpecial()));
 
-            // This is a bit weird - we have a super intense contract with Arg::CCallSpecial. It might
-            // be better if we factored Air::CCallSpecial out of the Air namespace and made it a B3
-            // thing.
-            // FIXME: https://bugs.webkit.org/show_bug.cgi?id=151045
+            Inst inst(m_isRare ? Air::ColdCCall : Air::CCall, cCall);
 
             // We have a ton of flexibility regarding the callee argument, but currently, we don't
             // use it yet. It gets weird for reasons:
@@ -1954,48 +1931,13 @@ private:
             // FIXME: https://bugs.webkit.org/show_bug.cgi?id=151052
             inst.args.append(tmp(cCall->child(0)));
 
-            // We need to tell Air what registers this defines.
-            inst.args.append(Tmp(GPRInfo::returnValueGPR));
-            inst.args.append(Tmp(GPRInfo::returnValueGPR2));
-            inst.args.append(Tmp(FPRInfo::returnValueFPR));
-
-            // Now marshall the arguments. This is where we implement the C calling convention. After
-            // this, Air does not know what the convention is; it just takes our word for it.
-            unsigned gpArgumentCount = 0;
-            unsigned fpArgumentCount = 0;
-            unsigned stackOffset = 0;
-            for (unsigned i = 1; i < cCall->numChildren(); ++i) {
-                Value* argChild = cCall->child(i);
-                Arg arg;
-                
-                switch (Arg::typeForB3Type(argChild->type())) {
-                case Arg::GP:
-                    arg = marshallCCallArgument<GPRInfo>(gpArgumentCount, stackOffset, argChild);
-                    break;
+            if (cCall->type() != Void)
+                inst.args.append(tmp(cCall));
 
-                case Arg::FP:
-                    arg = marshallCCallArgument<FPRInfo>(fpArgumentCount, stackOffset, argChild);
-                    break;
-                }
+            for (unsigned i = 1; i < cCall->numChildren(); ++i)
+                inst.args.append(immOrTmp(cCall->child(i)));
 
-                if (arg.isTmp())
-                    inst.args.append(arg);
-            }
-            
             m_insts.last().append(WTFMove(inst));
-
-            switch (cCall->type()) {
-            case Void:
-                break;
-            case Int32:
-            case Int64:
-                append(Move, Tmp(GPRInfo::returnValueGPR), tmp(cCall));
-                break;
-            case Float:
-            case Double:
-                append(MoveDouble, Tmp(FPRInfo::returnValueFPR), tmp(cCall));
-                break;
-            }
             return;
         }
 
@@ -2287,11 +2229,13 @@ private:
 
     UseCounts m_useCounts;
     PhiChildren m_phiChildren;
+    BlockWorklist m_fastWorklist;
 
     Vector<Vector<Inst, 4>> m_insts;
     Vector<Inst> m_prologue;
 
     B3::BasicBlock* m_block;
+    bool m_isRare;
     unsigned m_index;
     Value* m_value;
 
index 93caa11..8556e9e 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -39,7 +39,7 @@ class OpaqueByproducts {
     WTF_MAKE_FAST_ALLOCATED;
 public:
     OpaqueByproducts();
-    ~OpaqueByproducts();
+    JS_EXPORT_PRIVATE ~OpaqueByproducts();
 
     size_t count() const { return m_byproducts.size(); }
     
index 6984f6d..5d554a7 100644 (file)
@@ -200,21 +200,14 @@ bool StackmapSpecial::isArgValidForValue(const Air::Arg& arg, Value* value)
     case Arg::Tmp:
     case Arg::Imm:
     case Arg::Imm64:
-    case Arg::Stack:
-    case Arg::CallArg:
-        break; // OK
-    case Arg::Addr:
-        if (arg.base() != Tmp(GPRInfo::callFrameRegister)
-            && arg.base() != Tmp(MacroAssembler::stackPointerRegister))
-            return false;
         break;
     default:
-        return false;
+        if (!arg.isStackMemory())
+            return false;
+        break;
     }
-    
-    Arg::Type type = Arg::typeForB3Type(value->type());
 
-    return arg.isType(type);
+    return arg.canRepresent(value);
 }
 
 bool StackmapSpecial::isArgValidForRep(Air::Code& code, const Air::Arg& arg, const ValueRep& rep)
index 8d83bcb..033c972 100644 (file)
@@ -30,6 +30,9 @@
 
 #include "AirSpecial.h"
 #include "AirStackSlot.h"
+#include "B3Value.h"
+#include "FPRInfo.h"
+#include "GPRInfo.h"
 
 #if COMPILER(GCC) && ASSERT_DISABLED
 #pragma GCC diagnostic push
 
 namespace JSC { namespace B3 { namespace Air {
 
+bool Arg::isStackMemory() const
+{
+    switch (kind()) {
+    case Addr:
+        return base() == Air::Tmp(GPRInfo::callFrameRegister)
+            || base() == Air::Tmp(MacroAssembler::stackPointerRegister);
+    case Stack:
+    case CallArg:
+        return true;
+    default:
+        return false;
+    }
+}
+
 bool Arg::isRepresentableAs(Width width, Signedness signedness) const
 {
     switch (signedness) {
@@ -67,6 +84,31 @@ bool Arg::isRepresentableAs(Width width, Signedness signedness) const
     ASSERT_NOT_REACHED();
 }
 
+bool Arg::usesTmp(Air::Tmp tmp) const
+{
+    bool uses = false;
+    const_cast<Arg*>(this)->forEachTmpFast(
+        [&] (Air::Tmp otherTmp) {
+            if (otherTmp == tmp)
+                uses = true;
+        });
+    return uses;
+}
+
+bool Arg::canRepresent(Value* value) const
+{
+    return isType(typeForB3Type(value->type()));
+}
+
+bool Arg::isCompatibleType(const Arg& other) const
+{
+    if (hasType())
+        return other.isType(type());
+    if (other.hasType())
+        return isType(other.type());
+    return true;
+}
+
 void Arg::dump(PrintStream& out) const
 {
     switch (m_kind) {
@@ -117,6 +159,9 @@ void Arg::dump(PrintStream& out) const
     case Special:
         out.print(pointerDump(special()));
         return;
+    case WidthArg:
+        out.print(width());
+        return;
     }
 
     RELEASE_ASSERT_NOT_REACHED();
@@ -167,6 +212,9 @@ void printInternal(PrintStream& out, Arg::Kind kind)
     case Arg::Special:
         out.print("Special");
         return;
+    case Arg::WidthArg:
+        out.print("WidthArg");
+        return;
     }
 
     RELEASE_ASSERT_NOT_REACHED();
@@ -231,16 +279,16 @@ void printInternal(PrintStream& out, Arg::Width width)
 {
     switch (width) {
     case Arg::Width8:
-        out.print("Width8");
+        out.print("8");
         return;
     case Arg::Width16:
-        out.print("Width16");
+        out.print("16");
         return;
     case Arg::Width32:
-        out.print("Width32");
+        out.print("32");
         return;
     case Arg::Width64:
-        out.print("Width64");
+        out.print("64");
         return;
     }
 
index 8f3d115..22d802a 100644 (file)
 #pragma GCC diagnostic ignored "-Wreturn-type"
 #endif // COMPILER(GCC) && ASSERT_DISABLED
 
-namespace JSC { namespace B3 { namespace Air {
+namespace JSC { namespace B3 {
+
+class Value;
+
+namespace Air {
 
 class Special;
 class StackSlot;
@@ -74,7 +78,8 @@ public:
         RelCond,
         ResCond,
         DoubleCond,
-        Special
+        Special,
+        WidthArg
     };
 
     enum Role : int8_t {
@@ -161,6 +166,13 @@ public:
 
     static const unsigned numTypes = 2;
 
+    template<typename Functor>
+    static void forEachType(const Functor& functor)
+    {
+        functor(GP);
+        functor(FP);
+    }
+
     enum Width : int8_t {
         Width8,
         Width16,
@@ -227,6 +239,26 @@ public:
         return isAnyUse(role) && !isColdUse(role);
     }
 
+    static Role cooled(Role role)
+    {
+        switch (role) {
+        case ColdUse:
+        case LateColdUse:
+        case UseDef:
+        case UseZDef:
+        case Def:
+        case ZDef:
+        case UseAddr:
+        case Scratch:
+        case EarlyDef:
+            return role;
+        case Use:
+            return ColdUse;
+        case LateUse:
+            return LateColdUse;
+        }
+    }
+
     // Returns true if the Role implies that the Inst will Use the Arg before doing anything else.
     static bool isEarlyUse(Role role)
     {
@@ -449,6 +481,11 @@ public:
         return result;
     }
 
+    static Arg immPtr(const void* address)
+    {
+        return imm64(bitwise_cast<intptr_t>(address));
+    }
+
     static Arg addr(Air::Tmp base, int32_t offset = 0)
     {
         ASSERT(base.isGP());
@@ -563,6 +600,14 @@ public:
         return result;
     }
 
+    static Arg widthArg(Width width)
+    {
+        Arg result;
+        result.m_kind = WidthArg;
+        result.m_offset = width;
+        return result;
+    }
+
     bool operator==(const Arg& other) const
     {
         return m_offset == other.m_offset
@@ -599,6 +644,11 @@ public:
         return kind() == Imm64;
     }
 
+    bool isSomeImm() const
+    {
+        return isImm() || isImm64();
+    }
+
     bool isAddr() const
     {
         return kind() == Addr;
@@ -619,6 +669,21 @@ public:
         return kind() == Index;
     }
 
+    bool isMemory() const
+    {
+        switch (kind()) {
+        case Addr:
+        case Stack:
+        case CallArg:
+        case Index:
+            return true;
+        default:
+            return false;
+        }
+    }
+
+    bool isStackMemory() const;
+
     bool isRelCond() const
     {
         return kind() == RelCond;
@@ -651,6 +716,11 @@ public:
         return kind() == Special;
     }
 
+    bool isWidthArg() const
+    {
+        return kind() == WidthArg;
+    }
+
     bool isAlive() const
     {
         return isTmp() || isStack();
@@ -694,18 +764,7 @@ public:
         return m_base;
     }
 
-    bool hasOffset() const
-    {
-        switch (kind()) {
-        case Addr:
-        case Stack:
-        case CallArg:
-        case Index:
-            return true;
-        default:
-            return false;
-        }
-    }
+    bool hasOffset() const { return isMemory(); }
     
     int32_t offset() const
     {
@@ -744,6 +803,12 @@ public:
         return bitwise_cast<Air::Special*>(m_offset);
     }
 
+    Width width() const
+    {
+        ASSERT(kind() == WidthArg);
+        return static_cast<Width>(m_offset);
+    }
+
     bool isGPTmp() const
     {
         return isTmp() && tmp().isGP();
@@ -768,6 +833,7 @@ public:
         case ResCond:
         case DoubleCond:
         case Special:
+        case WidthArg:
             return true;
         case Tmp:
             return isGPTmp();
@@ -786,6 +852,7 @@ public:
         case ResCond:
         case DoubleCond:
         case Special:
+        case WidthArg:
         case Invalid:
             return false;
         case Addr:
@@ -829,6 +896,10 @@ public:
         ASSERT_NOT_REACHED();
     }
 
+    bool canRepresent(Value* value) const;
+
+    bool isCompatibleType(const Arg& other) const;
+
     bool isGPR() const
     {
         return isTmp() && tmp().isGPR();
@@ -970,6 +1041,7 @@ public:
         case ResCond:
         case DoubleCond:
         case Special:
+        case WidthArg:
             return true;
         }
         ASSERT_NOT_REACHED();
@@ -992,6 +1064,8 @@ public:
         }
     }
 
+    bool usesTmp(Air::Tmp tmp) const;
+
     // This is smart enough to know that an address arg in a Def or UseDef rule will use its
     // tmps and never def them. For example, this:
     //
index 738c3ee..e7fed3b 100644 (file)
@@ -78,15 +78,17 @@ public:
     InstList& insts() { return m_insts; }
 
     template<typename Inst>
-    void appendInst(Inst&& inst)
+    Inst& appendInst(Inst&& inst)
     {
         m_insts.append(std::forward<Inst>(inst));
+        return m_insts.last();
     }
 
     template<typename... Arguments>
-    void append(Arguments&&... arguments)
+    Inst& append(Arguments&&... arguments)
     {
         m_insts.append(Inst(std::forward<Arguments>(arguments)...));
+        return m_insts.last();
     }
 
     // The "0" case is the case to which the branch jumps, so the "then" case. The "1" case is the
diff --git a/Source/JavaScriptCore/b3/air/AirCCallingConvention.cpp b/Source/JavaScriptCore/b3/air/AirCCallingConvention.cpp
new file mode 100644 (file)
index 0000000..63128a7
--- /dev/null
@@ -0,0 +1,127 @@
+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include "config.h"
+#include "AirCCallingConvention.h"
+
+#if ENABLE(B3_JIT)
+
+#include "AirCCallSpecial.h"
+#include "AirCode.h"
+#include "B3CCallValue.h"
+#include "B3ValueInlines.h"
+
+namespace JSC { namespace B3 { namespace Air {
+
+namespace {
+
+template<typename BankInfo>
+Arg marshallCCallArgumentImpl(unsigned& argumentCount, unsigned& stackOffset, Value* child)
+{
+    unsigned argumentIndex = argumentCount++;
+    if (argumentIndex < BankInfo::numberOfArgumentRegisters)
+        return Tmp(BankInfo::toArgumentRegister(argumentIndex));
+
+    unsigned slotSize;
+    if (isARM64() && isIOS()) {
+        // Arguments are packed.
+        slotSize = sizeofType(child->type());
+    } else {
+        // Arguments are aligned.
+        slotSize = 8;
+    }
+
+    stackOffset = WTF::roundUpToMultipleOf(slotSize, stackOffset);
+    Arg result = Arg::callArg(stackOffset);
+    stackOffset += slotSize;
+    return result;
+}
+
+Arg marshallCCallArgument(
+    unsigned& gpArgumentCount, unsigned& fpArgumentCount, unsigned& stackOffset, Value* child)
+{
+    switch (Arg::typeForB3Type(child->type())) {
+    case Arg::GP:
+        return marshallCCallArgumentImpl<GPRInfo>(gpArgumentCount, stackOffset, child);
+    case Arg::FP:
+        return marshallCCallArgumentImpl<FPRInfo>(fpArgumentCount, stackOffset, child);
+    }
+    RELEASE_ASSERT_NOT_REACHED();
+    return Arg();
+}
+
+} // anonymous namespace
+
+Vector<Arg> computeCCallingConvention(Code& code, CCallValue* value)
+{
+    Vector<Arg> result;
+    result.append(Tmp(CCallSpecial::scratchRegister));
+    unsigned gpArgumentCount = 0;
+    unsigned fpArgumentCount = 0;
+    unsigned stackOffset = 0;
+    for (unsigned i = 1; i < value->numChildren(); ++i) {
+        result.append(
+            marshallCCallArgument(gpArgumentCount, fpArgumentCount, stackOffset, value->child(i)));
+    }
+    code.requestCallArgAreaSize(WTF::roundUpToMultipleOf(stackAlignmentBytes(), stackOffset));
+    return result;
+}
+
+Tmp cCallResult(Type type)
+{
+    switch (type) {
+    case Void:
+        return Tmp();
+    case Int32:
+    case Int64:
+        return Tmp(GPRInfo::returnValueGPR);
+    case Float:
+    case Double:
+        return Tmp(FPRInfo::returnValueFPR);
+    }
+
+    RELEASE_ASSERT_NOT_REACHED();
+    return Tmp();
+}
+
+Inst buildCCall(Code& code, Value* origin, const Vector<Arg>& arguments)
+{
+    Inst inst(Patch, origin, Arg::special(code.cCallSpecial()));
+    inst.args.append(arguments[0]);
+    inst.args.append(Tmp(GPRInfo::returnValueGPR));
+    inst.args.append(Tmp(GPRInfo::returnValueGPR2));
+    inst.args.append(Tmp(FPRInfo::returnValueFPR));
+    for (unsigned i = 1; i < arguments.size(); ++i) {
+        Arg arg = arguments[i];
+        if (arg.isTmp())
+            inst.args.append(arg);
+    }
+    return inst;
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
diff --git a/Source/JavaScriptCore/b3/air/AirCCallingConvention.h b/Source/JavaScriptCore/b3/air/AirCCallingConvention.h
new file mode 100644 (file)
index 0000000..11005e8
--- /dev/null
@@ -0,0 +1,55 @@
+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#ifndef AirCCallingConvention_h
+#define AirCCallingConvention_h
+
+#if ENABLE(B3_JIT)
+
+#include "AirArg.h"
+#include "AirInst.h"
+#include "B3Type.h"
+#include <wtf/Vector.h>
+
+namespace JSC { namespace B3 {
+
+class CCallValue;
+
+namespace Air {
+
+class Code;
+
+Vector<Arg> computeCCallingConvention(Code&, CCallValue*);
+
+Tmp cCallResult(Type);
+
+Inst buildCCall(Code&, Value* origin, const Vector<Arg>&);
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
+#endif // AirCCallingConvention_h
+
index da9a592..694a9a6 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -60,12 +60,13 @@ public:
 
     Procedure& proc() { return m_proc; }
 
-    BasicBlock* addBlock(double frequency = 1);
+    JS_EXPORT_PRIVATE BasicBlock* addBlock(double frequency = 1);
 
     // Note that you can rely on stack slots always getting indices that are larger than the index
     // of any prior stack slot. In fact, all stack slots you create in the future will have an index
     // that is >= stackSlots().size().
-    StackSlot* addStackSlot(unsigned byteSize, StackSlotKind, StackSlotValue* = nullptr);
+    JS_EXPORT_PRIVATE StackSlot* addStackSlot(
+        unsigned byteSize, StackSlotKind, StackSlotValue* = nullptr);
     StackSlot* addStackSlot(StackSlotValue*);
 
     Special* addSpecial(std::unique_ptr<Special>);
diff --git a/Source/JavaScriptCore/b3/air/AirCustom.cpp b/Source/JavaScriptCore/b3/air/AirCustom.cpp
new file mode 100644 (file)
index 0000000..220e495
--- /dev/null
@@ -0,0 +1,160 @@
+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include "config.h"
+#include "AirCustom.h"
+
+#if ENABLE(B3_JIT)
+
+#include "B3CCallValue.h"
+#include "B3ValueInlines.h"
+
+namespace JSC { namespace B3 { namespace Air {
+
+bool CCallCustom::isValidForm(Inst& inst)
+{
+    CCallValue* value = inst.origin->as<CCallValue>();
+    if (!value)
+        return false;
+
+    if (inst.args.size() != (value->type() == Void ? 0 : 1) + value->numChildren())
+        return false;
+
+    // The arguments can only refer to the stack, tmps, or immediates.
+    for (Arg& arg : inst.args) {
+        if (!arg.isTmp() && !arg.isStackMemory() && !arg.isSomeImm())
+            return false;
+    }
+
+    unsigned offset = 0;
+
+    if (!inst.args[0].isGP())
+        return false;
+
+    // If there is a result then it cannot be an immediate.
+    if (value->type() != Void) {
+        if (inst.args[1].isSomeImm())
+            return false;
+        if (!inst.args[1].canRepresent(value))
+            return false;
+        offset++;
+    }
+
+    for (unsigned i = value->numChildren(); i-- > 1;) {
+        Value* child = value->child(i);
+        Arg arg = inst.args[offset + i];
+        if (!arg.canRepresent(child))
+            return false;
+    }
+
+    return true;
+}
+
+CCallHelpers::Jump CCallCustom::generate(Inst& inst, CCallHelpers&, GenerationContext&)
+{
+    dataLog("FATAL: Unlowered C call: ", inst, "\n");
+    UNREACHABLE_FOR_PLATFORM();
+    return CCallHelpers::Jump();
+}
+
+bool ShuffleCustom::isValidForm(Inst& inst)
+{
+    if (inst.args.size() % 3)
+        return false;
+
+    // A destination may only appear once. This requirement allows us to avoid the undefined behavior
+    // of having a destination that is supposed to get multiple inputs simultaneously. It also
+    // imposes some interesting constraints on the "shape" of the shuffle. If we treat a shuffle pair
+    // as an edge and the Args as nodes, then the single-destination requirement means that the
+    // shuffle graph consists of two kinds of subgraphs:
+    //
+    // - Spanning trees. We call these shifts. They can be executed as a sequence of Move
+    //   instructions and don't usually require scratch registers.
+    //
+    // - Closed loops. These loops consist of nodes that have one successor and one predecessor, so
+    //   there is no way to "get into" the loop from outside of it. These can be executed using swaps
+    //   or by saving one of the Args to a scratch register and executing it as a shift.
+    HashSet<Arg> dsts;
+
+    for (unsigned i = 0; i < inst.args.size(); ++i) {
+        Arg arg = inst.args[i];
+        unsigned mode = i % 3;
+
+        if (mode == 2) {
+            // It's the width.
+            if (!arg.isWidthArg())
+                return false;
+            continue;
+        }
+
+        // The source can be an immediate.
+        if (!mode) {
+            if (arg.isSomeImm())
+                continue;
+
+            if (!arg.isCompatibleType(inst.args[i + 1]))
+                return false;
+        } else {
+            ASSERT(mode == 1);
+            if (!dsts.add(arg).isNewEntry)
+                return false;
+        }
+
+        if (arg.isTmp() || arg.isMemory())
+            continue;
+
+        return false;
+    }
+
+    // No destination register may appear in any address expressions. The lowering can't handle it
+    // and it's not useful for the way we end up using Shuffles. Normally, Shuffles only used for
+    // stack addresses and non-stack registers.
+    for (Arg& arg : inst.args) {
+        if (!arg.isMemory())
+            continue;
+        bool ok = true;
+        arg.forEachTmpFast(
+            [&] (Tmp tmp) {
+                if (dsts.contains(tmp))
+                    ok = false;
+            });
+        if (!ok)
+            return false;
+    }
+
+    return true;
+}
+
+CCallHelpers::Jump ShuffleCustom::generate(Inst& inst, CCallHelpers&, GenerationContext&)
+{
+    dataLog("FATAL: Unlowered shuffle: ", inst, "\n");
+    UNREACHABLE_FOR_PLATFORM();
+    return CCallHelpers::Jump();
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
index a4002ae..25824e2 100644 (file)
@@ -30,6 +30,7 @@
 
 #include "AirInst.h"
 #include "AirSpecial.h"
+#include "B3Value.h"
 
 namespace JSC { namespace B3 { namespace Air {
 
@@ -51,6 +52,8 @@ namespace JSC { namespace B3 { namespace Air {
 // you need to carry extra state around with the instruction. Also, Specials mean that you
 // always have access to Code& even in methods that don't take a GenerationContext.
 
+// Definition of Patch instruction. Patch is used to delegate the behavior of the instruction to the
+// Special object, which will be the first argument to the instruction.
 struct PatchCustom {
     template<typename Functor>
     static void forEachArg(Inst& inst, const Functor& functor)
@@ -96,6 +99,114 @@ struct PatchCustom {
     }
 };
 
+// Definition of CCall instruction. CCall is used for hot path C function calls. It's lowered to a
+// Patch with an Air CCallSpecial along with code to marshal instructions. The lowering happens
+// before register allocation, so that the register allocator sees the clobbers.
+struct CCallCustom {
+    template<typename Functor>
+    static void forEachArg(Inst& inst, const Functor& functor)
+    {
+        Value* value = inst.origin;
+
+        unsigned index = 0;
+
+        functor(inst.args[index++], Arg::Use, Arg::GP, Arg::pointerWidth()); // callee
+        
+        if (value->type() != Void) {
+            functor(
+                inst.args[index++], Arg::Def,
+                Arg::typeForB3Type(value->type()),
+                Arg::widthForB3Type(value->type()));
+        }
+
+        for (unsigned i = 1; i < value->numChildren(); ++i) {
+            Value* child = value->child(i);
+            functor(
+                inst.args[index++], Arg::Use,
+                Arg::typeForB3Type(child->type()),
+                Arg::widthForB3Type(child->type()));
+        }
+    }
+
+    template<typename... Arguments>
+    static bool isValidFormStatic(Arguments...)
+    {
+        return false;
+    }
+
+    static bool isValidForm(Inst&);
+
+    static bool admitsStack(Inst&, unsigned)
+    {
+        return true;
+    }
+
+    static bool hasNonArgNonControlEffects(Inst&)
+    {
+        return true;
+    }
+
+    // This just crashes, since we expect C calls to be lowered before generation.
+    static CCallHelpers::Jump generate(Inst&, CCallHelpers&, GenerationContext&);
+};
+
+struct ColdCCallCustom : CCallCustom {
+    template<typename Functor>
+    static void forEachArg(Inst& inst, const Functor& functor)
+    {
+        // This is just like a call, but uses become cold.
+        CCallCustom::forEachArg(
+            inst,
+            [&] (Arg& arg, Arg::Role role, Arg::Type type, Arg::Width width) {
+                functor(arg, Arg::cooled(role), type, width);
+            });
+    }
+};
+
+struct ShuffleCustom {
+    template<typename Functor>
+    static void forEachArg(Inst& inst, const Functor& functor)
+    {
+        unsigned limit = inst.args.size() / 3 * 3;
+        for (unsigned i = 0; i < limit; i += 3) {
+            Arg& src = inst.args[i + 0];
+            Arg& dst = inst.args[i + 1];
+            Arg& widthArg = inst.args[i + 2];
+            Arg::Width width = widthArg.width();
+            Arg::Type type = src.isGP() && dst.isGP() ? Arg::GP : Arg::FP;
+            functor(src, Arg::Use, type, width);
+            functor(dst, Arg::Def, type, width);
+            functor(widthArg, Arg::Use, Arg::GP, Arg::Width8);
+        }
+    }
+
+    template<typename... Arguments>
+    static bool isValidFormStatic(Arguments...)
+    {
+        return false;
+    }
+
+    static bool isValidForm(Inst&);
+    
+    static bool admitsStack(Inst&, unsigned index)
+    {
+        switch (index % 3) {
+        case 0:
+        case 1:
+            return true;
+        default:
+            return false;
+        }
+    }
+
+    static bool hasNonArgNonControlEffects(Inst&)
+    {
+        return false;
+    }
+
+    static CCallHelpers::Jump generate(Inst&, CCallHelpers&, GenerationContext&);
+};
+
 } } } // namespace JSC::B3::Air
 
 #endif // ENABLE(B3_JIT)
diff --git a/Source/JavaScriptCore/b3/air/AirEmitShuffle.cpp b/Source/JavaScriptCore/b3/air/AirEmitShuffle.cpp
new file mode 100644 (file)
index 0000000..11ef6d2
--- /dev/null
@@ -0,0 +1,520 @@
+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include "config.h"
+#include "AirEmitShuffle.h"
+
+#if ENABLE(B3_JIT)
+
+#include "AirInstInlines.h"
+#include "AirRegisterPriority.h"
+#include <wtf/GraphNodeWorklist.h>
+#include <wtf/ListDump.h>
+
+namespace JSC { namespace B3 { namespace Air {
+
+namespace {
+
+bool verbose = false;
+
+template<typename Functor>
+Tmp findPossibleScratch(Arg::Type type, const Functor& functor) {
+    for (Reg reg : regsInPriorityOrder(type)) {
+        Tmp tmp(reg);
+        if (functor(tmp))
+            return tmp;
+    }
+    return Tmp();
+}
+
+Tmp findPossibleScratch(Arg::Type type, const Arg& arg1, const Arg& arg2) {
+    return findPossibleScratch(
+        type,
+        [&] (Tmp tmp) -> bool {
+            return !arg1.usesTmp(tmp) && !arg2.usesTmp(tmp);
+        });
+}
+
+// Example: (a => b, b => a, a => c, b => d)
+struct Rotate {
+    Vector<ShufflePair> loop; // in the example, this is the loop: (a => b, b => a)
+    Vector<ShufflePair> fringe; // in the example, these are the associated shifts: (a => c, b => d)
+};
+
+} // anonymous namespace
+
+void ShufflePair::dump(PrintStream& out) const
+{
+    out.print(width(), ":", src(), "=>", dst());
+}
+
+Vector<Inst> emitShuffle(
+    Vector<ShufflePair> pairs, std::array<Arg, 2> scratches, Arg::Type type, Value* origin)
+{
+    pairs.removeAllMatching(
+        [&] (const ShufflePair& pair) -> bool {
+            return pair.src() == pair.dst();
+        });
+    
+    // First validate that this is the kind of shuffle that we know how to deal with.
+#if !ASSERT_DISABLED
+    for (const ShufflePair& pair : pairs) {
+        ASSERT(pair.src().isType(type));
+        ASSERT(pair.dst().isType(type));
+        ASSERT(pair.dst().isTmp() || pair.dst().isMemory());
+    }
+#endif // !ASSERT_DISABLED
+
+    // There are two possible kinds of operations that we will do:
+    //
+    // - Shift. Example: (a => b, b => c). We emit this as "Move b, c; Move a, b". This only requires
+    //   scratch registers if there are memory->memory moves. We want to find as many of these as
+    //   possible because they are cheaper. Note that shifts can involve the same source mentioned
+    //   multiple times. Example: (a => b, a => c, b => d, b => e).
+    //
+    // - Rotate. Example: (a => b, b => a). We want to emit this as "Swap a, b", but that instruction
+    //   may not be available, in which case we may need a scratch register or a scratch memory
+    //   location. A gnarlier example is (a => b, b => c, c => a). We can emit this as "Swap b, c;
+    //   Swap a, b". Note that swapping has to be careful about differing widths.
+    //
+    // Note that a rotate can have "fringe". For example, we might have (a => b, b => a, a =>c,
+    // b => d). This has a rotate loop (a => b, b => a) and some fringe (a => c, b => d). We treat
+    // the whole thing as a single rotate.
+    //
+    // We will find multiple disjoint such operations. We can execute them in any order.
+
+    // We interpret these as Moves that should be executed backwards. All shifts are keyed by their
+    // starting source.
+    HashMap<Arg, Vector<ShufflePair>> shifts;
+
+    // We interpret these as Swaps over src()'s that should be executed backwards, i.e. for a list
+    // of size 3 we would do "Swap list[1].src(), list[2].src(); Swap list[0].src(), list[1].src()".
+    // Note that we actually can't do that if the widths don't match or other bad things happen.
+    // But, prior to executing all of that, we need to execute the fringe: the shifts comming off the
+    // rotate.
+    Vector<Rotate> rotates;
+
+    {
+        HashMap<Arg, Vector<ShufflePair>> mapping;
+        for (const ShufflePair& pair : pairs)
+            mapping.add(pair.src(), Vector<ShufflePair>()).iterator->value.append(pair);
+
+        Vector<ShufflePair> currentPairs;
+
+        while (!mapping.isEmpty()) {
+            ASSERT(currentPairs.isEmpty());
+            Arg originalSrc = mapping.begin()->key;
+            ASSERT(!shifts.contains(originalSrc));
+            if (verbose)
+                dataLog("Processing from ", originalSrc, "\n");
+            
+            GraphNodeWorklist<Arg> worklist;
+            worklist.push(originalSrc);
+            while (Arg src = worklist.pop()) {
+                HashMap<Arg, Vector<ShufflePair>>::iterator iter = mapping.find(src);
+                if (iter == mapping.end()) {
+                    // With a shift it's possible that we previously built the tail of this shift.
+                    // See if that's the case now.
+                    if (verbose)
+                        dataLog("Trying to append shift at ", src, "\n");
+                    currentPairs.appendVector(shifts.take(src));
+                    continue;
+                }
+                Vector<ShufflePair> pairs = WTFMove(iter->value);
+                mapping.remove(iter);
+
+                for (const ShufflePair& pair : pairs) {
+                    currentPairs.append(pair);
+                    ASSERT(pair.src() == src);
+                    worklist.push(pair.dst());
+                }
+            }
+
+            ASSERT(currentPairs.size());
+            ASSERT(currentPairs[0].src() == originalSrc);
+
+            if (verbose)
+                dataLog("currentPairs = ", listDump(currentPairs), "\n");
+
+            bool isRotate = false;
+            for (const ShufflePair& pair : currentPairs) {
+                if (pair.dst() == originalSrc) {
+                    isRotate = true;
+                    break;
+                }
+            }
+
+            if (isRotate) {
+                if (verbose)
+                    dataLog("It's a rotate.\n");
+                Rotate rotate;
+                
+                // The common case is that the rotate does not have fringe. When this happens, the
+                // last destination is the first source.
+                if (currentPairs.last().dst() == originalSrc)
+                    rotate.loop = WTFMove(currentPairs);
+                else {
+                    // This is the slow path. The rotate has fringe.
+                    
+                    HashMap<Arg, ShufflePair> dstMapping;
+                    for (const ShufflePair& pair : currentPairs)
+                        dstMapping.add(pair.dst(), pair);
+
+                    ShufflePair pair = dstMapping.take(originalSrc);
+                    for (;;) {
+                        rotate.loop.append(pair);
+
+                        auto iter = dstMapping.find(pair.src());
+                        if (iter == dstMapping.end())
+                            break;
+                        pair = iter->value;
+                        dstMapping.remove(iter);
+                    }
+
+                    rotate.loop.reverse();
+
+                    // Make sure that the fringe appears in the same order as how it appeared in the
+                    // currentPairs, since that's the DFS order.
+                    for (const ShufflePair& pair : currentPairs) {
+                        // But of course we only include it if it's not in the loop.
+                        if (dstMapping.contains(pair.dst()))
+                            rotate.fringe.append(pair);
+                    }
+                }
+                
+                // If the graph search terminates because we returned to the first source, then the
+                // pair list has to have a very particular shape.
+                for (unsigned i = rotate.loop.size() - 1; i--;)
+                    ASSERT(rotate.loop[i].dst() == rotate.loop[i + 1].src());
+                rotates.append(WTFMove(rotate));
+                currentPairs.resize(0);
+            } else {
+                if (verbose)
+                    dataLog("It's a shift.\n");
+                shifts.add(originalSrc, WTFMove(currentPairs));
+            }
+        }
+    }
+
+    if (verbose) {
+        dataLog("Shifts:\n");
+        for (auto& entry : shifts)
+            dataLog("    ", entry.key, ": ", listDump(entry.value), "\n");
+        dataLog("Rotates:\n");
+        for (auto& rotate : rotates)
+            dataLog("    loop = ", listDump(rotate.loop), ", fringe = ", listDump(rotate.fringe), "\n");
+    }
+
+    // In the worst case, we need two scratch registers. The way we do this is that the client passes
+    // us what scratch registers he happens to have laying around. We will need scratch registers in
+    // the following cases:
+    //
+    // - Shuffle pairs where both src and dst refer to memory.
+    // - Rotate when no Swap instruction is available.
+    //
+    // Lucky for us, we are guaranteed to have extra scratch registers anytime we have a Shift that
+    // ends with a register. We search for such a register right now.
+
+    auto moveForWidth = [&] (Arg::Width width) -> Opcode {
+        switch (width) {
+        case Arg::Width32:
+            return type == Arg::GP ? Move32 : MoveFloat;
+        case Arg::Width64:
+            return type == Arg::GP ? Move : MoveDouble;
+        default:
+            RELEASE_ASSERT_NOT_REACHED();
+        }
+    };
+
+    Opcode conservativeMove = moveForWidth(Arg::conservativeWidth(type));
+
+    // We will emit things in reverse. We maintain a list of packs of instructions, and then we emit
+    // append them together in reverse (for example the thing at the end of resultPacks is placed
+    // first). This is useful because the last thing we emit frees up its destination registers, so
+    // it affects how we emit things before it.
+    Vector<Vector<Inst>> resultPacks;
+    Vector<Inst> result;
+
+    auto commitResult = [&] () {
+        resultPacks.append(WTFMove(result));
+    };
+
+    auto getScratch = [&] (unsigned index, Tmp possibleScratch) -> Tmp {
+        if (scratches[index].isTmp())
+            return scratches[index].tmp();
+
+        if (!possibleScratch)
+            return Tmp();
+        result.append(Inst(conservativeMove, origin, possibleScratch, scratches[index]));
+        return possibleScratch;
+    };
+
+    auto returnScratch = [&] (unsigned index, Tmp tmp) {
+        if (Arg(tmp) != scratches[index])
+            result.append(Inst(conservativeMove, origin, scratches[index], tmp));
+    };
+
+    auto handleShiftPair = [&] (const ShufflePair& pair, unsigned scratchIndex) {
+        Opcode move = moveForWidth(pair.width());
+        
+        if (!isValidForm(move, pair.src().kind(), pair.dst().kind())) {
+            Tmp scratch =
+                getScratch(scratchIndex, findPossibleScratch(type, pair.src(), pair.dst()));
+            RELEASE_ASSERT(scratch);
+            if (isValidForm(move, pair.src().kind(), Arg::Tmp))
+                result.append(Inst(moveForWidth(pair.width()), origin, pair.src(), scratch));
+            else {
+                ASSERT(pair.src().isSomeImm());
+                ASSERT(move == Move32);
+                result.append(Inst(Move, origin, Arg::imm64(pair.src().value()), scratch));
+            }
+            result.append(Inst(moveForWidth(pair.width()), origin, scratch, pair.dst()));
+            returnScratch(scratchIndex, scratch);
+            return;
+        }
+        
+        result.append(Inst(move, origin, pair.src(), pair.dst()));
+    };
+
+    auto handleShift = [&] (Vector<ShufflePair>& shift) {
+        // FIXME: We could optimize the spill behavior of the shifter by checking if any of the
+        // shifts need spills. If they do, then we could try to get a register out here. Note that
+        // this may fail where the current strategy succeeds: out here we need a register that does
+        // not interfere with any of the shifts, while the current strategy only needs to find a
+        // scratch register that does not interfer with a particular shift. So, this optimization
+        // will be opportunistic: if it succeeds, then the individual shifts can use that scratch,
+        // otherwise they will do what they do now.
+        
+        for (unsigned i = shift.size(); i--;)
+            handleShiftPair(shift[i], 0);
+
+        Arg lastDst = shift.last().dst();
+        if (lastDst.isTmp()) {
+            for (Arg& scratch : scratches) {
+                ASSERT(scratch != lastDst);
+                if (!scratch.isTmp()) {
+                    scratch = lastDst;
+                    break;
+                }
+            }
+        }
+    };
+
+    // First handle shifts whose last destination is a tmp because these free up scratch registers.
+    // These end up last in the final sequence, so the final destination of these shifts will be
+    // available as a scratch location for anything emitted prior (so, after, since we're emitting in
+    // reverse).
+    for (auto& entry : shifts) {
+        Vector<ShufflePair>& shift = entry.value;
+        if (shift.last().dst().isTmp())
+            handleShift(shift);
+        commitResult();
+    }
+
+    // Now handle the rest of the shifts.
+    for (auto& entry : shifts) {
+        Vector<ShufflePair>& shift = entry.value;
+        if (!shift.last().dst().isTmp())
+            handleShift(shift);
+        commitResult();
+    }
+
+    for (Rotate& rotate : rotates) {
+        if (!rotate.fringe.isEmpty()) {
+            // Make sure we do the fringe first! This won't clobber any of the registers that are
+            // part of the rotation.
+            handleShift(rotate.fringe);
+        }
+        
+        bool canSwap = false;
+        Opcode swap = Oops;
+        Arg::Width swapWidth = Arg::Width8; // bogus value
+
+        // Currently, the swap instruction is not available for floating point on any architecture we
+        // support.
+        if (type == Arg::GP) {
+            // Figure out whether we will be doing 64-bit swaps or 32-bit swaps. If we have a mix of
+            // widths we handle that by fixing up the relevant register with zero-extends.
+            swap = Swap32;
+            swapWidth = Arg::Width32;
+            bool hasMemory = false;
+            bool hasIndex = false;
+            for (ShufflePair& pair : rotate.loop) {
+                switch (pair.width()) {
+                case Arg::Width32:
+                    break;
+                case Arg::Width64:
+                    swap = Swap64;
+                    swapWidth = Arg::Width64;
+                    break;
+                default:
+                    RELEASE_ASSERT_NOT_REACHED();
+                    break;
+                }
+
+                hasMemory |= pair.src().isMemory() || pair.dst().isMemory();
+                hasIndex |= pair.src().isIndex() || pair.dst().isIndex();
+            }
+            
+            canSwap = isValidForm(swap, Arg::Tmp, Arg::Tmp);
+
+            // We can totally use swaps even if there are shuffles involving memory. But, we play it
+            // safe in that case. There are corner cases we don't handle, and our ability to do it is
+            // contingent upon swap form availability.
+            
+            if (hasMemory) {
+                canSwap &= isValidForm(swap, Arg::Tmp, Arg::Addr);
+                
+                // We don't take the swapping path if there is a mix of widths and some of the
+                // shuffles involve memory. That gets too confusing. We might be able to relax this
+                // to only bail if there are subwidth pairs involving memory, but I haven't thought
+                // about it very hard. Anyway, this case is not common: rotates involving memory
+                // don't arise for function calls, and they will only happen for rotates in user code
+                // if some of the variables get spilled. It's hard to imagine a program that rotates
+                // data around in variables while also doing a combination of uint32->uint64 and
+                // int64->int32 casts.
+                for (ShufflePair& pair : rotate.loop)
+                    canSwap &= pair.width() == swapWidth;
+            }
+
+            if (hasIndex)
+                canSwap &= isValidForm(swap, Arg::Tmp, Arg::Index);
+        }
+
+        if (canSwap) {
+            for (unsigned i = rotate.loop.size() - 1; i--;) {
+                Arg left = rotate.loop[i].src();
+                Arg right = rotate.loop[i + 1].src();
+
+                if (left.isMemory() && right.isMemory()) {
+                    // Note that this is a super rare outcome. Rotates are rare. Spills are rare.
+                    // Moving data between two spills is rare. To get here a lot of rare stuff has to
+                    // all happen at once.
+                    
+                    Tmp scratch = getScratch(0, findPossibleScratch(type, left, right));
+                    RELEASE_ASSERT(scratch);
+                    result.append(Inst(moveForWidth(swapWidth), origin, left, scratch));
+                    result.append(Inst(swap, origin, scratch, right));
+                    result.append(Inst(moveForWidth(swapWidth), origin, scratch, left));
+                    returnScratch(0, scratch);
+                    continue;
+                }
+
+                if (left.isMemory())
+                    std::swap(left, right);
+                
+                result.append(Inst(swap, origin, left, right));
+            }
+
+            for (ShufflePair pair : rotate.loop) {
+                if (pair.width() == swapWidth)
+                    continue;
+
+                RELEASE_ASSERT(pair.width() == Arg::Width32);
+                RELEASE_ASSERT(swapWidth == Arg::Width64);
+                RELEASE_ASSERT(pair.dst().isTmp());
+
+                // Need to do an extra zero extension.
+                result.append(Inst(Move32, origin, pair.dst(), pair.dst()));
+            }
+        } else {
+            // We can treat this as a shift so long as we take the last destination (i.e. first
+            // source) and save it first. Then we handle the first entry in the pair in the rotate
+            // specially, after we restore the last destination. This requires some special care to
+            // find a scratch register. It's possible that we have a rotate that uses the entire
+            // available register file.
+
+            Tmp scratch = findPossibleScratch(
+                type,
+                [&] (Tmp tmp) -> bool {
+                    for (ShufflePair pair : rotate.loop) {
+                        if (pair.src().usesTmp(tmp))
+                            return false;
+                        if (pair.dst().usesTmp(tmp))
+                            return false;
+                    }
+                    return true;
+                });
+
+            // NOTE: This is the most likely use of scratch registers.
+            scratch = getScratch(0, scratch);
+
+            // We may not have found a scratch register. When this happens, we can just use the spill
+            // slot directly.
+            Arg rotateSave = scratch ? Arg(scratch) : scratches[0];
+            
+            handleShiftPair(
+                ShufflePair(rotate.loop.last().dst(), rotateSave, rotate.loop[0].width()), 1);
+
+            for (unsigned i = rotate.loop.size(); i-- > 1;)
+                handleShiftPair(rotate.loop[i], 1);
+
+            handleShiftPair(
+                ShufflePair(rotateSave, rotate.loop[0].dst(), rotate.loop[0].width()), 1);
+
+            if (scratch)
+                returnScratch(0, scratch);
+        }
+
+        commitResult();
+    }
+
+    ASSERT(result.isEmpty());
+
+    for (unsigned i = resultPacks.size(); i--;)
+        result.appendVector(resultPacks[i]);
+
+    return result;
+}
+
+Vector<Inst> emitShuffle(
+    const Vector<ShufflePair>& pairs,
+    const std::array<Arg, 2>& gpScratch, const std::array<Arg, 2>& fpScratch,
+    Value* origin)
+{
+    Vector<ShufflePair> gpPairs;
+    Vector<ShufflePair> fpPairs;
+    for (const ShufflePair& pair : pairs) {
+        if (pair.src().isMemory() && pair.dst().isMemory() && pair.width() > Arg::pointerWidth()) {
+            // 8-byte memory-to-memory moves on a 32-bit platform are best handled as float moves.
+            fpPairs.append(pair);
+        } else if (pair.src().isGP() && pair.dst().isGP()) {
+            // This means that gpPairs gets memory-to-memory shuffles. The assumption is that we
+            // can do that more efficiently using GPRs, except in the special case above.
+            gpPairs.append(pair);
+        } else
+            fpPairs.append(pair);
+    }
+
+    Vector<Inst> result;
+    result.appendVector(emitShuffle(gpPairs, gpScratch, Arg::GP, origin));
+    result.appendVector(emitShuffle(fpPairs, fpScratch, Arg::FP, origin));
+    return result;
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
diff --git a/Source/JavaScriptCore/b3/air/AirEmitShuffle.h b/Source/JavaScriptCore/b3/air/AirEmitShuffle.h
new file mode 100644 (file)
index 0000000..1273702
--- /dev/null
@@ -0,0 +1,115 @@
+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#ifndef AirEmitShuffle_h
+#define AirEmitShuffle_h
+
+#if ENABLE(B3_JIT)
+
+#include "AirArg.h"
+#include "AirInst.h"
+#include <wtf/Vector.h>
+
+namespace JSC { namespace B3 {
+
+class Value;
+
+namespace Air {
+
+class ShufflePair {
+public:
+    ShufflePair()
+    {
+    }
+    
+    ShufflePair(const Arg& src, const Arg& dst, Arg::Width width)
+        : m_src(src)
+        , m_dst(dst)
+        , m_width(width)
+    {
+    }
+
+    const Arg& src() const { return m_src; }
+    const Arg& dst() const { return m_dst; }
+
+    // The width determines the kind of move we do. You can only choose Width32 or Width64 right now.
+    // For GP, it picks between Move32 and Move. For FP, it picks between MoveFloat and MoveDouble.
+    Arg::Width width() const { return m_width; }
+
+    void dump(PrintStream&) const;
+    
+private:
+    Arg m_src;
+    Arg m_dst;
+    Arg::Width m_width { Arg::Width8 };
+};
+
+// Perform a shuffle of a given type. The scratch argument is mandatory. You should pass it as
+// follows: If you know that you have scratch registers or temporaries available - that is, they're
+// registers that are not mentioned in the shuffle, have the same type as the shuffle, and are not
+// live at the shuffle - then you can pass them. If you don't have scratch registers available or if
+// you don't feel like looking for them, you can pass memory locations. It's always safe to pass a
+// pair of memory locations, and replacing either memory location with a register can be viewed as an
+// optimization. It's a pretty important optimization. Some more notes:
+//
+// - We define scratch registers as things that are not live before the shuffle and are not one of
+//   the destinations of the shuffle. Not being live before the shuffle also means that they cannot
+//   be used for any of the sources of the shuffle.
+//
+// - A second scratch location is only needed when you have shuffle pairs where memory is used both
+//   as source and destination.
+//
+// - You're guaranteed not to need any scratch locations if there is a Swap instruction available for
+//   the type and you don't have any memory locations that are both the source and the destination of
+//   some pairs. GP supports Swap on x86 while FP never supports Swap.
+//
+// - Passing memory locations as scratch if are running emitShuffle() before register allocation is
+//   silly, since that will cause emitShuffle() to pick some specific registers when it does need
+//   scratch. One easy way to avoid that predicament is to ensure that you call emitShuffle() after
+//   register allocation. For this reason we could add a Shuffle instruction so that we can defer
+//   shufflings until after regalloc.
+//
+// - Shuffles with memory=>memory pairs are not very well tuned. You should avoid them if you want
+//   performance. If you need to do them, then making sure that you reserve a temporary is one way to
+//   get acceptable performance.
+//
+// NOTE: Use this method (and its friend below) to emit shuffles after register allocation. Before
+// register allocation it is much better to simply use the Shuffle instruction.
+Vector<Inst> emitShuffle(
+    Vector<ShufflePair>, std::array<Arg, 2> scratch, Arg::Type, Value* origin);
+
+// Perform a shuffle that involves any number of types. Pass scratch registers or memory locations
+// for each type according to the rules above.
+Vector<Inst> emitShuffle(
+    const Vector<ShufflePair>&,
+    const std::array<Arg, 2>& gpScratch, const std::array<Arg, 2>& fpScratch,
+    Value* origin);
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
+#endif // AirEmitShuffle_h
+
index 07a3c10..8e33c7f 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -35,6 +35,8 @@
 #include "AirGenerationContext.h"
 #include "AirHandleCalleeSaves.h"
 #include "AirIteratedRegisterCoalescing.h"
+#include "AirLowerAfterRegAlloc.h"
+#include "AirLowerMacros.h"
 #include "AirOpcodeUtils.h"
 #include "AirOptimizeBlockOrder.h"
 #include "AirReportUsedRegisters.h"
@@ -65,6 +67,8 @@ void prepareForGeneration(Code& code)
         dataLog(code);
     }
 
+    lowerMacros(code);
+
     // This is where we run our optimizations and transformations.
     // FIXME: Add Air optimizations.
     // https://bugs.webkit.org/show_bug.cgi?id=150456
@@ -80,6 +84,8 @@ void prepareForGeneration(Code& code)
     else
         iteratedRegisterCoalescing(code);
 
+    lowerAfterRegAlloc(code);
+
     // Prior to this point the prologue and epilogue is implicit. This makes it explicit. It also
     // does things like identify which callee-saves we're using and saves them.
     handleCalleeSaves(code);
index 71b7a88..e4c2aee 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -38,11 +38,11 @@ class Code;
 
 // This takes an Air::Code that hasn't had any stack allocation and optionally hasn't had any
 // register allocation and does both of those things.
-void prepareForGeneration(Code&);
+JS_EXPORT_PRIVATE void prepareForGeneration(Code&);
 
 // This generates the code using the given CCallHelpers instance. Note that this may call callbacks
 // in the supplied code as it is generating.
-void generate(Code&, CCallHelpers&);
+JS_EXPORT_PRIVATE void generate(Code&, CCallHelpers&);
 
 } } } // namespace JSC::B3::Air
 
index 4533417..90891e3 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
 
 namespace JSC { namespace B3 { namespace Air {
 
+void InsertionSet::insertInsts(size_t index, const Vector<Inst>& insts)
+{
+    for (const Inst& inst : insts)
+        insertInst(index, inst);
+}
+
+void InsertionSet::insertInsts(size_t index, Vector<Inst>&& insts)
+{
+    for (Inst& inst : insts)
+        insertInst(index, WTFMove(inst));
+}
+
 void InsertionSet::execute(BasicBlock* block)
 {
     bubbleSort(m_insertions.begin(), m_insertions.end());
index 3c74659..cabafb7 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -59,6 +59,9 @@ public:
     {
         appendInsertion(Insertion(index, std::forward<Inst>(inst)));
     }
+
+    void insertInsts(size_t index, const Vector<Inst>&);
+    void insertInsts(size_t index, Vector<Inst>&&);
     
     template<typename... Arguments>
     void insert(size_t index, Arguments&&... arguments)
index 1472fb6..797c292 100644 (file)
@@ -85,6 +85,15 @@ public:
 
     explicit operator bool() const { return origin || opcode != Nop || args.size(); }
 
+    void append() { }
+    
+    template<typename... Arguments>
+    void append(Arg arg, Arguments... arguments)
+    {
+        args.append(arg);
+        append(arguments...);
+    }
+
     // Note that these functors all avoid using "const" because we want to use them for things that
     // edit IR. IR is meant to be edited; if you're carrying around a "const Inst&" then you're
     // probably doing it wrong.
diff --git a/Source/JavaScriptCore/b3/air/AirLowerAfterRegAlloc.cpp b/Source/JavaScriptCore/b3/air/AirLowerAfterRegAlloc.cpp
new file mode 100644 (file)
index 0000000..a9bbcb0
--- /dev/null
@@ -0,0 +1,242 @@
+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include "config.h"
+#include "AirLowerAfterRegAlloc.h"
+
+#if ENABLE(B3_JIT)
+
+#include "AirCCallingConvention.h"
+#include "AirCode.h"
+#include "AirEmitShuffle.h"
+#include "AirInsertionSet.h"
+#include "AirInstInlines.h"
+#include "AirLiveness.h"
+#include "AirPhaseScope.h"
+#include "AirRegisterPriority.h"
+#include "B3CCallValue.h"
+#include "B3ValueInlines.h"
+#include "RegisterSet.h"
+#include <wtf/HashMap.h>
+
+namespace JSC { namespace B3 { namespace Air {
+
+namespace {
+
+bool verbose = false;
+    
+} // anonymous namespace
+
+void lowerAfterRegAlloc(Code& code)
+{
+    PhaseScope phaseScope(code, "lowerAfterRegAlloc");
+
+    if (verbose)
+        dataLog("Code before lowerAfterRegAlloc:\n", code);
+
+    HashMap<Inst*, RegisterSet> usedRegisters;
+
+    RegLiveness liveness(code);
+    for (BasicBlock* block : code) {
+        RegLiveness::LocalCalc localCalc(liveness, block);
+
+        for (unsigned instIndex = block->size(); instIndex--;) {
+            Inst& inst = block->at(instIndex);
+            
+            RegisterSet set;
+
+            bool isRelevant = inst.opcode == Shuffle || inst.opcode == ColdCCall;
+            
+            if (isRelevant) {
+                for (Reg reg : localCalc.live())
+                    set.set(reg);
+            }
+            
+            localCalc.execute(instIndex);
+
+            if (isRelevant)
+                usedRegisters.add(&inst, set);
+        }
+    }
+
+    auto getScratches = [&] (RegisterSet set, Arg::Type type) -> std::array<Arg, 2> {
+        std::array<Arg, 2> result;
+        for (unsigned i = 0; i < 2; ++i) {
+            bool found = false;
+            for (Reg reg : regsInPriorityOrder(type)) {
+                if (!set.get(reg)) {
+                    result[i] = Tmp(reg);
+                    set.set(reg);
+                    found = true;
+                    break;
+                }
+            }
+            if (!found) {
+                result[i] = Arg::stack(
+                    code.addStackSlot(
+                        Arg::bytes(Arg::conservativeWidth(type)),
+                        StackSlotKind::Anonymous));
+            }
+        }
+        return result;
+    };
+
+    // Now transform the code.
+    InsertionSet insertionSet(code);
+    for (BasicBlock* block : code) {
+        for (unsigned instIndex = 0; instIndex < block->size(); ++instIndex) {
+            Inst& inst = block->at(instIndex);
+
+            switch (inst.opcode) {
+            case Shuffle: {
+                RegisterSet set = usedRegisters.get(&inst);
+                Vector<ShufflePair> pairs;
+                for (unsigned i = 0; i < inst.args.size(); i += 3) {
+                    Arg src = inst.args[i + 0];
+                    Arg dst = inst.args[i + 1];
+                    Arg::Width width = inst.args[i + 2].width();
+
+                    // The used register set contains things live after the shuffle. But
+                    // emitShuffle() wants a scratch register that is not just dead but also does not
+                    // interfere with either sources or destinations.
+                    auto excludeRegisters = [&] (Tmp tmp) {
+                        if (tmp.isReg())
+                            set.set(tmp.reg());
+                    };
+                    src.forEachTmpFast(excludeRegisters);
+                    dst.forEachTmpFast(excludeRegisters);
+                    
+                    pairs.append(ShufflePair(src, dst, width));
+                }
+                std::array<Arg, 2> gpScratch = getScratches(set, Arg::GP);
+                std::array<Arg, 2> fpScratch = getScratches(set, Arg::FP);
+                insertionSet.insertInsts(
+                    instIndex, emitShuffle(pairs, gpScratch, fpScratch, inst.origin));
+                inst = Inst();
+                break;
+            }
+
+            case ColdCCall: {
+                CCallValue* value = inst.origin->as<CCallValue>();
+
+                RegisterSet liveRegs = usedRegisters.get(&inst);
+                RegisterSet regsToSave = liveRegs;
+                regsToSave.exclude(RegisterSet::calleeSaveRegisters());
+                regsToSave.exclude(RegisterSet::stackRegisters());
+                regsToSave.exclude(RegisterSet::reservedHardwareRegisters());
+
+                RegisterSet preUsed = regsToSave;
+                Vector<Arg> destinations = computeCCallingConvention(code, value);
+                Tmp result = cCallResult(value->type());
+                Arg originalResult = result ? inst.args[1] : Arg();
+                
+                Vector<ShufflePair> pairs;
+                for (unsigned i = 0; i < destinations.size(); ++i) {
+                    Value* child = value->child(i);
+                    Arg src = inst.args[result ? (i >= 1 ? i + 1 : i) : i ];
+                    Arg dst = destinations[i];
+                    Arg::Width width = Arg::widthForB3Type(child->type());
+                    pairs.append(ShufflePair(src, dst, width));
+
+                    auto excludeRegisters = [&] (Tmp tmp) {
+                        if (tmp.isReg())
+                            preUsed.set(tmp.reg());
+                    };
+                    src.forEachTmpFast(excludeRegisters);
+                    dst.forEachTmpFast(excludeRegisters);
+                }
+
+                std::array<Arg, 2> gpScratch = getScratches(preUsed, Arg::GP);
+                std::array<Arg, 2> fpScratch = getScratches(preUsed, Arg::FP);
+                
+                // Also need to save all live registers. Don't need to worry about the result
+                // register.
+                if (originalResult.isReg())
+                    regsToSave.clear(originalResult.reg());
+                Vector<StackSlot*> stackSlots;
+                regsToSave.forEach(
+                    [&] (Reg reg) {
+                        Tmp tmp(reg);
+                        Arg arg(tmp);
+                        Arg::Width width = Arg::conservativeWidth(arg.type());
+                        StackSlot* stackSlot =
+                            code.addStackSlot(Arg::bytes(width), StackSlotKind::Anonymous);
+                        pairs.append(ShufflePair(arg, Arg::stack(stackSlot), width));
+                        stackSlots.append(stackSlot);
+                    });
+
+                if (verbose)
+                    dataLog("Pre-call pairs for ", inst, ": ", listDump(pairs), "\n");
+                
+                insertionSet.insertInsts(
+                    instIndex, emitShuffle(pairs, gpScratch, fpScratch, inst.origin));
+
+                inst = buildCCall(code, inst.origin, destinations);
+
+                // Now we need to emit code to restore registers.
+                pairs.resize(0);
+                unsigned stackSlotIndex = 0;
+                regsToSave.forEach(
+                    [&] (Reg reg) {
+                        Tmp tmp(reg);
+                        Arg arg(tmp);
+                        Arg::Width width = Arg::conservativeWidth(arg.type());
+                        StackSlot* stackSlot = stackSlots[stackSlotIndex++];
+                        pairs.append(ShufflePair(Arg::stack(stackSlot), arg, width));
+                    });
+                if (result) {
+                    ShufflePair pair(result, originalResult, Arg::widthForB3Type(value->type()));
+                    pairs.append(pair);
+                }
+
+                gpScratch = getScratches(liveRegs, Arg::GP);
+                fpScratch = getScratches(liveRegs, Arg::FP);
+                
+                insertionSet.insertInsts(
+                    instIndex + 1, emitShuffle(pairs, gpScratch, fpScratch, inst.origin));
+                break;
+            }
+
+            default:
+                break;
+            }
+        }
+
+        insertionSet.execute(block);
+
+        block->insts().removeAllMatching(
+            [&] (Inst& inst) -> bool {
+                return !inst;
+            });
+    }
+
+    if (verbose)
+        dataLog("Code after lowerAfterRegAlloc:\n", code);
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
diff --git a/Source/JavaScriptCore/b3/air/AirLowerAfterRegAlloc.h b/Source/JavaScriptCore/b3/air/AirLowerAfterRegAlloc.h
new file mode 100644 (file)
index 0000000..ae29b74
--- /dev/null
@@ -0,0 +1,44 @@
+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#ifndef AirLowerAfterRegAlloc_h
+#define AirLowerAfterRegAlloc_h
+
+#if ENABLE(B3_JIT)
+
+namespace JSC { namespace B3 { namespace Air {
+
+class Code;
+
+// This lowers Shuffle and ColdCCall instructions. This phase is designed to be run after register
+// allocation.
+
+void lowerAfterRegAlloc(Code&);
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
+#endif // AirLowerAfterRegAlloc_h
diff --git a/Source/JavaScriptCore/b3/air/AirLowerMacros.cpp b/Source/JavaScriptCore/b3/air/AirLowerMacros.cpp
new file mode 100644 (file)
index 0000000..12bd89d
--- /dev/null
@@ -0,0 +1,105 @@
+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include "config.h"
+#include "AirLowerMacros.h"
+
+#if ENABLE(B3_JIT)
+
+#include "AirCCallingConvention.h"
+#include "AirCode.h"
+#include "AirInsertionSet.h"
+#include "AirInstInlines.h"
+#include "AirPhaseScope.h"
+#include "B3CCallValue.h"
+#include "B3ValueInlines.h"
+
+namespace JSC { namespace B3 { namespace Air {
+
+void lowerMacros(Code& code)
+{
+    PhaseScope phaseScope(code, "lowerMacros");
+
+    InsertionSet insertionSet(code);
+    for (BasicBlock* block : code) {
+        for (unsigned instIndex = 0; instIndex < block->size(); ++instIndex) {
+            Inst& inst = block->at(instIndex);
+
+            switch (inst.opcode) {
+            case CCall: {
+                CCallValue* value = inst.origin->as<CCallValue>();
+
+                Vector<Arg> destinations = computeCCallingConvention(code, value);
+
+                Inst shuffleArguments(Shuffle, value);
+                unsigned offset = value->type() == Void ? 0 : 1;
+                for (unsigned i = 1; i < destinations.size(); ++i) {
+                    Value* child = value->child(i);
+                    shuffleArguments.args.append(inst.args[offset + i]);
+                    shuffleArguments.args.append(destinations[i]);
+                    shuffleArguments.args.append(Arg::widthArg(Arg::widthForB3Type(child->type())));
+                }
+                insertionSet.insertInst(instIndex, WTFMove(shuffleArguments));
+
+                // Indicate that we're using our original callee argument.
+                destinations[0] = inst.args[0];
+
+                // Save where the original instruction put its result.
+                Arg resultDst = value->type() == Void ? Arg() : inst.args[1];
+                
+                inst = buildCCall(code, inst.origin, destinations);
+
+                Tmp result = cCallResult(value->type());
+                switch (value->type()) {
+                case Void:
+                    break;
+                case Float:
+                    insertionSet.insert(instIndex + 1, MoveFloat, value, result, resultDst);
+                    break;
+                case Double:
+                    insertionSet.insert(instIndex + 1, MoveDouble, value, result, resultDst);
+                    break;
+                case Int32:
+                    insertionSet.insert(instIndex + 1, Move32, value, result, resultDst);
+                    break;
+                case Int64:
+                    insertionSet.insert(instIndex + 1, Move, value, result, resultDst);
+                    break;
+                }
+                break;
+            }
+
+            default:
+                break;
+            }
+        }
+        insertionSet.execute(block);
+    }
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
diff --git a/Source/JavaScriptCore/b3/air/AirLowerMacros.h b/Source/JavaScriptCore/b3/air/AirLowerMacros.h
new file mode 100644 (file)
index 0000000..fa2d11d
--- /dev/null
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#ifndef AirLowerMacros_h
+#define AirLowerMacros_h
+
+#if ENABLE(B3_JIT)
+
+namespace JSC { namespace B3 { namespace Air {
+
+class Code;
+
+// Air has some opcodes that are very high-level and are meant to reduce the amount of low-level
+// knowledge in the B3->Air lowering. The current example is CCall.
+
+void lowerMacros(Code&);
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
+#endif // AirLowerMacros_h
+
index 5e7ca07..6010b2a 100644 (file)
@@ -417,6 +417,14 @@ Move U:G:Ptr, D:G:Ptr
     Tmp, Index as storePtr
     x86: Imm, Addr as storePtr
 
+x86: Swap32 UD:G:32, UD:G:32
+    Tmp, Tmp
+    Tmp, Addr
+
+x86_64: Swap64 UD:G:64, UD:G:64
+    Tmp, Tmp
+    Tmp, Addr
+
 Move32 U:G:32, ZD:G:32
     Tmp, Tmp as zeroExtend32ToPtr
     Addr, Tmp as load32
@@ -682,7 +690,19 @@ RetDouble U:F:64 /return
 
 Oops /terminal
 
+# A Shuffle is a multi-source, multi-destination move. It simultaneously does multiple moves at once.
+# The moves are specified as triplets of src, dst, and width. For example you can request a swap this
+# way:
+#     Shuffle %tmp1, %tmp2, 64, %tmp2, %tmp1, 64
+custom Shuffle
+
 # Air allows for exotic behavior. A Patch's behavior is determined entirely by the Special operand,
 # which must be the first operand.
 custom Patch
 
+# Instructions used for lowering C calls. These don't make it to Air generation. They get lowered to
+# something else first. The origin Value must be a CCallValue.
+custom CCall
+custom ColdCCall
+
+
index c46314d..f0ef8a0 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -52,7 +52,7 @@ const Vector<typename Bank::RegisterType>& regsInPriorityOrder()
     return RegistersInPriorityOrder<Bank>::inPriorityOrder();
 }
 
-const Vector<Reg>& regsInPriorityOrder(Arg::Type);
+JS_EXPORT_PRIVATE const Vector<Reg>& regsInPriorityOrder(Arg::Type);
 
 } } } // namespace JSC::B3::Air
 
diff --git a/Source/JavaScriptCore/b3/air/testair.cpp b/Source/JavaScriptCore/b3/air/testair.cpp
new file mode 100644 (file)
index 0000000..e261e00
--- /dev/null
@@ -0,0 +1,1707 @@
+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include "config.h"
+
+#include "AirCode.h"
+#include "AirGenerate.h"
+#include "AirInstInlines.h"
+#include "AirRegisterPriority.h"
+#include "AllowMacroScratchRegisterUsage.h"
+#include "B3Compilation.h"
+#include "B3Procedure.h"
+#include "CCallHelpers.h"
+#include "InitializeThreading.h"
+#include "JSCInlines.h"
+#include "LinkBuffer.h"
+#include "PureNaN.h"
+#include "VM.h"
+#include <cmath>
+#include <map>
+#include <string>
+#include <wtf/Lock.h>
+#include <wtf/NumberOfCores.h>
+#include <wtf/Threading.h>
+
+// We don't have a NO_RETURN_DUE_TO_EXIT, nor should we. That's ridiculous.
+static bool hiddenTruthBecauseNoReturnIsStupid() { return true; }
+
+static void usage()
+{
+    dataLog("Usage: testb3 [<filter>]\n");
+    if (hiddenTruthBecauseNoReturnIsStupid())
+        exit(1);
+}
+
+#if ENABLE(B3_JIT)
+
+using namespace JSC;
+using namespace JSC::B3::Air;
+
+namespace {
+
+StaticLock crashLock;
+
+// Nothing fancy for now; we just use the existing WTF assertion machinery.
+#define CHECK(x) do {                                                   \
+        if (!!(x))                                                      \
+            break;                                                      \
+        crashLock.lock();                                               \
+        WTFReportAssertionFailure(__FILE__, __LINE__, WTF_PRETTY_FUNCTION, #x); \
+        CRASH();                                                        \
+    } while (false)
+
+VM* vm;
+
+std::unique_ptr<B3::Compilation> compile(B3::Procedure& proc)
+{
+    prepareForGeneration(proc.code());
+    CCallHelpers jit(vm);
+    generate(proc.code(), jit);
+    LinkBuffer linkBuffer(*vm, jit, nullptr);
+
+    return std::make_unique<B3::Compilation>(
+        FINALIZE_CODE(linkBuffer, ("testair compilation")), proc.releaseByproducts());
+}
+
+template<typename T, typename... Arguments>
+T invoke(const B3::Compilation& code, Arguments... arguments)
+{
+    T (*function)(Arguments...) = bitwise_cast<T(*)(Arguments...)>(code.code().executableAddress());
+    return function(arguments...);
+}
+
+template<typename T, typename... Arguments>
+T compileAndRun(B3::Procedure& procedure, Arguments... arguments)
+{
+    return invoke<T>(*compile(procedure), arguments...);
+}
+
+void testSimple()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    root->append(Move, nullptr, Arg::imm(42), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(compileAndRun<int>(proc) == 42);
+}
+
+// Use this to put a constant into a register without Air being able to see the constant.
+template<typename T>
+void loadConstantImpl(BasicBlock* block, T value, B3::Air::Opcode move, Tmp tmp, Tmp scratch)
+{
+    static StaticLock lock;
+    static std::map<T, T*>* map; // I'm not messing with HashMap's problems with integers.
+
+    LockHolder locker(lock);
+    if (!map)
+        map = new std::map<T, T*>();
+
+    if (!map->count(value))
+        (*map)[value] = new T(value);
+
+    T* ptr = (*map)[value];
+    block->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(ptr)), scratch);
+    block->append(move, nullptr, Arg::addr(scratch), tmp);
+}
+
+void loadConstant(BasicBlock* block, intptr_t value, Tmp tmp)
+{
+    loadConstantImpl<intptr_t>(block, value, Move, tmp, tmp);
+}
+
+void loadDoubleConstant(BasicBlock* block, double value, Tmp tmp, Tmp scratch)
+{
+    loadConstantImpl<double>(block, value, MoveDouble, tmp, scratch);
+}
+
+void testShuffleSimpleSwap()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32));
+
+    int32_t things[4];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 2);
+    CHECK(things[2] == 4);
+    CHECK(things[3] == 3);
+}
+
+void testShuffleSimpleShift()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32));
+
+    int32_t things[5];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 2);
+    CHECK(things[2] == 3);
+    CHECK(things[3] == 3);
+    CHECK(things[4] == 4);
+}
+
+void testShuffleLongShift()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    loadConstant(root, 7, Tmp(GPRInfo::regT6));
+    loadConstant(root, 8, Tmp(GPRInfo::regT7));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT4), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT5), Tmp(GPRInfo::regT6), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT6), Tmp(GPRInfo::regT7), Arg::widthArg(Arg::Width32));
+
+    int32_t things[8];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT6), Arg::addr(base, 6 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT7), Arg::addr(base, 7 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 2);
+    CHECK(things[3] == 3);
+    CHECK(things[4] == 4);
+    CHECK(things[5] == 5);
+    CHECK(things[6] == 6);
+    CHECK(things[7] == 7);
+}
+
+void testShuffleLongShiftBackwards()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    loadConstant(root, 7, Tmp(GPRInfo::regT6));
+    loadConstant(root, 8, Tmp(GPRInfo::regT7));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT6), Tmp(GPRInfo::regT7), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT5), Tmp(GPRInfo::regT6), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT4), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32));
+
+    int32_t things[8];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT6), Arg::addr(base, 6 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT7), Arg::addr(base, 7 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 2);
+    CHECK(things[3] == 3);
+    CHECK(things[4] == 4);
+    CHECK(things[5] == 5);
+    CHECK(things[6] == 6);
+    CHECK(things[7] == 7);
+}
+
+void testShuffleSimpleRotate()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT0), Arg::widthArg(Arg::Width32));
+
+    int32_t things[4];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 3);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 2);
+    CHECK(things[3] == 4);
+}
+
+void testShuffleSimpleBroadcast()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32));
+
+    int32_t things[4];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 1);
+    CHECK(things[3] == 1);
+}
+
+void testShuffleBroadcastAllRegs()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    const Vector<Reg>& regs = regsInPriorityOrder(Arg::GP);
+
+    BasicBlock* root = code.addBlock();
+    root->append(Move, nullptr, Arg::imm(35), Tmp(GPRInfo::regT0));
+    unsigned count = 1;
+    for (Reg reg : regs) {
+        if (reg != Reg(GPRInfo::regT0))
+            loadConstant(root, count++, Tmp(reg));
+    }
+    Inst& shuffle = root->append(Shuffle, nullptr);
+    for (Reg reg : regs) {
+        if (reg != Reg(GPRInfo::regT0))
+            shuffle.append(Tmp(GPRInfo::regT0), Tmp(reg), Arg::widthArg(Arg::Width32));
+    }
+
+    StackSlot* slot = code.addStackSlot(sizeof(int32_t) * regs.size(), B3::StackSlotKind::Locked);
+    for (unsigned i = 0; i < regs.size(); ++i)
+        root->append(Move32, nullptr, Tmp(regs[i]), Arg::stack(slot, i * sizeof(int32_t)));
+
+    Vector<int32_t> things(regs.size(), 666);
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things[0])), base);
+    for (unsigned i = 0; i < regs.size(); ++i) {
+        root->append(Move32, nullptr, Arg::stack(slot, i * sizeof(int32_t)), Tmp(GPRInfo::regT0));
+        root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, i * sizeof(int32_t)));
+    }
+    
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun<int>(proc));
+
+    for (int32_t thing : things)
+        CHECK(thing == 35);
+}
+
+void testShuffleTreeShift()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    loadConstant(root, 7, Tmp(GPRInfo::regT6));
+    loadConstant(root, 8, Tmp(GPRInfo::regT7));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT6), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT7), Arg::widthArg(Arg::Width32));
+
+    int32_t things[8];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT6), Arg::addr(base, 6 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT7), Arg::addr(base, 7 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 1);
+    CHECK(things[3] == 2);
+    CHECK(things[4] == 2);
+    CHECK(things[5] == 3);
+    CHECK(things[6] == 3);
+    CHECK(things[7] == 4);
+}
+
+void testShuffleTreeShiftBackward()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    loadConstant(root, 7, Tmp(GPRInfo::regT6));
+    loadConstant(root, 8, Tmp(GPRInfo::regT7));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT7), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT6), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32));
+
+    int32_t things[8];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT6), Arg::addr(base, 6 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT7), Arg::addr(base, 7 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 1);
+    CHECK(things[3] == 2);
+    CHECK(things[4] == 2);
+    CHECK(things[5] == 3);
+    CHECK(things[6] == 3);
+    CHECK(things[7] == 4);
+}
+
+void testShuffleTreeShiftOtherBackward()
+{
+    // NOTE: This test was my original attempt at TreeShiftBackward but mistakes were made. So, this
+    // ends up being just a weird test. But weird tests are useful, so I kept it.
+    
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    loadConstant(root, 7, Tmp(GPRInfo::regT6));
+    loadConstant(root, 8, Tmp(GPRInfo::regT7));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT4), Tmp(GPRInfo::regT7), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT5), Tmp(GPRInfo::regT6), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT5), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT6), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT6), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT7), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT7), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32));
+
+    int32_t things[8];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT6), Arg::addr(base, 6 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT7), Arg::addr(base, 7 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 8);
+    CHECK(things[2] == 8);
+    CHECK(things[3] == 7);
+    CHECK(things[4] == 7);
+    CHECK(things[5] == 6);
+    CHECK(things[6] == 6);
+    CHECK(things[7] == 5);
+}
+
+void testShuffleMultipleShifts()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32));
+
+    int32_t things[6];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 3);
+    CHECK(things[3] == 3);
+    CHECK(things[4] == 3);
+    CHECK(things[5] == 1);
+}
+
+void testShuffleRotateWithFringe()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT0), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32));
+
+    int32_t things[6];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 3);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 2);
+    CHECK(things[3] == 1);
+    CHECK(things[4] == 2);
+    CHECK(things[5] == 3);
+}
+
+void testShuffleRotateWithLongFringe()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT0), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT4), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32));
+
+    int32_t things[6];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 3);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 2);
+    CHECK(things[3] == 1);
+    CHECK(things[4] == 4);
+    CHECK(things[5] == 5);
+}
+
+void testShuffleMultipleRotates()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT0), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT4), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT5), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32));
+
+    int32_t things[6];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 3);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 2);
+    CHECK(things[3] == 6);
+    CHECK(things[4] == 4);
+    CHECK(things[5] == 5);
+}
+
+void testShuffleShiftAndRotate()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT0), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT4), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32));
+
+    int32_t things[6];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 3);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 2);
+    CHECK(things[3] == 4);
+    CHECK(things[4] == 4);
+    CHECK(things[5] == 5);
+}
+
+void testShuffleShiftAllRegs()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    const Vector<Reg>& regs = regsInPriorityOrder(Arg::GP);
+
+    BasicBlock* root = code.addBlock();
+    for (unsigned i = 0; i < regs.size(); ++i)
+        loadConstant(root, 35 + i, Tmp(regs[i]));
+    Inst& shuffle = root->append(Shuffle, nullptr);
+    for (unsigned i = 1; i < regs.size(); ++i)
+        shuffle.append(Tmp(regs[i - 1]), Tmp(regs[i]), Arg::widthArg(Arg::Width32));
+
+    StackSlot* slot = code.addStackSlot(sizeof(int32_t) * regs.size(), B3::StackSlotKind::Locked);
+    for (unsigned i = 0; i < regs.size(); ++i)
+        root->append(Move32, nullptr, Tmp(regs[i]), Arg::stack(slot, i * sizeof(int32_t)));
+
+    Vector<int32_t> things(regs.size(), 666);
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things[0])), base);
+    for (unsigned i = 0; i < regs.size(); ++i) {
+        root->append(Move32, nullptr, Arg::stack(slot, i * sizeof(int32_t)), Tmp(GPRInfo::regT0));
+        root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, i * sizeof(int32_t)));
+    }
+    
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 35);
+    for (unsigned i = 1; i < regs.size(); ++i)
+        CHECK(things[i] == 35 + static_cast<int32_t>(i) - 1);
+}
+
+void testShuffleRotateAllRegs()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    const Vector<Reg>& regs = regsInPriorityOrder(Arg::GP);
+
+    BasicBlock* root = code.addBlock();
+    for (unsigned i = 0; i < regs.size(); ++i)
+        loadConstant(root, 35 + i, Tmp(regs[i]));
+    Inst& shuffle = root->append(Shuffle, nullptr);
+    for (unsigned i = 1; i < regs.size(); ++i)
+        shuffle.append(Tmp(regs[i - 1]), Tmp(regs[i]), Arg::widthArg(Arg::Width32));
+    shuffle.append(Tmp(regs.last()), Tmp(regs[0]), Arg::widthArg(Arg::Width32));
+
+    StackSlot* slot = code.addStackSlot(sizeof(int32_t) * regs.size(), B3::StackSlotKind::Locked);
+    for (unsigned i = 0; i < regs.size(); ++i)
+        root->append(Move32, nullptr, Tmp(regs[i]), Arg::stack(slot, i * sizeof(int32_t)));
+
+    Vector<int32_t> things(regs.size(), 666);
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things[0])), base);
+    for (unsigned i = 0; i < regs.size(); ++i) {
+        root->append(Move32, nullptr, Arg::stack(slot, i * sizeof(int32_t)), Tmp(GPRInfo::regT0));
+        root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, i * sizeof(int32_t)));
+    }
+    
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 35 + static_cast<int32_t>(regs.size()) - 1);
+    for (unsigned i = 1; i < regs.size(); ++i)
+        CHECK(things[i] == 35 + static_cast<int32_t>(i) - 1);
+}
+
+void testShuffleSimpleSwap64()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 10000000000000000ll, Tmp(GPRInfo::regT0));
+    loadConstant(root, 20000000000000000ll, Tmp(GPRInfo::regT1));
+    loadConstant(root, 30000000000000000ll, Tmp(GPRInfo::regT2));
+    loadConstant(root, 40000000000000000ll, Tmp(GPRInfo::regT3));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width64),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width64));
+
+    int64_t things[4];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int64_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 10000000000000000ll);
+    CHECK(things[1] == 20000000000000000ll);
+    CHECK(things[2] == 40000000000000000ll);
+    CHECK(things[3] == 30000000000000000ll);
+}
+
+void testShuffleSimpleShift64()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 10000000000000000ll, Tmp(GPRInfo::regT0));
+    loadConstant(root, 20000000000000000ll, Tmp(GPRInfo::regT1));
+    loadConstant(root, 30000000000000000ll, Tmp(GPRInfo::regT2));
+    loadConstant(root, 40000000000000000ll, Tmp(GPRInfo::regT3));
+    loadConstant(root, 50000000000000000ll, Tmp(GPRInfo::regT4));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width64),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width64));
+
+    int64_t things[5];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int64_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 10000000000000000ll);
+    CHECK(things[1] == 20000000000000000ll);
+    CHECK(things[2] == 30000000000000000ll);
+    CHECK(things[3] == 30000000000000000ll);
+    CHECK(things[4] == 40000000000000000ll);
+}
+
+void testShuffleSwapMixedWidth()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 10000000000000000ll, Tmp(GPRInfo::regT0));
+    loadConstant(root, 20000000000000000ll, Tmp(GPRInfo::regT1));
+    loadConstant(root, 30000000000000000ll, Tmp(GPRInfo::regT2));
+    loadConstant(root, 40000000000000000ll, Tmp(GPRInfo::regT3));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width64));
+
+    int64_t things[4];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int64_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 10000000000000000ll);
+    CHECK(things[1] == 20000000000000000ll);
+    CHECK(things[2] == 40000000000000000ll);
+    CHECK(things[3] == static_cast<uint32_t>(30000000000000000ll));
+}
+
+void testShuffleShiftMixedWidth()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 10000000000000000ll, Tmp(GPRInfo::regT0));
+    loadConstant(root, 20000000000000000ll, Tmp(GPRInfo::regT1));
+    loadConstant(root, 30000000000000000ll, Tmp(GPRInfo::regT2));
+    loadConstant(root, 40000000000000000ll, Tmp(GPRInfo::regT3));
+    loadConstant(root, 50000000000000000ll, Tmp(GPRInfo::regT4));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width64),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32));
+
+    int64_t things[5];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int64_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 10000000000000000ll);
+    CHECK(things[1] == 20000000000000000ll);
+    CHECK(things[2] == 30000000000000000ll);
+    CHECK(things[3] == 30000000000000000ll);
+    CHECK(things[4] == static_cast<uint32_t>(40000000000000000ll));
+}
+
+void testShuffleShiftMemory()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    int32_t memory[2];
+    memory[0] = 35;
+    memory[1] = 36;
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    root->append(Move, nullptr, Arg::immPtr(&memory), Tmp(GPRInfo::regT2));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Arg::addr(Tmp(GPRInfo::regT2), 0 * sizeof(int32_t)),
+        Arg::addr(Tmp(GPRInfo::regT2), 1 * sizeof(int32_t)), Arg::widthArg(Arg::Width32));
+
+    int32_t things[2];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(memory[0] == 35);
+    CHECK(memory[1] == 35);
+}
+
+void testShuffleShiftMemoryLong()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    int32_t memory[2];
+    memory[0] = 35;
+    memory[1] = 36;
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    root->append(Move, nullptr, Arg::immPtr(&memory), Tmp(GPRInfo::regT3));
+    root->append(
+        Shuffle, nullptr,
+        
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        
+        Tmp(GPRInfo::regT1), Arg::addr(Tmp(GPRInfo::regT3), 0 * sizeof(int32_t)),
+        Arg::widthArg(Arg::Width32),
+        
+        Arg::addr(Tmp(GPRInfo::regT3), 0 * sizeof(int32_t)),
+        Arg::addr(Tmp(GPRInfo::regT3), 1 * sizeof(int32_t)), Arg::widthArg(Arg::Width32),
+
+        Arg::addr(Tmp(GPRInfo::regT3), 1 * sizeof(int32_t)), Tmp(GPRInfo::regT2),
+        Arg::widthArg(Arg::Width32));
+
+    int32_t things[3];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 36);
+    CHECK(memory[0] == 2);
+    CHECK(memory[1] == 35);
+}
+
+void testShuffleShiftMemoryAllRegs()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    int32_t memory[2];
+    memory[0] = 35;
+    memory[1] = 36;
+
+    Vector<Reg> regs = regsInPriorityOrder(Arg::GP);
+    regs.removeFirst(Reg(GPRInfo::regT0));
+
+    BasicBlock* root = code.addBlock();
+    for (unsigned i = 0; i < regs.size(); ++i)
+        loadConstant(root, i + 1, Tmp(regs[i]));
+    root->append(Move, nullptr, Arg::immPtr(&memory), Tmp(GPRInfo::regT0));
+    Inst& shuffle = root->append(
+        Shuffle, nullptr,
+        
+        Tmp(regs[0]), Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int32_t)),
+        Arg::widthArg(Arg::Width32),
+        
+        Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int32_t)),
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int32_t)), Arg::widthArg(Arg::Width32),
+
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int32_t)), Tmp(regs[1]),
+        Arg::widthArg(Arg::Width32));
+
+    for (unsigned i = 2; i < regs.size(); ++i)
+        shuffle.append(Tmp(regs[i - 1]), Tmp(regs[i]), Arg::widthArg(Arg::Width32));
+
+    Vector<int32_t> things(regs.size(), 666);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things[0])), Tmp(GPRInfo::regT0));
+    for (unsigned i = 0; i < regs.size(); ++i) {
+        root->append(
+            Move32, nullptr, Tmp(regs[i]), Arg::addr(Tmp(GPRInfo::regT0), i * sizeof(int32_t)));
+    }
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 36);
+    for (unsigned i = 2; i < regs.size(); ++i)
+        CHECK(things[i] == static_cast<int32_t>(i));
+    CHECK(memory[0] == 1);
+    CHECK(memory[1] == 35);
+}
+
+void testShuffleShiftMemoryAllRegs64()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    int64_t memory[2];
+    memory[0] = 35000000000000ll;
+    memory[1] = 36000000000000ll;
+
+    Vector<Reg> regs = regsInPriorityOrder(Arg::GP);
+    regs.removeFirst(Reg(GPRInfo::regT0));
+
+    BasicBlock* root = code.addBlock();
+    for (unsigned i = 0; i < regs.size(); ++i)
+        loadConstant(root, (i + 1) * 1000000000000ll, Tmp(regs[i]));
+    root->append(Move, nullptr, Arg::immPtr(&memory), Tmp(GPRInfo::regT0));
+    Inst& shuffle = root->append(
+        Shuffle, nullptr,
+        
+        Tmp(regs[0]), Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::widthArg(Arg::Width64),
+        
+        Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Arg::widthArg(Arg::Width64),
+
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Tmp(regs[1]),
+        Arg::widthArg(Arg::Width64));
+
+    for (unsigned i = 2; i < regs.size(); ++i)
+        shuffle.append(Tmp(regs[i - 1]), Tmp(regs[i]), Arg::widthArg(Arg::Width64));
+
+    Vector<int64_t> things(regs.size(), 666);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things[0])), Tmp(GPRInfo::regT0));
+    for (unsigned i = 0; i < regs.size(); ++i) {
+        root->append(
+            Move, nullptr, Tmp(regs[i]), Arg::addr(Tmp(GPRInfo::regT0), i * sizeof(int64_t)));
+    }
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1000000000000ll);
+    CHECK(things[1] == 36000000000000ll);
+    for (unsigned i = 2; i < regs.size(); ++i)
+        CHECK(things[i] == static_cast<int64_t>(i) * 1000000000000ll);
+    CHECK(memory[0] == 1000000000000ll);
+    CHECK(memory[1] == 35000000000000ll);
+}
+
+int64_t combineHiLo(int64_t high, int64_t low)
+{
+    union {
+        int64_t value;
+        int32_t halves[2];
+    } u;
+    u.value = high;
+    u.halves[0] = static_cast<int32_t>(low);
+    return u.value;
+}
+
+void testShuffleShiftMemoryAllRegsMixedWidth()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    int64_t memory[2];
+    memory[0] = 35000000000000ll;
+    memory[1] = 36000000000000ll;
+
+    Vector<Reg> regs = regsInPriorityOrder(Arg::GP);
+    regs.removeFirst(Reg(GPRInfo::regT0));
+
+    BasicBlock* root = code.addBlock();
+    for (unsigned i = 0; i < regs.size(); ++i)
+        loadConstant(root, (i + 1) * 1000000000000ll, Tmp(regs[i]));
+    root->append(Move, nullptr, Arg::immPtr(&memory), Tmp(GPRInfo::regT0));
+    Inst& shuffle = root->append(
+        Shuffle, nullptr,
+        
+        Tmp(regs[0]), Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::widthArg(Arg::Width32),
+        
+        Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Arg::widthArg(Arg::Width64),
+
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Tmp(regs[1]),
+        Arg::widthArg(Arg::Width32));
+
+    for (unsigned i = 2; i < regs.size(); ++i) {
+        shuffle.append(
+            Tmp(regs[i - 1]), Tmp(regs[i]),
+            (i & 1) ? Arg::widthArg(Arg::Width32) : Arg::widthArg(Arg::Width64));
+    }
+
+    Vector<int64_t> things(regs.size(), 666);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things[0])), Tmp(GPRInfo::regT0));
+    for (unsigned i = 0; i < regs.size(); ++i) {
+        root->append(
+            Move, nullptr, Tmp(regs[i]), Arg::addr(Tmp(GPRInfo::regT0), i * sizeof(int64_t)));
+    }
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1000000000000ll);
+    CHECK(things[1] == static_cast<uint32_t>(36000000000000ll));
+    for (unsigned i = 2; i < regs.size(); ++i) {
+        int64_t value = static_cast<int64_t>(i) * 1000000000000ll;
+        CHECK(things[i] == ((i & 1) ? static_cast<uint32_t>(value) : value));
+    }
+    CHECK(memory[0] == combineHiLo(35000000000000ll, 1000000000000ll));
+    CHECK(memory[1] == 35000000000000ll);
+}
+
+void testShuffleRotateMemory()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    int32_t memory[2];
+    memory[0] = 35;
+    memory[1] = 36;
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    root->append(Move, nullptr, Arg::immPtr(&memory), Tmp(GPRInfo::regT2));
+    root->append(
+        Shuffle, nullptr,
+        
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+
+        Tmp(GPRInfo::regT1), Arg::addr(Tmp(GPRInfo::regT2), 0 * sizeof(int32_t)),
+        Arg::widthArg(Arg::Width32),
+        
+        Arg::addr(Tmp(GPRInfo::regT2), 0 * sizeof(int32_t)),
+        Arg::addr(Tmp(GPRInfo::regT2), 1 * sizeof(int32_t)), Arg::widthArg(Arg::Width32),
+
+        Arg::addr(Tmp(GPRInfo::regT2), 1 * sizeof(int32_t)), Tmp(GPRInfo::regT0),
+        Arg::widthArg(Arg::Width32));
+
+    int32_t things[2];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root->append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 36);
+    CHECK(things[1] == 1);
+    CHECK(memory[0] == 2);
+    CHECK(memory[1] == 35);
+}
+
+void testShuffleRotateMemory64()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    int64_t memory[2];
+    memory[0] = 35000000000000ll;
+    memory[1] = 36000000000000ll;
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1000000000000ll, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2000000000000ll, Tmp(GPRInfo::regT1));
+    root->append(Move, nullptr, Arg::immPtr(&memory), Tmp(GPRInfo::regT2));
+    root->append(
+        Shuffle, nullptr,
+        
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width64),
+
+        Tmp(GPRInfo::regT1), Arg::addr(Tmp(GPRInfo::regT2), 0 * sizeof(int64_t)),
+        Arg::widthArg(Arg::Width64),
+        
+        Arg::addr(Tmp(GPRInfo::regT2), 0 * sizeof(int64_t)),
+        Arg::addr(Tmp(GPRInfo::regT2), 1 * sizeof(int64_t)), Arg::widthArg(Arg::Width64),
+
+        Arg::addr(Tmp(GPRInfo::regT2), 1 * sizeof(int64_t)), Tmp(GPRInfo::regT0),
+        Arg::widthArg(Arg::Width64));
+
+    int64_t things[2];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int64_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 36000000000000ll);
+    CHECK(things[1] == 1000000000000ll);
+    CHECK(memory[0] == 2000000000000ll);
+    CHECK(memory[1] == 35000000000000ll);
+}
+
+void testShuffleRotateMemoryMixedWidth()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    int64_t memory[2];
+    memory[0] = 35000000000000ll;
+    memory[1] = 36000000000000ll;
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1000000000000ll, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2000000000000ll, Tmp(GPRInfo::regT1));
+    root->append(Move, nullptr, Arg::immPtr(&memory), Tmp(GPRInfo::regT2));
+    root->append(
+        Shuffle, nullptr,
+        
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+
+        Tmp(GPRInfo::regT1), Arg::addr(Tmp(GPRInfo::regT2), 0 * sizeof(int64_t)),
+        Arg::widthArg(Arg::Width64),
+        
+        Arg::addr(Tmp(GPRInfo::regT2), 0 * sizeof(int64_t)),
+        Arg::addr(Tmp(GPRInfo::regT2), 1 * sizeof(int64_t)), Arg::widthArg(Arg::Width32),
+
+        Arg::addr(Tmp(GPRInfo::regT2), 1 * sizeof(int64_t)), Tmp(GPRInfo::regT0),
+        Arg::widthArg(Arg::Width64));
+
+    int64_t things[2];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(Move, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int64_t)));
+    root->append(Move, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int64_t)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 36000000000000ll);
+    CHECK(things[1] == static_cast<uint32_t>(1000000000000ll));
+    CHECK(memory[0] == 2000000000000ll);
+    CHECK(memory[1] == combineHiLo(36000000000000ll, 35000000000000ll));
+}
+
+void testShuffleRotateMemoryAllRegs64()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    int64_t memory[2];
+    memory[0] = 35000000000000ll;
+    memory[1] = 36000000000000ll;
+
+    Vector<Reg> regs = regsInPriorityOrder(Arg::GP);
+    regs.removeFirst(Reg(GPRInfo::regT0));
+
+    BasicBlock* root = code.addBlock();
+    for (unsigned i = 0; i < regs.size(); ++i)
+        loadConstant(root, (i + 1) * 1000000000000ll, Tmp(regs[i]));
+    root->append(Move, nullptr, Arg::immPtr(&memory), Tmp(GPRInfo::regT0));
+    Inst& shuffle = root->append(
+        Shuffle, nullptr,
+        
+        Tmp(regs[0]), Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::widthArg(Arg::Width64),
+        
+        Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Arg::widthArg(Arg::Width64),
+
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Tmp(regs[1]),
+        Arg::widthArg(Arg::Width64),
+
+        regs.last(), regs[0], Arg::widthArg(Arg::Width64));
+
+    for (unsigned i = 2; i < regs.size(); ++i)
+        shuffle.append(Tmp(regs[i - 1]), Tmp(regs[i]), Arg::widthArg(Arg::Width64));
+
+    Vector<int64_t> things(regs.size(), 666);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things[0])), Tmp(GPRInfo::regT0));
+    for (unsigned i = 0; i < regs.size(); ++i) {
+        root->append(
+            Move, nullptr, Tmp(regs[i]), Arg::addr(Tmp(GPRInfo::regT0), i * sizeof(int64_t)));
+    }
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == static_cast<int64_t>(regs.size()) * 1000000000000ll);
+    CHECK(things[1] == 36000000000000ll);
+    for (unsigned i = 2; i < regs.size(); ++i)
+        CHECK(things[i] == static_cast<int64_t>(i) * 1000000000000ll);
+    CHECK(memory[0] == 1000000000000ll);
+    CHECK(memory[1] == 35000000000000ll);
+}
+
+void testShuffleRotateMemoryAllRegsMixedWidth()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    int64_t memory[2];
+    memory[0] = 35000000000000ll;
+    memory[1] = 36000000000000ll;
+
+    Vector<Reg> regs = regsInPriorityOrder(Arg::GP);
+    regs.removeFirst(Reg(GPRInfo::regT0));
+
+    BasicBlock* root = code.addBlock();
+    for (unsigned i = 0; i < regs.size(); ++i)
+        loadConstant(root, (i + 1) * 1000000000000ll, Tmp(regs[i]));
+    root->append(Move, nullptr, Arg::immPtr(&memory), Tmp(GPRInfo::regT0));
+    Inst& shuffle = root->append(
+        Shuffle, nullptr,
+        
+        Tmp(regs[0]), Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::widthArg(Arg::Width32),
+        
+        Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Arg::widthArg(Arg::Width64),
+
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Tmp(regs[1]),
+        Arg::widthArg(Arg::Width32),
+
+        regs.last(), regs[0], Arg::widthArg(Arg::Width32));
+
+    for (unsigned i = 2; i < regs.size(); ++i)
+        shuffle.append(Tmp(regs[i - 1]), Tmp(regs[i]), Arg::widthArg(Arg::Width64));
+
+    Vector<int64_t> things(regs.size(), 666);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things[0])), Tmp(GPRInfo::regT0));
+    for (unsigned i = 0; i < regs.size(); ++i) {
+        root->append(
+            Move, nullptr, Tmp(regs[i]), Arg::addr(Tmp(GPRInfo::regT0), i * sizeof(int64_t)));
+    }
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == static_cast<uint32_t>(static_cast<int64_t>(regs.size()) * 1000000000000ll));
+    CHECK(things[1] == static_cast<uint32_t>(36000000000000ll));
+    for (unsigned i = 2; i < regs.size(); ++i)
+        CHECK(things[i] == static_cast<int64_t>(i) * 1000000000000ll);
+    CHECK(memory[0] == combineHiLo(35000000000000ll, 1000000000000ll));
+    CHECK(memory[1] == 35000000000000ll);
+}
+
+void testShuffleSwapDouble()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadDoubleConstant(root, 1, Tmp(FPRInfo::fpRegT0), Tmp(GPRInfo::regT0));
+    loadDoubleConstant(root, 2, Tmp(FPRInfo::fpRegT1), Tmp(GPRInfo::regT0));
+    loadDoubleConstant(root, 3, Tmp(FPRInfo::fpRegT2), Tmp(GPRInfo::regT0));
+    loadDoubleConstant(root, 4, Tmp(FPRInfo::fpRegT3), Tmp(GPRInfo::regT0));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(FPRInfo::fpRegT2), Tmp(FPRInfo::fpRegT3), Arg::widthArg(Arg::Width64),
+        Tmp(FPRInfo::fpRegT3), Tmp(FPRInfo::fpRegT2), Arg::widthArg(Arg::Width64));
+
+    double things[4];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT0), Arg::addr(base, 0 * sizeof(double)));
+    root->append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT1), Arg::addr(base, 1 * sizeof(double)));
+    root->append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT2), Arg::addr(base, 2 * sizeof(double)));
+    root->append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT3), Arg::addr(base, 3 * sizeof(double)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 2);
+    CHECK(things[2] == 4);
+    CHECK(things[3] == 3);
+}
+
+void testShuffleShiftDouble()
+{
+    B3::Procedure proc;
+    Code& code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadDoubleConstant(root, 1, Tmp(FPRInfo::fpRegT0), Tmp(GPRInfo::regT0));
+    loadDoubleConstant(root, 2, Tmp(FPRInfo::fpRegT1), Tmp(GPRInfo::regT0));
+    loadDoubleConstant(root, 3, Tmp(FPRInfo::fpRegT2), Tmp(GPRInfo::regT0));
+    loadDoubleConstant(root, 4, Tmp(FPRInfo::fpRegT3), Tmp(GPRInfo::regT0));
+    root->append(
+        Shuffle, nullptr,
+        Tmp(FPRInfo::fpRegT2), Tmp(FPRInfo::fpRegT3), Arg::widthArg(Arg::Width64));
+
+    double things[4];
+    Tmp base = code.newTmp(Arg::GP);
+    root->append(Move, nullptr, Arg::imm64(bitwise_cast<intptr_t>(&things)), base);
+    root->append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT0), Arg::addr(base, 0 * sizeof(double)));
+    root->append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT1), Arg::addr(base, 1 * sizeof(double)));
+    root->append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT2), Arg::addr(base, 2 * sizeof(double)));
+    root->append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT3), Arg::addr(base, 3 * sizeof(double)));
+    root->append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root->append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun<int>(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 2);
+    CHECK(things[2] == 3);
+    CHECK(things[3] == 3);
+}
+
+#define RUN(test) do {                          \
+        if (!shouldRun(#test))                  \
+            break;                              \
+        tasks.append(                           \
+            createSharedTask<void()>(           \
+                [&] () {                        \
+                    dataLog(#test "...\n");     \
+                    test;                       \
+                    dataLog(#test ": OK!\n");   \
+                }));                            \
+    } while (false);
+
+void run(const char* filter)
+{
+    JSC::initializeThreading();
+    vm = &VM::create(LargeHeap).leakRef();
+
+    Deque<RefPtr<SharedTask<void()>>> tasks;
+
+    auto shouldRun = [&] (const char* testName) -> bool {
+        return !filter || !!strcasestr(testName, filter);
+    };
+
+    RUN(testSimple());
+    
+    RUN(testShuffleSimpleSwap());
+    RUN(testShuffleSimpleShift());
+    RUN(testShuffleLongShift());
+    RUN(testShuffleLongShiftBackwards());
+    RUN(testShuffleSimpleRotate());
+    RUN(testShuffleSimpleBroadcast());
+    RUN(testShuffleBroadcastAllRegs());
+    RUN(testShuffleTreeShift());
+    RUN(testShuffleTreeShiftBackward());
+    RUN(testShuffleTreeShiftOtherBackward());
+    RUN(testShuffleMultipleShifts());
+    RUN(testShuffleRotateWithFringe());
+    RUN(testShuffleRotateWithLongFringe());
+    RUN(testShuffleMultipleRotates());
+    RUN(testShuffleShiftAndRotate());
+    RUN(testShuffleShiftAllRegs());
+    RUN(testShuffleRotateAllRegs());
+    RUN(testShuffleSimpleSwap64());
+    RUN(testShuffleSimpleShift64());
+    RUN(testShuffleSwapMixedWidth());
+    RUN(testShuffleShiftMixedWidth());
+    RUN(testShuffleShiftMemory());
+    RUN(testShuffleShiftMemoryLong());
+    RUN(testShuffleShiftMemoryAllRegs());
+    RUN(testShuffleShiftMemoryAllRegs64());
+    RUN(testShuffleShiftMemoryAllRegsMixedWidth());
+    RUN(testShuffleRotateMemory());
+    RUN(testShuffleRotateMemory64());
+    RUN(testShuffleRotateMemoryMixedWidth());
+    RUN(testShuffleRotateMemoryAllRegs64());
+    RUN(testShuffleRotateMemoryAllRegsMixedWidth());
+    RUN(testShuffleSwapDouble());
+    RUN(testShuffleShiftDouble());
+
+    if (tasks.isEmpty())
+        usage();
+
+    Lock lock;
+
+    Vector<ThreadIdentifier> threads;
+    for (unsigned i = filter ? 1 : WTF::numberOfProcessorCores(); i--;) {
+        threads.append(
+            createThread(
+                "testb3 thread",
+                [&] () {
+                    for (;;) {
+                        RefPtr<SharedTask<void()>> task;
+                        {
+                            LockHolder locker(lock);
+                            if (tasks.isEmpty())
+                                return;
+                            task = tasks.takeFirst();
+                        }
+
+                        task->run();
+                    }
+                }));
+    }
+
+    for (ThreadIdentifier thread : threads)
+        waitForThreadCompletion(thread);
+    crashLock.lock();
+}
+
+} // anonymois namespace
+
+#else // ENABLE(B3_JIT)
+
+static void run(const char*)
+{
+    dataLog("B3 JIT is not enabled.\n");
+}
+
+#endif // ENABLE(B3_JIT)
+
+int main(int argc, char** argv)
+{
+    const char* filter = nullptr;
+    switch (argc) {
+    case 1:
+        break;
+    case 2:
+        filter = argv[1];
+        break;
+    default:
+        usage();
+        break;
+    }
+    
+    run(filter);
+    return 0;
+}
index 26f8a76..f595991 100644 (file)
@@ -7775,6 +7775,65 @@ void testCallSimple(int a, int b)
     CHECK(compileAndRun<int>(proc, a, b) == a + b);
 }
 
+void testCallRare(int a, int b)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    BasicBlock* common = proc.addBlock();
+    BasicBlock* rare = proc.addBlock();
+
+    root->appendNew<ControlValue>(
+        proc, Branch, Origin(),
+        root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0),
+        FrequentedBlock(rare, FrequencyClass::Rare),
+        FrequentedBlock(common));
+
+    common->appendNew<ControlValue>(
+        proc, Return, Origin(), common->appendNew<Const32Value>(proc, Origin(), 0));
+    
+    rare->appendNew<ControlValue>(
+        proc, Return, Origin(),
+        rare->appendNew<CCallValue>(
+            proc, Int32, Origin(),
+            rare->appendNew<ConstPtrValue>(proc, Origin(), bitwise_cast<void*>(simpleFunction)),
+            rare->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1),
+            rare->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR2)));
+
+    CHECK(compileAndRun<int>(proc, true, a, b) == a + b);
+}
+
+void testCallRareLive(int a, int b, int c)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    BasicBlock* common = proc.addBlock();
+    BasicBlock* rare = proc.addBlock();
+
+    root->appendNew<ControlValue>(
+        proc, Branch, Origin(),
+        root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0),
+        FrequentedBlock(rare, FrequencyClass::Rare),
+        FrequentedBlock(common));
+
+    common->appendNew<ControlValue>(
+        proc, Return, Origin(), common->appendNew<Const32Value>(proc, Origin(), 0));
+    
+    rare->appendNew<ControlValue>(
+        proc, Return, Origin(),
+        rare->appendNew<Value>(
+            proc, Add, Origin(),
+            rare->appendNew<CCallValue>(
+                proc, Int32, Origin(),
+                rare->appendNew<ConstPtrValue>(proc, Origin(), bitwise_cast<void*>(simpleFunction)),
+                rare->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1),
+                rare->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR2)),
+            rare->appendNew<Value>(
+                proc, Trunc, Origin(),
+                rare->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR3))));
+
+    CHECK(compileAndRun<int>(proc, true, a, b, c) == a + b + c);
+}
+
 void testCallSimplePure(int a, int b)
 {
     Procedure proc;
@@ -10069,6 +10128,8 @@ void run(const char* filter)
     RUN(testInt32ToDoublePartialRegisterWithoutStall());
 
     RUN(testCallSimple(1, 2));
+    RUN(testCallRare(1, 2));
+    RUN(testCallRareLive(1, 2, 3));
     RUN(testCallSimplePure(1, 2));
     RUN(testCallFunctionWithHellaArguments());
 
index 37cae5c..3c008fd 100644 (file)
@@ -3540,7 +3540,7 @@ private:
             m_out.branch(
                 m_out.aboveOrEqual(
                     prevLength, m_out.load32(storage, m_heaps.Butterfly_vectorLength)),
-                rarely(slowPath), usually(fastPath));
+                unsure(slowPath), unsure(fastPath));
             
             LBasicBlock lastNext = m_out.appendTo(fastPath, slowPath);
             m_out.store(
@@ -8225,7 +8225,7 @@ private:
                 LBasicBlock holeCase =
                     FTL_NEW_BLOCK(m_out, ("PutByVal hole case"));
                     
-                m_out.branch(isOutOfBounds, unsure(outOfBoundsCase), unsure(holeCase));
+                m_out.branch(isOutOfBounds, rarely(outOfBoundsCase), usually(holeCase));
                     
                 LBasicBlock innerLastNext = m_out.appendTo(outOfBoundsCase, holeCase);
                     
index 9327f86..59e1f44 100644 (file)
@@ -73,7 +73,7 @@ RefPtr<OSRExitHandle> OSRExitDescriptor::emitOSRExit(
 {
     RefPtr<OSRExitHandle> handle =
         prepareOSRExitHandle(state, exitKind, nodeOrigin, params, offset, isExceptionHandler);
-    handle->emitExitThunk(jit);
+    handle->emitExitThunk(state, jit);
     return handle;
 }
 
@@ -84,8 +84,8 @@ RefPtr<OSRExitHandle> OSRExitDescriptor::emitOSRExitLater(
     RefPtr<OSRExitHandle> handle =
         prepareOSRExitHandle(state, exitKind, nodeOrigin, params, offset, isExceptionHandler);
     params.addLatePath(
-        [handle] (CCallHelpers& jit) {
-            handle->emitExitThunk(jit);
+        [handle, &state] (CCallHelpers& jit) {
+            handle->emitExitThunk(state, jit);
         });
     return handle;
 }
index 4d654a5..c895725 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
 #if ENABLE(FTL_JIT) && FTL_USES_B3
 
 #include "FTLOSRExit.h"
+#include "FTLState.h"
 #include "FTLThunks.h"
 #include "LinkBuffer.h"
+#include "ProfilerCompilation.h"
 
 namespace JSC { namespace FTL {
 
-void OSRExitHandle::emitExitThunk(CCallHelpers& jit)
+void OSRExitHandle::emitExitThunk(State& state, CCallHelpers& jit)
 {
-    label = jit.label();
+    Profiler::Compilation* compilation = state.graph.compilation();
+    CCallHelpers::Label myLabel = jit.label();
+    label = myLabel;
     jit.pushToSaveImmediateWithoutTouchingRegisters(CCallHelpers::TrustedImm32(index));
     CCallHelpers::PatchableJump jump = jit.patchableJump();
     RefPtr<OSRExitHandle> self = this;
     jit.addLinkTask(
-        [self, jump] (LinkBuffer& linkBuffer) {
+        [self, jump, myLabel, compilation] (LinkBuffer& linkBuffer) {
             self->exit.m_patchableJump = CodeLocationJump(linkBuffer.locationOf(jump));
 
             linkBuffer.link(
                 jump.m_jump,
                 CodeLocationLabel(linkBuffer.vm().getCTIStub(osrExitGenerationThunkGenerator).code()));
+            if (compilation)
+                compilation->addOSRExitSite({ linkBuffer.locationOf(myLabel).executableAddress() });
         });
 }
 
index 2725fa4..0f66bb7 100644 (file)
@@ -35,6 +35,7 @@
 
 namespace JSC { namespace FTL {
 
+class State;
 struct OSRExit;
 
 // This is an object that stores some interesting data about an OSR exit. It's expected that you will
@@ -55,7 +56,7 @@ struct OSRExitHandle : public ThreadSafeRefCounted<OSRExitHandle> {
     CCallHelpers::Label label;
 
     // This emits the exit thunk and populates 'label'.
-    void emitExitThunk(CCallHelpers&);
+    void emitExitThunk(State&, CCallHelpers&);
 };
 
 } } // namespace JSC::FTL