B3 -O1 should not allocateStackByGraphColoring
authorfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 12 Apr 2017 21:22:14 +0000 (21:22 +0000)
committerfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 12 Apr 2017 21:22:14 +0000 (21:22 +0000)
https://bugs.webkit.org/show_bug.cgi?id=170742

Reviewed by Keith Miller.

One of B3 -O1's longest running phases is allocateStackByGraphColoring. One approach to
this would be to make that phase cheaper. But it's weird that this phase reruns
liveness after register allocation already ran liveness. If only it could reuse the
liveness computed by register allocation then it would run a lot faster. At -O2, we do
not want this, since we run phases between register allocation and stack allocation,
and those phases are free to change the liveness of spill slots (in fact,
fixObviousSpills will both shorten and lengthen live ranges because of load and store
elimination, respectively). But at -O1, we don't really need to run any phases between
register and stack allocation.

This changes Air's backend in the following ways:

- Linear scan does stack allocation. This means that we don't need to run
  allocateStackByGraphColoring at all. In reality, we reuse some of its innards, but
  we don't run the expensive part of it (liveness->interference->coalescing->coloring).
  This is a speed-up because we only run liveness once and reuse it for both register
  and stack allocation.

- Phases that previously ran between register and stack allocation are taken care of,
  each in its own special way:

  -> handleCalleSaves: this is now a utility function called by both
     allocateStackByGraphColoring and allocateRegistersAndStackByLinearScan.

  -> fixObviousSpills: we didn't run this at -O1, so nothing needs to be done.

  -> lowerAfterRegAlloc: this needed to be able to run before stack allocation because
     it could change register usage (vis a vis callee saves) and it could introduce
     spill slots. I changed this phase to have a secondary mode for when it runs after
     stack allocation.

- The part of allocateStackByGraphColoring that lowered stack addresses and took care
  of the call arg area is now a separate phase called lowerStackArgs. We run this phase
  regardless of optimization level. It's a cheap and general lowering.

This also removes spillEverything, because we never use that phase, we never test it,
and it got in the way in this refactoring.

This is a 21% speed-up on wasm -O1 compile times. This does not significantly change
-O1 throughput. We had already disabled allocateStack's most important optimization
(spill coalescing). This probably regresses average stack frame size, but I didn't
measure by how much. Stack frame size is really not that important. The algorithm in
allocateStackByGraphColoring is about much more than optimal frame size; it also
tries to avoid having to zero-extend 32-bit spills, it kills dead code, and of course
it coalesces.

* CMakeLists.txt:
* JavaScriptCore.xcodeproj/project.pbxproj:
* b3/B3Procedure.cpp:
(JSC::B3::Procedure::calleeSaveRegisterAtOffsetList):
(JSC::B3::Procedure::calleeSaveRegisters): Deleted.
* b3/B3Procedure.h:
* b3/B3StackmapGenerationParams.cpp:
(JSC::B3::StackmapGenerationParams::unavailableRegisters):
* b3/air/AirAllocateRegistersAndStackByLinearScan.cpp: Copied from Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp.
(JSC::B3::Air::allocateRegistersAndStackByLinearScan):
(JSC::B3::Air::allocateRegistersByLinearScan): Deleted.
* b3/air/AirAllocateRegistersAndStackByLinearScan.h: Copied from Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.h.
* b3/air/AirAllocateRegistersByLinearScan.cpp: Removed.
* b3/air/AirAllocateRegistersByLinearScan.h: Removed.
* b3/air/AirAllocateStackByGraphColoring.cpp:
(JSC::B3::Air::allocateEscapedStackSlots):
(JSC::B3::Air::updateFrameSizeBasedOnStackSlots):
(JSC::B3::Air::allocateStackByGraphColoring):
* b3/air/AirAllocateStackByGraphColoring.h:
* b3/air/AirArg.cpp:
(JSC::B3::Air::Arg::stackAddr):
* b3/air/AirArg.h:
(JSC::B3::Air::Arg::stackAddr): Deleted.
* b3/air/AirCode.cpp:
(JSC::B3::Air::Code::addStackSlot):
(JSC::B3::Air::Code::setCalleeSaveRegisterAtOffsetList):
(JSC::B3::Air::Code::calleeSaveRegisterAtOffsetList):
(JSC::B3::Air::Code::dump):
* b3/air/AirCode.h:
(JSC::B3::Air::Code::setStackIsAllocated):
(JSC::B3::Air::Code::stackIsAllocated):
(JSC::B3::Air::Code::calleeSaveRegisters):
* b3/air/AirGenerate.cpp:
(JSC::B3::Air::prepareForGeneration):
(JSC::B3::Air::generate):
* b3/air/AirHandleCalleeSaves.cpp:
(JSC::B3::Air::handleCalleeSaves):
* b3/air/AirHandleCalleeSaves.h:
* b3/air/AirLowerAfterRegAlloc.cpp:
(JSC::B3::Air::lowerAfterRegAlloc):
* b3/air/AirLowerStackArgs.cpp: Added.
(JSC::B3::Air::lowerStackArgs):
* b3/air/AirLowerStackArgs.h: Added.
* b3/testb3.cpp:
(JSC::B3::testPinRegisters):
* ftl/FTLCompile.cpp:
(JSC::FTL::compile):
* jit/RegisterAtOffsetList.h:
* wasm/WasmB3IRGenerator.cpp:
(JSC::Wasm::parseAndCompile):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@215292 268f45cc-cd09-0410-ab3c-d52691b4dbfc

27 files changed:
Source/JavaScriptCore/CMakeLists.txt
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj
Source/JavaScriptCore/b3/B3Procedure.cpp
Source/JavaScriptCore/b3/B3Procedure.h
Source/JavaScriptCore/b3/B3StackmapGenerationParams.cpp
Source/JavaScriptCore/b3/air/AirAllocateRegistersAndStackByLinearScan.cpp [moved from Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp with 87% similarity]
Source/JavaScriptCore/b3/air/AirAllocateRegistersAndStackByLinearScan.h [moved from Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.h with 78% similarity]
Source/JavaScriptCore/b3/air/AirAllocateStackByGraphColoring.cpp
Source/JavaScriptCore/b3/air/AirAllocateStackByGraphColoring.h
Source/JavaScriptCore/b3/air/AirArg.cpp
Source/JavaScriptCore/b3/air/AirArg.h
Source/JavaScriptCore/b3/air/AirCode.cpp
Source/JavaScriptCore/b3/air/AirCode.h
Source/JavaScriptCore/b3/air/AirGenerate.cpp
Source/JavaScriptCore/b3/air/AirHandleCalleeSaves.cpp
Source/JavaScriptCore/b3/air/AirHandleCalleeSaves.h
Source/JavaScriptCore/b3/air/AirLowerAfterRegAlloc.cpp
Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp [new file with mode: 0644]
Source/JavaScriptCore/b3/air/AirLowerStackArgs.h [moved from Source/JavaScriptCore/b3/air/AirSpillEverything.h with 63% similarity]
Source/JavaScriptCore/b3/air/AirSpillEverything.cpp [deleted file]
Source/JavaScriptCore/b3/testb3.cpp
Source/JavaScriptCore/ftl/FTLCompile.cpp
Source/JavaScriptCore/jit/RegisterAtOffsetList.h
Source/JavaScriptCore/runtime/Options.h
Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp
Tools/Scripts/run-jsc-stress-tests

index f6888cd..0ead9fe 100644 (file)
@@ -73,8 +73,8 @@ set(JavaScriptCore_SOURCES
     assembler/MacroAssemblerPrinter.cpp
     assembler/MacroAssemblerX86Common.cpp
 
+    b3/air/AirAllocateRegistersAndStackByLinearScan.cpp
     b3/air/AirAllocateRegistersByGraphColoring.cpp
-    b3/air/AirAllocateRegistersByLinearScan.cpp
     b3/air/AirAllocateStackByGraphColoring.cpp
     b3/air/AirArg.cpp
     b3/air/AirBasicBlock.cpp
@@ -100,6 +100,7 @@ set(JavaScriptCore_SOURCES
     b3/air/AirLowerAfterRegAlloc.cpp
     b3/air/AirLowerEntrySwitch.cpp
     b3/air/AirLowerMacros.cpp
+    b3/air/AirLowerStackArgs.cpp
     b3/air/AirOptimizeBlockOrder.cpp
     b3/air/AirPadInterference.cpp
     b3/air/AirPhaseInsertionSet.cpp
@@ -108,7 +109,6 @@ set(JavaScriptCore_SOURCES
     b3/air/AirReportUsedRegisters.cpp
     b3/air/AirSimplifyCFG.cpp
     b3/air/AirSpecial.cpp
-    b3/air/AirSpillEverything.cpp
     b3/air/AirStackSlot.cpp
     b3/air/AirStackSlotKind.cpp
     b3/air/AirTmp.cpp
index b36eed3..a33d3cd 100644 (file)
@@ -1,3 +1,107 @@
+2017-04-12  Filip Pizlo  <fpizlo@apple.com>
+
+        B3 -O1 should not allocateStackByGraphColoring
+        https://bugs.webkit.org/show_bug.cgi?id=170742
+
+        Reviewed by Keith Miller.
+        
+        One of B3 -O1's longest running phases is allocateStackByGraphColoring. One approach to
+        this would be to make that phase cheaper. But it's weird that this phase reruns
+        liveness after register allocation already ran liveness. If only it could reuse the
+        liveness computed by register allocation then it would run a lot faster. At -O2, we do
+        not want this, since we run phases between register allocation and stack allocation,
+        and those phases are free to change the liveness of spill slots (in fact,
+        fixObviousSpills will both shorten and lengthen live ranges because of load and store
+        elimination, respectively). But at -O1, we don't really need to run any phases between
+        register and stack allocation.
+        
+        This changes Air's backend in the following ways:
+        
+        - Linear scan does stack allocation. This means that we don't need to run
+          allocateStackByGraphColoring at all. In reality, we reuse some of its innards, but
+          we don't run the expensive part of it (liveness->interference->coalescing->coloring).
+          This is a speed-up because we only run liveness once and reuse it for both register
+          and stack allocation.
+        
+        - Phases that previously ran between register and stack allocation are taken care of,
+          each in its own special way:
+          
+          -> handleCalleSaves: this is now a utility function called by both
+             allocateStackByGraphColoring and allocateRegistersAndStackByLinearScan.
+          
+          -> fixObviousSpills: we didn't run this at -O1, so nothing needs to be done.
+          
+          -> lowerAfterRegAlloc: this needed to be able to run before stack allocation because
+             it could change register usage (vis a vis callee saves) and it could introduce
+             spill slots. I changed this phase to have a secondary mode for when it runs after
+             stack allocation.
+        
+        - The part of allocateStackByGraphColoring that lowered stack addresses and took care
+          of the call arg area is now a separate phase called lowerStackArgs. We run this phase
+          regardless of optimization level. It's a cheap and general lowering.
+        
+        This also removes spillEverything, because we never use that phase, we never test it,
+        and it got in the way in this refactoring.
+        
+        This is a 21% speed-up on wasm -O1 compile times. This does not significantly change
+        -O1 throughput. We had already disabled allocateStack's most important optimization
+        (spill coalescing). This probably regresses average stack frame size, but I didn't
+        measure by how much. Stack frame size is really not that important. The algorithm in
+        allocateStackByGraphColoring is about much more than optimal frame size; it also
+        tries to avoid having to zero-extend 32-bit spills, it kills dead code, and of course
+        it coalesces.
+
+        * CMakeLists.txt:
+        * JavaScriptCore.xcodeproj/project.pbxproj:
+        * b3/B3Procedure.cpp:
+        (JSC::B3::Procedure::calleeSaveRegisterAtOffsetList):
+        (JSC::B3::Procedure::calleeSaveRegisters): Deleted.
+        * b3/B3Procedure.h:
+        * b3/B3StackmapGenerationParams.cpp:
+        (JSC::B3::StackmapGenerationParams::unavailableRegisters):
+        * b3/air/AirAllocateRegistersAndStackByLinearScan.cpp: Copied from Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.cpp.
+        (JSC::B3::Air::allocateRegistersAndStackByLinearScan):
+        (JSC::B3::Air::allocateRegistersByLinearScan): Deleted.
+        * b3/air/AirAllocateRegistersAndStackByLinearScan.h: Copied from Source/JavaScriptCore/b3/air/AirAllocateRegistersByLinearScan.h.
+        * b3/air/AirAllocateRegistersByLinearScan.cpp: Removed.
+        * b3/air/AirAllocateRegistersByLinearScan.h: Removed.
+        * b3/air/AirAllocateStackByGraphColoring.cpp:
+        (JSC::B3::Air::allocateEscapedStackSlots):
+        (JSC::B3::Air::updateFrameSizeBasedOnStackSlots):
+        (JSC::B3::Air::allocateStackByGraphColoring):
+        * b3/air/AirAllocateStackByGraphColoring.h:
+        * b3/air/AirArg.cpp:
+        (JSC::B3::Air::Arg::stackAddr):
+        * b3/air/AirArg.h:
+        (JSC::B3::Air::Arg::stackAddr): Deleted.
+        * b3/air/AirCode.cpp:
+        (JSC::B3::Air::Code::addStackSlot):
+        (JSC::B3::Air::Code::setCalleeSaveRegisterAtOffsetList):
+        (JSC::B3::Air::Code::calleeSaveRegisterAtOffsetList):
+        (JSC::B3::Air::Code::dump):
+        * b3/air/AirCode.h:
+        (JSC::B3::Air::Code::setStackIsAllocated):
+        (JSC::B3::Air::Code::stackIsAllocated):
+        (JSC::B3::Air::Code::calleeSaveRegisters):
+        * b3/air/AirGenerate.cpp:
+        (JSC::B3::Air::prepareForGeneration):
+        (JSC::B3::Air::generate):
+        * b3/air/AirHandleCalleeSaves.cpp:
+        (JSC::B3::Air::handleCalleeSaves):
+        * b3/air/AirHandleCalleeSaves.h:
+        * b3/air/AirLowerAfterRegAlloc.cpp:
+        (JSC::B3::Air::lowerAfterRegAlloc):
+        * b3/air/AirLowerStackArgs.cpp: Added.
+        (JSC::B3::Air::lowerStackArgs):
+        * b3/air/AirLowerStackArgs.h: Added.
+        * b3/testb3.cpp:
+        (JSC::B3::testPinRegisters):
+        * ftl/FTLCompile.cpp:
+        (JSC::FTL::compile):
+        * jit/RegisterAtOffsetList.h:
+        * wasm/WasmB3IRGenerator.cpp:
+        (JSC::Wasm::parseAndCompile):
+
 2017-04-12  Michael Saboff  <msaboff@apple.com>
 
         Implement Object.isFrozen() and Object.isSealed() per ECMA spec
index 9b94f30..9f451bc 100644 (file)
                0F25F1B4181635F300522F39 /* FTLSlowPathCallKey.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F25F1AD181635F300522F39 /* FTLSlowPathCallKey.h */; settings = {ATTRIBUTES = (Private, ); }; };
                0F2AC5661E8A0B770001EE3F /* AirFixSpillsAfterTerminals.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F2AC5641E8A0B760001EE3F /* AirFixSpillsAfterTerminals.cpp */; };
                0F2AC5671E8A0B790001EE3F /* AirFixSpillsAfterTerminals.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F2AC5651E8A0B760001EE3F /* AirFixSpillsAfterTerminals.h */; };
-               0F2AC56A1E8A0BD30001EE3F /* AirAllocateRegistersByLinearScan.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F2AC5681E8A0BD10001EE3F /* AirAllocateRegistersByLinearScan.cpp */; };
-               0F2AC56B1E8A0BD50001EE3F /* AirAllocateRegistersByLinearScan.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F2AC5691E8A0BD10001EE3F /* AirAllocateRegistersByLinearScan.h */; };
+               0F2AC56A1E8A0BD30001EE3F /* AirAllocateRegistersAndStackByLinearScan.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F2AC5681E8A0BD10001EE3F /* AirAllocateRegistersAndStackByLinearScan.cpp */; };
+               0F2AC56B1E8A0BD50001EE3F /* AirAllocateRegistersAndStackByLinearScan.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F2AC5691E8A0BD10001EE3F /* AirAllocateRegistersAndStackByLinearScan.h */; };
                0F2AC56E1E8D7B000001EE3F /* AirPhaseInsertionSet.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F2AC56C1E8D7AFF0001EE3F /* AirPhaseInsertionSet.cpp */; };
                0F2AC56F1E8D7B030001EE3F /* AirPhaseInsertionSet.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F2AC56D1E8D7AFF0001EE3F /* AirPhaseInsertionSet.h */; };
                0F2AC5711E8EE4540001EE3F /* AirFormTable.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F2AC5701E8EE4520001EE3F /* AirFormTable.h */; };
                0F5AE2C41DF4F2800066EFE1 /* VMInlines.h in Headers */ = {isa = PBXBuildFile; fileRef = FE90BB3A1B7CF64E006B3F03 /* VMInlines.h */; settings = {ATTRIBUTES = (Private, ); }; };
                0F5B4A331C84F0D600F1B17E /* SlowPathReturnType.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F5B4A321C84F0D600F1B17E /* SlowPathReturnType.h */; settings = {ATTRIBUTES = (Private, ); }; };
                0F5CF9811E96F17F00C18692 /* AirTmpMap.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F5CF9801E96F17D00C18692 /* AirTmpMap.h */; };
+               0F5CF9841E9D537700C18692 /* AirLowerStackArgs.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F5CF9831E9D537500C18692 /* AirLowerStackArgs.h */; };
+               0F5CF9851E9D537A00C18692 /* AirLowerStackArgs.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F5CF9821E9D537500C18692 /* AirLowerStackArgs.cpp */; };
                0F5D085D1B8CF99D001143B4 /* DFGNodeOrigin.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F5D085C1B8CF99D001143B4 /* DFGNodeOrigin.cpp */; };
                0F5EF91E16878F7A003E5C25 /* JITThunks.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F5EF91B16878F78003E5C25 /* JITThunks.cpp */; };
                0F5EF91F16878F7D003E5C25 /* JITThunks.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F5EF91C16878F78003E5C25 /* JITThunks.h */; settings = {ATTRIBUTES = (Private, ); }; };
                0FEC85841BDACDC70080FF74 /* AirPhaseScope.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FEC855F1BDACDC70080FF74 /* AirPhaseScope.h */; };
                0FEC85871BDACDC70080FF74 /* AirSpecial.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FEC85621BDACDC70080FF74 /* AirSpecial.cpp */; };
                0FEC85881BDACDC70080FF74 /* AirSpecial.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FEC85631BDACDC70080FF74 /* AirSpecial.h */; };
-               0FEC85891BDACDC70080FF74 /* AirSpillEverything.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FEC85641BDACDC70080FF74 /* AirSpillEverything.cpp */; };
-               0FEC858A1BDACDC70080FF74 /* AirSpillEverything.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FEC85651BDACDC70080FF74 /* AirSpillEverything.h */; };
                0FEC858B1BDACDC70080FF74 /* AirStackSlot.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FEC85661BDACDC70080FF74 /* AirStackSlot.cpp */; };
                0FEC858C1BDACDC70080FF74 /* AirStackSlot.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FEC85671BDACDC70080FF74 /* AirStackSlot.h */; };
                0FEC858D1BDACDC70080FF74 /* AirTmp.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FEC85681BDACDC70080FF74 /* AirTmp.cpp */; };
                0F25F1AD181635F300522F39 /* FTLSlowPathCallKey.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = FTLSlowPathCallKey.h; path = ftl/FTLSlowPathCallKey.h; sourceTree = "<group>"; };
                0F2AC5641E8A0B760001EE3F /* AirFixSpillsAfterTerminals.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirFixSpillsAfterTerminals.cpp; path = b3/air/AirFixSpillsAfterTerminals.cpp; sourceTree = "<group>"; };
                0F2AC5651E8A0B760001EE3F /* AirFixSpillsAfterTerminals.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirFixSpillsAfterTerminals.h; path = b3/air/AirFixSpillsAfterTerminals.h; sourceTree = "<group>"; };
-               0F2AC5681E8A0BD10001EE3F /* AirAllocateRegistersByLinearScan.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirAllocateRegistersByLinearScan.cpp; path = b3/air/AirAllocateRegistersByLinearScan.cpp; sourceTree = "<group>"; };
-               0F2AC5691E8A0BD10001EE3F /* AirAllocateRegistersByLinearScan.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirAllocateRegistersByLinearScan.h; path = b3/air/AirAllocateRegistersByLinearScan.h; sourceTree = "<group>"; };
+               0F2AC5681E8A0BD10001EE3F /* AirAllocateRegistersAndStackByLinearScan.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirAllocateRegistersAndStackByLinearScan.cpp; path = b3/air/AirAllocateRegistersAndStackByLinearScan.cpp; sourceTree = "<group>"; };
+               0F2AC5691E8A0BD10001EE3F /* AirAllocateRegistersAndStackByLinearScan.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirAllocateRegistersAndStackByLinearScan.h; path = b3/air/AirAllocateRegistersAndStackByLinearScan.h; sourceTree = "<group>"; };
                0F2AC56C1E8D7AFF0001EE3F /* AirPhaseInsertionSet.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirPhaseInsertionSet.cpp; path = b3/air/AirPhaseInsertionSet.cpp; sourceTree = "<group>"; };
                0F2AC56D1E8D7AFF0001EE3F /* AirPhaseInsertionSet.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirPhaseInsertionSet.h; path = b3/air/AirPhaseInsertionSet.h; sourceTree = "<group>"; };
                0F2AC5701E8EE4520001EE3F /* AirFormTable.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirFormTable.h; path = b3/air/AirFormTable.h; sourceTree = "<group>"; };
                0F5A6282188C98D40072C9DF /* FTLValueRange.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = FTLValueRange.h; path = ftl/FTLValueRange.h; sourceTree = "<group>"; };
                0F5B4A321C84F0D600F1B17E /* SlowPathReturnType.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = SlowPathReturnType.h; sourceTree = "<group>"; };
                0F5CF9801E96F17D00C18692 /* AirTmpMap.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirTmpMap.h; path = b3/air/AirTmpMap.h; sourceTree = "<group>"; };
+               0F5CF9821E9D537500C18692 /* AirLowerStackArgs.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirLowerStackArgs.cpp; path = b3/air/AirLowerStackArgs.cpp; sourceTree = "<group>"; };
+               0F5CF9831E9D537500C18692 /* AirLowerStackArgs.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirLowerStackArgs.h; path = b3/air/AirLowerStackArgs.h; sourceTree = "<group>"; };
                0F5D085C1B8CF99D001143B4 /* DFGNodeOrigin.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = DFGNodeOrigin.cpp; path = dfg/DFGNodeOrigin.cpp; sourceTree = "<group>"; };
                0F5EF91B16878F78003E5C25 /* JITThunks.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = JITThunks.cpp; sourceTree = "<group>"; };
                0F5EF91C16878F78003E5C25 /* JITThunks.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = JITThunks.h; sourceTree = "<group>"; };
                0FEC855F1BDACDC70080FF74 /* AirPhaseScope.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirPhaseScope.h; path = b3/air/AirPhaseScope.h; sourceTree = "<group>"; };
                0FEC85621BDACDC70080FF74 /* AirSpecial.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirSpecial.cpp; path = b3/air/AirSpecial.cpp; sourceTree = "<group>"; };
                0FEC85631BDACDC70080FF74 /* AirSpecial.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirSpecial.h; path = b3/air/AirSpecial.h; sourceTree = "<group>"; };
-               0FEC85641BDACDC70080FF74 /* AirSpillEverything.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirSpillEverything.cpp; path = b3/air/AirSpillEverything.cpp; sourceTree = "<group>"; };
-               0FEC85651BDACDC70080FF74 /* AirSpillEverything.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirSpillEverything.h; path = b3/air/AirSpillEverything.h; sourceTree = "<group>"; };
                0FEC85661BDACDC70080FF74 /* AirStackSlot.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirStackSlot.cpp; path = b3/air/AirStackSlot.cpp; sourceTree = "<group>"; };
                0FEC85671BDACDC70080FF74 /* AirStackSlot.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirStackSlot.h; path = b3/air/AirStackSlot.h; sourceTree = "<group>"; };
                0FEC85681BDACDC70080FF74 /* AirTmp.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirTmp.cpp; path = b3/air/AirTmp.cpp; sourceTree = "<group>"; };
                0FEC84B31BDACD880080FF74 /* air */ = {
                        isa = PBXGroup;
                        children = (
+                               0F2AC5681E8A0BD10001EE3F /* AirAllocateRegistersAndStackByLinearScan.cpp */,
+                               0F2AC5691E8A0BD10001EE3F /* AirAllocateRegistersAndStackByLinearScan.h */,
                                7965C2141E5D799600B7591D /* AirAllocateRegistersByGraphColoring.cpp */,
                                7965C2151E5D799600B7591D /* AirAllocateRegistersByGraphColoring.h */,
-                               0F2AC5681E8A0BD10001EE3F /* AirAllocateRegistersByLinearScan.cpp */,
-                               0F2AC5691E8A0BD10001EE3F /* AirAllocateRegistersByLinearScan.h */,
                                0FEC85481BDACDC70080FF74 /* AirAllocateStackByGraphColoring.cpp */,
                                0FEC85491BDACDC70080FF74 /* AirAllocateStackByGraphColoring.h */,
                                0FEC854A1BDACDC70080FF74 /* AirArg.cpp */,
                                0FDF70841D3F2C1F00927449 /* AirLowerEntrySwitch.h */,
                                0F6183271C45BF070072450B /* AirLowerMacros.cpp */,
                                0F6183281C45BF070072450B /* AirLowerMacros.h */,
+                               0F5CF9821E9D537500C18692 /* AirLowerStackArgs.cpp */,
+                               0F5CF9831E9D537500C18692 /* AirLowerStackArgs.h */,
                                264091FA1BE2FD4100684DB2 /* AirOpcode.opcodes */,
                                0FB3878C1BFBC44D00E3AB1E /* AirOptimizeBlockOrder.cpp */,
                                0FB3878D1BFBC44D00E3AB1E /* AirOptimizeBlockOrder.h */,
                                0F338DFC1BED51270013C88F /* AirSimplifyCFG.h */,
                                0FEC85621BDACDC70080FF74 /* AirSpecial.cpp */,
                                0FEC85631BDACDC70080FF74 /* AirSpecial.h */,
-                               0FEC85641BDACDC70080FF74 /* AirSpillEverything.cpp */,
-                               0FEC85651BDACDC70080FF74 /* AirSpillEverything.h */,
                                0FEC85661BDACDC70080FF74 /* AirStackSlot.cpp */,
                                0FEC85671BDACDC70080FF74 /* AirStackSlot.h */,
                                0F2BBD9B1C5FF4050023EF23 /* AirStackSlotKind.cpp */,
                                0F45703D1BE45F0A0062A629 /* AirReportUsedRegisters.h in Headers */,
                                0F338DFE1BED51270013C88F /* AirSimplifyCFG.h in Headers */,
                                0FEC85881BDACDC70080FF74 /* AirSpecial.h in Headers */,
-                               0FEC858A1BDACDC70080FF74 /* AirSpillEverything.h in Headers */,
                                0FEC858C1BDACDC70080FF74 /* AirStackSlot.h in Headers */,
                                0F2BBD9E1C5FF4050023EF23 /* AirStackSlotKind.h in Headers */,
                                0FEC858E1BDACDC70080FF74 /* AirTmp.h in Headers */,
                                0FEC85101BDACDAC0080FF74 /* B3Const64Value.h in Headers */,
                                0FEC85121BDACDAC0080FF74 /* B3ConstDoubleValue.h in Headers */,
                                43422A631C158E6D00E2EB98 /* B3ConstFloatValue.h in Headers */,
-                               0F2AC56B1E8A0BD50001EE3F /* AirAllocateRegistersByLinearScan.h in Headers */,
+                               0F2AC56B1E8A0BD50001EE3F /* AirAllocateRegistersAndStackByLinearScan.h in Headers */,
                                0FEC85B31BDED9570080FF74 /* B3ConstPtrValue.h in Headers */,
                                0F338DF61BE93D550013C88F /* B3ConstrainedValue.h in Headers */,
                                0F2C63B81E6343F700C13839 /* B3GenericBlockInsertionSet.h in Headers */,
                                A51007C1187CC3C600B38879 /* JSGlobalObjectInspectorController.h in Headers */,
                                A50E4B6418809DD50068A46D /* JSGlobalObjectRuntimeAgent.h in Headers */,
                                0F2C63C41E69EF9400C13839 /* B3MemoryValueInlines.h in Headers */,
+                               0F5CF9841E9D537700C18692 /* AirLowerStackArgs.h in Headers */,
                                A503FA2A188F105900110F14 /* JSGlobalObjectScriptDebugServer.h in Headers */,
                                A513E5C0185BFACC007E95AD /* JSInjectedScriptHost.h in Headers */,
                                A513E5C2185BFACC007E95AD /* JSInjectedScriptHostPrototype.h in Headers */,
                                0F45703C1BE45F0A0062A629 /* AirReportUsedRegisters.cpp in Sources */,
                                0F338DFD1BED51270013C88F /* AirSimplifyCFG.cpp in Sources */,
                                0FEC85871BDACDC70080FF74 /* AirSpecial.cpp in Sources */,
-                               0FEC85891BDACDC70080FF74 /* AirSpillEverything.cpp in Sources */,
                                0FEC858B1BDACDC70080FF74 /* AirStackSlot.cpp in Sources */,
                                0F2BBD9D1C5FF4050023EF23 /* AirStackSlotKind.cpp in Sources */,
                                0FEC858D1BDACDC70080FF74 /* AirTmp.cpp in Sources */,
                                A78A9776179738B8009DF744 /* DFGFinalizer.cpp in Sources */,
                                0F2BDC15151C5D4D00CD8910 /* DFGFixupPhase.cpp in Sources */,
                                0F20177F1DCADC3300EA5950 /* DFGFlowIndexing.cpp in Sources */,
-                               0F2AC56A1E8A0BD30001EE3F /* AirAllocateRegistersByLinearScan.cpp in Sources */,
+                               0F2AC56A1E8A0BD30001EE3F /* AirAllocateRegistersAndStackByLinearScan.cpp in Sources */,
                                0F9D339617FFC4E60073C2BC /* DFGFlushedAt.cpp in Sources */,
                                A7D89CF717A0B8CC00773AD8 /* DFGFlushFormat.cpp in Sources */,
                                0F69CC88193AC60A0045759E /* DFGFrozenValue.cpp in Sources */,
                                14280842107EC0930013E7B2 /* RegExpConstructor.cpp in Sources */,
                                8642C512151C083D0046D4EF /* RegExpMatchesArray.cpp in Sources */,
                                14280843107EC0930013E7B2 /* RegExpObject.cpp in Sources */,
+                               0F5CF9851E9D537A00C18692 /* AirLowerStackArgs.cpp in Sources */,
                                14280844107EC0930013E7B2 /* RegExpPrototype.cpp in Sources */,
                                6540C7A01B82E1C3000F6B79 /* RegisterAtOffset.cpp in Sources */,
                                6540C7A11B82E1C3000F6B79 /* RegisterAtOffsetList.cpp in Sources */,
index 2069ab3..1c894e9 100644 (file)
@@ -345,9 +345,9 @@ unsigned Procedure::frameSize() const
     return code().frameSize();
 }
 
-const RegisterAtOffsetList& Procedure::calleeSaveRegisters() const
+RegisterAtOffsetList Procedure::calleeSaveRegisterAtOffsetList() const
 {
-    return code().calleeSaveRegisters();
+    return code().calleeSaveRegisterAtOffsetList();
 }
 
 Value* Procedure::addValueImpl(Value* value)
index 2ca579d..48ede1b 100644 (file)
@@ -243,7 +243,7 @@ public:
     bool needsUsedRegisters() const { return m_needsUsedRegisters; }
 
     JS_EXPORT_PRIVATE unsigned frameSize() const;
-    JS_EXPORT_PRIVATE const RegisterAtOffsetList& calleeSaveRegisters() const;
+    JS_EXPORT_PRIVATE RegisterAtOffsetList calleeSaveRegisterAtOffsetList() const;
 
     PCToOriginMap& pcToOriginMap() { return m_pcToOriginMap; }
     PCToOriginMap releasePCToOriginMap() { return WTFMove(m_pcToOriginMap); }
index 1d653c5..bba84ca 100644 (file)
@@ -48,8 +48,7 @@ RegisterSet StackmapGenerationParams::unavailableRegisters() const
     RegisterSet result = usedRegisters();
     
     RegisterSet unsavedCalleeSaves = RegisterSet::vmCalleeSaveRegisters();
-    for (const RegisterAtOffset& regAtOffset : m_context.code->calleeSaveRegisters())
-        unsavedCalleeSaves.clear(regAtOffset.reg());
+    unsavedCalleeSaves.exclude(m_context.code->calleeSaveRegisters());
 
     result.merge(unsavedCalleeSaves);
 
  */
 
 #include "config.h"
-#include "AirAllocateRegistersByLinearScan.h"
+#include "AirAllocateRegistersAndStackByLinearScan.h"
 
 #if ENABLE(B3_JIT)
 
+#include "AirAllocateStackByGraphColoring.h"
 #include "AirArgInlines.h"
 #include "AirCode.h"
 #include "AirFixSpillsAfterTerminals.h"
+#include "AirHandleCalleeSaves.h"
 #include "AirPhaseInsertionSet.h"
 #include "AirInstInlines.h"
 #include "AirLiveness.h"
@@ -84,6 +86,7 @@ struct TmpData {
     Reg assigned;
     bool isUnspillable { false };
     bool didBuildPossibleRegs { false };
+    unsigned spillIndex { 0 };
 };
 
 struct Clobber {
@@ -118,6 +121,7 @@ public:
     
     void run()
     {
+        padInterference(m_code);
         buildRegisterSet();
         buildIndices();
         buildIntervals();
@@ -126,11 +130,11 @@ public:
             emitSpillCode();
         }
         for (;;) {
-            prepareIntervals();
+            prepareIntervalsForScanForRegisters();
             m_didSpill = false;
             forEachBank(
                 [&] (Bank bank) {
-                    attemptScan(bank);
+                    attemptScanForRegisters(bank);
                 });
             if (!m_didSpill)
                 break;
@@ -138,6 +142,14 @@ public:
         }
         insertSpillCode();
         assignRegisters();
+        fixSpillsAfterTerminals(m_code);
+
+        handleCalleeSaves(m_code);
+        allocateEscapedStackSlots(m_code);
+        prepareIntervalsForScanForStack();
+        scanForStack();
+        updateFrameSizeBasedOnStackSlots(m_code);
+        m_code.setStackIsAllocated(true);
     }
     
 private:
@@ -186,6 +198,7 @@ private:
     
     void buildIntervals()
     {
+        TimingScope timingScope("LinearScan::buildIntervals");
         UnifiedTmpLiveness liveness(m_code);
         
         for (BasicBlock* block : m_code) {
@@ -266,6 +279,7 @@ private:
                 [&] (Tmp tmp) {
                     dataLog("    ", tmp, ": ", m_map[tmp], "\n");
                 });
+            dataLog("Clobbers: ", listDump(m_clobbers), "\n");
         }
     }
     
@@ -288,17 +302,37 @@ private:
             });
     }
     
-    void prepareIntervals()
+    void prepareIntervalsForScanForRegisters()
+    {
+        prepareIntervals(
+            [&] (TmpData& data) -> bool {
+                if (data.spilled)
+                    return false;
+                
+                data.assigned = Reg();
+                return true;
+            });
+    }
+    
+    void prepareIntervalsForScanForStack()
+    {
+        prepareIntervals(
+            [&] (TmpData& data) -> bool {
+                return data.spilled;
+            });
+    }
+    
+    template<typename SelectFunc>
+    void prepareIntervals(const SelectFunc& selectFunc)
     {
         m_tmps.resize(0);
         
         m_code.forEachTmp(
             [&] (Tmp tmp) {
                 TmpData& data = m_map[tmp];
-                if (data.spilled)
+                if (!selectFunc(data))
                     return;
                 
-                data.assigned = Reg();
                 m_tmps.append(tmp);
             });
         
@@ -308,10 +342,8 @@ private:
                 return m_map[a].interval.begin() < m_map[b].interval.begin();
             });
         
-        if (verbose()) {
+        if (verbose())
             dataLog("Tmps: ", listDump(m_tmps), "\n");
-            dataLog("Clobbers: ", listDump(m_clobbers), "\n");
-        }
     }
     
     Tmp addSpillTmpWithInterval(Bank bank, Interval interval)
@@ -325,7 +357,7 @@ private:
         return tmp;
     }
     
-    void attemptScan(Bank bank)
+    void attemptScanForRegisters(Bank bank)
     {
         // This is modeled after LinearScanRegisterAllocation in Fig. 1 in
         // http://dl.acm.org/citation.cfm?id=330250.
@@ -520,6 +552,45 @@ private:
         }
     }
     
+    void scanForStack()
+    {
+        // This is loosely modeled after LinearScanRegisterAllocation in Fig. 1 in
+        // http://dl.acm.org/citation.cfm?id=330250.
+
+        m_active.clear();
+        m_usedSpillSlots.clearAll();
+        
+        for (Tmp& tmp : m_tmps) {
+            TmpData& entry = m_map[tmp];
+            if (!entry.spilled)
+                continue;
+            
+            size_t index = entry.interval.begin();
+            
+            // This is ExpireOldIntervals in Fig. 1.
+            while (!m_active.isEmpty()) {
+                Tmp tmp = m_active.first();
+                TmpData& entry = m_map[tmp];
+                
+                bool expired = entry.interval.end() <= index;
+                
+                if (!expired)
+                    break;
+                
+                m_active.removeFirst();
+                m_usedSpillSlots.clear(entry.spillIndex);
+            }
+            
+            entry.spillIndex = m_usedSpillSlots.findBit(0, false);
+            ptrdiff_t offset = -static_cast<ptrdiff_t>(m_code.frameSize()) - static_cast<ptrdiff_t>(entry.spillIndex) * 8 - 8;
+            if (verbose())
+                dataLog("  Assigning offset = ", offset, " to spill ", pointerDump(entry.spilled), " for ", tmp, "\n");
+            entry.spilled->setOffsetFromFP(offset);
+            m_usedSpillSlots.set(entry.spillIndex);
+            m_active.append(tmp);
+        }
+    }
+    
     void insertSpillCode()
     {
         for (BasicBlock* block : m_code)
@@ -569,11 +640,15 @@ private:
     Vector<Tmp> m_tmps;
     Deque<Tmp> m_active;
     RegisterSet m_activeRegs;
+    BitVector m_usedSpillSlots;
     bool m_didSpill { false };
 };
 
-void runLinearScan(Code& code)
+} // anonymous namespace
+
+void allocateRegistersAndStackByLinearScan(Code& code)
 {
+    PhaseScope phaseScope(code, "allocateRegistersAndStackByLinearScan");
     if (verbose())
         dataLog("Air before linear scan:\n", code);
     LinearScan linearScan(code);
@@ -582,16 +657,6 @@ void runLinearScan(Code& code)
         dataLog("Air after linear scan:\n", code);
 }
 
-} // anonymous namespace
-
-void allocateRegistersByLinearScan(Code& code)
-{
-    PhaseScope phaseScope(code, "allocateRegistersByLinearScan");
-    padInterference(code);
-    runLinearScan(code);
-    fixSpillsAfterTerminals(code);
-}
-
 } } } // namespace JSC::B3::Air
 
 #endif // ENABLE(B3_JIT)
@@ -37,13 +37,16 @@ class Code;
 // This implements the Poletto and Sarkar register allocator called "linear scan":
 // http://dl.acm.org/citation.cfm?id=330250
 //
-// This is not Air's primary register allocator. We use it only when running at optLevel<2. That's not
-// the default level. This register allocator is optimized primarily for running quickly. It's expected
-// that improvements to this register allocator should focus on improving its execution time without much
-// regard for the quality of generated code. If you want good code, use graph coloring.
+// This is not Air's primary register allocator. We use it only when running at optLevel<2.
+// That's not the default level. This register allocator is optimized primarily for running
+// quickly. It's expected that improvements to this register allocator should focus on improving
+// its execution time without much regard for the quality of generated code. If you want good
+// code, use graph coloring.
 //
 // For Air's primary register allocator, see AirAllocateRegistersByGraphColoring.h|cpp.
-void allocateRegistersByLinearScan(Code&);
+//
+// This also does stack allocation as an afterthought. It does not do any spill coalescing.
+void allocateRegistersAndStackByLinearScan(Code&);
 
 } } } // namespace JSC::B3::Air
 
index 1bd1f6c..79cb260 100644 (file)
@@ -30,7 +30,7 @@
 
 #include "AirArgInlines.h"
 #include "AirCode.h"
-#include "AirInsertionSet.h"
+#include "AirHandleCalleeSaves.h"
 #include "AirInstInlines.h"
 #include "AirLiveness.h"
 #include "AirPhaseScope.h"
@@ -129,15 +129,11 @@ struct CoalescableMove {
     double frequency { PNaN };
 };
 
-} // anonymous namespace
-
-void allocateStackByGraphColoring(Code& code)
+Vector<StackSlot*> allocateEscapedStackSlotsImpl(Code& code)
 {
-    PhaseScope phaseScope(code, "allocateStackByGraphColoring");
-
     // Allocate all of the escaped slots in order. This is kind of a crazy algorithm to allow for
     // the possibility of stack slots being assigned frame offsets before we even get here.
-    ASSERT(!code.frameSize());
+    RELEASE_ASSERT(!code.frameSize());
     Vector<StackSlot*> assignedEscapedStackSlots;
     Vector<StackSlot*> escapedStackSlotsWorklist;
     for (StackSlot* slot : code.stackSlots()) {
@@ -158,6 +154,37 @@ void allocateStackByGraphColoring(Code& code)
         assign(slot, assignedEscapedStackSlots);
         assignedEscapedStackSlots.append(slot);
     }
+    return assignedEscapedStackSlots;
+}
+
+template<typename Collection>
+void updateFrameSizeBasedOnStackSlotsImpl(Code& code, const Collection& collection)
+{
+    unsigned frameSize = 0;
+    for (StackSlot* slot : collection)
+        frameSize = std::max(frameSize, static_cast<unsigned>(-slot->offsetFromFP()));
+    code.setFrameSize(WTF::roundUpToMultipleOf(stackAlignmentBytes(), frameSize));
+}
+
+} // anonymous namespace
+
+void allocateEscapedStackSlots(Code& code)
+{
+    updateFrameSizeBasedOnStackSlotsImpl(code, allocateEscapedStackSlotsImpl(code));
+}
+
+void updateFrameSizeBasedOnStackSlots(Code& code)
+{
+    updateFrameSizeBasedOnStackSlotsImpl(code, code.stackSlots());
+}
+
+void allocateStackByGraphColoring(Code& code)
+{
+    PhaseScope phaseScope(code, "allocateStackByGraphColoring");
+    
+    handleCalleeSaves(code);
+    
+    Vector<StackSlot*> assignedEscapedStackSlots = allocateEscapedStackSlotsImpl(code);
 
     // Now we handle the spill slots.
     StackSlotLiveness liveness(code);
@@ -377,88 +404,11 @@ void allocateStackByGraphColoring(Code& code)
         assign(slot, otherSlots);
     }
 
-    // Figure out how much stack we're using for stack slots.
-    unsigned frameSizeForStackSlots = 0;
-    for (StackSlot* slot : code.stackSlots()) {
-        frameSizeForStackSlots = std::max(
-            frameSizeForStackSlots,
-            static_cast<unsigned>(-slot->offsetFromFP()));
-    }
-
-    frameSizeForStackSlots = WTF::roundUpToMultipleOf(stackAlignmentBytes(), frameSizeForStackSlots);
-
-    // Now we need to deduce how much argument area we need.
-    for (BasicBlock* block : code) {
-        for (Inst& inst : *block) {
-            for (Arg& arg : inst.args) {
-                if (arg.isCallArg()) {
-                    // For now, we assume that we use 8 bytes of the call arg. But that's not
-                    // such an awesome assumption.
-                    // FIXME: https://bugs.webkit.org/show_bug.cgi?id=150454
-                    ASSERT(arg.offset() >= 0);
-                    code.requestCallArgAreaSizeInBytes(arg.offset() + 8);
-                }
-            }
-        }
-    }
-
-    code.setFrameSize(frameSizeForStackSlots + code.callArgAreaSizeInBytes());
-
-    // Finally, transform the code to use Addr's instead of StackSlot's. This is a lossless
-    // transformation since we can search the StackSlots array to figure out which StackSlot any
-    // offset-from-FP refers to.
-
-    // FIXME: This may produce addresses that aren't valid if we end up with a ginormous stack frame.
-    // We would have to scavenge for temporaries if this happened. Fortunately, this case will be
-    // extremely rare so we can do crazy things when it arises.
-    // https://bugs.webkit.org/show_bug.cgi?id=152530
-
-    InsertionSet insertionSet(code);
-    for (BasicBlock* block : code) {
-        for (unsigned instIndex = 0; instIndex < block->size(); ++instIndex) {
-            Inst& inst = block->at(instIndex);
-            inst.forEachArg(
-                [&] (Arg& arg, Arg::Role role, Bank, Width width) {
-                    auto stackAddr = [&] (int32_t offset) -> Arg {
-                        return Arg::stackAddr(offset, code.frameSize(), width);
-                    };
-                    
-                    switch (arg.kind()) {
-                    case Arg::Stack: {
-                        StackSlot* slot = arg.stackSlot();
-                        if (Arg::isZDef(role)
-                            && slot->kind() == StackSlotKind::Spill
-                            && slot->byteSize() > bytes(width)) {
-                            // Currently we only handle this simple case because it's the only one
-                            // that arises: ZDef's are only 32-bit right now. So, when we hit these
-                            // assertions it means that we need to implement those other kinds of
-                            // zero fills.
-                            RELEASE_ASSERT(slot->byteSize() == 8);
-                            RELEASE_ASSERT(width == Width32);
-
-                            RELEASE_ASSERT(isValidForm(StoreZero32, Arg::Stack));
-                            insertionSet.insert(
-                                instIndex + 1, StoreZero32, inst.origin,
-                                stackAddr(arg.offset() + 4 + slot->offsetFromFP()));
-                        }
-                        arg = stackAddr(arg.offset() + slot->offsetFromFP());
-                        break;
-                    }
-                    case Arg::CallArg:
-                        arg = stackAddr(arg.offset() - code.frameSize());
-                        break;
-                    default:
-                        break;
-                    }
-                }
-            );
-        }
-        insertionSet.execute(block);
-    }
+    updateFrameSizeBasedOnStackSlots(code);
+    code.setStackIsAllocated(true);
 }
 
 } } } // namespace JSC::B3::Air
 
 #endif // ENABLE(B3_JIT)
 
-
index 4c9e0da..752ba12 100644 (file)
@@ -33,14 +33,18 @@ class Code;
 
 // This allocates StackSlots to places on the stack. It first allocates the pinned ones in index
 // order and then it allocates the rest using first fit. Takes the opportunity to kill dead
-// assignments to stack slots, since it knows which ones are live. Also fixes ZDefs to anonymous
-// stack slots. Also coalesces spill slots whenever possible.
+// assignments to stack slots, since it knows which ones are live. Also coalesces spill slots
+// whenever possible.
 //
 // This is meant to be an optimal stack allocator, focused on generating great code. It's not
 // particularly fast, though.
 
 void allocateStackByGraphColoring(Code&);
 
+// These are utilities shared by this phase and allocateRegistersAndStackByLinearScan().
+void allocateEscapedStackSlots(Code&);
+void updateFrameSizeBasedOnStackSlots(Code&);
+
 } } } // namespace JSC::B3::Air
 
 #endif // ENABLE(B3_JIT)
index 5889cbf..53c52e1 100644 (file)
 
 namespace JSC { namespace B3 { namespace Air {
 
+Arg Arg::stackAddr(int32_t offsetFromFP, unsigned frameSize, Width width)
+{
+    Arg result = Arg::addr(Air::Tmp(GPRInfo::callFrameRegister), offsetFromFP);
+    if (!result.isValidForm(width)) {
+        result = Arg::addr(
+            Air::Tmp(MacroAssembler::stackPointerRegister),
+            offsetFromFP + frameSize);
+    }
+    return result;
+}
+
 bool Arg::isStackMemory() const
 {
     switch (kind()) {
index b1b7479..aa11be5 100644 (file)
@@ -560,17 +560,8 @@ public:
         result.m_offset = offset;
         return result;
     }
-
-    static Arg stackAddr(int32_t offsetFromFP, unsigned frameSize, Width width)
-    {
-        Arg result = Arg::addr(Air::Tmp(GPRInfo::callFrameRegister), offsetFromFP);
-        if (!result.isValidForm(width)) {
-            result = Arg::addr(
-                Air::Tmp(MacroAssembler::stackPointerRegister),
-                offsetFromFP + frameSize);
-        }
-        return result;
-    }
+    
+    static Arg stackAddr(int32_t offsetFromFP, unsigned frameSize, Width);
 
     // If you don't pass a Width, this optimistically assumes that you're using the right width.
     static bool isValidScale(unsigned scale, std::optional<Width> width = std::nullopt)
index 0aaa518..7cb8e70 100644 (file)
@@ -102,7 +102,14 @@ BasicBlock* Code::addBlock(double frequency)
 
 StackSlot* Code::addStackSlot(unsigned byteSize, StackSlotKind kind, B3::StackSlot* b3Slot)
 {
-    return m_stackSlots.addNew(byteSize, kind, b3Slot);
+    StackSlot* result = m_stackSlots.addNew(byteSize, kind, b3Slot);
+    if (m_stackIsAllocated) {
+        // FIXME: This is unnecessarily awful. Fortunately, it doesn't run often.
+        unsigned extent = WTF::roundUpToMultipleOf(result->alignment(), frameSize() + byteSize);
+        result->setOffsetFromFP(-static_cast<ptrdiff_t>(extent));
+        setFrameSize(WTF::roundUpToMultipleOf(stackAlignmentBytes(), extent));
+    }
+    return result;
 }
 
 StackSlot* Code::addStackSlot(B3::StackSlot* b3Slot)
@@ -138,6 +145,28 @@ bool Code::isEntrypoint(BasicBlock* block) const
     return false;
 }
 
+void Code::setCalleeSaveRegisterAtOffsetList(RegisterAtOffsetList&& registerAtOffsetList, StackSlot* slot)
+{
+    m_uncorrectedCalleeSaveRegisterAtOffsetList = WTFMove(registerAtOffsetList);
+    for (const RegisterAtOffset& registerAtOffset : m_uncorrectedCalleeSaveRegisterAtOffsetList)
+        m_calleeSaveRegisters.set(registerAtOffset.reg());
+    m_calleeSaveStackSlot = slot;
+}
+
+RegisterAtOffsetList Code::calleeSaveRegisterAtOffsetList() const
+{
+    RegisterAtOffsetList result = m_uncorrectedCalleeSaveRegisterAtOffsetList;
+    if (StackSlot* slot = m_calleeSaveStackSlot) {
+        ptrdiff_t offset = slot->byteSize() + slot->offsetFromFP();
+        for (size_t i = result.size(); i--;) {
+            result.at(i) = RegisterAtOffset(
+                result.at(i).reg(),
+                result.at(i).offset() + offset);
+        }
+    }
+    return result;
+}
+
 void Code::resetReachability()
 {
     clearPredecessors(m_blocks);
@@ -170,12 +199,13 @@ void Code::dump(PrintStream& out) const
         for (Special* special : specials())
             out.print("    ", deepDump(special), "\n");
     }
-    if (m_frameSize)
-        out.print("Frame size: ", m_frameSize, "\n");
+    if (m_frameSize || m_stackIsAllocated)
+        out.print("Frame size: ", m_frameSize, m_stackIsAllocated ? " (Allocated)" : "", "\n");
     if (m_callArgAreaSize)
         out.print("Call arg area size: ", m_callArgAreaSize, "\n");
-    if (m_calleeSaveRegisters.size())
-        out.print("Callee saves: ", m_calleeSaveRegisters, "\n");
+    RegisterAtOffsetList calleeSaveRegisters = this->calleeSaveRegisterAtOffsetList();
+    if (calleeSaveRegisters.size())
+        out.print("Callee saves: ", calleeSaveRegisters, "\n");
 }
 
 unsigned Code::findFirstBlockIndex(unsigned index) const
index 2df6232..066195c 100644 (file)
@@ -186,9 +186,22 @@ public:
     {
         m_entrypointLabels = std::forward<Vector>(vector);
     }
+    
+    void setStackIsAllocated(bool value)
+    {
+        m_stackIsAllocated = value;
+    }
+    
+    bool stackIsAllocated() const { return m_stackIsAllocated; }
+    
+    // This sets the callee save registers.
+    void setCalleeSaveRegisterAtOffsetList(RegisterAtOffsetList&&, StackSlot*);
 
-    const RegisterAtOffsetList& calleeSaveRegisters() const { return m_calleeSaveRegisters; }
-    RegisterAtOffsetList& calleeSaveRegisters() { return m_calleeSaveRegisters; }
+    // This returns the correctly offset list of callee save registers.
+    RegisterAtOffsetList calleeSaveRegisterAtOffsetList() const;
+    
+    // This just tells you what the callee saves are.
+    RegisterSet calleeSaveRegisters() const { return m_calleeSaveRegisters; }
 
     // Recomputes predecessors and deletes unreachable blocks.
     void resetReachability();
@@ -329,7 +342,10 @@ private:
     unsigned m_numFPTmps { 0 };
     unsigned m_frameSize { 0 };
     unsigned m_callArgAreaSize { 0 };
-    RegisterAtOffsetList m_calleeSaveRegisters;
+    bool m_stackIsAllocated { false };
+    RegisterAtOffsetList m_uncorrectedCalleeSaveRegisterAtOffsetList;
+    RegisterSet m_calleeSaveRegisters;
+    StackSlot* m_calleeSaveStackSlot { nullptr };
     Vector<FrequentedBlock> m_entrypoints; // This is empty until after lowerEntrySwitch().
     Vector<CCallHelpers::Label> m_entrypointLabels; // This is empty until code generation.
     RefPtr<WasmBoundsCheckGenerator> m_wasmBoundsCheckGenerator;
index 205b6c6..9a01827 100644 (file)
 
 #if ENABLE(B3_JIT)
 
+#include "AirAllocateRegistersAndStackByLinearScan.h"
 #include "AirAllocateRegistersByGraphColoring.h"
-#include "AirAllocateRegistersByLinearScan.h"
 #include "AirAllocateStackByGraphColoring.h"
 #include "AirCode.h"
 #include "AirEliminateDeadCode.h"
 #include "AirFixObviousSpills.h"
 #include "AirFixPartialRegisterStalls.h"
 #include "AirGenerationContext.h"
-#include "AirHandleCalleeSaves.h"
 #include "AirLogRegisterPressure.h"
 #include "AirLowerAfterRegAlloc.h"
 #include "AirLowerEntrySwitch.h"
 #include "AirLowerMacros.h"
+#include "AirLowerStackArgs.h"
 #include "AirOpcodeUtils.h"
 #include "AirOptimizeBlockOrder.h"
 #include "AirReportUsedRegisters.h"
 #include "AirSimplifyCFG.h"
-#include "AirSpillEverything.h"
 #include "AirValidate.h"
 #include "B3Common.h"
 #include "B3Procedure.h"
@@ -84,39 +83,47 @@ void prepareForGeneration(Code& code)
     
     eliminateDeadCode(code);
 
-    // Register allocation for all the Tmps that do not have a corresponding machine register.
-    // After this phase, every Tmp has a reg.
-    //
-    // For debugging, you can use spillEverything() to put everything to the stack between each Inst.
-    if (Options::airSpillsEverything())
-        spillEverything(code);
-    else if (code.optLevel() >= 2)
+    if (code.optLevel() <= 1) {
+        // When we're compiling quickly, we do register and stack allocation in one linear scan
+        // phase. It's fast because it computes liveness only once.
+        allocateRegistersAndStackByLinearScan(code);
+        
+        if (Options::logAirRegisterPressure()) {
+            dataLog("Register pressure after register allocation:\n");
+            logRegisterPressure(code);
+        }
+        
+        // We may still need to do post-allocation lowering. Doing it after both register and
+        // stack allocation is less optimal, but it works fine.
+        lowerAfterRegAlloc(code);
+    } else {
+        // NOTE: B3 -O2 generates code that runs 1.5x-2x faster than code generated by -O1.
+        // Most of this performance benefit is from -O2's graph coloring register allocation
+        // and stack allocation pipeline, which you see below.
+        
+        // Register allocation for all the Tmps that do not have a corresponding machine
+        // register. After this phase, every Tmp has a reg.
         allocateRegistersByGraphColoring(code);
-    else
-        allocateRegistersByLinearScan(code);
-
-    if (Options::logAirRegisterPressure()) {
-        dataLog("Register pressure after register allocation:\n");
-        logRegisterPressure(code);
-    }
-    
-    if (code.optLevel() >= 2) {
-        // This replaces uses of spill slots with registers or constants if possible. It does this by
-        // minimizing the amount that we perturb the already-chosen register allocation. It may extend
-        // the live ranges of registers though.
+        
+        if (Options::logAirRegisterPressure()) {
+            dataLog("Register pressure after register allocation:\n");
+            logRegisterPressure(code);
+        }
+        
+        // This replaces uses of spill slots with registers or constants if possible. It
+        // does this by minimizing the amount that we perturb the already-chosen register
+        // allocation. It may extend the live ranges of registers though.
         fixObviousSpills(code);
+        
+        lowerAfterRegAlloc(code);
+        
+        // This does first-fit allocation of stack slots using an interference graph plus a
+        // bunch of other optimizations.
+        allocateStackByGraphColoring(code);
     }
-
-    lowerAfterRegAlloc(code);
-
-    // Prior to this point the prologue and epilogue is implicit. This makes it explicit. It also
-    // does things like identify which callee-saves we're using and saves them.
-    handleCalleeSaves(code);
     
-    // This turns all Stack and CallArg Args into Addr args that use the frame pointer. It does
-    // this by first-fit allocating stack slots. It should be pretty darn close to optimal, so we
-    // shouldn't have to worry about this very much.
-    allocateStackByGraphColoring(code);
+    // This turns all Stack and CallArg Args into Addr args that use the frame pointer.
+    lowerStackArgs(code);
     
     // If we coalesced moves then we can unbreak critical edges. This is the main reason for this
     // phase.
@@ -212,7 +219,7 @@ void generate(Code& code, CCallHelpers& jit)
             if (code.frameSize())
                 jit.addPtr(CCallHelpers::TrustedImm32(-code.frameSize()), MacroAssembler::stackPointerRegister);
             
-            for (const RegisterAtOffset& entry : code.calleeSaveRegisters()) {
+            for (const RegisterAtOffset& entry : code.calleeSaveRegisterAtOffsetList()) {
                 if (entry.reg().isGPR())
                     jit.storePtr(entry.reg().gpr(), argFor(entry));
                 else
@@ -249,7 +256,7 @@ void generate(Code& code, CCallHelpers& jit)
             // have this override.
             auto start = jit.labelIgnoringWatchpoints();
             if (code.frameSize()) {
-                for (const RegisterAtOffset& entry : code.calleeSaveRegisters()) {
+                for (const RegisterAtOffset& entry : code.calleeSaveRegisterAtOffsetList()) {
                     if (entry.reg().isGPR())
                         jit.loadPtr(argFor(entry), entry.reg().gpr());
                     else
index 97cdfa1..965957a 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2017 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
 
 #include "AirCode.h"
 #include "AirInstInlines.h"
-#include "AirPhaseScope.h"
 
 namespace JSC { namespace B3 { namespace Air {
 
 void handleCalleeSaves(Code& code)
 {
-    PhaseScope phaseScope(code, "handleCalleeSaves");
-
     RegisterSet usedCalleeSaves;
 
     for (BasicBlock* block : code) {
@@ -61,16 +58,15 @@ void handleCalleeSaves(Code& code)
     if (!usedCalleeSaves.numberOfSetRegisters())
         return;
 
-    code.calleeSaveRegisters() = RegisterAtOffsetList(usedCalleeSaves);
+    RegisterAtOffsetList calleeSaveRegisters = RegisterAtOffsetList(usedCalleeSaves);
 
     size_t byteSize = 0;
-    for (const RegisterAtOffset& entry : code.calleeSaveRegisters())
+    for (const RegisterAtOffset& entry : calleeSaveRegisters)
         byteSize = std::max(static_cast<size_t>(-entry.offset()), byteSize);
 
-    StackSlot* savesArea = code.addStackSlot(byteSize, StackSlotKind::Locked);
-    // This is a bit weird since we could have already pinned a different stack slot to this
-    // area. Also, our runtime does not require us to pin the saves area. Maybe we shouldn't pin it?
-    savesArea->setOffsetFromFP(-byteSize);
+    code.setCalleeSaveRegisterAtOffsetList(
+        WTFMove(calleeSaveRegisters),
+        code.addStackSlot(byteSize, StackSlotKind::Locked));
 }
 
 } } } // namespace JSC::B3::Air
index b4b78a3..0965836 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2017 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -31,8 +31,9 @@ namespace JSC { namespace B3 { namespace Air {
 
 class Code;
 
-// This phase identifies callee-save registers and adds code to save/restore them in the
-// prologue/epilogue to the code. It's a mandatory phase.
+// This utility identifies callee-save registers and tells Code. It's called from phases that
+// do stack allocation. We don't do it at the end of register allocation because the real end
+// of register allocation is just before stack allocation.
 
 // FIXME: It would be cool to make this more interactive with the Air client and also more
 // powerful.
index 826fee2..5109864 100644 (file)
@@ -75,7 +75,7 @@ void lowerAfterRegAlloc(Code& code)
         return;
 
     HashMap<Inst*, RegisterSet> usedRegisters;
-
+    
     RegLiveness liveness(code);
     for (BasicBlock* block : code) {
         RegLiveness::LocalCalc localCalc(liveness, block);
@@ -96,13 +96,29 @@ void lowerAfterRegAlloc(Code& code)
                 usedRegisters.add(&inst, set);
         }
     }
-
+    
+    std::array<std::array<StackSlot*, 2>, numBanks> slots;
+    forEachBank(
+        [&] (Bank bank) {
+            for (unsigned i = 0; i < 2; ++i)
+                slots[bank][i] = nullptr;
+        });
+
+    // If we run after stack allocation then we cannot use those callee saves that aren't in
+    // the callee save list. Note that we are only run after stack allocation in -O1, so this
+    // kind of slop is OK.
+    RegisterSet blacklistedCalleeSaves;
+    if (code.stackIsAllocated()) {
+        blacklistedCalleeSaves = RegisterSet::calleeSaveRegisters();
+        blacklistedCalleeSaves.exclude(code.calleeSaveRegisters());
+    }
+    
     auto getScratches = [&] (RegisterSet set, Bank bank) -> std::array<Arg, 2> {
         std::array<Arg, 2> result;
         for (unsigned i = 0; i < 2; ++i) {
             bool found = false;
             for (Reg reg : code.regsInPriorityOrder(bank)) {
-                if (!set.get(reg)) {
+                if (!set.get(reg) && !blacklistedCalleeSaves.get(reg)) {
                     result[i] = Tmp(reg);
                     set.set(reg);
                     found = true;
@@ -110,10 +126,10 @@ void lowerAfterRegAlloc(Code& code)
                 }
             }
             if (!found) {
-                result[i] = Arg::stack(
-                    code.addStackSlot(
-                        bytes(conservativeWidth(bank)),
-                        StackSlotKind::Spill));
+                StackSlot*& slot = slots[bank][i];
+                if (!slot)
+                    slot = code.addStackSlot(bytes(conservativeWidth(bank)), StackSlotKind::Spill);
+                result[i] = Arg::stack(slots[bank][i]);
             }
         }
         return result;
diff --git a/Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp b/Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp
new file mode 100644 (file)
index 0000000..6aaf992
--- /dev/null
@@ -0,0 +1,126 @@
+/*
+ * Copyright (C) 2017 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include "config.h"
+#include "AirLowerStackArgs.h"
+
+#if ENABLE(B3_JIT)
+
+#include "AirArgInlines.h"
+#include "AirCode.h"
+#include "AirInsertionSet.h"
+#include "AirInstInlines.h"
+#include "AirPhaseScope.h"
+#include "StackAlignment.h"
+#include <wtf/ListDump.h>
+
+namespace JSC { namespace B3 { namespace Air {
+
+void lowerStackArgs(Code& code)
+{
+    PhaseScope phaseScope(code, "lowerStackArgs");
+    
+    // Now we need to deduce how much argument area we need.
+    for (BasicBlock* block : code) {
+        for (Inst& inst : *block) {
+            for (Arg& arg : inst.args) {
+                if (arg.isCallArg()) {
+                    // For now, we assume that we use 8 bytes of the call arg. But that's not
+                    // such an awesome assumption.
+                    // FIXME: https://bugs.webkit.org/show_bug.cgi?id=150454
+                    ASSERT(arg.offset() >= 0);
+                    code.requestCallArgAreaSizeInBytes(arg.offset() + 8);
+                }
+            }
+        }
+    }
+
+    code.setFrameSize(code.frameSize() + code.callArgAreaSizeInBytes());
+
+    // Finally, transform the code to use Addr's instead of StackSlot's. This is a lossless
+    // transformation since we can search the StackSlots array to figure out which StackSlot any
+    // offset-from-FP refers to.
+
+    // FIXME: This may produce addresses that aren't valid if we end up with a ginormous stack frame.
+    // We would have to scavenge for temporaries if this happened. Fortunately, this case will be
+    // extremely rare so we can do crazy things when it arises.
+    // https://bugs.webkit.org/show_bug.cgi?id=152530
+
+    InsertionSet insertionSet(code);
+    for (BasicBlock* block : code) {
+        for (unsigned instIndex = 0; instIndex < block->size(); ++instIndex) {
+            Inst& inst = block->at(instIndex);
+            inst.forEachArg(
+                [&] (Arg& arg, Arg::Role role, Bank, Width width) {
+                    auto stackAddr = [&] (int32_t offset) -> Arg {
+                        Arg result = Arg::stackAddr(offset, code.frameSize(), width);
+                        if (!result) {
+                            dataLog("FATAL: Could not create stack reference for offset = ", offset, " and width = ", width, "\n");
+                            dataLog("Code:\n");
+                            dataLog(code);
+                            dataLog("FATAL: Could not create stack reference for offset = ", offset, " and width = ", width, "\n");
+                            RELEASE_ASSERT_NOT_REACHED();
+                        }
+                        return result;
+                    };
+                    
+                    switch (arg.kind()) {
+                    case Arg::Stack: {
+                        StackSlot* slot = arg.stackSlot();
+                        if (Arg::isZDef(role)
+                            && slot->kind() == StackSlotKind::Spill
+                            && slot->byteSize() > bytes(width)) {
+                            // Currently we only handle this simple case because it's the only one
+                            // that arises: ZDef's are only 32-bit right now. So, when we hit these
+                            // assertions it means that we need to implement those other kinds of
+                            // zero fills.
+                            RELEASE_ASSERT(slot->byteSize() == 8);
+                            RELEASE_ASSERT(width == Width32);
+
+                            RELEASE_ASSERT(isValidForm(StoreZero32, Arg::Stack));
+                            insertionSet.insert(
+                                instIndex + 1, StoreZero32, inst.origin,
+                                stackAddr(arg.offset() + 4 + slot->offsetFromFP()));
+                        }
+                        arg = stackAddr(arg.offset() + slot->offsetFromFP());
+                        break;
+                    }
+                    case Arg::CallArg:
+                        arg = stackAddr(arg.offset() - code.frameSize());
+                        break;
+                    default:
+                        break;
+                    }
+                }
+            );
+        }
+        insertionSet.execute(block);
+    }
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2017 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -31,18 +31,10 @@ namespace JSC { namespace B3 { namespace Air {
 
 class Code;
 
-// This is a phase for testing. It behaves like a register allocator in the sense that it
-// eliminates temporaries from the program. It accomplishes this by always spilling all
-// temporaries. The resulting code is going to be very inefficient. This phase is great if you
-// think that there is a bug in the register allocator. You can confirm this by running this
-// phase instead of the register allocator.
-//
-// Note that even though this phase does the cheapest thing possible, it's not even written in a
-// particularly efficient way. So, don't get any ideas about using this phase to reduce compiler
-// latency. If you wanted to do that, you should come up with a clever algorithm instead of using
-// this silly thing.
-
-void spillEverything(Code&);
+// This turns stack and callArg references into actual FP-relative or SP-relative addresses.
+// Also fixes ZDefs to anonymous stack slots.
+
+void lowerStackArgs(Code&);
 
 } } } // namespace JSC::B3::Air
 
diff --git a/Source/JavaScriptCore/b3/air/AirSpillEverything.cpp b/Source/JavaScriptCore/b3/air/AirSpillEverything.cpp
deleted file mode 100644 (file)
index 7e9af7b..0000000
+++ /dev/null
@@ -1,197 +0,0 @@
-/*
- * Copyright (C) 2015-2017 Apple Inc. All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- * 1. Redistributions of source code must retain the above copyright
- *    notice, this list of conditions and the following disclaimer.
- * 2. Redistributions in binary form must reproduce the above copyright
- *    notice, this list of conditions and the following disclaimer in the
- *    documentation and/or other materials provided with the distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
- * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
- * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
- * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
- * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
- * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
- * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
- * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
- */
-
-#include "config.h"
-#include "AirSpillEverything.h"
-
-#if ENABLE(B3_JIT)
-
-#include "AirArgInlines.h"
-#include "AirCode.h"
-#include "AirFixSpillsAfterTerminals.h"
-#include "AirInsertionSet.h"
-#include "AirInstInlines.h"
-#include "AirLiveness.h"
-#include "AirPadInterference.h"
-#include "AirPhaseScope.h"
-#include <wtf/IndexMap.h>
-
-namespace JSC { namespace B3 { namespace Air {
-
-void spillEverything(Code& code)
-{
-    PhaseScope phaseScope(code, "spillEverything");
-    
-    padInterference(code);
-
-    // We want to know the set of registers used at every point in every basic block.
-    IndexMap<BasicBlock*, Vector<RegisterSet>> usedRegisters(code.size());
-    GPLiveness gpLiveness(code);
-    FPLiveness fpLiveness(code);
-    for (BasicBlock* block : code) {
-        GPLiveness::LocalCalc gpLocalCalc(gpLiveness, block);
-        FPLiveness::LocalCalc fpLocalCalc(fpLiveness, block);
-
-        usedRegisters[block].resize(block->size() + 1);
-
-        auto setUsedRegisters = [&] (unsigned index) {
-            RegisterSet& registerSet = usedRegisters[block][index];
-            for (Tmp tmp : gpLocalCalc.live()) {
-                if (tmp.isReg())
-                    registerSet.set(tmp.reg());
-            }
-            for (Tmp tmp : fpLocalCalc.live()) {
-                if (tmp.isReg())
-                    registerSet.set(tmp.reg());
-            }
-
-            // Gotta account for dead assignments to registers. These may happen because the input
-            // code is suboptimal.
-            Inst::forEachDefWithExtraClobberedRegs<Tmp>(
-                block->get(index - 1), block->get(index),
-                [&] (const Tmp& tmp, Arg::Role, Bank, Width) {
-                    if (tmp.isReg())
-                        registerSet.set(tmp.reg());
-                });
-        };
-
-        for (unsigned instIndex = block->size(); instIndex--;) {
-            setUsedRegisters(instIndex + 1);
-            gpLocalCalc.execute(instIndex);
-            fpLocalCalc.execute(instIndex);
-        }
-        setUsedRegisters(0);
-    }
-
-    // Allocate a stack slot for each tmp.
-    Vector<StackSlot*> allStackSlots[numBanks];
-    for (unsigned bankIndex = 0; bankIndex < numBanks; ++bankIndex) {
-        Vector<StackSlot*>& stackSlots = allStackSlots[bankIndex];
-        Bank bank = static_cast<Bank>(bankIndex);
-        stackSlots.resize(code.numTmps(bank));
-        for (unsigned tmpIndex = code.numTmps(bank); tmpIndex--;)
-            stackSlots[tmpIndex] = code.addStackSlot(8, StackSlotKind::Spill);
-    }
-
-    InsertionSet insertionSet(code);
-    for (BasicBlock* block : code) {
-        for (unsigned instIndex = 0; instIndex < block->size(); ++instIndex) {
-            RegisterSet& setBefore = usedRegisters[block][instIndex];
-            RegisterSet& setAfter = usedRegisters[block][instIndex + 1];
-            Inst& inst = block->at(instIndex);
-
-            // First try to spill directly.
-            for (unsigned i = 0; i < inst.args.size(); ++i) {
-                Arg& arg = inst.args[i];
-
-                if (arg.isTmp()) {
-                    if (arg.isReg())
-                        continue;
-
-                    if (inst.admitsStack(i)) {
-                        StackSlot* stackSlot = allStackSlots[arg.bank()][arg.tmpIndex()];
-                        arg = Arg::stack(stackSlot);
-                        continue;
-                    }
-                }
-            }
-
-            // Now fall back on spilling using separate Move's to load/store the tmp.
-            inst.forEachTmp(
-                [&] (Tmp& tmp, Arg::Role role, Bank bank, Width) {
-                    if (tmp.isReg())
-                        return;
-                    
-                    StackSlot* stackSlot = allStackSlots[bank][tmp.tmpIndex()];
-                    Arg arg = Arg::stack(stackSlot);
-
-                    // Need to figure out a register to use. How we do that depends on the role.
-                    Reg chosenReg;
-                    switch (role) {
-                    case Arg::Use:
-                    case Arg::ColdUse:
-                        for (Reg reg : code.regsInPriorityOrder(bank)) {
-                            if (!setBefore.get(reg)) {
-                                setBefore.set(reg);
-                                chosenReg = reg;
-                                break;
-                            }
-                        }
-                        break;
-                    case Arg::Def:
-                    case Arg::ZDef:
-                        for (Reg reg : code.regsInPriorityOrder(bank)) {
-                            if (!setAfter.get(reg)) {
-                                setAfter.set(reg);
-                                chosenReg = reg;
-                                break;
-                            }
-                        }
-                        break;
-                    case Arg::UseDef:
-                    case Arg::UseZDef:
-                    case Arg::LateUse:
-                    case Arg::LateColdUse:
-                    case Arg::Scratch:
-                    case Arg::EarlyDef:
-                    case Arg::EarlyZDef:
-                        for (Reg reg : code.regsInPriorityOrder(bank)) {
-                            if (!setBefore.get(reg) && !setAfter.get(reg)) {
-                                setAfter.set(reg);
-                                setBefore.set(reg);
-                                chosenReg = reg;
-                                break;
-                            }
-                        }
-                        break;
-                    case Arg::UseAddr:
-                        // We will never UseAddr a Tmp, that doesn't make sense.
-                        RELEASE_ASSERT_NOT_REACHED();
-                        break;
-                    }
-                    RELEASE_ASSERT(chosenReg);
-
-                    tmp = Tmp(chosenReg);
-                    
-                    if (role == Arg::Scratch)
-                        return;
-
-                    Opcode move = bank == GP ? Move : MoveDouble;
-
-                    if (Arg::isAnyUse(role))
-                        insertionSet.insert(instIndex, move, inst.origin, arg, tmp);
-                    if (Arg::isAnyDef(role))
-                        insertionSet.insert(instIndex + 1, move, inst.origin, tmp, arg);
-                });
-        }
-        insertionSet.execute(block);
-    }
-    
-    fixSpillsAfterTerminals(code);
-}
-
-} } } // namespace JSC::B3::Air
-
-#endif // ENABLE(B3_JIT)
index 9046564..d78ae97 100644 (file)
@@ -14102,7 +14102,7 @@ void testPinRegisters()
                     });
             }
         }
-        for (const RegisterAtOffset& regAtOffset : proc.calleeSaveRegisters())
+        for (const RegisterAtOffset& regAtOffset : proc.calleeSaveRegisterAtOffsetList())
             usesCSRs |= csrs.get(regAtOffset.reg());
         CHECK_EQ(usesCSRs, !pin);
     };
index d906405..fd09145 100644 (file)
@@ -77,7 +77,7 @@ void compile(State& state, Safepoint::Result& safepointResult)
         return;
     
     std::unique_ptr<RegisterAtOffsetList> registerOffsets =
-        std::make_unique<RegisterAtOffsetList>(state.proc->calleeSaveRegisters());
+        std::make_unique<RegisterAtOffsetList>(state.proc->calleeSaveRegisterAtOffsetList());
     if (shouldDumpDisassembly())
         dataLog("Unwind info for ", CodeBlockWithJITType(state.graph.m_codeBlock, JITCode::FTLJIT), ": ", *registerOffsets, "\n");
     state.graph.m_codeBlock->setCalleeSaveRegisters(WTFMove(registerOffsets));
index 875a591..e0b2541 100644 (file)
@@ -38,7 +38,7 @@ public:
     enum OffsetBaseType { FramePointerBased, ZeroBased };
 
     RegisterAtOffsetList();
-    RegisterAtOffsetList(RegisterSet, OffsetBaseType = FramePointerBased);
+    explicit RegisterAtOffsetList(RegisterSet, OffsetBaseType = FramePointerBased);
 
     void dump(PrintStream&) const;
 
index 60caae0..6376b0e 100644 (file)
@@ -398,7 +398,6 @@ typedef const char* optionString;
     \
     v(bool, logB3PhaseTimes, false, Normal, nullptr) \
     v(double, rareBlockPenalty, 0.001, Normal, nullptr) \
-    v(bool, airSpillsEverything, false, Normal, nullptr) \
     v(bool, airLinearScanVerbose, false, Normal, nullptr) \
     v(bool, airLinearScanSpillsEverything, false, Normal, nullptr) \
     v(bool, airForceBriggsAllocator, false, Normal, nullptr) \
index 30462a9..5ac8639 100644 (file)
@@ -1328,7 +1328,7 @@ Expected<std::unique_ptr<WasmInternalFunction>, String> parseAndCompile(Compilat
         B3::prepareForGeneration(procedure);
         B3::generate(procedure, *compilationContext.wasmEntrypointJIT);
         compilationContext.wasmEntrypointByproducts = procedure.releaseByproducts();
-        result->wasmEntrypoint.calleeSaveRegisters = procedure.calleeSaveRegisters();
+        result->wasmEntrypoint.calleeSaveRegisters = procedure.calleeSaveRegisterAtOffsetList();
     }
 
     createJSToWasmWrapper(compilationContext, *result, signature, info, mode);
index 1c8adf4..1d49f4a 100755 (executable)
@@ -893,7 +893,7 @@ def runFTLEagerNoCJITValidate(*optionalTestSpecificOptions)
 end
 
 def runFTLEagerNoCJITB3O1(*optionalTestSpecificOptions)
-    run("ftl-eager-no-cjit", "--validateGraph=true", "--airForceIRCAllocator=true", *(FTL_OPTIONS + NO_CJIT_OPTIONS + EAGER_OPTIONS + B3O1_OPTIONS + optionalTestSpecificOptions))
+    run("ftl-eager-no-cjit-b3o1", "--validateGraph=true", *(FTL_OPTIONS + NO_CJIT_OPTIONS + EAGER_OPTIONS + B3O1_OPTIONS + optionalTestSpecificOptions))
 end
 
 def runFTLEagerNoCJITOSRValidation(*optionalTestSpecificOptions)