WebAssembly: Air::Inst::generate crashes on large binary on A64
authorjfbastien@apple.com <jfbastien@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Sat, 6 May 2017 03:57:42 +0000 (03:57 +0000)
committerjfbastien@apple.com <jfbastien@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Sat, 6 May 2017 03:57:42 +0000 (03:57 +0000)
https://bugs.webkit.org/show_bug.cgi?id=170215

Reviewed by Filip Pizlo.

ARM can't encode all offsets in a single instruction. We usualy
handle this type of detail early, or the macro assembler uses a
scratch register to take care of the large immediate. After
register allocation we assumed that we would never get large
offsets, and asserted this was the case. That was a fine
assumption with JavaScript, but WebAssembly ends up generating
stack frames which are too big to encode.

There are two places that needed to be fixed:
    1. AirGenerate
    2. AirLowerStackArgs

We now unconditionally pin the dataTempRegister on ARM64, and use
it when immediates don't fit.

Number 1. is easy: we're just incrementing SP, make sure we can
use a scratch register when that happens.

Number 2. is more complex: not all Inst can receive a stack
argument whose base register isn't SP or FP. Specifically,
Patchpoints and Stackmaps get very sad because they just want to
know the offset value, but when we materialize the offset as
follows:

    Move (spill337), (spill201), %r0, @8735

Becomes (where %r16 is dataTempRegister):
    Move $1404, %r16, @8736
    Add64 %sp, %r16, @8736
    Move (%r16), 2032(%sp), %r0, @8736

The code currently doesn't see through our little dance. To work
around this issue we introduce a new Air Arg kind:
ExtendedOffsetAddr. This is the same as a regular Addr, but with
an offset which may be too big to encode. Opcodes then declare
whether their arguments can handle such inputs, and if so we
generate them, otherwise we generate Addr as shown above.

None of this affects x86 because it can always encode large
immediates.

This patch also drive-by converts some uses of `override` to
`final`. It makes the code easier to grok, and maybe helps the
optimizer sometimes but really that doens't matter.

* assembler/MacroAssembler.h:
* assembler/MacroAssemblerARM64.h:
* b3/B3CheckSpecial.cpp:
(JSC::B3::CheckSpecial::admitsExtendedOffsetAddr):
* b3/B3CheckSpecial.h:
* b3/B3Common.cpp:
(JSC::B3::pinnedExtendedOffsetAddrRegister): keep the CPU-specific
pinning information in a cpp file
* b3/B3Common.h:
* b3/B3PatchpointSpecial.cpp:
(JSC::B3::PatchpointSpecial::admitsExtendedOffsetAddr):
* b3/B3PatchpointSpecial.h:
* b3/B3StackmapSpecial.cpp:
(JSC::B3::StackmapSpecial::isArgValidForRep):
(JSC::B3::StackmapSpecial::repForArg):
* b3/B3StackmapSpecial.h:
* b3/air/AirArg.cpp:
(JSC::B3::Air::Arg::isStackMemory):
(JSC::B3::Air::Arg::jsHash):
(JSC::B3::Air::Arg::dump):
(WTF::printInternal):
(JSC::B3::Air::Arg::stackAddrImpl): Deleted. There was only one
use of this (in AirLowerStackArgs) and it was now confusing to
split the logic up between these two. Inline the code that used to
be here into its one usepoint instead.
* b3/air/AirArg.h:
(JSC::B3::Air::Arg::extendedOffsetAddr):
(JSC::B3::Air::Arg::isExtendedOffsetAddr):
(JSC::B3::Air::Arg::isMemory):
(JSC::B3::Air::Arg::base):
(JSC::B3::Air::Arg::offset):
(JSC::B3::Air::Arg::isGP):
(JSC::B3::Air::Arg::isFP):
(JSC::B3::Air::Arg::isValidForm):
(JSC::B3::Air::Arg::forEachTmpFast):
(JSC::B3::Air::Arg::forEachTmp):
(JSC::B3::Air::Arg::asAddress):
(JSC::B3::Air::Arg::stackAddr): Deleted.
* b3/air/AirCCallSpecial.cpp:
(JSC::B3::Air::CCallSpecial::isValid):
(JSC::B3::Air::CCallSpecial::admitsExtendedOffsetAddr):
(JSC::B3::Air::CCallSpecial::generate):
* b3/air/AirCCallSpecial.h:
* b3/air/AirCode.cpp:
(JSC::B3::Air::Code::Code):
(JSC::B3::Air::Code::pinRegister): Check that the register wasn't
pinned before pinning it. It's likely a bug to pin the same
register twice.
* b3/air/AirCustom.h:
(JSC::B3::Air::PatchCustom::admitsExtendedOffsetAddr):
(JSC::B3::Air::CCallCustom::admitsExtendedOffsetAddr):
(JSC::B3::Air::ShuffleCustom::admitsExtendedOffsetAddr):
(JSC::B3::Air::EntrySwitchCustom::admitsExtendedOffsetAddr):
(JSC::B3::Air::WasmBoundsCheckCustom::admitsExtendedOffsetAddr):
* b3/air/AirGenerate.cpp:
(JSC::B3::Air::generate):
* b3/air/AirInst.h:
* b3/air/AirInstInlines.h:
(JSC::B3::Air::Inst::admitsExtendedOffsetAddr):
* b3/air/AirLowerStackArgs.cpp:
(JSC::B3::Air::lowerStackArgs):
* b3/air/AirPrintSpecial.cpp:
(JSC::B3::Air::PrintSpecial::admitsExtendedOffsetAddr):
(JSC::B3::Air::PrintSpecial::generate):
* b3/air/AirPrintSpecial.h:
* b3/air/AirSpecial.h:
* b3/air/opcode_generator.rb:

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@216306 268f45cc-cd09-0410-ab3c-d52691b4dbfc

25 files changed:
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/assembler/MacroAssembler.h
Source/JavaScriptCore/assembler/MacroAssemblerARM64.h
Source/JavaScriptCore/b3/B3CheckSpecial.cpp
Source/JavaScriptCore/b3/B3CheckSpecial.h
Source/JavaScriptCore/b3/B3Common.cpp
Source/JavaScriptCore/b3/B3Common.h
Source/JavaScriptCore/b3/B3PatchpointSpecial.cpp
Source/JavaScriptCore/b3/B3PatchpointSpecial.h
Source/JavaScriptCore/b3/B3StackmapSpecial.cpp
Source/JavaScriptCore/b3/B3StackmapSpecial.h
Source/JavaScriptCore/b3/air/AirArg.cpp
Source/JavaScriptCore/b3/air/AirArg.h
Source/JavaScriptCore/b3/air/AirCCallSpecial.cpp
Source/JavaScriptCore/b3/air/AirCCallSpecial.h
Source/JavaScriptCore/b3/air/AirCode.cpp
Source/JavaScriptCore/b3/air/AirCustom.h
Source/JavaScriptCore/b3/air/AirGenerate.cpp
Source/JavaScriptCore/b3/air/AirInst.h
Source/JavaScriptCore/b3/air/AirInstInlines.h
Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp
Source/JavaScriptCore/b3/air/AirPrintSpecial.cpp
Source/JavaScriptCore/b3/air/AirPrintSpecial.h
Source/JavaScriptCore/b3/air/AirSpecial.h
Source/JavaScriptCore/b3/air/opcode_generator.rb

index 517e8ae..1077c82 100644 (file)
@@ -1,3 +1,123 @@
+2017-05-05  JF Bastien  <jfbastien@apple.com>
+
+        WebAssembly: Air::Inst::generate crashes on large binary on A64
+        https://bugs.webkit.org/show_bug.cgi?id=170215
+
+        Reviewed by Filip Pizlo.
+
+        ARM can't encode all offsets in a single instruction. We usualy
+        handle this type of detail early, or the macro assembler uses a
+        scratch register to take care of the large immediate. After
+        register allocation we assumed that we would never get large
+        offsets, and asserted this was the case. That was a fine
+        assumption with JavaScript, but WebAssembly ends up generating
+        stack frames which are too big to encode.
+
+        There are two places that needed to be fixed:
+            1. AirGenerate
+            2. AirLowerStackArgs
+
+        We now unconditionally pin the dataTempRegister on ARM64, and use
+        it when immediates don't fit.
+
+        Number 1. is easy: we're just incrementing SP, make sure we can
+        use a scratch register when that happens.
+
+        Number 2. is more complex: not all Inst can receive a stack
+        argument whose base register isn't SP or FP. Specifically,
+        Patchpoints and Stackmaps get very sad because they just want to
+        know the offset value, but when we materialize the offset as
+        follows:
+
+            Move (spill337), (spill201), %r0, @8735
+
+        Becomes (where %r16 is dataTempRegister):
+            Move $1404, %r16, @8736
+            Add64 %sp, %r16, @8736
+            Move (%r16), 2032(%sp), %r0, @8736
+
+        The code currently doesn't see through our little dance. To work
+        around this issue we introduce a new Air Arg kind:
+        ExtendedOffsetAddr. This is the same as a regular Addr, but with
+        an offset which may be too big to encode. Opcodes then declare
+        whether their arguments can handle such inputs, and if so we
+        generate them, otherwise we generate Addr as shown above.
+
+        None of this affects x86 because it can always encode large
+        immediates.
+
+        This patch also drive-by converts some uses of `override` to
+        `final`. It makes the code easier to grok, and maybe helps the
+        optimizer sometimes but really that doens't matter.
+
+        * assembler/MacroAssembler.h:
+        * assembler/MacroAssemblerARM64.h:
+        * b3/B3CheckSpecial.cpp:
+        (JSC::B3::CheckSpecial::admitsExtendedOffsetAddr):
+        * b3/B3CheckSpecial.h:
+        * b3/B3Common.cpp:
+        (JSC::B3::pinnedExtendedOffsetAddrRegister): keep the CPU-specific
+        pinning information in a cpp file
+        * b3/B3Common.h:
+        * b3/B3PatchpointSpecial.cpp:
+        (JSC::B3::PatchpointSpecial::admitsExtendedOffsetAddr):
+        * b3/B3PatchpointSpecial.h:
+        * b3/B3StackmapSpecial.cpp:
+        (JSC::B3::StackmapSpecial::isArgValidForRep):
+        (JSC::B3::StackmapSpecial::repForArg):
+        * b3/B3StackmapSpecial.h:
+        * b3/air/AirArg.cpp:
+        (JSC::B3::Air::Arg::isStackMemory):
+        (JSC::B3::Air::Arg::jsHash):
+        (JSC::B3::Air::Arg::dump):
+        (WTF::printInternal):
+        (JSC::B3::Air::Arg::stackAddrImpl): Deleted. There was only one
+        use of this (in AirLowerStackArgs) and it was now confusing to
+        split the logic up between these two. Inline the code that used to
+        be here into its one usepoint instead.
+        * b3/air/AirArg.h:
+        (JSC::B3::Air::Arg::extendedOffsetAddr):
+        (JSC::B3::Air::Arg::isExtendedOffsetAddr):
+        (JSC::B3::Air::Arg::isMemory):
+        (JSC::B3::Air::Arg::base):
+        (JSC::B3::Air::Arg::offset):
+        (JSC::B3::Air::Arg::isGP):
+        (JSC::B3::Air::Arg::isFP):
+        (JSC::B3::Air::Arg::isValidForm):
+        (JSC::B3::Air::Arg::forEachTmpFast):
+        (JSC::B3::Air::Arg::forEachTmp):
+        (JSC::B3::Air::Arg::asAddress):
+        (JSC::B3::Air::Arg::stackAddr): Deleted.
+        * b3/air/AirCCallSpecial.cpp:
+        (JSC::B3::Air::CCallSpecial::isValid):
+        (JSC::B3::Air::CCallSpecial::admitsExtendedOffsetAddr):
+        (JSC::B3::Air::CCallSpecial::generate):
+        * b3/air/AirCCallSpecial.h:
+        * b3/air/AirCode.cpp:
+        (JSC::B3::Air::Code::Code):
+        (JSC::B3::Air::Code::pinRegister): Check that the register wasn't
+        pinned before pinning it. It's likely a bug to pin the same
+        register twice.
+        * b3/air/AirCustom.h:
+        (JSC::B3::Air::PatchCustom::admitsExtendedOffsetAddr):
+        (JSC::B3::Air::CCallCustom::admitsExtendedOffsetAddr):
+        (JSC::B3::Air::ShuffleCustom::admitsExtendedOffsetAddr):
+        (JSC::B3::Air::EntrySwitchCustom::admitsExtendedOffsetAddr):
+        (JSC::B3::Air::WasmBoundsCheckCustom::admitsExtendedOffsetAddr):
+        * b3/air/AirGenerate.cpp:
+        (JSC::B3::Air::generate):
+        * b3/air/AirInst.h:
+        * b3/air/AirInstInlines.h:
+        (JSC::B3::Air::Inst::admitsExtendedOffsetAddr):
+        * b3/air/AirLowerStackArgs.cpp:
+        (JSC::B3::Air::lowerStackArgs):
+        * b3/air/AirPrintSpecial.cpp:
+        (JSC::B3::Air::PrintSpecial::admitsExtendedOffsetAddr):
+        (JSC::B3::Air::PrintSpecial::generate):
+        * b3/air/AirPrintSpecial.h:
+        * b3/air/AirSpecial.h:
+        * b3/air/opcode_generator.rb:
+
 2017-05-05  Oliver Hunt  <oliver@apple.com>
 
         Move trivial String prototype functions to JS builtins
index 0050275..7558fe3 100644 (file)
@@ -821,7 +821,7 @@ public:
         return MacroAssemblerBase::branchTest8(cond, Address(address.base, address.offset), mask);
     }
 
-#else // !CPU(X86_64)
+#else // !CPU(X86_64) && !CPU(ARM64)
 
     void addPtr(RegisterID src, RegisterID dest)
     {
index f764a94..24a5317 100644 (file)
@@ -39,8 +39,8 @@ public:
     static const unsigned numGPRs = 32;
     static const unsigned numFPRs = 32;
     
-    static const RegisterID dataTempRegister = ARM64Registers::ip0;
-    static const RegisterID memoryTempRegister = ARM64Registers::ip1;
+    static constexpr RegisterID dataTempRegister = ARM64Registers::ip0;
+    static constexpr RegisterID memoryTempRegister = ARM64Registers::ip1;
 
     RegisterID scratchRegister()
     {
index b6baa8a..6875191 100644 (file)
@@ -140,6 +140,13 @@ bool CheckSpecial::admitsStack(Inst& inst, unsigned argIndex)
     return admitsStackImpl(numB3Args(inst), m_numCheckArgs + 1, inst, argIndex);
 }
 
+bool CheckSpecial::admitsExtendedOffsetAddr(Inst& inst, unsigned argIndex)
+{
+    if (argIndex >= 1 && argIndex < 1 + m_numCheckArgs)
+        return false;
+    return admitsStack(inst, argIndex);
+}
+
 std::optional<unsigned> CheckSpecial::shouldTryAliasingDef(Inst& inst)
 {
     if (std::optional<unsigned> branchDef = hiddenBranch(inst).shouldTryAliasingDef())
index aa7f2fe..b9183fa 100644 (file)
@@ -120,18 +120,19 @@ protected:
     // Constructs and returns the Inst representing the branch that this will use.
     Air::Inst hiddenBranch(const Air::Inst&) const;
 
-    void forEachArg(Air::Inst&, const ScopedLambda<Air::Inst::EachArgCallback>&) override;
-    bool isValid(Air::Inst&) override;
-    bool admitsStack(Air::Inst&, unsigned argIndex) override;
-    std::optional<unsigned> shouldTryAliasingDef(Air::Inst&) override;
+    void forEachArg(Air::Inst&, const ScopedLambda<Air::Inst::EachArgCallback>&) final;
+    bool isValid(Air::Inst&) final;
+    bool admitsStack(Air::Inst&, unsigned argIndex) final;
+    bool admitsExtendedOffsetAddr(Air::Inst&, unsigned) final;
+    std::optional<unsigned> shouldTryAliasingDef(Air::Inst&) final;
 
     // NOTE: the generate method will generate the hidden branch and then register a LatePath that
     // generates the stackmap. Super crazy dude!
 
-    CCallHelpers::Jump generate(Air::Inst&, CCallHelpers&, Air::GenerationContext&) override;
+    CCallHelpers::Jump generate(Air::Inst&, CCallHelpers&, Air::GenerationContext&) final;
 
-    void dumpImpl(PrintStream&) const override;
-    void deepDumpImpl(PrintStream&) const override;
+    void dumpImpl(PrintStream&) const final;
+    void deepDumpImpl(PrintStream&) const final;
 
 private:
     Air::Kind m_checkKind;
index 60da362..06639c9 100644 (file)
@@ -70,6 +70,17 @@ bool shouldMeasurePhaseTiming()
     return Options::logB3PhaseTimes();
 }
 
+std::optional<GPRReg> pinnedExtendedOffsetAddrRegister()
+{
+#if CPU(ARM64)
+    return static_cast<GPRReg>(+MacroAssembler::dataTempRegister);
+#elif CPU(X86_64)
+    return std::nullopt;
+#else
+#error Unhandled architecture.
+#endif
+}
+
 } } // namespace JSC::B3
 
 #endif // ENABLE(B3_JIT)
index e3b0f20..d17a204 100644 (file)
@@ -28,6 +28,7 @@
 #if ENABLE(B3_JIT)
 
 #include "CPU.h"
+#include "GPRInfo.h"
 #include "JSExportMacros.h"
 #include "Options.h"
 #include <wtf/Optional.h>
@@ -182,6 +183,8 @@ inline unsigned defaultOptLevel()
     return Options::defaultB3OptLevel();
 }
 
+std::optional<GPRReg> pinnedExtendedOffsetAddrRegister();
+
 } } // namespace JSC::B3
 
 #endif // ENABLE(B3_JIT)
index 4ceb1b9..47f0abd 100644 (file)
@@ -130,6 +130,11 @@ bool PatchpointSpecial::admitsStack(Inst& inst, unsigned argIndex)
     return admitsStackImpl(0, 2, inst, argIndex);
 }
 
+bool PatchpointSpecial::admitsExtendedOffsetAddr(Inst& inst, unsigned argIndex)
+{
+    return admitsStack(inst, argIndex);
+}
+
 CCallHelpers::Jump PatchpointSpecial::generate(
     Inst& inst, CCallHelpers& jit, GenerationContext& context)
 {
index 4e1b2a3..de0409b 100644 (file)
@@ -47,19 +47,20 @@ public:
     virtual ~PatchpointSpecial();
 
 protected:
-    void forEachArg(Air::Inst&, const ScopedLambda<Air::Inst::EachArgCallback>&) override;
-    bool isValid(Air::Inst&) override;
-    bool admitsStack(Air::Inst&, unsigned argIndex) override;
+    void forEachArg(Air::Inst&, const ScopedLambda<Air::Inst::EachArgCallback>&) final;
+    bool isValid(Air::Inst&) final;
+    bool admitsStack(Air::Inst&, unsigned argIndex) final;
+    bool admitsExtendedOffsetAddr(Air::Inst&, unsigned) final;
 
     // NOTE: the generate method will generate the hidden branch and then register a LatePath that
     // generates the stackmap. Super crazy dude!
 
-    CCallHelpers::Jump generate(Air::Inst&, CCallHelpers&, Air::GenerationContext&) override;
+    CCallHelpers::Jump generate(Air::Inst&, CCallHelpers&, Air::GenerationContext&) final;
     
-    bool isTerminal(Air::Inst&) override;
+    bool isTerminal(Air::Inst&) final;
 
-    void dumpImpl(PrintStream&) const override;
-    void deepDumpImpl(PrintStream&) const override;
+    void dumpImpl(PrintStream&) const final;
+    void deepDumpImpl(PrintStream&) const final;
 };
 
 } } // namespace JSC::B3
index 87dd951..60370c5 100644 (file)
@@ -252,7 +252,7 @@ bool StackmapSpecial::isArgValidForRep(Air::Code& code, const Air::Arg& arg, con
     case ValueRep::StackArgument:
         if (arg == Arg::callArg(rep.offsetFromSP()))
             return true;
-        if (arg.isAddr() && code.frameSize()) {
+        if ((arg.isAddr() || arg.isExtendedOffsetAddr()) && code.frameSize()) {
             if (arg.base() == Tmp(GPRInfo::callFrameRegister)
                 && arg.offset() == rep.offsetFromSP() - code.frameSize())
                 return true;
@@ -277,6 +277,9 @@ ValueRep StackmapSpecial::repForArg(Code& code, const Arg& arg)
     case Arg::BigImm:
         return ValueRep::constant(arg.value());
         break;
+    case Arg::ExtendedOffsetAddr:
+        ASSERT(arg.base() == Tmp(GPRInfo::callFrameRegister));
+        FALLTHROUGH;
     case Arg::Addr:
         if (arg.base() == Tmp(GPRInfo::callFrameRegister))
             return ValueRep::stack(arg.offset());
index 6742a63..6e8eac3 100644 (file)
@@ -51,9 +51,9 @@ public:
     };
 
 protected:
-    void reportUsedRegisters(Air::Inst&, const RegisterSet&) override;
-    RegisterSet extraEarlyClobberedRegs(Air::Inst&) override;
-    RegisterSet extraClobberedRegs(Air::Inst&) override;
+    void reportUsedRegisters(Air::Inst&, const RegisterSet&) final;
+    RegisterSet extraEarlyClobberedRegs(Air::Inst&) final;
+    RegisterSet extraClobberedRegs(Air::Inst&) final;
 
     // Note that this does not override generate() or dumpImpl()/deepDumpImpl(). We have many some
     // subclasses that implement that.
index 6e8a5dc..f9e469c 100644 (file)
 
 namespace JSC { namespace B3 { namespace Air {
 
-Arg Arg::stackAddrImpl(int32_t offsetFromFP, unsigned frameSize, Width width)
-{
-    Arg result = Arg::addr(Air::Tmp(GPRInfo::callFrameRegister), offsetFromFP);
-    if (!result.isValidForm(width)) {
-        result = Arg::addr(
-            Air::Tmp(MacroAssembler::stackPointerRegister),
-            offsetFromFP + frameSize);
-        if (!result.isValidForm(width))
-            result = Arg();
-    }
-    return result;
-}
-
 bool Arg::isStackMemory() const
 {
     switch (kind()) {
     case Addr:
         return base() == Air::Tmp(GPRInfo::callFrameRegister)
             || base() == Air::Tmp(MacroAssembler::stackPointerRegister);
+    case ExtendedOffsetAddr:
     case Stack:
     case CallArg:
         return true;
@@ -128,6 +116,7 @@ unsigned Arg::jsHash() const
         result += m_base.internalValue();
         break;
     case Addr:
+    case ExtendedOffsetAddr:
         result += m_offset;
         result += m_base.internalValue();
         break;
@@ -171,6 +160,7 @@ void Arg::dump(PrintStream& out) const
         out.print("(", base(), ")");
         return;
     case Addr:
+    case ExtendedOffsetAddr:
         if (offset())
             out.print(offset());
         out.print("(", base(), ")");
@@ -249,6 +239,9 @@ void printInternal(PrintStream& out, Arg::Kind kind)
     case Arg::Addr:
         out.print("Addr");
         return;
+    case Arg::ExtendedOffsetAddr:
+        out.print("ExtendedOffsetAddr");
+        return;
     case Arg::Stack:
         out.print("Stack");
         return;
index c2ed4fc..7a38d10 100644 (file)
@@ -75,6 +75,7 @@ public:
         // (UseAddr) addresses.
         SimpleAddr,
         Addr,
+        ExtendedOffsetAddr,
         Stack,
         CallArg,
         Index,
@@ -546,6 +547,16 @@ public:
         return result;
     }
 
+    template<typename Int, typename = Value::IsLegalOffset<Int>>
+    static Arg extendedOffsetAddr(Int offsetFromFP)
+    {
+        Arg result;
+        result.m_kind = ExtendedOffsetAddr;
+        result.m_base = Air::Tmp(MacroAssembler::framePointerRegister);
+        result.m_offset = offsetFromFP;
+        return result;
+    }
+
     static Arg addr(Air::Tmp base)
     {
         return addr(base, 0);
@@ -575,12 +586,6 @@ public:
         return result;
     }
 
-    template<typename Int, typename = Value::IsLegalOffset<Int>>
-    static Arg stackAddr(Int offsetFromFP, unsigned frameSize, Width width)
-    {
-        return stackAddrImpl(offsetFromFP, frameSize, width);
-    }
-
     // If you don't pass a Width, this optimistically assumes that you're using the right width.
     static bool isValidScale(unsigned scale, std::optional<Width> width = std::nullopt)
     {
@@ -759,6 +764,11 @@ public:
         return kind() == Addr;
     }
 
+    bool isExtendedOffsetAddr() const
+    {
+        return kind() == ExtendedOffsetAddr;
+    }
+
     bool isStack() const
     {
         return kind() == Stack;
@@ -779,6 +789,7 @@ public:
         switch (kind()) {
         case SimpleAddr:
         case Addr:
+        case ExtendedOffsetAddr:
         case Stack:
         case CallArg:
         case Index:
@@ -792,6 +803,7 @@ public:
     // stack references. The following idioms are recognized:
     // - the Stack kind
     // - the CallArg kind
+    // - the ExtendedOffsetAddr kind
     // - the Addr kind with the base being either SP or FP
     // Callers of this function are allowed to expect that if it returns true, then it must be one of
     // these easy-to-recognize kinds. So, making this function recognize more kinds could break things.
@@ -943,7 +955,7 @@ public:
 
     Air::Tmp base() const
     {
-        ASSERT(kind() == SimpleAddr || kind() == Addr || kind() == Index);
+        ASSERT(kind() == SimpleAddr || kind() == Addr || kind() == ExtendedOffsetAddr || kind() == Index);
         return m_base;
     }
 
@@ -953,7 +965,7 @@ public:
     {
         if (kind() == Stack)
             return static_cast<Value::OffsetType>(m_scale);
-        ASSERT(kind() == Addr || kind() == CallArg || kind() == Index);
+        ASSERT(kind() == Addr || kind() == ExtendedOffsetAddr || kind() == CallArg || kind() == Index);
         return static_cast<Value::OffsetType>(m_offset);
     }
 
@@ -1012,6 +1024,7 @@ public:
         case BitImm64:
         case SimpleAddr:
         case Addr:
+        case ExtendedOffsetAddr:
         case Index:
         case Stack:
         case CallArg:
@@ -1047,6 +1060,7 @@ public:
             return false;
         case SimpleAddr:
         case Addr:
+        case ExtendedOffsetAddr:
         case Index:
         case Stack:
         case CallArg:
@@ -1222,6 +1236,7 @@ public:
         case BitImm64:
             return isValidBitImm64Form(value());
         case SimpleAddr:
+        case ExtendedOffsetAddr:
             return true;
         case Addr:
         case Stack:
@@ -1247,6 +1262,7 @@ public:
         case Tmp:
         case SimpleAddr:
         case Addr:
+        case ExtendedOffsetAddr:
             functor(m_base);
             break;
         case Index:
@@ -1288,6 +1304,7 @@ public:
             break;
         case SimpleAddr:
         case Addr:
+        case ExtendedOffsetAddr:
             functor(m_base, Use, GP, argRole == UseAddr ? argWidth : pointerWidth());
             break;
         case Index:
@@ -1326,7 +1343,7 @@ public:
     {
         if (isSimpleAddr())
             return MacroAssembler::Address(m_base.gpr());
-        ASSERT(isAddr());
+        ASSERT(isAddr() || isExtendedOffsetAddr());
         return MacroAssembler::Address(m_base.gpr(), static_cast<Value::OffsetType>(m_offset));
     }
 
@@ -1438,8 +1455,6 @@ public:
     }
 
 private:
-    static Arg stackAddrImpl(int32_t, unsigned, Width);
-
     int64_t m_offset { 0 };
     Kind m_kind { Invalid };
     int32_t m_scale { 1 };
index 2471fbb..def0f4c 100644 (file)
@@ -82,6 +82,7 @@ bool CCallSpecial::isValid(Inst& inst)
             return false;
         case Arg::Tmp:
         case Arg::Addr:
+        case Arg::ExtendedOffsetAddr:
         case Arg::Stack:
         case Arg::CallArg:
             break;
@@ -118,6 +119,11 @@ bool CCallSpecial::admitsStack(Inst&, unsigned argIndex)
     return false;
 }
 
+bool CCallSpecial::admitsExtendedOffsetAddr(Inst& inst, unsigned argIndex)
+{
+    return admitsStack(inst, argIndex);
+}
+
 void CCallSpecial::reportUsedRegisters(Inst&, const RegisterSet&)
 {
 }
@@ -134,6 +140,7 @@ CCallHelpers::Jump CCallSpecial::generate(Inst& inst, CCallHelpers& jit, Generat
         jit.call(inst.args[calleeArgOffset].gpr());
         break;
     case Arg::Addr:
+    case Arg::ExtendedOffsetAddr:
         jit.call(inst.args[calleeArgOffset].asAddress());
         break;
     default:
index ec909b9..a5d210d 100644 (file)
@@ -52,16 +52,17 @@ public:
     static const GPRReg scratchRegister = GPRInfo::nonArgGPR0;
 
 protected:
-    void forEachArg(Inst&, const ScopedLambda<Inst::EachArgCallback>&) override;
-    bool isValid(Inst&) override;
-    bool admitsStack(Inst&, unsigned argIndex) override;
-    void reportUsedRegisters(Inst&, const RegisterSet&) override;
-    CCallHelpers::Jump generate(Inst&, CCallHelpers&, GenerationContext&) override;
-    RegisterSet extraEarlyClobberedRegs(Inst&) override;
-    RegisterSet extraClobberedRegs(Inst&) override;
+    void forEachArg(Inst&, const ScopedLambda<Inst::EachArgCallback>&) final;
+    bool isValid(Inst&) final;
+    bool admitsStack(Inst&, unsigned argIndex) final;
+    bool admitsExtendedOffsetAddr(Inst&, unsigned) final;
+    void reportUsedRegisters(Inst&, const RegisterSet&) final;
+    CCallHelpers::Jump generate(Inst&, CCallHelpers&, GenerationContext&) final;
+    RegisterSet extraEarlyClobberedRegs(Inst&) final;
+    RegisterSet extraClobberedRegs(Inst&) final;
 
-    void dumpImpl(PrintStream&) const override;
-    void deepDumpImpl(PrintStream&) const override;
+    void dumpImpl(PrintStream&) const final;
+    void deepDumpImpl(PrintStream&) const final;
 
 private:
     static const unsigned specialArgOffset = 0;
index 7cb8e70..710af43 100644 (file)
@@ -62,6 +62,9 @@ Code::Code(Procedure& proc)
                 });
             setRegsInPriorityOrder(bank, result);
         });
+
+    if (auto reg = pinnedExtendedOffsetAddrRegister())
+        pinRegister(*reg);
 }
 
 Code::~Code()
@@ -82,6 +85,7 @@ void Code::setRegsInPriorityOrder(Bank bank, const Vector<Reg>& regs)
 void Code::pinRegister(Reg reg)
 {
     Vector<Reg>& regs = regsInPriorityOrderImpl(Arg(Tmp(reg)).bank());
+    ASSERT(regs.contains(reg));
     regs.removeFirst(reg);
     m_mutableRegs.clear(reg);
     ASSERT(!regs.contains(reg));
index f8f3dff..c6eb9c1 100644 (file)
@@ -81,6 +81,13 @@ struct PatchCustom {
         return inst.args[0].special()->admitsStack(inst, argIndex);
     }
 
+    static bool admitsExtendedOffsetAddr(Inst& inst, unsigned argIndex)
+    {
+        if (!argIndex)
+            return false;
+        return inst.args[0].special()->admitsExtendedOffsetAddr(inst, argIndex);
+    }
+
     static std::optional<unsigned> shouldTryAliasingDef(Inst& inst)
     {
         return inst.args[0].special()->shouldTryAliasingDef(inst);
@@ -157,6 +164,11 @@ struct CCallCustom : public CommonCustomBase<CCallCustom> {
     {
         return true;
     }
+
+    static bool admitsExtendedOffsetAddr(Inst&, unsigned)
+    {
+        return false;
+    }
     
     static bool isTerminal(Inst&)
     {
@@ -221,6 +233,11 @@ struct ShuffleCustom : public CommonCustomBase<ShuffleCustom> {
         }
     }
 
+    static bool admitsExtendedOffsetAddr(Inst&, unsigned)
+    {
+        return false;
+    }
+
     static bool isTerminal(Inst&)
     {
         return false;
@@ -255,6 +272,11 @@ struct EntrySwitchCustom : public CommonCustomBase<EntrySwitchCustom> {
     {
         return false;
     }
+
+    static bool admitsExtendedOffsetAddr(Inst&, unsigned)
+    {
+        return false;
+    }
     
     static bool isTerminal(Inst&)
     {
@@ -296,6 +318,11 @@ struct WasmBoundsCheckCustom : public CommonCustomBase<WasmBoundsCheckCustom> {
         return false;
     }
 
+    static bool admitsExtendedOffsetAddr(Inst&, unsigned)
+    {
+        return false;
+    }
+
     static bool isTerminal(Inst&)
     {
         return false;
index 9a01827..2a687de 100644 (file)
@@ -46,6 +46,7 @@
 #include "AirReportUsedRegisters.h"
 #include "AirSimplifyCFG.h"
 #include "AirValidate.h"
+#include "AllowMacroScratchRegisterUsageIf.h"
 #include "B3Common.h"
 #include "B3Procedure.h"
 #include "B3TimingScope.h"
@@ -216,8 +217,10 @@ void generate(Code& code, CCallHelpers& jit)
                 disassembler->startEntrypoint(jit); 
 
             jit.emitFunctionPrologue();
-            if (code.frameSize())
+            if (code.frameSize()) {
+                AllowMacroScratchRegisterUsageIf allowScratch(jit, isARM64());
                 jit.addPtr(CCallHelpers::TrustedImm32(-code.frameSize()), MacroAssembler::stackPointerRegister);
+            }
             
             for (const RegisterAtOffset& entry : code.calleeSaveRegisterAtOffsetList()) {
                 if (entry.reg().isGPR())
index 5815359..f2c0f02 100644 (file)
@@ -167,7 +167,10 @@ struct Inst {
     // This function is auto-generated by opcode_generator.rb.
     bool admitsStack(unsigned argIndex);
     bool admitsStack(Arg&);
-    
+
+    bool admitsExtendedOffsetAddr(unsigned argIndex);
+    bool admitsExtendedOffsetAddr(Arg&);
+
     // Defined by opcode_generator.rb.
     bool isTerminal();
 
index 306d904..835b105 100644 (file)
@@ -111,6 +111,11 @@ inline bool Inst::admitsStack(Arg& arg)
     return admitsStack(&arg - &args[0]);
 }
 
+inline bool Inst::admitsExtendedOffsetAddr(Arg& arg)
+{
+    return admitsExtendedOffsetAddr(&arg - &args[0]);
+}
+
 inline std::optional<unsigned> Inst::shouldTryAliasingDef()
 {
     if (!isX86())
index 837c6f1..07d0d78 100644 (file)
@@ -63,27 +63,49 @@ void lowerStackArgs(Code& code)
     // transformation since we can search the StackSlots array to figure out which StackSlot any
     // offset-from-FP refers to.
 
-    // FIXME: This may produce addresses that aren't valid if we end up with a ginormous stack frame.
-    // We would have to scavenge for temporaries if this happened. Fortunately, this case will be
-    // extremely rare so we can do crazy things when it arises.
-    // https://bugs.webkit.org/show_bug.cgi?id=152530
-
     InsertionSet insertionSet(code);
     for (BasicBlock* block : code) {
+        // FIXME We can keep track of the last large offset which was materialized in this block, and reuse the register
+        // if it hasn't been clobbered instead of renetating imm+add+addr every time. https://bugs.webkit.org/show_bug.cgi?id=171387
+
         for (unsigned instIndex = 0; instIndex < block->size(); ++instIndex) {
             Inst& inst = block->at(instIndex);
+            bool isPatch = inst.kind.opcode == Patch;
+
             inst.forEachArg(
                 [&] (Arg& arg, Arg::Role role, Bank, Width width) {
-                    auto stackAddr = [&] (Value::OffsetType offset) -> Arg {
-                        Arg result = Arg::stackAddr(offset, code.frameSize(), width);
-                        if (!result) {
-                            dataLog("FATAL: Could not create stack reference for offset = ", offset, " and width = ", width, "\n");
-                            dataLog("Code:\n");
-                            dataLog(code);
-                            dataLog("FATAL: Could not create stack reference for offset = ", offset, " and width = ", width, "\n");
-                            RELEASE_ASSERT_NOT_REACHED();
+                    auto stackAddr = [&] (Value::OffsetType offsetFromFP) -> Arg {
+                        int32_t offsetFromSP = offsetFromFP + code.frameSize();
+
+                        if (isPatch && inst.admitsExtendedOffsetAddr(arg)) {
+                            // Stackmaps and patchpoints expect addr inputs relative to SP or FP only. We might as well
+                            // not even bother generating an addr with valid form for these opcodes since extended offset
+                            // addr is always valid.
+                            return Arg::extendedOffsetAddr(offsetFromFP);
                         }
+
+                        Arg result = Arg::addr(Air::Tmp(GPRInfo::callFrameRegister), offsetFromFP);
+                        if (result.isValidForm(width))
+                            return result;
+
+                        result = Arg::addr(Air::Tmp(MacroAssembler::stackPointerRegister), offsetFromSP);
+                        if (result.isValidForm(width))
+                            return result;
+#if CPU(ARM64)
+                        ASSERT(pinnedExtendedOffsetAddrRegister());
+                        Air::Tmp tmp = Air::Tmp(*pinnedExtendedOffsetAddrRegister());
+
+                        Arg largeOffset = Arg::isValidImmForm(offsetFromSP) ? Arg::imm(offsetFromSP) : Arg::bigImm(offsetFromSP);
+                        insertionSet.insert(instIndex, Move, inst.origin, largeOffset, tmp);
+                        insertionSet.insert(instIndex, Add64, inst.origin, Air::Tmp(MacroAssembler::stackPointerRegister), tmp);
+                        result = Arg::addr(tmp, 0);
                         return result;
+#elif CPU(X86_64)
+                        // Can't happen on x86: immediates are always big enough for frame size.
+                        RELEASE_ASSERT_NOT_REACHED();
+#else
+#error Unhandled architecture.
+#endif
                     };
                     
                     switch (arg.kind()) {
index 56a26cc..c9989b4 100644 (file)
@@ -56,6 +56,11 @@ bool PrintSpecial::admitsStack(Inst&, unsigned)
     return false;
 }
 
+bool PrintSpecial::admitsExtendedOffsetAddr(Inst&, unsigned)
+{
+    return false;
+}
+
 void PrintSpecial::reportUsedRegisters(Inst&, const RegisterSet&)
 {
 }
@@ -71,6 +76,7 @@ CCallHelpers::Jump PrintSpecial::generate(Inst& inst, CCallHelpers& jit, Generat
                 term = Printer::Printer<MacroAssembler::RegisterID>(arg.gpr());
                 break;
             case Arg::Addr:
+            case Arg::ExtendedOffsetAddr:
                 term = Printer::Printer<MacroAssembler::Address>(arg.asAddress());
                 break;
             default:
index 74cb573..b9a4d7b 100644 (file)
@@ -100,16 +100,17 @@ public:
     static const GPRReg scratchRegister = GPRInfo::nonArgGPR0;
     
 protected:
-    void forEachArg(Inst&, const ScopedLambda<Inst::EachArgCallback>&) override;
-    bool isValid(Inst&) override;
-    bool admitsStack(Inst&, unsigned argIndex) override;
-    void reportUsedRegisters(Inst&, const RegisterSet&) override;
-    CCallHelpers::Jump generate(Inst&, CCallHelpers&, GenerationContext&) override;
-    RegisterSet extraEarlyClobberedRegs(Inst&) override;
-    RegisterSet extraClobberedRegs(Inst&) override;
+    void forEachArg(Inst&, const ScopedLambda<Inst::EachArgCallback>&) final;
+    bool isValid(Inst&) final;
+    bool admitsStack(Inst&, unsigned argIndex) final;
+    bool admitsExtendedOffsetAddr(Inst&, unsigned) final;
+    void reportUsedRegisters(Inst&, const RegisterSet&) final;
+    CCallHelpers::Jump generate(Inst&, CCallHelpers&, GenerationContext&) final;
+    RegisterSet extraEarlyClobberedRegs(Inst&) final;
+    RegisterSet extraClobberedRegs(Inst&) final;
     
-    void dumpImpl(PrintStream&) const override;
-    void deepDumpImpl(PrintStream&) const override;
+    void dumpImpl(PrintStream&) const final;
+    void deepDumpImpl(PrintStream&) const final;
     
 private:
     static const unsigned specialArgOffset = 0;
index 480cbfc..268de8a 100644 (file)
@@ -55,6 +55,7 @@ public:
     virtual void forEachArg(Inst&, const ScopedLambda<Inst::EachArgCallback>&) = 0;
     virtual bool isValid(Inst&) = 0;
     virtual bool admitsStack(Inst&, unsigned argIndex) = 0;
+    virtual bool admitsExtendedOffsetAddr(Inst&, unsigned argIndex) = 0;
     virtual std::optional<unsigned> shouldTryAliasingDef(Inst&);
 
     // This gets called on for each Inst that uses this Special. Note that there is no way to
index 5152c86..4feb050 100644 (file)
@@ -229,7 +229,7 @@ def isGF(token)
 end
 
 def isKind(token)
-    token =~ /\A((Tmp)|(Imm)|(BigImm)|(BitImm)|(BitImm64)|(SimpleAddr)|(Addr)|(Index)|(RelCond)|(ResCond)|(DoubleCond)|(StatusCond))\Z/
+    token =~ /\A((Tmp)|(Imm)|(BigImm)|(BitImm)|(BitImm64)|(SimpleAddr)|(Addr)|(ExtendedOffsetAddr)|(Index)|(RelCond)|(ResCond)|(DoubleCond)|(StatusCond))\Z/
 end
 
 def isArch(token)
@@ -303,7 +303,7 @@ class Parser
 
     def consumeKind
         result = token.string
-        parseError("Expected kind (Imm, BigImm, BitImm, BitImm64, Tmp, SimpleAddr, Addr, Index, RelCond, ResCond, DoubleCond, or StatusCond)") unless isKind(result)
+        parseError("Expected kind (Imm, BigImm, BitImm, BitImm64, Tmp, SimpleAddr, Addr, ExtendedOffsetAddr, Index, RelCond, ResCond, DoubleCond, or StatusCond)") unless isKind(result)
         advance
         result
     end
@@ -908,9 +908,13 @@ writeH("OpcodeGenerated") {
                         outp.puts "if (args[#{index}].isStack() && args[#{index}].stackSlot()->isSpill())"
                         outp.puts "OPGEN_RETURN(false);"
                     end
-                    
                     outp.puts "if (!Arg::isValidAddrForm(args[#{index}].offset()))"
                     outp.puts "OPGEN_RETURN(false);"
+                when "ExtendedOffsetAddr"
+                    if arg.role == "UA"
+                        outp.puts "if (args[#{index}].isStack() && args[#{index}].stackSlot()->isSpill())"
+                        outp.puts "OPGEN_RETURN(false);"
+                    end
                 when "Index"
                     outp.puts "if (!Arg::isValidIndexForm(args[#{index}].scale(), args[#{index}].offset(), #{arg.widthCode}))"
                     outp.puts "OPGEN_RETURN(false);"
@@ -1076,6 +1080,24 @@ writeH("OpcodeGenerated") {
     outp.puts "return false;"
     outp.puts "}"
 
+    outp.puts "bool Inst::admitsExtendedOffsetAddr(unsigned argIndex)"
+    outp.puts "{"
+    outp.puts "switch (kind.opcode) {"
+    $opcodes.values.each {
+        | opcode |
+        if opcode.custom
+            outp.puts "case Opcode::#{opcode.name}:"
+            outp.puts "OPGEN_RETURN(#{opcode.name}Custom::admitsExtendedOffsetAddr(*this, argIndex));"
+            outp.puts "break;"
+        end
+    }
+    outp.puts "default:"
+    outp.puts "break;"
+    outp.puts "}"
+    outp.puts "return false;"
+    outp.puts "}"
+
+
     outp.puts "bool Inst::isTerminal()"
     outp.puts "{"
     outp.puts "switch (kind.opcode) {"
@@ -1195,7 +1217,7 @@ writeH("OpcodeGenerated") {
                     outp.print "args[#{index}].asTrustedImm32()"
                 when "BigImm", "BitImm64"
                     outp.print "args[#{index}].asTrustedImm64()"
-                when "SimpleAddr", "Addr"
+                when "SimpleAddr", "Addr", "ExtendedOffsetAddr"
                     outp.print "args[#{index}].asAddress()"
                 when "Index"
                     outp.print "args[#{index}].asBaseIndex()"