Improve some other cases of context-sensitive inlining
authorfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 6 Apr 2016 04:46:12 +0000 (04:46 +0000)
committerfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 6 Apr 2016 04:46:12 +0000 (04:46 +0000)
https://bugs.webkit.org/show_bug.cgi?id=156277

Reviewed by Benjamin Poulain.

This implements some improvements for inlining:

- We no longer do guarded inlining when the profiling doesn't come from a stub. Doing so would have
  been risky, and according to benchmarks, it wasn't common enough to matter. I think it's better to
  err on the side of not inlining.

- The jneq_ptr pattern for variadic calls no longer breaks the basic block. Not breaking the block
  increases the chances of the parser seeing the callee constant. While inlining doesn't require a
  callee constant, sometimes it makes a difference. Note that we were previously breaking the block
  for no reason at all: if the boundary after jneq_ptr is a jump target from some other jump, then
  the parser will automatically break the block for us. There is no reason to add any block breaking
  ourselves since we implement jneq_ptr by ignoring the affirmative jump destination and inserting a
  check and falling through.

- get_by_id handling now tries to apply some common sense to its status object. In particular, if
  the source is a NewObject and there was no interfering operation that could clobber the structure,
  then we know which case of a polymorphic GetByIdStatus we would take. This arises in some
  constructor patterns.

Long term, we should address all of these cases comprehensively by having a late inliner. The inliner
being part of the bytecode parser means that there is a lot of complexity in the parser and it
prevents us from inlining upon learning new information from static analysis. But for now, I think
it's fine to experiment with one-off hacks, if only to learn what the possibilities are.

This is a 14% speed-up on Octane/raytrace.

* bytecode/CallLinkStatus.cpp:
(JSC::CallLinkStatus::dump):
* bytecode/CallLinkStatus.h:
(JSC::CallLinkStatus::couldTakeSlowPath):
(JSC::CallLinkStatus::setCouldTakeSlowPath):
(JSC::CallLinkStatus::variants):
(JSC::CallLinkStatus::size):
(JSC::CallLinkStatus::at):
* bytecode/GetByIdStatus.cpp:
(JSC::GetByIdStatus::makesCalls):
(JSC::GetByIdStatus::filter):
(JSC::GetByIdStatus::dump):
* bytecode/GetByIdStatus.h:
(JSC::GetByIdStatus::wasSeenInJIT):
* dfg/DFGByteCodeParser.cpp:
(JSC::DFG::ByteCodeParser::handleCall):
(JSC::DFG::ByteCodeParser::refineStatically):
(JSC::DFG::ByteCodeParser::handleVarargsCall):
(JSC::DFG::ByteCodeParser::handleInlining):
(JSC::DFG::ByteCodeParser::handleGetById):
(JSC::DFG::ByteCodeParser::parseBlock):
* runtime/Options.h:

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@199093 268f45cc-cd09-0410-ab3c-d52691b4dbfc

Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/bytecode/CallLinkStatus.cpp
Source/JavaScriptCore/bytecode/CallLinkStatus.h
Source/JavaScriptCore/bytecode/GetByIdStatus.cpp
Source/JavaScriptCore/bytecode/GetByIdStatus.h
Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp
Source/JavaScriptCore/runtime/Options.h

index bb1243e..128dabd 100644 (file)
@@ -1,3 +1,59 @@
+2016-04-05  Filip Pizlo  <fpizlo@apple.com>
+
+        Improve some other cases of context-sensitive inlining
+        https://bugs.webkit.org/show_bug.cgi?id=156277
+
+        Reviewed by Benjamin Poulain.
+        
+        This implements some improvements for inlining:
+
+        - We no longer do guarded inlining when the profiling doesn't come from a stub. Doing so would have
+          been risky, and according to benchmarks, it wasn't common enough to matter. I think it's better to
+          err on the side of not inlining.
+        
+        - The jneq_ptr pattern for variadic calls no longer breaks the basic block. Not breaking the block
+          increases the chances of the parser seeing the callee constant. While inlining doesn't require a
+          callee constant, sometimes it makes a difference. Note that we were previously breaking the block
+          for no reason at all: if the boundary after jneq_ptr is a jump target from some other jump, then
+          the parser will automatically break the block for us. There is no reason to add any block breaking
+          ourselves since we implement jneq_ptr by ignoring the affirmative jump destination and inserting a
+          check and falling through.
+        
+        - get_by_id handling now tries to apply some common sense to its status object. In particular, if
+          the source is a NewObject and there was no interfering operation that could clobber the structure,
+          then we know which case of a polymorphic GetByIdStatus we would take. This arises in some
+          constructor patterns.
+        
+        Long term, we should address all of these cases comprehensively by having a late inliner. The inliner
+        being part of the bytecode parser means that there is a lot of complexity in the parser and it
+        prevents us from inlining upon learning new information from static analysis. But for now, I think
+        it's fine to experiment with one-off hacks, if only to learn what the possibilities are.
+        
+        This is a 14% speed-up on Octane/raytrace.
+
+        * bytecode/CallLinkStatus.cpp:
+        (JSC::CallLinkStatus::dump):
+        * bytecode/CallLinkStatus.h:
+        (JSC::CallLinkStatus::couldTakeSlowPath):
+        (JSC::CallLinkStatus::setCouldTakeSlowPath):
+        (JSC::CallLinkStatus::variants):
+        (JSC::CallLinkStatus::size):
+        (JSC::CallLinkStatus::at):
+        * bytecode/GetByIdStatus.cpp:
+        (JSC::GetByIdStatus::makesCalls):
+        (JSC::GetByIdStatus::filter):
+        (JSC::GetByIdStatus::dump):
+        * bytecode/GetByIdStatus.h:
+        (JSC::GetByIdStatus::wasSeenInJIT):
+        * dfg/DFGByteCodeParser.cpp:
+        (JSC::DFG::ByteCodeParser::handleCall):
+        (JSC::DFG::ByteCodeParser::refineStatically):
+        (JSC::DFG::ByteCodeParser::handleVarargsCall):
+        (JSC::DFG::ByteCodeParser::handleInlining):
+        (JSC::DFG::ByteCodeParser::handleGetById):
+        (JSC::DFG::ByteCodeParser::parseBlock):
+        * runtime/Options.h:
+
 2016-04-05  Saam barati  <sbarati@apple.com>
 
         JSC SamplingProfiler: Use a thread + sleep loop instead of WTF::WorkQueue for taking samples
index 8ffc23d..7ce79ab 100644 (file)
@@ -342,6 +342,9 @@ void CallLinkStatus::dump(PrintStream& out) const
     if (m_couldTakeSlowPath)
         out.print(comma, "Could Take Slow Path");
     
+    if (m_isBasedOnStub)
+        out.print(comma, "Based On Stub");
+    
     if (!m_variants.isEmpty())
         out.print(comma, listDump(m_variants));
     
index d3c1eee..9b22387 100644 (file)
@@ -101,6 +101,8 @@ public:
     
     bool couldTakeSlowPath() const { return m_couldTakeSlowPath; }
     
+    void setCouldTakeSlowPath(bool value) { m_couldTakeSlowPath = value; }
+    
     CallVariantList variants() const { return m_variants; }
     unsigned size() const { return m_variants.size(); }
     CallVariant at(unsigned i) const { return m_variants[i]; }
index 66a4dd8..a55d53a 100644 (file)
@@ -350,6 +350,22 @@ bool GetByIdStatus::makesCalls() const
     return false;
 }
 
+void GetByIdStatus::filter(const StructureSet& set)
+{
+    if (m_state != Simple)
+        return;
+    
+    // FIXME: We could also filter the variants themselves.
+    
+    m_variants.removeAllMatching(
+        [&] (GetByIdVariant& variant) -> bool {
+            return !variant.structureSet().overlaps(set);
+        });
+    
+    if (m_variants.isEmpty())
+        m_state = NoInformation;
+}
+
 void GetByIdStatus::dump(PrintStream& out) const
 {
     out.print("(");
index 6afac54..bb0c9e0 100644 (file)
@@ -92,6 +92,9 @@ public:
     
     bool wasSeenInJIT() const { return m_wasSeenInJIT; }
     
+    // Attempts to reduce the set of variants to fit the given structure set. This may be approximate.
+    void filter(const StructureSet&);
+    
     void dump(PrintStream&) const;
     
 private:
index 843f1ee..0018f6f 100644 (file)
 #include "CallLinkStatus.h"
 #include "CodeBlock.h"
 #include "CodeBlockWithJITType.h"
+#include "DFGAbstractHeap.h"
 #include "DFGArrayMode.h"
 #include "DFGCapabilities.h"
+#include "DFGClobberize.h"
 #include "DFGClobbersExitState.h"
 #include "DFGGraph.h"
 #include "DFGJITCode.h"
@@ -177,6 +179,7 @@ private:
     template<typename ChecksFunctor>
     bool handleMinMax(int resultOperand, NodeType op, int registerOffset, int argumentCountIncludingThis, const ChecksFunctor& insertChecks);
 
+    void refineStatically(CallLinkStatus&, Node* callTarget);
     // Handle calls. This resolves issues surrounding inlining and intrinsics.
     enum Terminality { Terminal, NonTerminal };
     Terminality handleCall(
@@ -230,8 +233,7 @@ private:
     Node* store(Node* base, unsigned identifier, const PutByIdVariant&, Node* value);
         
     void handleGetById(
-        int destinationOperand, SpeculatedType, Node* base, unsigned identifierNumber,
-        const GetByIdStatus&);
+        int destinationOperand, SpeculatedType, Node* base, unsigned identifierNumber, GetByIdStatus);
     void emitPutById(
         Node* base, unsigned identifierNumber, Node* value,  const PutByIdStatus&, bool isDirect);
     void handlePutById(
@@ -1167,15 +1169,22 @@ ByteCodeParser::Terminality ByteCodeParser::handleCall(
         registerOffset, callLinkStatus, getPrediction());
 }
 
+void ByteCodeParser::refineStatically(CallLinkStatus& callLinkStatus, Node* callTarget)
+{
+    if (callTarget->isCellConstant()) {
+        callLinkStatus.setProvenConstantCallee(CallVariant(callTarget->asCell()));
+        return;
+    }
+}
+
 ByteCodeParser::Terminality ByteCodeParser::handleCall(
     int result, NodeType op, InlineCallFrame::Kind kind, unsigned instructionSize,
     Node* callTarget, int argumentCountIncludingThis, int registerOffset,
     CallLinkStatus callLinkStatus, SpeculatedType prediction)
 {
     ASSERT(registerOffset <= 0);
-    
-    if (callTarget->isCellConstant())
-        callLinkStatus.setProvenConstantCallee(CallVariant(callTarget->asCell()));
+
+    refineStatically(callLinkStatus, callTarget);
     
     if (Options::verboseDFGByteCodeParsing())
         dataLog("    Handling call at ", currentCodeOrigin(), ": ", callLinkStatus, "\n");
@@ -1227,8 +1236,7 @@ ByteCodeParser::Terminality ByteCodeParser::handleVarargsCall(Instruction* pc, N
     CallLinkStatus callLinkStatus = CallLinkStatus::computeFor(
         m_inlineStackTop->m_profiledBlock, currentCodeOrigin(),
         m_inlineStackTop->m_callLinkInfos, m_callContextMap);
-    if (callTarget->isCellConstant())
-        callLinkStatus.setProvenConstantCallee(CallVariant(callTarget->asCell()));
+    refineStatically(callLinkStatus, callTarget);
     
     if (Options::verboseDFGByteCodeParsing())
         dataLog("    Varargs call link status at ", currentCodeOrigin(), ": ", callLinkStatus, "\n");
@@ -1785,6 +1793,18 @@ bool ByteCodeParser::handleInlining(
         return false;
     }
     
+    // If the claim is that this did not originate from a stub, then we don't want to emit a switch
+    // statement. Whenever the non-stub profiling says that it could take slow path, it really means that
+    // it has no idea.
+    if (!Options::usePolymorphicCallInliningForNonStubStatus()
+        && !callLinkStatus.isBasedOnStub()) {
+        if (verbose) {
+            dataLog("Bailing inlining (non-stub polymorphism).\n");
+            dataLog("Stack: ", currentCodeOrigin(), "\n");
+        }
+        return false;
+    }
+    
     unsigned oldOffset = m_currentIndex;
     
     bool allAreClosureCalls = true;
@@ -2867,8 +2887,24 @@ Node* ByteCodeParser::store(Node* base, unsigned identifier, const PutByIdVarian
 
 void ByteCodeParser::handleGetById(
     int destinationOperand, SpeculatedType prediction, Node* base, unsigned identifierNumber,
-    const GetByIdStatus& getByIdStatus)
+    GetByIdStatus getByIdStatus)
 {
+    // Attempt to reduce the set of things in the GetByIdStatus.
+    if (base->op() == NewObject) {
+        bool ok = true;
+        for (unsigned i = m_currentBlock->size(); i--;) {
+            Node* node = m_currentBlock->at(i);
+            if (node == base)
+                break;
+            if (writesOverlap(m_graph, node, JSCell_structureID)) {
+                ok = false;
+                break;
+            }
+        }
+        if (ok)
+            getByIdStatus.filter(base->structure());
+    }
+    
     NodeType getById = getByIdStatus.makesCalls() ? GetByIdFlush : GetById;
     
     if (!getByIdStatus.isSimple() || !getByIdStatus.numVariants() || !Options::useAccessInlining()) {
@@ -4173,8 +4209,7 @@ bool ByteCodeParser::parseBlock(unsigned limit)
                 OpInfo(m_graph.freeze(static_cast<JSCell*>(actualPointerFor(
                     m_inlineStackTop->m_codeBlock, currentInstruction[2].u.specialPointer)))),
                 get(VirtualRegister(currentInstruction[1].u.operand)));
-            addToGraph(Jump, OpInfo(m_currentIndex + OPCODE_LENGTH(op_jneq_ptr)));
-            LAST_OPCODE(op_jneq_ptr);
+            NEXT_OPCODE(op_jneq_ptr);
 
         case op_resolve_scope: {
             int dst = currentInstruction[1].u.operand;
index a802022..f98192e 100644 (file)
@@ -195,6 +195,7 @@ typedef const char* optionString;
     v(bool, usePolyvariantDevirtualization, true, nullptr) \
     v(bool, usePolymorphicAccessInlining, true, nullptr) \
     v(bool, usePolymorphicCallInlining, true, nullptr) \
+    v(bool, usePolymorphicCallInliningForNonStubStatus, false, nullptr) \
     v(unsigned, maxPolymorphicCallVariantListSize, 15, nullptr) \
     v(unsigned, maxPolymorphicCallVariantListSizeForTopTier, 5, nullptr) \
     v(unsigned, maxPolymorphicCallVariantsForInlining, 5, nullptr) \