OSR entry: delay outer-loop compilation when at inner-loop
authorjfbastien@apple.com <jfbastien@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Sat, 4 Feb 2017 01:17:38 +0000 (01:17 +0000)
committerjfbastien@apple.com <jfbastien@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Sat, 4 Feb 2017 01:17:38 +0000 (01:17 +0000)
https://bugs.webkit.org/show_bug.cgi?id=167149

Reviewed by Filip Pizlo.

r211224 and r211461 were reverted because they caused massive
kraken/ai-astar regressions. This patch instead does the
minimally-disruptive change to fix the original bug as described
below, but omits extra tuning and refactoring which I had
before. I'll commit tuning and refactoring separately, if this
sticks. This patch is therefore very minimal, and layers carefully
on top of the complex spaghetti-logic. The only change it makes is
that it uses triggers to indicate to outer loops that they should
compile, which fixes the immediate bug and seems roughly perf
neutral (maybe a small gain on kraken sometimes, other times a
small regression as would be expected from slightly compiling
later). As opposed to r211461 this patch doesn't unconditionally
unset the trigger because it prevents further DFG executions from
entering. It therefore makes the trigger a tri-state enum class:
don't trigger, compilation done, start compilation. Only "start
compilation" gets reset to "don't trigger". "Compilation done"
does not (unless there's a problem compiling, then it gets set
back to "don't trigger").

As of https://bugs.webkit.org/show_bug.cgi?id=155217 OSR
compilation can be kicked off for an entry into an outer-loop,
while executing an inner-loop. This is desirable because often the
codegen from an inner-entry isn't as good as the codegen from an
outer-entry, but execution from an inner-loop is often pretty hot
and likely to kick off compilation. This approach provided nice
speedups on Kraken because we'd select to enter to the outer-loop
very reliably, which reduces variability (the inner-loop was
selected roughly 1/5 times from my unscientific measurements).

When compilation starts we take a snapshot of the JSValues at the
current execution state using OSR's recovery mechanism. These
values are passed to the compiler and are used as way to perform
type profiling, and could be used to observe cell types as well as
to perform predictions such as through constant propagation.

It's therefore desired to enter from the outer-loop when we can,
but we need to be executing from that location to capture the
right JSValues, otherwise we're confusing the compiler and giving
it inaccurate JSValues which can lead it to predict the wrong
things, leading to suboptimal code or recompilation due to
misprediction, or in super-corner-cases a crash.

DFG tier-up was added here:
https://bugs.webkit.org/show_bug.cgi?id=112838

* dfg/DFGJITCode.h:
* dfg/DFGJITCompiler.cpp:
(JSC::DFG::JITCompiler::JITCompiler):
* dfg/DFGOperations.cpp:
* dfg/DFGSpeculativeJIT64.cpp:
(JSC::DFG::SpeculativeJIT::compile):
* dfg/DFGToFTLForOSREntryDeferredCompilationCallback.cpp:
(JSC::DFG::ToFTLForOSREntryDeferredCompilationCallback::ToFTLForOSREntryDeferredCompilationCallback):
(JSC::DFG::Ref<ToFTLForOSREntryDeferredCompilationCallback>ToFTLForOSREntryDeferredCompilationCallback::create):
(JSC::DFG::ToFTLForOSREntryDeferredCompilationCallback::compilationDidBecomeReadyAsynchronously):
(JSC::DFG::ToFTLForOSREntryDeferredCompilationCallback::compilationDidComplete):
* dfg/DFGToFTLForOSREntryDeferredCompilationCallback.h:

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@211658 268f45cc-cd09-0410-ab3c-d52691b4dbfc

Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/dfg/DFGJITCode.h
Source/JavaScriptCore/dfg/DFGJITCompiler.cpp
Source/JavaScriptCore/dfg/DFGOperations.cpp
Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp
Source/JavaScriptCore/dfg/DFGToFTLForOSREntryDeferredCompilationCallback.cpp
Source/JavaScriptCore/dfg/DFGToFTLForOSREntryDeferredCompilationCallback.h

index 4f7d1da..1b1a3ee 100644 (file)
@@ -1,3 +1,68 @@
+2017-02-03  JF Bastien  <jfbastien@apple.com>
+
+        OSR entry: delay outer-loop compilation when at inner-loop
+        https://bugs.webkit.org/show_bug.cgi?id=167149
+
+        Reviewed by Filip Pizlo.
+
+        r211224 and r211461 were reverted because they caused massive
+        kraken/ai-astar regressions. This patch instead does the
+        minimally-disruptive change to fix the original bug as described
+        below, but omits extra tuning and refactoring which I had
+        before. I'll commit tuning and refactoring separately, if this
+        sticks. This patch is therefore very minimal, and layers carefully
+        on top of the complex spaghetti-logic. The only change it makes is
+        that it uses triggers to indicate to outer loops that they should
+        compile, which fixes the immediate bug and seems roughly perf
+        neutral (maybe a small gain on kraken sometimes, other times a
+        small regression as would be expected from slightly compiling
+        later). As opposed to r211461 this patch doesn't unconditionally
+        unset the trigger because it prevents further DFG executions from
+        entering. It therefore makes the trigger a tri-state enum class:
+        don't trigger, compilation done, start compilation. Only "start
+        compilation" gets reset to "don't trigger". "Compilation done"
+        does not (unless there's a problem compiling, then it gets set
+        back to "don't trigger").
+
+        As of https://bugs.webkit.org/show_bug.cgi?id=155217 OSR
+        compilation can be kicked off for an entry into an outer-loop,
+        while executing an inner-loop. This is desirable because often the
+        codegen from an inner-entry isn't as good as the codegen from an
+        outer-entry, but execution from an inner-loop is often pretty hot
+        and likely to kick off compilation. This approach provided nice
+        speedups on Kraken because we'd select to enter to the outer-loop
+        very reliably, which reduces variability (the inner-loop was
+        selected roughly 1/5 times from my unscientific measurements).
+
+        When compilation starts we take a snapshot of the JSValues at the
+        current execution state using OSR's recovery mechanism. These
+        values are passed to the compiler and are used as way to perform
+        type profiling, and could be used to observe cell types as well as
+        to perform predictions such as through constant propagation.
+
+        It's therefore desired to enter from the outer-loop when we can,
+        but we need to be executing from that location to capture the
+        right JSValues, otherwise we're confusing the compiler and giving
+        it inaccurate JSValues which can lead it to predict the wrong
+        things, leading to suboptimal code or recompilation due to
+        misprediction, or in super-corner-cases a crash.
+
+        DFG tier-up was added here:
+        https://bugs.webkit.org/show_bug.cgi?id=112838
+
+        * dfg/DFGJITCode.h:
+        * dfg/DFGJITCompiler.cpp:
+        (JSC::DFG::JITCompiler::JITCompiler):
+        * dfg/DFGOperations.cpp:
+        * dfg/DFGSpeculativeJIT64.cpp:
+        (JSC::DFG::SpeculativeJIT::compile):
+        * dfg/DFGToFTLForOSREntryDeferredCompilationCallback.cpp:
+        (JSC::DFG::ToFTLForOSREntryDeferredCompilationCallback::ToFTLForOSREntryDeferredCompilationCallback):
+        (JSC::DFG::Ref<ToFTLForOSREntryDeferredCompilationCallback>ToFTLForOSREntryDeferredCompilationCallback::create):
+        (JSC::DFG::ToFTLForOSREntryDeferredCompilationCallback::compilationDidBecomeReadyAsynchronously):
+        (JSC::DFG::ToFTLForOSREntryDeferredCompilationCallback::compilationDidComplete):
+        * dfg/DFGToFTLForOSREntryDeferredCompilationCallback.h:
+
 2017-02-03  Saam Barati  <sbarati@apple.com>
 
         When OSR entering to the baseline JIT from the LLInt for a ProgramCodeBlock we can skip compiling a lot of the program
index aee8742..e1f6c41 100644 (file)
@@ -151,10 +151,16 @@ public:
     // Map each bytecode of CheckTierUpAndOSREnter to its stream index.
     HashMap<unsigned, unsigned, WTF::IntHash<unsigned>, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> bytecodeIndexToStreamIndex;
 
+    enum class TriggerReason : uint8_t {
+        DontTrigger,
+        CompilationDone,
+        StartCompilation,
+    };
+
     // Map each bytecode of CheckTierUpAndOSREnter to its trigger forcing OSR Entry.
     // This can never be modified after it has been initialized since the addresses of the triggers
     // are used by the JIT.
-    HashMap<unsigned, uint8_t> tierUpEntryTriggers;
+    HashMap<unsigned, TriggerReason> tierUpEntryTriggers;
 
     // Set of bytecode that were the target of a TierUp operation.
     HashSet<unsigned, WTF::IntHash<unsigned>, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> tierUpEntrySeen;
index 29e9231..3513b5b 100644 (file)
@@ -60,7 +60,7 @@ JITCompiler::JITCompiler(Graph& dfg)
 #if ENABLE(FTL_JIT)
     m_jitCode->tierUpInLoopHierarchy = WTFMove(m_graph.m_plan.tierUpInLoopHierarchy);
     for (unsigned tierUpBytecode : m_graph.m_plan.tierUpAndOSREnterBytecodes)
-        m_jitCode->tierUpEntryTriggers.add(tierUpBytecode, 0);
+        m_jitCode->tierUpEntryTriggers.add(tierUpBytecode, JITCode::TriggerReason::DontTrigger);
 #endif
 }
 
index dc54aec..025a7cf 100644 (file)
@@ -2351,6 +2351,37 @@ static char* tierUpCommon(ExecState* exec, unsigned originBytecodeIndex, unsigne
         worklistState = Worklist::NotKnown;
 
     JITCode* jitCode = codeBlock->jitCode()->dfg();
+
+    // The following is only true for triggerTierUpNowInLoop, which can never
+    // be an OSR entry.
+    bool canOSRFromHere = originBytecodeIndex == osrEntryBytecodeIndex;
+
+    bool triggeredSlowPathToStartCompilation = false;
+    auto tierUpEntryTriggers = jitCode->tierUpEntryTriggers.find(originBytecodeIndex);
+    if (tierUpEntryTriggers != jitCode->tierUpEntryTriggers.end()) {
+        switch (tierUpEntryTriggers->value) {
+        case JITCode::TriggerReason::DontTrigger:
+            // The trigger isn't set, we entered because the counter reached its
+            // threshold.
+            break;
+
+        case JITCode::TriggerReason::CompilationDone:
+            // The trigger was set because compilation completed. Don't unset it
+            // so that further DFG executions OSR enters as well.
+            RELEASE_ASSERT(canOSRFromHere);
+            break;
+
+        case JITCode::TriggerReason::StartCompilation:
+            // We were asked to enter as soon as possible and start compiling an
+            // entry for the current bytecode location. Unset this trigger so we
+            // don't continually enter.
+            RELEASE_ASSERT(canOSRFromHere);
+            tierUpEntryTriggers->value = JITCode::TriggerReason::DontTrigger;
+            triggeredSlowPathToStartCompilation = true;
+            break;
+        }
+    }
+
     if (worklistState == Worklist::Compiling) {
         CODEBLOCK_LOG_EVENT(codeBlock, "delayFTLCompile", ("still compiling"));
         jitCode->setOptimizationThresholdBasedOnCompilationResult(
@@ -2367,7 +2398,7 @@ static char* tierUpCommon(ExecState* exec, unsigned originBytecodeIndex, unsigne
     }
 
     // If we can OSR Enter, do it right away.
-    if (originBytecodeIndex == osrEntryBytecodeIndex) {
+    if (canOSRFromHere) {
         unsigned streamIndex = jitCode->bytecodeIndexToStreamIndex.get(originBytecodeIndex);
         if (CodeBlock* entryBlock = jitCode->osrEntryBlock()) {
             if (void* address = FTL::prepareOSREntry(exec, codeBlock, entryBlock, originBytecodeIndex, streamIndex)) {
@@ -2381,10 +2412,10 @@ static char* tierUpCommon(ExecState* exec, unsigned originBytecodeIndex, unsigne
     // - If we do have an FTL code block, then try to enter for a while.
     // - If we couldn't enter for a while, then trigger OSR entry.
 
-    if (!shouldTriggerFTLCompile(codeBlock, jitCode))
+    if (!shouldTriggerFTLCompile(codeBlock, jitCode) && !triggeredSlowPathToStartCompilation)
         return nullptr;
 
-    if (!jitCode->neverExecutedEntry) {
+    if (!jitCode->neverExecutedEntry && !triggeredSlowPathToStartCompilation) {
         triggerFTLReplacementCompile(vm, codeBlock, jitCode);
 
         if (!codeBlock->hasOptimizedReplacement())
@@ -2424,19 +2455,36 @@ static char* tierUpCommon(ExecState* exec, unsigned originBytecodeIndex, unsigne
         unsigned osrEntryBytecode = entryBlock->jitCode()->ftlForOSREntry()->bytecodeIndex();
         jitCode->clearOSREntryBlock();
         jitCode->osrEntryRetry = 0;
-        jitCode->tierUpEntryTriggers.set(osrEntryBytecode, 0);
+        jitCode->tierUpEntryTriggers.set(osrEntryBytecode, JITCode::TriggerReason::DontTrigger);
         jitCode->setOptimizationThresholdBasedOnCompilationResult(
             codeBlock, CompilationDeferred);
         return nullptr;
     }
 
+    if (!canOSRFromHere) {
+        // We can't OSR from here, or even start a compilation because doing so
+        // calls jitCode->reconstruct which would get the wrong state.
+        if (Options::verboseOSR())
+            dataLog("Non-OSR-able bc#", originBytecodeIndex, " in ", *codeBlock, " setting parent loop bc#", osrEntryBytecodeIndex, "'s trigger and backing off.\n");
+        jitCode->tierUpEntryTriggers.set(osrEntryBytecodeIndex, JITCode::TriggerReason::StartCompilation);
+        jitCode->setOptimizationThresholdBasedOnCompilationResult(codeBlock, CompilationDeferred);
+        return nullptr;
+    }
+
     unsigned streamIndex = jitCode->bytecodeIndexToStreamIndex.get(osrEntryBytecodeIndex);
-    auto tierUpHierarchyEntry = jitCode->tierUpInLoopHierarchy.find(osrEntryBytecodeIndex);
-    if (tierUpHierarchyEntry != jitCode->tierUpInLoopHierarchy.end()) {
-        for (unsigned osrEntryCandidate : tierUpHierarchyEntry->value) {
-            if (jitCode->tierUpEntrySeen.contains(osrEntryCandidate)) {
-                osrEntryBytecodeIndex = osrEntryCandidate;
-                streamIndex = jitCode->bytecodeIndexToStreamIndex.get(osrEntryBytecodeIndex);
+
+    if (!triggeredSlowPathToStartCompilation) {
+        auto tierUpHierarchyEntry = jitCode->tierUpInLoopHierarchy.find(osrEntryBytecodeIndex);
+        if (tierUpHierarchyEntry != jitCode->tierUpInLoopHierarchy.end()) {
+            for (unsigned osrEntryCandidate : tierUpHierarchyEntry->value) {
+                if (jitCode->tierUpEntrySeen.contains(osrEntryCandidate)) {
+                    // Ask an enclosing loop to compile, instead of doing so here.
+                    if (Options::verboseOSR())
+                        dataLog("Inner-loop bc#", originBytecodeIndex, " in ", *codeBlock, " setting parent loop bc#", osrEntryCandidate, "'s trigger and backing off.\n");
+                    jitCode->tierUpEntryTriggers.set(osrEntryCandidate, JITCode::TriggerReason::StartCompilation);
+                    jitCode->setOptimizationThresholdBasedOnCompilationResult(codeBlock, CompilationDeferred);
+                    return nullptr;
+                }
             }
         }
     }
@@ -2445,7 +2493,7 @@ static char* tierUpCommon(ExecState* exec, unsigned originBytecodeIndex, unsigne
     // something.
     auto triggerIterator = jitCode->tierUpEntryTriggers.find(osrEntryBytecodeIndex);
     RELEASE_ASSERT(triggerIterator != jitCode->tierUpEntryTriggers.end());
-    uint8_t* triggerAddress = &(triggerIterator->value);
+    JITCode::TriggerReason* triggerAddress = &(triggerIterator->value);
 
     Operands<JSValue> mustHandleValues;
     jitCode->reconstruct(
index 1ca8832..bbff49c 100644 (file)
@@ -5781,7 +5781,9 @@ void SpeculativeJIT::compile(Node* node)
         unsigned bytecodeIndex = node->origin.semantic.bytecodeIndex;
         auto triggerIterator = m_jit.jitCode()->tierUpEntryTriggers.find(bytecodeIndex);
         DFG_ASSERT(m_jit.graph(), node, triggerIterator != m_jit.jitCode()->tierUpEntryTriggers.end());
-        uint8_t* forceEntryTrigger = &(m_jit.jitCode()->tierUpEntryTriggers.find(bytecodeIndex)->value);
+        JITCode::TriggerReason* forceEntryTrigger = &(m_jit.jitCode()->tierUpEntryTriggers.find(bytecodeIndex)->value);
+        static_assert(!static_cast<uint8_t>(JITCode::TriggerReason::DontTrigger), "the JIT code assumes non-zero means 'enter'");
+        static_assert(sizeof(JITCode::TriggerReason) == 1, "branchTest8 assumes this size");
 
         MacroAssembler::Jump forceOSREntry = m_jit.branchTest8(MacroAssembler::NonZero, MacroAssembler::AbsoluteAddress(forceEntryTrigger));
         MacroAssembler::Jump overflowedCounter = m_jit.branchAdd32(
index 1b125d3..ae81542 100644 (file)
@@ -35,7 +35,7 @@
 
 namespace JSC { namespace DFG {
 
-ToFTLForOSREntryDeferredCompilationCallback::ToFTLForOSREntryDeferredCompilationCallback(uint8_t* forcedOSREntryTrigger)
+ToFTLForOSREntryDeferredCompilationCallback::ToFTLForOSREntryDeferredCompilationCallback(JITCode::TriggerReason* forcedOSREntryTrigger)
     : m_forcedOSREntryTrigger(forcedOSREntryTrigger)
 {
 }
@@ -44,7 +44,7 @@ ToFTLForOSREntryDeferredCompilationCallback::~ToFTLForOSREntryDeferredCompilatio
 {
 }
 
-Ref<ToFTLForOSREntryDeferredCompilationCallback>ToFTLForOSREntryDeferredCompilationCallback::create(uint8_t* forcedOSREntryTrigger)
+Ref<ToFTLForOSREntryDeferredCompilationCallback>ToFTLForOSREntryDeferredCompilationCallback::create(JITCode::TriggerReason* forcedOSREntryTrigger)
 {
     return adoptRef(*new ToFTLForOSREntryDeferredCompilationCallback(forcedOSREntryTrigger));
 }
@@ -58,7 +58,7 @@ void ToFTLForOSREntryDeferredCompilationCallback::compilationDidBecomeReadyAsync
             ") did become ready.\n");
     }
 
-    *m_forcedOSREntryTrigger = 1;
+    *m_forcedOSREntryTrigger = JITCode::TriggerReason::CompilationDone;
 }
 
 void ToFTLForOSREntryDeferredCompilationCallback::compilationDidComplete(
@@ -76,7 +76,7 @@ void ToFTLForOSREntryDeferredCompilationCallback::compilationDidComplete(
     case CompilationSuccessful: {
         jitCode->setOSREntryBlock(*codeBlock->vm(), profiledDFGCodeBlock, codeBlock);
         unsigned osrEntryBytecode = codeBlock->jitCode()->ftlForOSREntry()->bytecodeIndex();
-        jitCode->tierUpEntryTriggers.set(osrEntryBytecode, 1);
+        jitCode->tierUpEntryTriggers.set(osrEntryBytecode, JITCode::TriggerReason::CompilationDone);
         break;
     }
     case CompilationFailed:
index 2fb9516..ffac65b 100644 (file)
@@ -27,6 +27,7 @@
 
 #if ENABLE(FTL_JIT)
 
+#include "DFGJITCode.h"
 #include "DeferredCompilationCallback.h"
 #include <wtf/RefPtr.h>
 
@@ -38,18 +39,18 @@ namespace DFG {
 
 class ToFTLForOSREntryDeferredCompilationCallback : public DeferredCompilationCallback {
 protected:
-    ToFTLForOSREntryDeferredCompilationCallback(uint8_t* forcedOSREntryTrigger);
+    ToFTLForOSREntryDeferredCompilationCallback(JITCode::TriggerReason* forcedOSREntryTrigger);
 
 public:
     virtual ~ToFTLForOSREntryDeferredCompilationCallback();
 
-    static Ref<ToFTLForOSREntryDeferredCompilationCallback> create(uint8_t* forcedOSREntryTrigger);
+    static Ref<ToFTLForOSREntryDeferredCompilationCallback> create(JITCode::TriggerReason* forcedOSREntryTrigger);
     
     virtual void compilationDidBecomeReadyAsynchronously(CodeBlock*, CodeBlock* profiledDFGCodeBlock);
     virtual void compilationDidComplete(CodeBlock*, CodeBlock* profiledDFGCodeBlock, CompilationResult);
 
 private:
-    uint8_t* m_forcedOSREntryTrigger;
+    JITCode::TriggerReason* m_forcedOSREntryTrigger;
 };
 
 } } // namespace JSC::DFG