Concurrent GC should be able to run splay in debug mode and earley/raytrace in releas...
authorfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Fri, 18 Nov 2016 22:11:51 +0000 (22:11 +0000)
committerfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Fri, 18 Nov 2016 22:11:51 +0000 (22:11 +0000)
https://bugs.webkit.org/show_bug.cgi?id=164282

Reviewed by Geoffrey Garen and Oliver Hunt.

PerformanceTests:

CDjs is a fun benchmark for stressing concurrent GCs, but to really give the GC a good
workout you need to increase the amount of work that the test does. This adds a second
configuration of the benchmark that has more aircraft. It uses much more memory and causes us
to do more GCs and those GCs take longer.

* JetStream/cdjs/benchmark.js:
(benchmarkImpl):
(benchmark):
* JetStream/cdjs/large.js: Added.

Source/JavaScriptCore:

The two three remaining bugs were:

- Improper ordering inside putDirectWithoutTransition() and friends. We need to make sure
  that the GC doesn't see the store to Structure::m_offset until we've resized the butterfly.
  That proved a bit tricky. On the other hand, this means that we could probably remove the
  requirement that the GC holds the Structure lock in some cases. I haven't removed that lock
  yet because I still think it might protect some weird cases, and it doesn't seem to cost us
  anything.

- CodeBlock's GC strategy needed to be made thread-safe (visitWeakly, visitChildren, and
  their friends now hold locks) and incremental-safe (we need to update predictions in the
  finalizer to make sure we clear anything that was put into a value profile towards the end
  of GC).

- The GC timeslicing scheduler needed to be made a bit more aggressive to deal with
  generational workloads like earley, raytrace, and CDjs. Once I got those benchmarks to run,
  I found that they would do many useless iterations of GC because they wouldn't pause long
  enough after rescanning weak references and roots. I added a bunch of knobs for forcing a
  pause. In the end, I realized that I could get the desired effect by putting a ceiling on
  mutator utilization. We want the GC to finish quickly if it is possible to do so, even if
  the amount of allocation that the mutator had done is low. Having a utilization ceiling
  seems to accomplish this for benchmarks with trivial heaps (earley and raytrace) as well as
  huge heaps (like CDjs in its "large" configuration).

This preserves splay performance, makes the concurrent GC more stable, and makes the
concurrent GC not a perf regression on earley or raytrace. It seems to give us great CDjs
performance as well, but this is still hard to tell because we crash a lot in that benchmark.

* bytecode/CodeBlock.cpp:
(JSC::CodeBlock::CodeBlock):
(JSC::CodeBlock::visitWeakly):
(JSC::CodeBlock::visitChildren):
(JSC::CodeBlock::shouldVisitStrongly):
(JSC::CodeBlock::shouldJettisonDueToOldAge):
(JSC::CodeBlock::propagateTransitions):
(JSC::CodeBlock::determineLiveness):
(JSC::CodeBlock::WeakReferenceHarvester::visitWeakReferences):
(JSC::CodeBlock::UnconditionalFinalizer::finalizeUnconditionally):
(JSC::CodeBlock::visitOSRExitTargets):
(JSC::CodeBlock::stronglyVisitStrongReferences):
(JSC::CodeBlock::stronglyVisitWeakReferences):
* bytecode/CodeBlock.h:
(JSC::CodeBlock::clearVisitWeaklyHasBeenCalled):
* heap/CodeBlockSet.cpp:
(JSC::CodeBlockSet::deleteUnmarkedAndUnreferenced):
* heap/Heap.cpp:
(JSC::Heap::ResumeTheWorldScope::ResumeTheWorldScope):
(JSC::Heap::markToFixpoint):
(JSC::Heap::beginMarking):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::collectInThread):
* heap/Heap.h:
* heap/HeapInlines.h:
(JSC::Heap::mutatorFence):
* heap/MarkedBlock.cpp:
* runtime/JSCellInlines.h:
(JSC::JSCell::finishCreation):
* runtime/JSObjectInlines.h:
(JSC::JSObject::putDirectWithoutTransition):
(JSC::JSObject::putDirectInternal):
* runtime/Options.h:
* runtime/Structure.cpp:
(JSC::Structure::add):
* runtime/Structure.h:
* runtime/StructureInlines.h:
(JSC::Structure::add):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@208897 268f45cc-cd09-0410-ab3c-d52691b4dbfc

17 files changed:
PerformanceTests/ChangeLog
PerformanceTests/JetStream/cdjs/benchmark.js
PerformanceTests/JetStream/cdjs/large.js [new file with mode: 0644]
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/bytecode/CodeBlock.cpp
Source/JavaScriptCore/bytecode/CodeBlock.h
Source/JavaScriptCore/heap/CodeBlockSet.cpp
Source/JavaScriptCore/heap/Heap.cpp
Source/JavaScriptCore/heap/Heap.h
Source/JavaScriptCore/heap/HeapInlines.h
Source/JavaScriptCore/heap/MarkedBlock.cpp
Source/JavaScriptCore/runtime/JSCellInlines.h
Source/JavaScriptCore/runtime/JSObjectInlines.h
Source/JavaScriptCore/runtime/Options.h
Source/JavaScriptCore/runtime/Structure.cpp
Source/JavaScriptCore/runtime/Structure.h
Source/JavaScriptCore/runtime/StructureInlines.h

index c9279a138673e6b654900346e5e5acf089279f2e..8ee3482c245dd9193968a396b4d8d3f3b5908d7d 100644 (file)
@@ -1,3 +1,20 @@
+2016-11-18  Filip Pizlo  <fpizlo@apple.com>
+
+        Concurrent GC should be able to run splay in debug mode and earley/raytrace in release mode with no perf regression
+        https://bugs.webkit.org/show_bug.cgi?id=164282
+
+        Reviewed by Geoffrey Garen and Oliver Hunt.
+        
+        CDjs is a fun benchmark for stressing concurrent GCs, but to really give the GC a good
+        workout you need to increase the amount of work that the test does. This adds a second
+        configuration of the benchmark that has more aircraft. It uses much more memory and causes us
+        to do more GCs and those GCs take longer.
+
+        * JetStream/cdjs/benchmark.js:
+        (benchmarkImpl):
+        (benchmark):
+        * JetStream/cdjs/large.js: Added.
+
 2016-11-14  Filip Pizlo  <fpizlo@apple.com>
 
         Unreviewed, revert unintended change.
index 9f8d108b0856c6eba6296fff97f889515003039a..49b06be1643b9966382e1fdf0a43a84316cf9fb5 100644 (file)
@@ -1,5 +1,5 @@
 // Copyright (c) 2001-2010, Purdue University. All rights reserved.
-// Copyright (C) 2015 Apple Inc. All rights reserved.
+// Copyright (C) 2015-2016 Apple Inc. All rights reserved.
 // 
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are met:
 // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
-function benchmark() {
-    var verbosity = 0;
-    var numAircraft = 1000;
-    var numFrames = 200;
-    var expectedCollisions = 14484;
-    var percentile = 95;
+function benchmarkImpl(configuration) {
+    var verbosity = configuration.verbosity;
+    var numAircraft = configuration.numAircraft;
+    var numFrames = configuration.numFrames;
+    var expectedCollisions = configuration.expectedCollisions;
+    var percentile = configuration.percentile;
 
     var simulator = new Simulator(numAircraft);
     var detector = new CollisionDetector();
@@ -78,3 +78,24 @@ function benchmark() {
     
     return averageAbovePercentile(times, percentile);
 }
+
+function benchmark() {
+    return benchmarkImpl({
+        verbosity: 0,
+        numAircraft: 1000,
+        numFrames: 200,
+        expectedCollisions: 14484,
+        percentile: 95
+    });
+}
+
+function largeBenchmark() {
+    return benchmarkImpl({
+        verbosity: 0,
+        numAircraft: 20000,
+        numFrames: 100,
+        expectedCollisions: 5827,
+        percentile: 95
+    });
+}
+
diff --git a/PerformanceTests/JetStream/cdjs/large.js b/PerformanceTests/JetStream/cdjs/large.js
new file mode 100644 (file)
index 0000000..8c1388c
--- /dev/null
@@ -0,0 +1,45 @@
+// Copyright (c) 2001-2010, Purdue University. All rights reserved.
+// Copyright (C) 2016 Apple Inc. All rights reserved.
+// 
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+//  * Redistributions of source code must retain the above copyright
+//    notice, this list of conditions and the following disclaimer.
+//  * Redistributions in binary form must reproduce the above copyright
+//    notice, this list of conditions and the following disclaimer in the
+//    documentation and/or other materials provided with the distribution.
+//  * Neither the name of the Purdue University nor the
+//    names of its contributors may be used to endorse or promote products
+//    derived from this software without specific prior written permission.
+// 
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+// ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+// WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+// DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE FOR ANY
+// DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+// ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+// This is run as a JSC stress test. Let the harness know that this is a slow test.
+//@ slow!
+
+load("constants.js");
+load("util.js");
+load("red_black_tree.js");
+load("call_sign.js");
+load("vector_2d.js");
+load("vector_3d.js");
+load("motion.js");
+load("reduce_collision_set.js");
+load("simulator.js");
+load("collision.js");
+load("collision_detector.js");
+load("benchmark.js");
+
+var result = largeBenchmark();
+
+print("Average worst case: " + result + " ms.");
+
index 3cefd6ad1a5c0a078701926500c6524e1f6722ee..64cfff778f5fffdc197c3cab95dd97d3a8136932 100644 (file)
@@ -1,3 +1,77 @@
+2016-11-18  Filip Pizlo  <fpizlo@apple.com>
+
+        Concurrent GC should be able to run splay in debug mode and earley/raytrace in release mode with no perf regression
+        https://bugs.webkit.org/show_bug.cgi?id=164282
+
+        Reviewed by Geoffrey Garen and Oliver Hunt.
+        
+        The two three remaining bugs were:
+
+        - Improper ordering inside putDirectWithoutTransition() and friends. We need to make sure
+          that the GC doesn't see the store to Structure::m_offset until we've resized the butterfly.
+          That proved a bit tricky. On the other hand, this means that we could probably remove the
+          requirement that the GC holds the Structure lock in some cases. I haven't removed that lock
+          yet because I still think it might protect some weird cases, and it doesn't seem to cost us
+          anything.
+        
+        - CodeBlock's GC strategy needed to be made thread-safe (visitWeakly, visitChildren, and
+          their friends now hold locks) and incremental-safe (we need to update predictions in the
+          finalizer to make sure we clear anything that was put into a value profile towards the end
+          of GC).
+        
+        - The GC timeslicing scheduler needed to be made a bit more aggressive to deal with
+          generational workloads like earley, raytrace, and CDjs. Once I got those benchmarks to run,
+          I found that they would do many useless iterations of GC because they wouldn't pause long
+          enough after rescanning weak references and roots. I added a bunch of knobs for forcing a
+          pause. In the end, I realized that I could get the desired effect by putting a ceiling on
+          mutator utilization. We want the GC to finish quickly if it is possible to do so, even if
+          the amount of allocation that the mutator had done is low. Having a utilization ceiling
+          seems to accomplish this for benchmarks with trivial heaps (earley and raytrace) as well as
+          huge heaps (like CDjs in its "large" configuration).
+        
+        This preserves splay performance, makes the concurrent GC more stable, and makes the
+        concurrent GC not a perf regression on earley or raytrace. It seems to give us great CDjs
+        performance as well, but this is still hard to tell because we crash a lot in that benchmark.
+
+        * bytecode/CodeBlock.cpp:
+        (JSC::CodeBlock::CodeBlock):
+        (JSC::CodeBlock::visitWeakly):
+        (JSC::CodeBlock::visitChildren):
+        (JSC::CodeBlock::shouldVisitStrongly):
+        (JSC::CodeBlock::shouldJettisonDueToOldAge):
+        (JSC::CodeBlock::propagateTransitions):
+        (JSC::CodeBlock::determineLiveness):
+        (JSC::CodeBlock::WeakReferenceHarvester::visitWeakReferences):
+        (JSC::CodeBlock::UnconditionalFinalizer::finalizeUnconditionally):
+        (JSC::CodeBlock::visitOSRExitTargets):
+        (JSC::CodeBlock::stronglyVisitStrongReferences):
+        (JSC::CodeBlock::stronglyVisitWeakReferences):
+        * bytecode/CodeBlock.h:
+        (JSC::CodeBlock::clearVisitWeaklyHasBeenCalled):
+        * heap/CodeBlockSet.cpp:
+        (JSC::CodeBlockSet::deleteUnmarkedAndUnreferenced):
+        * heap/Heap.cpp:
+        (JSC::Heap::ResumeTheWorldScope::ResumeTheWorldScope):
+        (JSC::Heap::markToFixpoint):
+        (JSC::Heap::beginMarking):
+        (JSC::Heap::addToRememberedSet):
+        (JSC::Heap::collectInThread):
+        * heap/Heap.h:
+        * heap/HeapInlines.h:
+        (JSC::Heap::mutatorFence):
+        * heap/MarkedBlock.cpp:
+        * runtime/JSCellInlines.h:
+        (JSC::JSCell::finishCreation):
+        * runtime/JSObjectInlines.h:
+        (JSC::JSObject::putDirectWithoutTransition):
+        (JSC::JSObject::putDirectInternal):
+        * runtime/Options.h:
+        * runtime/Structure.cpp:
+        (JSC::Structure::add):
+        * runtime/Structure.h:
+        * runtime/StructureInlines.h:
+        (JSC::Structure::add):
+
 2016-11-18  Joseph Pecoraro  <pecoraro@apple.com>
 
         Web Inspector: Generator functions should have a displayable name when shown in stack traces
index fbd8549cb277f60edb878a2039fc937a0b1ae181..6456340d6b5c4fc8e46696f3d7fbfc95e3b6d0d8 100644 (file)
@@ -1873,7 +1873,7 @@ CodeBlock::CodeBlock(VM* vm, Structure* structure, CopyParsedBlockTag, CodeBlock
     , m_reoptimizationRetryCounter(0)
     , m_creationTime(std::chrono::steady_clock::now())
 {
-    m_visitWeaklyHasBeenCalled.store(false, std::memory_order_relaxed);
+    m_visitWeaklyHasBeenCalled = false;
 
     ASSERT(heap()->isDeferred());
     ASSERT(m_scopeRegister.isLocal());
@@ -1932,7 +1932,7 @@ CodeBlock::CodeBlock(VM* vm, Structure* structure, ScriptExecutable* ownerExecut
     , m_reoptimizationRetryCounter(0)
     , m_creationTime(std::chrono::steady_clock::now())
 {
-    m_visitWeaklyHasBeenCalled.store(false, std::memory_order_relaxed);
+    m_visitWeaklyHasBeenCalled = false;
 
     ASSERT(heap()->isDeferred());
     ASSERT(m_scopeRegister.isLocal());
@@ -2508,18 +2508,20 @@ CodeBlock* CodeBlock::specialOSREntryBlockOrNull()
 
 void CodeBlock::visitWeakly(SlotVisitor& visitor)
 {
-    bool setByMe = !m_visitWeaklyHasBeenCalled.compareExchangeStrong(false, true);
-    if (!setByMe)
+    ConcurrentJSLocker locker(m_lock);
+    if (m_visitWeaklyHasBeenCalled)
         return;
+    
+    m_visitWeaklyHasBeenCalled = true;
 
     if (Heap::isMarkedConcurrently(this))
         return;
 
-    if (shouldVisitStrongly()) {
+    if (shouldVisitStrongly(locker)) {
         visitor.appendUnbarrieredReadOnlyPointer(this);
         return;
     }
-
+    
     // There are two things that may use unconditional finalizers: inline cache clearing
     // and jettisoning. The probability of us wanting to do at least one of those things
     // is probably quite close to 1. So we add one no matter what and when it runs, it
@@ -2549,10 +2551,10 @@ void CodeBlock::visitWeakly(SlotVisitor& visitor)
     // decision by calling harvestWeakReferences().
 
     m_allTransitionsHaveBeenMarked = false;
-    propagateTransitions(visitor);
+    propagateTransitions(locker, visitor);
 
     m_jitCode->dfgCommon()->livenessHasBeenProved = false;
-    determineLiveness(visitor);
+    determineLiveness(locker, visitor);
 #endif // ENABLE(DFG_JIT)
 }
 
@@ -2575,6 +2577,7 @@ void CodeBlock::visitChildren(JSCell* cell, SlotVisitor& visitor)
 
 void CodeBlock::visitChildren(SlotVisitor& visitor)
 {
+    ConcurrentJSLocker locker(m_lock);
     // There are two things that may use unconditional finalizers: inline cache clearing
     // and jettisoning. The probability of us wanting to do at least one of those things
     // is probably quite close to 1. So we add one no matter what and when it runs, it
@@ -2592,19 +2595,19 @@ void CodeBlock::visitChildren(SlotVisitor& visitor)
         visitor.reportExtraMemoryVisited(m_instructions.size() * sizeof(Instruction) / refCount);
     }
 
-    stronglyVisitStrongReferences(visitor);
-    stronglyVisitWeakReferences(visitor);
+    stronglyVisitStrongReferences(locker, visitor);
+    stronglyVisitWeakReferences(locker, visitor);
 
     m_allTransitionsHaveBeenMarked = false;
-    propagateTransitions(visitor);
+    propagateTransitions(locker, visitor);
 }
 
-bool CodeBlock::shouldVisitStrongly()
+bool CodeBlock::shouldVisitStrongly(const ConcurrentJSLocker& locker)
 {
     if (Options::forceCodeBlockLiveness())
         return true;
 
-    if (shouldJettisonDueToOldAge())
+    if (shouldJettisonDueToOldAge(locker))
         return false;
 
     // Interpreter and Baseline JIT CodeBlocks don't need to be jettisoned when
@@ -2656,7 +2659,7 @@ static std::chrono::milliseconds timeToLive(JITCode::JITType jitType)
     }
 }
 
-bool CodeBlock::shouldJettisonDueToOldAge()
+bool CodeBlock::shouldJettisonDueToOldAge(const ConcurrentJSLocker&)
 {
     if (Heap::isMarkedConcurrently(this))
         return false;
@@ -2683,7 +2686,7 @@ static bool shouldMarkTransition(DFG::WeakReferenceTransition& transition)
 }
 #endif // ENABLE(DFG_JIT)
 
-void CodeBlock::propagateTransitions(SlotVisitor& visitor)
+void CodeBlock::propagateTransitions(const ConcurrentJSLocker&, SlotVisitor& visitor)
 {
     UNUSED_PARAM(visitor);
 
@@ -2764,7 +2767,7 @@ void CodeBlock::propagateTransitions(SlotVisitor& visitor)
         m_allTransitionsHaveBeenMarked = true;
 }
 
-void CodeBlock::determineLiveness(SlotVisitor& visitor)
+void CodeBlock::determineLiveness(const ConcurrentJSLocker&, SlotVisitor& visitor)
 {
     UNUSED_PARAM(visitor);
     
@@ -2810,9 +2813,9 @@ void CodeBlock::WeakReferenceHarvester::visitWeakReferences(SlotVisitor& visitor
     CodeBlock* codeBlock =
         bitwise_cast<CodeBlock*>(
             bitwise_cast<char*>(this) - OBJECT_OFFSETOF(CodeBlock, m_weakReferenceHarvester));
-
-    codeBlock->propagateTransitions(visitor);
-    codeBlock->determineLiveness(visitor);
+    
+    codeBlock->propagateTransitions(NoLockingNecessary, visitor);
+    codeBlock->determineLiveness(NoLockingNecessary, visitor);
 }
 
 void CodeBlock::clearLLIntGetByIdCache(Instruction* instruction)
@@ -2951,7 +2954,9 @@ void CodeBlock::UnconditionalFinalizer::finalizeUnconditionally()
 {
     CodeBlock* codeBlock = bitwise_cast<CodeBlock*>(
         bitwise_cast<char*>(this) - OBJECT_OFFSETOF(CodeBlock, m_unconditionalFinalizer));
-
+    
+    codeBlock->updateAllPredictions();
+    
 #if ENABLE(DFG_JIT)
     if (codeBlock->shouldJettisonDueToWeakReference()) {
         codeBlock->jettison(Profiler::JettisonDueToWeakReference);
@@ -2959,7 +2964,7 @@ void CodeBlock::UnconditionalFinalizer::finalizeUnconditionally()
     }
 #endif // ENABLE(DFG_JIT)
 
-    if (codeBlock->shouldJettisonDueToOldAge()) {
+    if (codeBlock->shouldJettisonDueToOldAge(NoLockingNecessary)) {
         codeBlock->jettison(Profiler::JettisonDueToOldAge);
         return;
     }
@@ -3099,7 +3104,7 @@ void CodeBlock::resetJITData()
 }
 #endif
 
-void CodeBlock::visitOSRExitTargets(SlotVisitor& visitor)
+void CodeBlock::visitOSRExitTargets(const ConcurrentJSLocker&, SlotVisitor& visitor)
 {
     // We strongly visit OSR exits targets because we don't want to deal with
     // the complexity of generating an exit target CodeBlock on demand and
@@ -3119,7 +3124,7 @@ void CodeBlock::visitOSRExitTargets(SlotVisitor& visitor)
 #endif
 }
 
-void CodeBlock::stronglyVisitStrongReferences(SlotVisitor& visitor)
+void CodeBlock::stronglyVisitStrongReferences(const ConcurrentJSLocker& locker, SlotVisitor& visitor)
 {
     visitor.append(&m_globalObject);
     visitor.append(&m_ownerExecutable);
@@ -3141,13 +3146,11 @@ void CodeBlock::stronglyVisitStrongReferences(SlotVisitor& visitor)
 
 #if ENABLE(DFG_JIT)
     if (JITCode::isOptimizingJIT(jitType()))
-        visitOSRExitTargets(visitor);
+        visitOSRExitTargets(locker, visitor);
 #endif
-
-    updateAllPredictions();
 }
 
-void CodeBlock::stronglyVisitWeakReferences(SlotVisitor& visitor)
+void CodeBlock::stronglyVisitWeakReferences(const ConcurrentJSLocker&, SlotVisitor& visitor)
 {
     UNUSED_PARAM(visitor);
 
index 819834bb591a229766aec748d80c4418f0951b51..6bb91db52aefed72fb2e7e530854e6b1f1efe2b2 100644 (file)
@@ -836,7 +836,7 @@ public:
     // concurrent compilation threads finish what they're doing.
     mutable ConcurrentJSLock m_lock;
 
-    Atomic<bool> m_visitWeaklyHasBeenCalled;
+    bool m_visitWeaklyHasBeenCalled;
 
     bool m_shouldAlwaysBeInlined; // Not a bitfield because the JIT wants to store to it.
 
@@ -947,16 +947,16 @@ private:
     void dumpRareCaseProfile(PrintStream&, const char* name, RareCaseProfile*, bool& hasPrintedProfiling);
     void dumpArithProfile(PrintStream&, ArithProfile*, bool& hasPrintedProfiling);
 
-    bool shouldVisitStrongly();
+    bool shouldVisitStrongly(const ConcurrentJSLocker&);
     bool shouldJettisonDueToWeakReference();
-    bool shouldJettisonDueToOldAge();
+    bool shouldJettisonDueToOldAge(const ConcurrentJSLocker&);
     
-    void propagateTransitions(SlotVisitor&);
-    void determineLiveness(SlotVisitor&);
+    void propagateTransitions(const ConcurrentJSLocker&, SlotVisitor&);
+    void determineLiveness(const ConcurrentJSLocker&, SlotVisitor&);
         
-    void stronglyVisitStrongReferences(SlotVisitor&);
-    void stronglyVisitWeakReferences(SlotVisitor&);
-    void visitOSRExitTargets(SlotVisitor&);
+    void stronglyVisitStrongReferences(const ConcurrentJSLocker&, SlotVisitor&);
+    void stronglyVisitWeakReferences(const ConcurrentJSLocker&, SlotVisitor&);
+    void visitOSRExitTargets(const ConcurrentJSLocker&, SlotVisitor&);
 
     std::chrono::milliseconds timeSinceCreation()
     {
@@ -1079,7 +1079,7 @@ inline Register& ExecState::uncheckedR(VirtualRegister reg)
 
 inline void CodeBlock::clearVisitWeaklyHasBeenCalled()
 {
-    m_visitWeaklyHasBeenCalled.store(false, std::memory_order_relaxed);
+    m_visitWeaklyHasBeenCalled = false;
 }
 
 template <typename ExecutableType>
index ec07db2e0eb832360b8c819d647b4c588917b594..06c63b653123ceee095cf748632b03840b7d4180 100644 (file)
@@ -94,8 +94,7 @@ void CodeBlockSet::deleteUnmarkedAndUnreferenced(CollectionScope scope)
     }
 
     // Any remaining young CodeBlocks are live and need to be promoted to the set of old CodeBlocks.
-    if (scope == CollectionScope::Eden)
-        promoteYoungCodeBlocks(locker);
+    promoteYoungCodeBlocks(locker);
 }
 
 bool CodeBlockSet::contains(const LockHolder&, void* candidateCodeBlock)
index 4540548d6a13dfcc3d2f182206a98032a275afae..124d36fa9c55237d90edce1eee47e8cad04569ba 100644 (file)
@@ -98,7 +98,7 @@ public:
         
         if (Options::logGC()) {
             double thisPauseMS = (MonotonicTime::now() - m_heap.m_stopTime).milliseconds();
-            dataLog(thisPauseMS, " ms (max ", maxPauseMS(thisPauseMS), ")...]\n");
+            dataLog("p=", thisPauseMS, " ms (max ", maxPauseMS(thisPauseMS), ")...]\n");
         }
         
         m_heap.resumeTheWorld();
@@ -542,19 +542,40 @@ void Heap::markToFixpoint(double gcStartTime)
     const Seconds period = Seconds::fromMilliseconds(Options::concurrentGCPeriodMS());
     
     const double bytesAllocatedThisCycleAtTheBeginning = m_bytesAllocatedThisCycle;
-    const double bytesAllocatedThisCycleAtTheEnd = bytesAllocatedThisCycleAtTheBeginning * Options::concurrentGCHeadroomRatio();
+    const double bytesAllocatedThisCycleAtTheEnd =
+        Options::concurrentGCMaxHeadroom() *
+        std::max(
+            bytesAllocatedThisCycleAtTheBeginning,
+            static_cast<double>(m_maxEdenSize));
     
-    auto targetCollectorUtilization = [&] () -> double {
+    auto targetMutatorUtilization = [&] () -> double {
         double headroomFullness =
             (m_bytesAllocatedThisCycle - bytesAllocatedThisCycleAtTheBeginning) /
             (bytesAllocatedThisCycleAtTheEnd - bytesAllocatedThisCycleAtTheBeginning);
         
+        // headroomFullness can be NaN and other interesting things if
+        // bytesAllocatedThisCycleAtTheBeginning is zero. We see that in debug tests. This code
+        // defends against all floating point dragons.
+        
         if (!(headroomFullness >= 0))
             headroomFullness = 0;
         if (!(headroomFullness <= 1))
             headroomFullness = 1;
         
-        return headroomFullness;
+        double mutatorUtilization = 1 - headroomFullness;
+        
+        // Scale the mutator utilization into the permitted window.
+        mutatorUtilization =
+            Options::minimumMutatorUtilization() +
+            mutatorUtilization * (
+                Options::maximumMutatorUtilization() -
+                Options::minimumMutatorUtilization());
+        
+        return mutatorUtilization;
+    };
+    
+    auto targetCollectorUtilization = [&] () -> double {
+        return 1 - targetMutatorUtilization();
     };
     
     auto elapsedInPeriod = [&] (MonotonicTime now) -> Seconds {
@@ -566,22 +587,41 @@ void Heap::markToFixpoint(double gcStartTime)
     };
     
     auto shouldBeResumed = [&] (MonotonicTime now) -> bool {
+        if (Options::collectorShouldResumeFirst())
+            return phase(now) <= targetMutatorUtilization();
         return phase(now) > targetCollectorUtilization();
     };
     
     auto timeToResume = [&] (MonotonicTime now) -> MonotonicTime {
         ASSERT(!shouldBeResumed(now));
+        if (Options::collectorShouldResumeFirst())
+            return now - elapsedInPeriod(now) + period;
         return now - elapsedInPeriod(now) + period * targetCollectorUtilization();
     };
     
     auto timeToStop = [&] (MonotonicTime now) -> MonotonicTime {
         ASSERT(shouldBeResumed(now));
-        return now - elapsedInPeriod(now) + period;
+        if (Options::collectorShouldResumeFirst())
+            return now - elapsedInPeriod(now) + period * targetMutatorUtilization();
+        return now -  - elapsedInPeriod(now) + period;
     };
     
+    // Adjust the target extra pause ratio as necessary.
+    double rateOfCollection =
+        (m_lastGCEndTime - m_lastGCStartTime) /
+        (m_currentGCStartTime - m_lastGCStartTime);
+    
+    if (Options::logGC())
+        dataLog("cr=", rateOfCollection, " ");
+    
+    // FIXME: Determine if this is useful or get rid of it.
+    // https://bugs.webkit.org/show_bug.cgi?id=164940
+    double extraPauseRatio = Options::initialExtraPauseRatio();
+    
     for (unsigned iteration = 1; ; ++iteration) {
         if (Options::logGC())
             dataLog("i#", iteration, " ");
+        MonotonicTime topOfLoop = MonotonicTime::now();
         {
             TimingScope preConvergenceTimingScope(*this, "Heap::markToFixpoint conservative scan");
             ConservativeRoots conservativeRoots(*this);
@@ -642,6 +682,9 @@ void Heap::markToFixpoint(double gcStartTime)
         DFG::markCodeBlocks(*m_vm, *m_collectorSlotVisitor);
         bool shouldTerminate = m_collectorSlotVisitor->isEmpty() && m_mutatorMarkStack->isEmpty();
         
+        if (Options::logGC())
+            dataLog(m_collectorSlotVisitor->collectorMarkStack().size(), "+", m_mutatorMarkStack->size() + m_collectorSlotVisitor->mutatorMarkStack().size(), ", a=", m_bytesAllocatedThisCycle / 1024, " kb, b=", m_barriersExecuted, ", mu=", targetMutatorUtilization(), " ");
+        
         // We want to do this to conservatively ensure that we rescan any code blocks that are
         // running right now. However, we need to be sure to do it *after* we mark the code block
         // so that we know for sure if it really needs a barrier. Also, this has to happen after the
@@ -661,23 +704,30 @@ void Heap::markToFixpoint(double gcStartTime)
         if (Options::logGC() == GCLogging::Verbose)
             dataLog("Live Weak Handles:\n", *m_collectorSlotVisitor);
         
-        if (Options::logGC())
-            dataLog(m_collectorSlotVisitor->collectorMarkStack().size(), "+", m_collectorSlotVisitor->mutatorMarkStack().size(), " mu=", 1 - targetCollectorUtilization(), " ");
+        MonotonicTime beforeConvergence = MonotonicTime::now();
         
         {
             TimingScope traceTimingScope(*this, "Heap::markToFixpoint tracing");
             ParallelModeEnabler enabler(*m_collectorSlotVisitor);
+            
             if (Options::useCollectorTimeslicing()) {
-                for (;;) {
+                // Before we yield to the mutator, we should do GC work proportional to the time we
+                // spent paused. We initialize the timeslicer to start after this "mandatory" pause
+                // completes.
+                
+                SlotVisitor::SharedDrainResult drainResult;
+                
+                Seconds extraPause = (beforeConvergence - topOfLoop) * extraPauseRatio;
+                initialTime = beforeConvergence + extraPause;
+                drainResult = m_collectorSlotVisitor->drainInParallel(initialTime);
+                
+                while (drainResult != SlotVisitor::SharedDrainResult::Done) {
                     MonotonicTime now = MonotonicTime::now();
-                    SlotVisitor::SharedDrainResult drainResult;
                     if (shouldBeResumed(now)) {
                         ResumeTheWorldScope resumeTheWorldScope(*this);
                         drainResult = m_collectorSlotVisitor->drainInParallel(timeToStop(now));
                     } else
                         drainResult = m_collectorSlotVisitor->drainInParallel(timeToResume(now));
-                    if (drainResult == SlotVisitor::SharedDrainResult::Done)
-                        break;
                 }
             } else {
                 // Disabling collector timeslicing is meant to be used together with
@@ -686,6 +736,8 @@ void Heap::markToFixpoint(double gcStartTime)
                 m_collectorSlotVisitor->drainInParallel();
             }
         }
+        
+        extraPauseRatio *= Options::extraPauseRatioIterationGrowthRate();
     }
 
     {
@@ -730,6 +782,7 @@ void Heap::beginMarking()
     m_objectSpace.beginMarking();
     m_mutatorShouldBeFenced = true;
     m_barrierThreshold = tautologicalThreshold;
+    m_barriersExecuted = 0;
 }
 
 void Heap::visitConservativeRoots(ConservativeRoots& roots)
@@ -986,6 +1039,7 @@ void Heap::addToRememberedSet(const JSCell* cell)
 {
     ASSERT(cell);
     ASSERT(!Options::useConcurrentJIT() || !isCompilationThread());
+    m_barriersExecuted++;
     if (!Heap::isMarkedConcurrently(cell)) {
         // During a full collection a store into an unmarked object that had surivived past
         // collections will manifest as a store to an unmarked black object. If the object gets
@@ -1082,6 +1136,8 @@ bool Heap::shouldCollectInThread(const LockHolder&)
 
 void Heap::collectInThread()
 {
+    m_currentGCStartTime = MonotonicTime::now();
+    
     Optional<CollectionScope> scope;
     {
         LockHolder locker(*m_threadLock);
@@ -1166,7 +1222,7 @@ void Heap::collectInThread()
     if (Options::logGC()) {
         MonotonicTime after = MonotonicTime::now();
         double thisPauseMS = (after - m_stopTime).milliseconds();
-        dataLog(thisPauseMS, " ms (max ", maxPauseMS(thisPauseMS), "), cycle ", (after - before).milliseconds(), " ms END]\n");
+        dataLog("p=", thisPauseMS, " ms (max ", maxPauseMS(thisPauseMS), "), cycle ", (after - before).milliseconds(), " ms END]\n");
     }
     
     {
@@ -1179,6 +1235,9 @@ void Heap::collectInThread()
 
     setNeedFinalize();
     resumeTheWorld();
+    
+    m_lastGCStartTime = m_currentGCStartTime;
+    m_lastGCEndTime = MonotonicTime::now();
 }
 
 void Heap::stopTheWorld()
@@ -1698,7 +1757,7 @@ void Heap::updateAllocationLimits()
     m_bytesAllocatedThisCycle = 0;
 
     if (Options::logGC())
-        dataLog(currentHeapSize / 1024, " kb, ");
+        dataLog("=> ", currentHeapSize / 1024, " kb, ");
 }
 
 void Heap::didFinishCollection(double gcStartTime)
index 355fcd87774bbcc3269cfcb7a7027947510c37cc..411f92a792d690843e1973a1e639934485a9fbe2 100644 (file)
@@ -116,6 +116,8 @@ public:
     
     void writeBarrierWithoutFence(const JSCell* from);
     
+    void mutatorFence();
+    
     // Take this if you know that from->cellState() < barrierThreshold.
     JS_EXPORT_PRIVATE void writeBarrierSlowPath(const JSCell* from);
 
@@ -605,6 +607,12 @@ private:
     Box<Lock> m_threadLock;
     RefPtr<AutomaticThreadCondition> m_threadCondition; // The mutator must not wait on this. It would cause a deadlock.
     RefPtr<AutomaticThread> m_thread;
+    
+    MonotonicTime m_lastGCStartTime;
+    MonotonicTime m_lastGCEndTime;
+    MonotonicTime m_currentGCStartTime;
+    
+    uintptr_t m_barriersExecuted { 0 };
 };
 
 } // namespace JSC
index 1fa9d859a794b503ce6a0ba5555c7d4a49ad4241..6f27b30660e41732cffde1c91b77f566451a126f 100644 (file)
@@ -157,6 +157,12 @@ inline void Heap::writeBarrierWithoutFence(const JSCell* from)
         addToRememberedSet(from);
 }
 
+inline void Heap::mutatorFence()
+{
+    if (isX86() || UNLIKELY(mutatorShouldBeFenced()))
+        WTF::storeStoreFence();
+}
+
 template<typename Functor> inline void Heap::forEachCodeBlock(const Functor& func)
 {
     forEachCodeBlockImpl(scopedLambdaRef<bool(CodeBlock*)>(func));
index 89d8fd1090ce9e1908172e2684773b11e70b277b..b04102c9e244b98bedccb0a4dabf39921d7ba6ba 100644 (file)
@@ -34,6 +34,8 @@
 
 namespace JSC {
 
+const size_t MarkedBlock::blockSize;
+
 static const bool computeBalance = false;
 static size_t balance;
 
index 4ee8fbd0f4269cc8e6f7a5a642b6052de1033975..a9519041d66bdcaf65c3d99931519240bf013a52 100644 (file)
@@ -60,8 +60,7 @@ inline void JSCell::finishCreation(VM& vm)
 {
     // This object is ready to be escaped so the concurrent GC may see it at any time. We have
     // to make sure that none of our stores sink below here.
-    if (isX86() || UNLIKELY(vm.heap.mutatorShouldBeFenced()))
-        WTF::storeStoreFence();
+    vm.heap.mutatorFence();
 #if ENABLE(GC_VALIDATION)
     ASSERT(vm.isInitializingObject());
     vm.setInitializingObjectClass(0);
index 179bcf841674dd4d2f6a27b777862abee6fa9548..613efc5410b3eb2d210ca07819c302bb2e8021bc 100644 (file)
@@ -167,9 +167,9 @@ inline void JSObject::putDirectWithoutTransition(VM& vm, PropertyName propertyNa
     unsigned oldOutOfLineCapacity = structure->outOfLineCapacity();
     structure->addPropertyWithoutTransition(
         vm, propertyName, attributes,
-        [&] (const GCSafeConcurrentJSLocker&, PropertyOffset offset) {
-            if (structure->outOfLineCapacity() != oldOutOfLineCapacity) {
-                butterfly = allocateMoreOutOfLineStorage(vm, oldOutOfLineCapacity, structure->outOfLineCapacity());
+        [&] (const GCSafeConcurrentJSLocker&, PropertyOffset offset, unsigned newOutOfLineCapacity) {
+            if (newOutOfLineCapacity != oldOutOfLineCapacity) {
+                butterfly = allocateMoreOutOfLineStorage(vm, oldOutOfLineCapacity, newOutOfLineCapacity);
                 WTF::storeStoreFence();
                 setButterfly(vm, butterfly);
             }
@@ -266,15 +266,14 @@ ALWAYS_INLINE bool JSObject::putDirectInternal(VM& vm, PropertyName propertyName
         unsigned oldOutOfLineCapacity = structure->outOfLineCapacity();
         offset = structure->addPropertyWithoutTransition(
             vm, propertyName, attributes,
-            [&] (const GCSafeConcurrentJSLocker&, PropertyOffset offset) {
+            [&] (const GCSafeConcurrentJSLocker&, PropertyOffset offset, unsigned newOutOfLineCapacity) {
                 Butterfly* butterfly = this->butterfly();
-                if (structure->outOfLineCapacity() != oldOutOfLineCapacity) {
-                    butterfly = allocateMoreOutOfLineStorage(vm, oldOutOfLineCapacity, structure->outOfLineCapacity());
+                if (newOutOfLineCapacity != oldOutOfLineCapacity) {
+                    butterfly = allocateMoreOutOfLineStorage(vm, oldOutOfLineCapacity, newOutOfLineCapacity);
                     WTF::storeStoreFence();
                     setButterfly(vm, butterfly);
                 }
                 validateOffset(offset);
-                ASSERT(structure->isValidOffset(offset));
                 putDirect(vm, offset, value);
             });
 
index 7c3e9410d4183237cc889b570e9af5ef3d9e91d5..092a812de826da4309483537900a7c506006d035 100644 (file)
@@ -195,8 +195,13 @@ typedef const char* optionString;
     v(double, mediumHeapGrowthFactor, 1.5, Normal, nullptr) \
     v(double, largeHeapGrowthFactor, 1.24, Normal, nullptr) \
     v(bool, useCollectorTimeslicing, true, Normal, nullptr) \
-    v(double, concurrentGCHeadroomRatio, 1.5, Normal, nullptr) \
+    v(double, minimumMutatorUtilization, 0, Normal, nullptr) \
+    v(double, maximumMutatorUtilization, 0.7, Normal, nullptr) \
+    v(double, concurrentGCMaxHeadroom, 1.5, Normal, nullptr) \
     v(double, concurrentGCPeriodMS, 2, Normal, nullptr) \
+    v(double, initialExtraPauseRatio, 0, Normal, nullptr) \
+    v(double, extraPauseRatioIterationGrowthRate, 1.1, Normal, nullptr) \
+    v(bool, collectorShouldResumeFirst, false, Normal, nullptr) \
     v(bool, scribbleFreeCells, false, Normal, nullptr) \
     v(double, sizeClassProgression, 1.4, Normal, nullptr) \
     v(unsigned, largeAllocationCutoff, 100000, Normal, nullptr) \
index 353d859ff69451b56b6e508766f8057b90c95088..279729a4d49c31527f3f66479f0db5da67d5fa66 100644 (file)
@@ -943,7 +943,7 @@ Vector<PropertyMapEntry> Structure::getPropertiesConcurrently()
 
 PropertyOffset Structure::add(VM& vm, PropertyName propertyName, unsigned attributes)
 {
-    return add(vm, propertyName, attributes, [] (const GCSafeConcurrentJSLocker&, PropertyOffset) { });
+    return add(vm, propertyName, attributes, [] (const GCSafeConcurrentJSLocker&, PropertyOffset, unsigned) { });
 }
 
 PropertyOffset Structure::remove(PropertyName propertyName)
index 36fb34a023bd320a66bdfd44799c2220018b7d19..413d3eba12a9dcbe9e0bbbf106f694226104cb12 100644 (file)
@@ -294,26 +294,11 @@ public:
 
     unsigned outOfLineCapacity() const
     {
-        ASSERT(checkOffsetConsistency());
-            
-        unsigned outOfLineSize = this->outOfLineSize();
-
-        if (!outOfLineSize)
-            return 0;
-
-        if (outOfLineSize <= initialOutOfLineCapacity)
-            return initialOutOfLineCapacity;
-
-        ASSERT(outOfLineSize > initialOutOfLineCapacity);
-        COMPILE_ASSERT(outOfLineGrowthFactor == 2, outOfLineGrowthFactor_is_two);
-        return WTF::roundUpToPowerOfTwo(outOfLineSize);
+        return outOfLineCapacity(m_offset);
     }
     unsigned outOfLineSize() const
     {
-        ASSERT(checkOffsetConsistency());
-        ASSERT(structure()->classInfo() == info());
-            
-        return numberOfOutOfLineSlotsForLastOffset(m_offset);
+        return outOfLineSize(m_offset);
     }
     bool hasInlineStorage() const
     {
@@ -636,6 +621,33 @@ private:
     void findStructuresAndMapForMaterialization(Vector<Structure*, 8>& structures, Structure*&, PropertyTable*&);
     
     static Structure* toDictionaryTransition(VM&, Structure*, DictionaryKind, DeferredStructureTransitionWatchpointFire* = nullptr);
+    
+    unsigned outOfLineCapacity(PropertyOffset lastOffset) const
+    {
+        unsigned outOfLineSize = this->outOfLineSize(lastOffset);
+
+        // This algorithm completely determines the out-of-line property storage growth algorithm.
+        // The JSObject code will only trigger a resize if the value returned by this algorithm
+        // changed between the new and old structure. So, it's important to keep this simple because
+        // it's on a fast path.
+        
+        if (!outOfLineSize)
+            return 0;
+
+        if (outOfLineSize <= initialOutOfLineCapacity)
+            return initialOutOfLineCapacity;
+
+        ASSERT(outOfLineSize > initialOutOfLineCapacity);
+        COMPILE_ASSERT(outOfLineGrowthFactor == 2, outOfLineGrowthFactor_is_two);
+        return WTF::roundUpToPowerOfTwo(outOfLineSize);
+    }
+    
+    unsigned outOfLineSize(PropertyOffset lastOffset) const
+    {
+        ASSERT(structure()->classInfo() == info());
+            
+        return numberOfOutOfLineSlotsForLastOffset(lastOffset);
+    }
 
     template<typename Func>
     PropertyOffset add(VM&, PropertyName, unsigned attributes, const Func&);
index 1dba9c762d8e441e1be541b163bd5ab94c86ede1..2a01c3a7cac2618de7a6d25031504a57aa6bd371 100644 (file)
@@ -304,11 +304,14 @@ inline PropertyOffset Structure::add(VM& vm, PropertyName propertyName, unsigned
 
     PropertyOffset newOffset = table->nextOffset(m_inlineCapacity);
     
-    table->add(PropertyMapEntry(rep, newOffset, attributes), m_offset, PropertyTable::PropertyOffsetMayChange);
+    PropertyOffset newLastOffset = m_offset;
+    table->add(PropertyMapEntry(rep, newOffset, attributes), newLastOffset, PropertyTable::PropertyOffsetMayChange);
     
-    checkConsistency();
+    func(locker, newOffset, outOfLineCapacity(newLastOffset));
+    vm.heap.mutatorFence();
+    m_offset = newLastOffset;
 
-    func(locker, newOffset);
+    checkConsistency();
     return newOffset;
 }