Lightweight locks should be adaptive
authorfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Fri, 7 Aug 2015 00:49:54 +0000 (00:49 +0000)
committerfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Fri, 7 Aug 2015 00:49:54 +0000 (00:49 +0000)
https://bugs.webkit.org/show_bug.cgi?id=147545

Reviewed by Geoffrey Garen.

Source/JavaScriptCore:

* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* runtime/TypeProfilerLog.h:

Source/WebCore:

No new tests because no new behavior.

* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):

Source/WebKit2:

* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:

Source/WTF:

A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.

But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.

The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.

On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:

- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
  than a system mutex.

- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
  than a system mutex.

- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
  SpinLock.

- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
  a SpinLock.

This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.

[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf

* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):

Tools:

* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188100 268f45cc-cd09-0410-ab3c-d52691b4dbfc

44 files changed:
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/dfg/DFGCommon.cpp
Source/JavaScriptCore/heap/CopiedBlock.h
Source/JavaScriptCore/heap/CopiedBlockInlines.h
Source/JavaScriptCore/heap/CopiedSpace.cpp
Source/JavaScriptCore/heap/CopiedSpace.h
Source/JavaScriptCore/heap/CopiedSpaceInlines.h
Source/JavaScriptCore/heap/GCThreadSharedData.cpp
Source/JavaScriptCore/heap/GCThreadSharedData.h
Source/JavaScriptCore/heap/ListableHandler.h
Source/JavaScriptCore/heap/MachineStackMarker.cpp
Source/JavaScriptCore/heap/SlotVisitorInlines.h
Source/JavaScriptCore/parser/SourceProvider.cpp
Source/JavaScriptCore/profiler/ProfilerDatabase.cpp
Source/JavaScriptCore/runtime/TypeProfilerLog.h
Source/WTF/ChangeLog
Source/WTF/WTF.vcxproj/WTF.vcxproj
Source/WTF/WTF.xcodeproj/project.pbxproj
Source/WTF/benchmarks/LockSpeedTest.cpp [new file with mode: 0644]
Source/WTF/wtf/CMakeLists.txt
Source/WTF/wtf/Lock.cpp [new file with mode: 0644]
Source/WTF/wtf/Lock.h [new file with mode: 0644]
Source/WTF/wtf/MetaAllocator.cpp
Source/WTF/wtf/MetaAllocator.h
Source/WTF/wtf/SpinLock.h
Source/WTF/wtf/ThreadingPthreads.cpp
Source/WTF/wtf/ThreadingWin.cpp
Source/WTF/wtf/text/AtomicString.cpp
Source/WTF/wtf/text/AtomicStringImpl.cpp
Source/WebCore/ChangeLog
Source/WebCore/bindings/objc/WebScriptObject.mm
Source/WebCore/platform/audio/mac/CARingBuffer.cpp
Source/WebCore/platform/audio/mac/CARingBuffer.h
Source/WebCore/platform/ios/wak/WAKWindow.mm
Source/WebKit2/ChangeLog
Source/WebKit2/WebProcess/WebPage/EventDispatcher.cpp
Source/WebKit2/WebProcess/WebPage/EventDispatcher.h
Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.cpp
Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.h
Tools/ChangeLog
Tools/TestWebKitAPI/CMakeLists.txt
Tools/TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj
Tools/TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj
Tools/TestWebKitAPI/Tests/WTF/Lock.cpp [new file with mode: 0644]

index d9600cc02e7810fcedd441366c6b806093c29a2e..99548ef0c815fd1aa7e5e2c6af1532737d913e2e 100644 (file)
@@ -1,3 +1,28 @@
+2015-08-05  Filip Pizlo  <fpizlo@apple.com>
+
+        Lightweight locks should be adaptive
+        https://bugs.webkit.org/show_bug.cgi?id=147545
+
+        Reviewed by Geoffrey Garen.
+
+        * heap/CopiedBlock.h:
+        (JSC::CopiedBlock::workListLock):
+        * heap/CopiedBlockInlines.h:
+        (JSC::CopiedBlock::shouldReportLiveBytes):
+        (JSC::CopiedBlock::reportLiveBytes):
+        * heap/CopiedSpace.h:
+        (JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
+        * heap/CopiedSpaceInlines.h:
+        (JSC::CopiedSpace::recycleEvacuatedBlock):
+        * heap/GCThreadSharedData.h:
+        (JSC::GCThreadSharedData::getNextBlocksToCopy):
+        * heap/ListableHandler.h:
+        (JSC::ListableHandler::List::addThreadSafe):
+        (JSC::ListableHandler::List::addNotThreadSafe):
+        * heap/SlotVisitorInlines.h:
+        (JSC::SlotVisitor::copyLater):
+        * runtime/TypeProfilerLog.h:
+
 2015-08-06  Sukolsak Sakshuwong  <sukolsak@gmail.com>
 
         Parse the entire WebAssembly modules
index 96ac252df6ff16d85620ac48b68e4b68d8cc1810..cd2a12c73c86fcb77f799b2332acbb8ec674f4ae 100644 (file)
@@ -34,7 +34,7 @@
 
 namespace JSC { namespace DFG {
 
-static StaticSpinLock crashLock;
+static StaticLock crashLock;
 
 void startCrashing()
 {
index f636529a43cb63addbfb7ad45188ec27d7c6284e..3be585c93936f4227dd104eda2e0ed9ba35cf175 100644 (file)
@@ -31,7 +31,7 @@
 #include "Options.h"
 #include <wtf/Atomics.h>
 #include <wtf/DoublyLinkedList.h>
-#include <wtf/SpinLock.h>
+#include <wtf/Lock.h>
 
 namespace JSC {
 
@@ -54,8 +54,8 @@ public:
     void didPromote();
 
     unsigned liveBytes();
-    bool shouldReportLiveBytes(SpinLockHolder&, JSCell* owner);
-    void reportLiveBytes(SpinLockHolder&, JSCell*, CopyToken, unsigned);
+    bool shouldReportLiveBytes(LockHolder&, JSCell* owner);
+    void reportLiveBytes(LockHolder&, JSCell*, CopyToken, unsigned);
     void reportLiveBytesDuringCopying(unsigned);
     void didSurviveGC();
     void didEvacuateBytes(unsigned);
@@ -85,7 +85,7 @@ public:
 
     bool hasWorkList();
     CopyWorkList& workList();
-    SpinLock& workListLock() { return m_workListLock; }
+    Lock& workListLock() { return m_workListLock; }
 
 private:
     CopiedBlock(size_t);
@@ -98,7 +98,7 @@ private:
 
     size_t m_capacity;
 
-    SpinLock m_workListLock;
+    Lock m_workListLock;
     std::unique_ptr<CopyWorkList> m_workList;
 
     size_t m_remaining;
index 2db0cd12ac84bf128dc6b5d7123dbb326fedb011..3d1133413c81c1be67e7a86800f68e7f6088b471 100644 (file)
@@ -33,7 +33,7 @@
 
 namespace JSC {
     
-inline bool CopiedBlock::shouldReportLiveBytes(SpinLockHolder&, JSCell* owner)
+inline bool CopiedBlock::shouldReportLiveBytes(LockHolder&, JSCell* owner)
 {
     // We want to add to live bytes if the owner isn't part of the remembered set or
     // if this block was allocated during the last cycle. 
@@ -43,7 +43,7 @@ inline bool CopiedBlock::shouldReportLiveBytes(SpinLockHolder&, JSCell* owner)
     return !Heap::isRemembered(owner) || !m_isOld;
 }
 
-inline void CopiedBlock::reportLiveBytes(SpinLockHolder&, JSCell* owner, CopyToken token, unsigned bytes)
+inline void CopiedBlock::reportLiveBytes(LockHolder&, JSCell* owner, CopyToken token, unsigned bytes)
 {
     checkConsistency();
 #ifndef NDEBUG
index 9592eaab9a5837fe6fa4ae683bd3a23eed7953d9..3f265e7ec9d26af51064d7ddd764238903f302a6 100644 (file)
@@ -191,7 +191,7 @@ void CopiedSpace::doneFillingBlock(CopiedBlock* block, CopiedBlock** exchange)
 
     {
         // Always put the block into the old gen because it's being promoted!
-        SpinLockHolder locker(&m_toSpaceLock);
+        LockHolder locker(&m_toSpaceLock);
         m_oldGen.toSpace->push(block);
         m_blockSet.add(block);
         m_oldGen.blockFilter.add(reinterpret_cast<Bits>(block));
index db1be188f00af4d315e742e373a32dcbc453134e..d18d1fcc44faad7485a77e1db9e709c9ab644dab 100644 (file)
@@ -33,9 +33,9 @@
 #include <wtf/CheckedBoolean.h>
 #include <wtf/DoublyLinkedList.h>
 #include <wtf/HashSet.h>
+#include <wtf/Lock.h>
 #include <wtf/OSAllocator.h>
 #include <wtf/PageBlock.h>
-#include <wtf/SpinLock.h>
 #include <wtf/StdLibExtras.h>
 #include <wtf/ThreadingPrimitives.h>
 
@@ -113,7 +113,7 @@ private:
 
     HashSet<CopiedBlock*> m_blockSet;
 
-    SpinLock m_toSpaceLock;
+    Lock m_toSpaceLock;
 
     struct CopiedGeneration {
         CopiedGeneration()
index 3faf342434ab4dedfd404b34e4f3ae75c92ad9e6..8f4058b12b96edd5d4cff1e4cc44813eae13e7c8 100644 (file)
@@ -98,7 +98,7 @@ inline void CopiedSpace::recycleEvacuatedBlock(CopiedBlock* block, HeapOperation
     ASSERT(block->canBeRecycled());
     ASSERT(!block->m_isPinned);
     {
-        SpinLockHolder locker(&m_toSpaceLock);
+        LockHolder locker(&m_toSpaceLock);
         m_blockSet.remove(block);
         if (collectionType == EdenCollection)
             m_newGen.fromSpace->remove(block);
index 2286370f1dd67d2057877c3eefc5981e809ad720..824b1ac3a60e3ef902adee0e2939c1322de2d7c0 100644 (file)
@@ -178,7 +178,7 @@ void GCThreadSharedData::didFinishMarking()
 void GCThreadSharedData::didStartCopying()
 {
     {
-        SpinLockHolder locker(&m_copyLock);
+        LockHolder locker(&m_copyLock);
         if (m_vm->heap.operationInProgress() == EdenCollection) {
             // Reset the vector to be empty, but don't throw away the backing store.
             m_blocksToCopy.shrink(0);
index d6c960aa4a328de5cf9b583c971689a7e0a4b891..0c5ad1ded8f3e51222e7cf1a66aca4ef1b32a139 100644 (file)
@@ -33,7 +33,7 @@
 #include "WeakReferenceHarvester.h"
 #include <condition_variable>
 #include <wtf/HashSet.h>
-#include <wtf/SpinLock.h>
+#include <wtf/Lock.h>
 #include <wtf/Vector.h>
 
 namespace JSC {
@@ -97,7 +97,7 @@ private:
     std::mutex m_opaqueRootsMutex;
     HashSet<void*> m_opaqueRoots;
 
-    SpinLock m_copyLock;
+    Lock m_copyLock;
     Vector<CopiedBlock*> m_blocksToCopy;
     size_t m_copyIndex;
     static const size_t s_blockFragmentLength = 32;
@@ -115,7 +115,7 @@ private:
 
 inline void GCThreadSharedData::getNextBlocksToCopy(size_t& start, size_t& end)
 {
-    SpinLockHolder locker(&m_copyLock);
+    LockHolder locker(&m_copyLock);
     start = m_copyIndex;
     end = std::min(m_blocksToCopy.size(), m_copyIndex + s_blockFragmentLength);
     m_copyIndex = end;
index 560a3562b83d6f3a2b28237a97f617c2ce6aa0a1..d653f0513134d084d529bf3d99f6cbe64a1336da 100644 (file)
@@ -21,9 +21,9 @@
 #define ListableHandler_h
 
 #include <stdint.h>
+#include <wtf/Lock.h>
 #include <wtf/Locker.h>
 #include <wtf/Noncopyable.h>
-#include <wtf/SpinLock.h>
 #include <wtf/ThreadingPrimitives.h>
 
 namespace JSC {
@@ -65,7 +65,7 @@ private:
         
         void addThreadSafe(T* handler)
         {
-            SpinLockHolder locker(&m_lock);
+            LockHolder locker(&m_lock);
             addNotThreadSafe(handler);
         }
         
@@ -103,7 +103,7 @@ private:
             m_first = handler;
         }
         
-        SpinLock m_lock;
+        Lock m_lock;
         T* m_first;
     };
     
index 4ae16a9cad9091ea7e178574ae9061ece62bee65..168f82a4e582735597da1a018dc69aebe36aaf33 100644 (file)
@@ -568,8 +568,8 @@ bool MachineThreads::tryCopyOtherThreadStacks(MutexLocker&, void* buffer, size_t
 {
     // Prevent two VMs from suspending each other's threads at the same time,
     // which can cause deadlock: <rdar://problem/20300842>.
-    static StaticSpinLock mutex;
-    std::lock_guard<StaticSpinLock> lock(mutex);
+    static StaticLock mutex;
+    std::lock_guard<StaticLock> lock(mutex);
 
     *size = 0;
 
index 1d688f850ebde968effdd8f3fe04e93ef0aa2fab..548b99df21da8a872ae52ace9783072d7cc6fa43 100644 (file)
@@ -250,7 +250,7 @@ inline void SlotVisitor::copyLater(JSCell* owner, CopyToken token, void* ptr, si
 
     ASSERT(heap()->m_storageSpace.contains(block));
 
-    SpinLockHolder locker(&block->workListLock());
+    LockHolder locker(&block->workListLock());
     if (heap()->operationInProgress() == FullCollection || block->shouldReportLiveBytes(locker, owner)) {
         m_bytesCopied += bytes;
         block->reportLiveBytes(locker, owner, token, bytes);
index c9652459a117f9bda578c72f17c4d3426f69127d..19ffb3372906e0a75842f51fc0bddbac03bbe979 100644 (file)
@@ -27,7 +27,7 @@
 #include "SourceProvider.h"
 
 #include "JSCInlines.h"
-#include <wtf/SpinLock.h>
+#include <wtf/Lock.h>
 #include <wtf/StdLibExtras.h>
 
 namespace JSC {
@@ -44,11 +44,11 @@ SourceProvider::~SourceProvider()
 {
 }
 
-static StaticSpinLock providerIdLock;
+static StaticLock providerIdLock;
 
 void SourceProvider::getID()
 {
-    SpinLockHolder lock(&providerIdLock);
+    LockHolder lock(&providerIdLock);
     if (!m_id) {
         static intptr_t nextProviderID = 0;
         m_id = ++nextProviderID;
index 236a5c8c8189d9ceb9292b5e87dc5737f51324f7..d78305a190ac88eb7902d0280a817349a7347ea2 100644 (file)
@@ -35,7 +35,7 @@ namespace JSC { namespace Profiler {
 
 static std::atomic<int> databaseCounter;
 
-static StaticSpinLock registrationLock;
+static StaticLock registrationLock;
 static std::atomic<int> didRegisterAtExit;
 static Database* firstDatabase;
 
@@ -138,14 +138,14 @@ void Database::addDatabaseToAtExit()
     if (++didRegisterAtExit == 1)
         atexit(atExitCallback);
     
-    SpinLockHolder holder(registrationLock);
+    LockHolder holder(registrationLock);
     m_nextRegisteredDatabase = firstDatabase;
     firstDatabase = this;
 }
 
 void Database::removeDatabaseFromAtExit()
 {
-    SpinLockHolder holder(registrationLock);
+    LockHolder holder(registrationLock);
     for (Database** current = &firstDatabase; *current; current = &(*current)->m_nextRegisteredDatabase) {
         if (*current != this)
             continue;
@@ -163,7 +163,7 @@ void Database::performAtExitSave() const
 
 Database* Database::removeFirstAtExitDatabase()
 {
-    SpinLockHolder holder(registrationLock);
+    LockHolder holder(registrationLock);
     Database* result = firstDatabase;
     if (result) {
         firstDatabase = result->m_nextRegisteredDatabase;
index b3a1948942a14e957b6f3399bc0326594b29c56a..1dd188ab7e3ce05a8d191a7ae8c46bfacb49e79e 100644 (file)
@@ -32,7 +32,6 @@
 #include "JSCJSValue.h"
 #include "Structure.h"
 #include "TypeProfiler.h"
-#include <wtf/ByteSpinLock.h>
 
 namespace JSC {
 
index 721faa1a93b30c75a7a88e962a2e82e72ffbec44..70ec738cf906ddfd8f483ad21ecc1369e4d3fa6c 100644 (file)
@@ -1,3 +1,97 @@
+2015-08-05  Filip Pizlo  <fpizlo@apple.com>
+
+        Lightweight locks should be adaptive
+        https://bugs.webkit.org/show_bug.cgi?id=147545
+
+        Reviewed by Geoffrey Garen.
+
+        A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
+        overhead is lower than system locks and because they take dramatically less space than system
+        locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
+        acquire is up to 10x faster and under microcontention - short critical section with two or
+        more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
+        bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
+        should continue to avoid system locks - they are just far too slow and far too big.
+
+        But there is a problem with this idiom. System lock implementations will sleep a thread when
+        it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
+        In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
+        microcontention, but awful when the lock will not be released for a while. In fact, when
+        critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
+        almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
+        frequently in our current uses of spinlocks, but that's probably because right now there are
+        places where we make a conscious decision to use system locks - even though they use more
+        memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
+        while to acquire the lock.
+
+        The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
+        concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
+        that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
+        idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
+        the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
+        put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
+        the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
+        (not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
+        lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
+        if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
+        queue for critical sections that are held for a long time.
+
+        On a locking microbenchmark, this new Lock exhibits the following performance
+        characteristics:
+
+        - Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
+          than a system mutex.
+
+        - Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
+          than a system mutex.
+
+        - CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
+          SpinLock.
+
+        - Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
+          a SpinLock.
+
+        This patch replaces all uses of SpinLock with Lock, since our critical sections are not
+        no-ops so if you do basically anything in your critical section, the Lock overhead will be
+        invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
+        instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
+        as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
+        This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
+        of having a lock that just uses a byte are still better than the CPU wastage benefits of
+        Lock. But, this work will enable some future work to create locks that will fit in just 1.6
+        bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
+
+        [1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
+
+        * WTF.vcxproj/WTF.vcxproj:
+        * WTF.xcodeproj/project.pbxproj:
+        * benchmarks: Added.
+        * benchmarks/LockSpeedTest.cpp: Added.
+        (main):
+        * wtf/CMakeLists.txt:
+        * wtf/Lock.cpp: Added.
+        (WTF::LockBase::lockSlow):
+        (WTF::LockBase::unlockSlow):
+        * wtf/Lock.h: Added.
+        (WTF::LockBase::lock):
+        (WTF::LockBase::unlock):
+        (WTF::LockBase::isHeld):
+        (WTF::Lock::Lock):
+        * wtf/MetaAllocator.cpp:
+        (WTF::MetaAllocator::release):
+        (WTF::MetaAllocatorHandle::shrink):
+        (WTF::MetaAllocator::allocate):
+        (WTF::MetaAllocator::currentStatistics):
+        (WTF::MetaAllocator::addFreshFreeSpace):
+        (WTF::MetaAllocator::debugFreeSpaceSize):
+        * wtf/MetaAllocator.h:
+        * wtf/SpinLock.h:
+        * wtf/ThreadingPthreads.cpp:
+        * wtf/ThreadingWin.cpp:
+        * wtf/text/AtomicString.cpp:
+        * wtf/text/AtomicStringImpl.cpp:
+        (WTF::AtomicStringTableLocker::AtomicStringTableLocker):
+
 2015-08-05  Filip Pizlo  <fpizlo@apple.com>
 
         Unreviewed, roll out http://trac.webkit.org/changeset/187972.
index 5a2d25333384f5783c3fe80f2dd861ef473c11ff..a481216f7250ee5354f4b1578f307d8de3e212c2 100644 (file)
@@ -1,4 +1,4 @@
-<?xml version="1.0" encoding="utf-8"?>
+<?xml version="1.0" encoding="utf-8"?>
 <Project DefaultTargets="Build" ToolsVersion="12.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
   <ItemGroup Label="ProjectConfigurations">
     <ProjectConfiguration Include="DebugSuffix|Win32">
     <ClCompile Include="..\wtf\glib\GThreadSafeMainLoopSource.cpp" />
     <ClCompile Include="..\wtf\GregorianDateTime.cpp" />
     <ClCompile Include="..\wtf\HashTable.cpp" />
+    <ClCompile Include="..\wtf\Lock.cpp" />
     <ClCompile Include="..\wtf\MainThread.cpp" />
     <ClCompile Include="..\wtf\MD5.cpp" />
     <ClCompile Include="..\wtf\MediaTime.cpp" />
     <ClInclude Include="..\wtf\IteratorAdaptors.h" />
     <ClInclude Include="..\wtf\IteratorRange.h" />
     <ClInclude Include="..\wtf\ListHashSet.h" />
+    <ClInclude Include="..\wtf\Lock.h" />
     <ClInclude Include="..\wtf\Locker.h" />
     <ClInclude Include="..\wtf\MainThread.h" />
     <ClInclude Include="..\wtf\MallocPtr.h" />
index 1557746b4772ca9e8a697871bd9c3ef354328425..9c068b08c9e7020f941a8814af72e9c7acebe2fe 100644 (file)
@@ -40,6 +40,8 @@
                0FD81AC5154FB22E00983E72 /* FastBitVector.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FD81AC4154FB22E00983E72 /* FastBitVector.h */; settings = {ATTRIBUTES = (); }; };
                0FDDBFA71666DFA300C55FEF /* StringPrintStream.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FDDBFA51666DFA300C55FEF /* StringPrintStream.cpp */; };
                0FDDBFA81666DFA300C55FEF /* StringPrintStream.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FDDBFA61666DFA300C55FEF /* StringPrintStream.h */; };
+               0FE1646A1B6FFC9600400E7C /* Lock.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE164681B6FFC9600400E7C /* Lock.cpp */; };
+               0FE1646B1B6FFC9600400E7C /* Lock.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FE164691B6FFC9600400E7C /* Lock.h */; };
                0FED67B61B22D4D80066CE15 /* TinyPtrSet.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FED67B51B22D4D80066CE15 /* TinyPtrSet.h */; };
                14022F4118F5C3FC007FF0EB /* libbmalloc.a in Frameworks */ = {isa = PBXBuildFile; fileRef = 14022F4018F5C3FC007FF0EB /* libbmalloc.a */; };
                143F611F1565F0F900DB514A /* RAMSize.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 143F611D1565F0F900DB514A /* RAMSize.cpp */; };
                0FD81AC4154FB22E00983E72 /* FastBitVector.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = FastBitVector.h; sourceTree = "<group>"; };
                0FDDBFA51666DFA300C55FEF /* StringPrintStream.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = StringPrintStream.cpp; sourceTree = "<group>"; };
                0FDDBFA61666DFA300C55FEF /* StringPrintStream.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = StringPrintStream.h; sourceTree = "<group>"; };
+               0FE164681B6FFC9600400E7C /* Lock.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = Lock.cpp; sourceTree = "<group>"; };
+               0FE164691B6FFC9600400E7C /* Lock.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = Lock.h; sourceTree = "<group>"; };
                0FEC3EE4171B834700FDAC8D /* ByteSpinLock.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; path = ByteSpinLock.h; sourceTree = "<group>"; };
                0FED67B51B22D4D80066CE15 /* TinyPtrSet.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = TinyPtrSet.h; sourceTree = "<group>"; };
                14022F4018F5C3FC007FF0EB /* libbmalloc.a */ = {isa = PBXFileReference; lastKnownFileType = archive.ar; path = libbmalloc.a; sourceTree = BUILT_PRODUCTS_DIR; };
                                7CDD7FF9186D2A54007433CD /* IteratorRange.h */,
                                A70DA0831799F04D00529A9B /* ListDump.h */,
                                A8A472C1151A825A004123FF /* ListHashSet.h */,
+                               0FE164681B6FFC9600400E7C /* Lock.cpp */,
+                               0FE164691B6FFC9600400E7C /* Lock.h */,
                                A8A472C3151A825A004123FF /* Locker.h */,
                                1447AEC818FCE59400B3D7FF /* mbmalloc.cpp */,
                                A8A472CA151A825B004123FF /* MD5.cpp */,
                                A8A4746A151A825B004123FF /* UTF8.h in Headers */,
                                A8A473B9151A825B004123FF /* utils.h in Headers */,
                                A8A4747D151A825B004123FF /* ValueCheck.h in Headers */,
+                               0FE1646B1B6FFC9600400E7C /* Lock.h in Headers */,
                                A8A4747E151A825B004123FF /* Vector.h in Headers */,
                                A8A4747F151A825B004123FF /* VectorTraits.h in Headers */,
                                0FED67B61B22D4D80066CE15 /* TinyPtrSet.h in Headers */,
                                A8A4739E151A825B004123FF /* DataLog.cpp in Sources */,
                                A8A473A0151A825B004123FF /* DateMath.cpp in Sources */,
                                A8A473A2151A825B004123FF /* DecimalNumber.cpp in Sources */,
+                               0FE1646A1B6FFC9600400E7C /* Lock.cpp in Sources */,
                                A8A473AE151A825B004123FF /* diy-fp.cc in Sources */,
                                A8A473B0151A825B004123FF /* double-conversion.cc in Sources */,
                                A8A473BA151A825B004123FF /* dtoa.cpp in Sources */,
diff --git a/Source/WTF/benchmarks/LockSpeedTest.cpp b/Source/WTF/benchmarks/LockSpeedTest.cpp
new file mode 100644 (file)
index 0000000..fcc2583
--- /dev/null
@@ -0,0 +1,137 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include "config.h"
+
+#include <unistd.h>
+#include <wtf/CurrentTime.h>
+#include <wtf/Lock.h>
+#include <wtf/SpinLock.h>
+#include <wtf/StdLibExtras.h>
+#include <wtf/Threading.h>
+#include <wtf/ThreadingPrimitives.h>
+
+namespace {
+
+unsigned numThreadGroups;
+unsigned numThreadsPerGroup;
+unsigned workPerCriticalSection;
+unsigned numNoiseThreads;
+unsigned numIterations;
+    
+void usage()
+{
+    printf("Usage: LockSpeedTest spinlock|lock|mutex|all <num thread groups> <num threads per group> <work per critical section> <num noise threads> <num iterations>\n");
+    exit(1);
+}
+
+template<typename LockType>
+void runBenchmark(const char* name)
+{
+    std::unique_ptr<LockType[]> locks = std::make_unique<LockType[]>(numThreadGroups);
+    std::unique_ptr<double[]> words = std::make_unique<double[]>(numThreadGroups);
+    std::unique_ptr<ThreadIdentifier[]> threads = std::make_unique<ThreadIdentifier[]>(numThreadGroups * numThreadsPerGroup);
+    std::unique_ptr<ThreadIdentifier[]> noiseThreads = std::make_unique<ThreadIdentifier[]>(numNoiseThreads);
+    std::unique_ptr<double[]> noiseCounts = std::make_unique<double[]>(numNoiseThreads);
+
+    volatile bool shouldStop = false;
+    for (unsigned threadIndex = numNoiseThreads; threadIndex--;) {
+        noiseCounts[threadIndex] = 0;
+        noiseThreads[threadIndex] = createThread(
+            "Noise Thread",
+            [&shouldStop, &noiseCounts, threadIndex] () {
+                while (!shouldStop)
+                    noiseCounts[threadIndex]++;
+            });
+    }
+
+    double before = monotonicallyIncreasingTimeMS();
+    
+    for (unsigned threadGroupIndex = numThreadGroups; threadGroupIndex--;) {
+        words[threadGroupIndex] = 0;
+
+        for (unsigned threadIndex = numThreadsPerGroup; threadIndex--;) {
+            threads[threadGroupIndex * numThreadsPerGroup + threadIndex] = createThread(
+                "Benchmark thread",
+                [threadGroupIndex, &locks, &words] () {
+                    for (unsigned i = numIterations; i--;) {
+                        locks[threadGroupIndex].lock();
+                        for (unsigned j = workPerCriticalSection; j--;) {
+                            words[threadGroupIndex]++;
+                            words[threadGroupIndex] *= 1.01;
+                        }
+                        locks[threadGroupIndex].unlock();
+                    }
+                });
+        }
+    }
+
+    for (unsigned threadIndex = numThreadGroups * numThreadsPerGroup; threadIndex--;)
+        waitForThreadCompletion(threads[threadIndex]);
+    shouldStop = true;
+    double noiseCount = 0;
+    for (unsigned threadIndex = numNoiseThreads; threadIndex--;) {
+        waitForThreadCompletion(noiseThreads[threadIndex]);
+        noiseCount += noiseCounts[threadIndex];
+    }
+
+    double after = monotonicallyIncreasingTimeMS();
+
+    printf("%s: %.3lf ms, %.0lf noise.\n", name, after - before, noiseCount);
+}
+
+} // anonymous namespace
+
+int main(int argc, char** argv)
+{
+    WTF::initializeThreading();
+    
+    if (argc != 7
+        || sscanf(argv[2], "%u", &numThreadGroups) != 1
+        || sscanf(argv[3], "%u", &numThreadsPerGroup) != 1
+        || sscanf(argv[4], "%u", &workPerCriticalSection) != 1
+        || sscanf(argv[5], "%u", &numNoiseThreads) != 1
+        || sscanf(argv[6], "%u", &numIterations) != 1)
+        usage();
+
+    bool didRun = false;
+    if (!strcmp(argv[1], "spinlock") || !strcmp(argv[1], "all")) {
+        runBenchmark<SpinLock>("SpinLock");
+        didRun = true;
+    }
+    if (!strcmp(argv[1], "lock") || !strcmp(argv[1], "all")) {
+        runBenchmark<Lock>("WTF Lock");
+        didRun = true;
+    }
+    if (!strcmp(argv[1], "mutex") || !strcmp(argv[1], "all")) {
+        runBenchmark<Mutex>("Platform Mutex");
+        didRun = true;
+    }
+
+    if (!didRun)
+        usage();
+
+    return 0;
+}
index 01d8bf1948df86be8c210fc1f899d751b4209ffd..5e9766d8a6a47aaab0a6da0e052d514f934e4ff1 100644 (file)
@@ -41,6 +41,7 @@ set(WTF_HEADERS
     IteratorAdaptors.h
     IteratorRange.h
     ListHashSet.h
+    Lock.h
     Locker.h
     MD5.h
     MainThread.h
@@ -156,6 +157,7 @@ set(WTF_SOURCES
     FunctionDispatcher.cpp
     GregorianDateTime.cpp
     HashTable.cpp
+    Lock.cpp
     MD5.cpp
     MainThread.cpp
     MediaTime.cpp
diff --git a/Source/WTF/wtf/Lock.cpp b/Source/WTF/wtf/Lock.cpp
new file mode 100644 (file)
index 0000000..10eb507
--- /dev/null
@@ -0,0 +1,253 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include "config.h"
+#include "Lock.h"
+
+#include "ThreadSpecific.h"
+#include "ThreadingPrimitives.h"
+#include <condition_variable>
+#include <mutex>
+#include <thread>
+
+namespace WTF {
+
+namespace {
+
+// This data structure serves three purposes:
+//
+// 1) A parking mechanism for threads that go to sleep. That involves just a system mutex and
+//    condition variable.
+//
+// 2) A queue node for when a thread is on some Lock's queue.
+//
+// 3) The queue head. This is kind of funky. When a thread is the head of a queue, it also serves as
+//    the basic queue bookkeeping data structure. When a thread is dequeued, the next thread in the
+//    queue takes on the queue head duties.
+struct ThreadData {
+    // The parking mechanism.
+    bool shouldPark { false };
+    std::mutex parkingLock;
+    std::condition_variable parkingCondition;
+
+    // The queue node.
+    ThreadData* nextInQueue { nullptr };
+
+    // The queue itself.
+    ThreadData* queueTail { nullptr };
+};
+
+ThreadSpecific<ThreadData>* threadData;
+
+ThreadData* myThreadData()
+{
+    static std::once_flag initializeOnce;
+    std::call_once(
+        initializeOnce,
+        [] {
+            threadData = new ThreadSpecific<ThreadData>();
+        });
+
+    return *threadData;
+}
+
+} // anonymous namespace
+
+void LockBase::lockSlow()
+{
+    unsigned spinCount = 0;
+
+    // This magic number turns out to be optimal based on past JikesRVM experiments.
+    const unsigned spinLimit = 40;
+    
+    for (;;) {
+        uintptr_t currentWordValue = m_word.load();
+        
+        if (!(currentWordValue & isLockedBit)) {
+            // It's not possible for someone to hold the queue lock while the lock itself is no longer
+            // held, since we will only attempt to acquire the queue lock when the lock is held and
+            // the queue lock prevents unlock.
+            ASSERT(!(currentWordValue & isQueueLockedBit));
+            if (m_word.compareExchangeWeak(currentWordValue, currentWordValue | isLockedBit)) {
+                // Success! We acquired the lock.
+                return;
+            }
+        }
+
+        // If there is no queue and we haven't spun too much, we can just try to spin around again.
+        if (!(currentWordValue & ~queueHeadMask) && spinCount < spinLimit) {
+            spinCount++;
+            std::this_thread::yield();
+            continue;
+        }
+
+        // Need to put ourselves on the queue. Create the queue if one does not exist. This requries
+        // owning the queue for a little bit. The lock that controls the queue is itself a spinlock.
+        // But before we acquire the queue spinlock, we make sure that we have a ThreadData for this
+        // thread.
+        ThreadData* me = myThreadData();
+        ASSERT(!me->shouldPark);
+        ASSERT(!me->nextInQueue);
+        ASSERT(!me->queueTail);
+
+        // Reload the current word value, since some time may have passed.
+        currentWordValue = m_word.load();
+
+        // We proceed only if the queue lock is not held, the Lock is held, and we succeed in
+        // acquiring the queue lock.
+        if ((currentWordValue & isQueueLockedBit)
+            || !(currentWordValue & isLockedBit)
+            || !m_word.compareExchangeWeak(currentWordValue, currentWordValue | isQueueLockedBit)) {
+            std::this_thread::yield();
+            continue;
+        }
+
+        me->shouldPark = true;
+
+        // We own the queue. Nobody can enqueue or dequeue until we're done. Also, it's not possible
+        // to release the Lock while we hold the queue lock.
+        ThreadData* queueHead = bitwise_cast<ThreadData*>(currentWordValue & ~queueHeadMask);
+        if (queueHead) {
+            // Put this thread at the end of the queue.
+            queueHead->queueTail->nextInQueue = me;
+            queueHead->queueTail = me;
+
+            // Release the queue lock.
+            currentWordValue = m_word.load();
+            ASSERT(currentWordValue & ~queueHeadMask);
+            ASSERT(currentWordValue & isQueueLockedBit);
+            ASSERT(currentWordValue & isLockedBit);
+            m_word.store(currentWordValue & ~isQueueLockedBit);
+        } else {
+            // Make this thread be the queue-head.
+            queueHead = me;
+            me->queueTail = me;
+
+            // Release the queue lock and install ourselves as the head. No need for a CAS loop, since
+            // we own the queue lock.
+            currentWordValue = m_word.load();
+            ASSERT(~(currentWordValue & ~queueHeadMask));
+            ASSERT(currentWordValue & isQueueLockedBit);
+            ASSERT(currentWordValue & isLockedBit);
+            uintptr_t newWordValue = currentWordValue;
+            newWordValue |= bitwise_cast<uintptr_t>(queueHead);
+            newWordValue &= ~isQueueLockedBit;
+            m_word.store(newWordValue);
+        }
+
+        // At this point everyone who acquires the queue lock will see me on the queue, and anyone who
+        // acquires me's lock will see that me wants to park. Note that shouldPark may have been
+        // cleared as soon as the queue lock was released above, but it will happen while the
+        // releasing thread holds me's parkingLock.
+
+        {
+            std::unique_lock<std::mutex> locker(me->parkingLock);
+            while (me->shouldPark)
+                me->parkingCondition.wait(locker);
+        }
+
+        ASSERT(!me->shouldPark);
+        ASSERT(!me->nextInQueue);
+        ASSERT(!me->queueTail);
+        
+        // Now we can loop around and try to acquire the lock again.
+    }
+}
+
+void LockBase::unlockSlow()
+{
+    // If the fast path failed, it can only be because someone put a thread on the queue or the queue
+    // lock is held. If the queue lock is held, it can only be because someone *will* enqueue a thread
+    // onto the queue.
+
+    // Acquire the queue lock.
+    for (;;) {
+        uintptr_t currentWordValue = m_word.load();
+
+        ASSERT(currentWordValue & isLockedBit);
+        
+        if (currentWordValue & isQueueLockedBit) {
+            std::this_thread::yield();
+            continue;
+        }
+
+        // If the queue lock is not held, then there must be an entry on the queue.
+        ASSERT(currentWordValue & ~queueHeadMask);
+
+        if (m_word.compareExchangeWeak(currentWordValue, currentWordValue | isQueueLockedBit))
+            break;
+    }
+
+    uintptr_t currentWordValue = m_word.load();
+        
+    // After we acquire the queue lock, the Lock must still be held and the queue must be
+    // non-empty. The queue must be non-empty since only the lockSlow() method could have held the
+    // queue lock and if it did then it only releases it after putting something on the queue.
+    ASSERT(currentWordValue & isLockedBit);
+    ASSERT(currentWordValue & isQueueLockedBit);
+    ThreadData* queueHead = bitwise_cast<ThreadData*>(currentWordValue & ~queueHeadMask);
+    ASSERT(queueHead);
+
+    ThreadData* newQueueHead = queueHead->nextInQueue;
+    // Either this was the only thread on the queue, in which case we delete the queue, or there
+    // are still more threads on the queue, in which case we create a new queue head.
+    if (newQueueHead)
+        newQueueHead->queueTail = queueHead->queueTail;
+
+    // Change the queue head, possibly removing it if newQueueHead is null. No need for a CAS loop,
+    // since we hold the queue lock and the lock itself so nothing about the lock can change right
+    // now.
+    currentWordValue = m_word.load();
+    ASSERT(currentWordValue & isLockedBit);
+    ASSERT(currentWordValue & isQueueLockedBit);
+    ASSERT((currentWordValue & ~queueHeadMask) == bitwise_cast<uintptr_t>(queueHead));
+    uintptr_t newWordValue = currentWordValue;
+    newWordValue &= ~isLockedBit; // Release the Lock.
+    newWordValue &= ~isQueueLockedBit; // Release the queue lock.
+    newWordValue &= queueHeadMask; // Clear out the old queue head.
+    newWordValue |= bitwise_cast<uintptr_t>(newQueueHead); // Install new queue head.
+    m_word.store(newWordValue);
+
+    // Now the lock is available for acquisition. But we just have to wake up the old queue head.
+    // After that, we're done!
+
+    queueHead->nextInQueue = nullptr;
+    queueHead->queueTail = nullptr;
+
+    // We do this carefully because this may run either before or during the parkingLock critical
+    // section in lockSlow().
+    {
+        std::unique_lock<std::mutex> locker(queueHead->parkingLock);
+        queueHead->shouldPark = false;
+    }
+    // Doesn't matter if we notify_all() or notify_one() here since the only thread that could be
+    // waiting is queueHead.
+    queueHead->parkingCondition.notify_one();
+
+    // The old queue head can now contend for the lock again. We're done!
+}
+
+} // namespace WTF
+
diff --git a/Source/WTF/wtf/Lock.h b/Source/WTF/wtf/Lock.h
new file mode 100644 (file)
index 0000000..0c340a2
--- /dev/null
@@ -0,0 +1,168 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#ifndef WTF_Lock_h
+#define WTF_Lock_h
+
+#include <wtf/Atomics.h>
+#include <wtf/Compiler.h>
+#include <wtf/Locker.h>
+#include <wtf/Noncopyable.h>
+
+namespace WTF {
+
+// A Lock is a fully adaptive mutex that gives you the best of SpinLock and Mutex. For small critical
+// sections (that take nanoseconds), it will usually perform within 2x of a SpinLock in both the
+// contended and uncontended case. When using a Mutex, such critical sections take up to 100x longer
+// than Lock in the contended case, or 3x longer than Lock in the uncontended case. For longer
+// critical sections (that take tens of microseconds), it will perform as well as a Mutex and slightly
+// better than a SpinLock. But, crucially, a SpinLock will burn up to 90x more time in the kernel for
+// such critical sections than either Lock or Mutex. Hence, using Lock will make the common case of
+// locking perform close to SpinLock for any critical section that does more than a few nanoseconds of
+// work while being as kind to the scheduler for longer critical sections as a Mutex.
+//
+// Like SpinLock, Lock takes very little memory - just sizeof(void*), though see a detailed caveat
+// below.
+//
+// Generally, you should use Lock instead of SpinLock because while it penalizes you slightly, you
+// make up for it by not wasting CPU cycles in case of contention.
+//
+// The Lock has the following nice properties:
+//
+// - Uncontended fast paths for lock acquisition and lock release that are almost as fast as the
+//   uncontended fast paths of a spinlock. The only overhead is that the spinlock will not CAS on
+//   release, while Lock will CAS. This overhead *can* slow things down for extremely small critical
+//   sections that do little or nothing - it makes them 2x slower since in that case, CAS is the most
+//   expensive instruction and having two of them is twice as bad as just having one. However, this
+//   lock implementation is still almost 3x faster than a platform mutex in those cases. It's unlikely
+//   that you'll encounter no-op critical sections, so usually, this lock is better than a spinlock.
+//
+// - Contended fast path that attempts to spin and yield for some number of times. For critical
+//   sections that are held only briefly, this allows Lock to perform almost as well as a SpinLock.
+//   SpinLock can still be almost 2x faster than Lock if the critical section is a no-op, but this
+//   advantage diminishes as the critical section grows.
+//
+// - Contended slow path that enqueues the contending thread and causes it to wait on a condition
+//   variable until the lock is released. This is the only case in which system mutexes and condition
+//   variables are used. This case is rare and self-limiting: it will only happen when a lock is held
+//   for long enough that spinning some number of times doesn't acquire it. The fact that Lock does
+//   this as a fallback when spinning for some number of times fails means that it will burn
+//   dramatically fewer CPU cycles - for example with 10 threads on an 8 logical CPU machine acquiring
+//   a critical section that takes 50 microseconds, the WTF SpinLock will cause 90x more time to be
+//   spent in the kernel than Lock.
+//
+// - Very low memory usage. Each Lock requires only sizeof(void*) memory. When the contended slow
+//   path is activated, Lock only relies on each thread having a preallocated thread-specific data
+//   structure called ThreadData that, together with the Lock itself, is used to build up a thread
+//   queue. So, the total memory usage of all Locks is still bounded by:
+//
+//       numberOfLocks * sizeof(void*) + numberOfThreads * sizeof(ThreadData)
+//
+//   Where ThreadData is a decently large data structure, but we will only ever have one per thread,
+//   regardless of the number of Locks in memory. Another way to view this is that the worst case
+//   memory usage per Lock is:
+//
+//       sizeof(void*) + numberOfThreads / numberOfLocks * sizeof(ThreadData)
+//
+//   So, unless you have a small number of Locks (or, a large number of threads, which is far less
+//   likely), the memory usage per-Lock is still going to be somewhere around sizeof(void*).
+//
+// - Barging fast paths. The Lock is tuned for maximum throughput rather than maximum fairness. If
+//   a thread releases a Lock that was contended and had a queue of waiting threads, then it will
+//   wake up the head of the queue, but it will also mark the lock as being available. This means that
+//   some other thread that is just now attempting to acquire the lock may get it before the thread
+//   that got woken up. When a thread barges into the lock, the thread that got woken up will simply
+//   go back to the end of the queue. The barging behavior ends up being probabilistic on most
+//   platforms and even though it may be unfair to some thread at some moment in time, it will rarely
+//   have a long streak of unfairness towards any particular thread: eventually each thread waiting on
+//   the lock will get to have a turn so long as no thread just holds the lock forever. That said,
+//   there *is* a chance of pathologies - users of Lock should not depend on first-in, first-out lock
+//   acquisition order under contention. The same caveat is generally true of SpinLock and platform
+//   mutexes on some platforms.
+
+// This is a struct without a constructor or destructor so that it can be statically initialized.
+// Use Lock in instance variables.
+struct LockBase {
+    void lock()
+    {
+        if (LIKELY(m_word.compareExchangeWeak(0, isLockedBit, std::memory_order_acquire))) {
+            // Lock acquired!
+            return;
+        }
+
+        lockSlow();
+    }
+
+    void unlock()
+    {
+        if (LIKELY(m_word.compareExchangeWeak(isLockedBit, 0, std::memory_order_release))) {
+            // Lock released, and nobody was waiting!
+            return;
+        }
+
+        unlockSlow();
+    }
+
+    bool isHeld() const
+    {
+        return m_word.load(std::memory_order_acquire) & isLockedBit;
+    }
+
+    bool isLocked() const
+    {
+        return isHeld();
+    }
+
+protected:
+    static const uintptr_t isLockedBit = 1;
+    static const uintptr_t isQueueLockedBit = 2;
+    static const uintptr_t queueHeadMask = 3;
+
+    WTF_EXPORT_PRIVATE void lockSlow();
+    WTF_EXPORT_PRIVATE void unlockSlow();
+
+    Atomic<uintptr_t> m_word;
+};
+
+class Lock : public LockBase {
+    WTF_MAKE_NONCOPYABLE(Lock);
+public:
+    Lock()
+    {
+        m_word.store(0, std::memory_order_relaxed);
+    }
+};
+
+typedef LockBase StaticLock;
+typedef Locker<LockBase> LockHolder;
+
+} // namespace WTF
+
+using WTF::StaticLock;
+using WTF::Lock;
+using WTF::LockHolder;
+
+#endif // WTF_Lock_h
+
index 44085729b0a194d63725728e58c0b16992926ac4..6eb29e0b73d2da34a33b4b65e046726bd6655f4e 100644 (file)
@@ -60,7 +60,7 @@ void MetaAllocatorTracker::release(MetaAllocatorHandle* handle)
 
 ALWAYS_INLINE void MetaAllocator::release(MetaAllocatorHandle* handle)
 {
-    SpinLockHolder locker(&m_lock);
+    LockHolder locker(&m_lock);
     if (handle->sizeInBytes()) {
         decrementPageOccupancy(handle->start(), handle->sizeInBytes());
         addFreeSpaceFromReleasedHandle(handle->start(), handle->sizeInBytes());
@@ -91,7 +91,7 @@ void MetaAllocatorHandle::shrink(size_t newSizeInBytes)
 {
     ASSERT(newSizeInBytes <= m_sizeInBytes);
     
-    SpinLockHolder locker(&m_allocator->m_lock);
+    LockHolder locker(&m_allocator->m_lock);
 
     newSizeInBytes = m_allocator->roundUp(newSizeInBytes);
     
@@ -150,7 +150,7 @@ MetaAllocator::MetaAllocator(size_t allocationGranule, size_t pageSize)
 
 PassRefPtr<MetaAllocatorHandle> MetaAllocator::allocate(size_t sizeInBytes, void* ownerUID)
 {
-    SpinLockHolder locker(&m_lock);
+    LockHolder locker(&m_lock);
 
     if (!sizeInBytes)
         return 0;
@@ -196,7 +196,7 @@ PassRefPtr<MetaAllocatorHandle> MetaAllocator::allocate(size_t sizeInBytes, void
 
 MetaAllocator::Statistics MetaAllocator::currentStatistics()
 {
-    SpinLockHolder locker(&m_lock);
+    LockHolder locker(&m_lock);
     Statistics result;
     result.bytesAllocated = m_bytesAllocated;
     result.bytesReserved = m_bytesReserved;
@@ -281,7 +281,7 @@ void MetaAllocator::addFreeSpaceFromReleasedHandle(void* start, size_t sizeInByt
 
 void MetaAllocator::addFreshFreeSpace(void* start, size_t sizeInBytes)
 {
-    SpinLockHolder locker(&m_lock);
+    LockHolder locker(&m_lock);
     m_bytesReserved += sizeInBytes;
     addFreeSpace(start, sizeInBytes);
 }
@@ -289,7 +289,7 @@ void MetaAllocator::addFreshFreeSpace(void* start, size_t sizeInBytes)
 size_t MetaAllocator::debugFreeSpaceSize()
 {
 #ifndef NDEBUG
-    SpinLockHolder locker(&m_lock);
+    LockHolder locker(&m_lock);
     size_t result = 0;
     for (FreeSpaceNode* node = m_freeSpaceSizeMap.first(); node; node = node->successor())
         result += node->m_sizeInBytes;
index 2ce9180bd19947d44c6df40ed32fd0d2a0b0390a..2d753cf69982f3863a33cf55644b5f37f8070171 100644 (file)
 
 #include <wtf/Assertions.h>
 #include <wtf/HashMap.h>
+#include <wtf/Lock.h>
 #include <wtf/MetaAllocatorHandle.h>
 #include <wtf/Noncopyable.h>
 #include <wtf/PageBlock.h>
 #include <wtf/RedBlackTree.h>
 #include <wtf/RefCounted.h>
 #include <wtf/RefPtr.h>
-#include <wtf/SpinLock.h>
 
 namespace WTF {
 
@@ -183,7 +183,7 @@ private:
     size_t m_bytesReserved;
     size_t m_bytesCommitted;
     
-    SpinLock m_lock;
+    Lock m_lock;
 
     MetaAllocatorTracker* m_tracker;
 
index 90944e6b4c317b9133af68fb7e2536da6bedce02..58604b012aa386bbeca2ce7341751fa932c402a9 100644 (file)
 
 namespace WTF {
 
+// SpinLock is a very simple lock implementation that has extremely fast lock/unlock for very small
+// uncontended critical sections. However, it will exhibit bad performance degradation when the lock
+// becomes contended: the thread trying to acquire the lock will simply waste CPU cycles.
+//
+// For most (all?) locking use cases, it's better to use Lock (see wtf/Lock.h). That uses only a bit
+// more memory (8 bytes instead of 4 on 64-bit), and is only a bit slower in the uncontended case
+// (Lock needs CAS to unlock, while SpinLock doesn't), but will burn a lot less CPU time - for 10
+// threads acquiring a 50 microsecond critical section, Lock will use up to 100x less kernel CPU time
+// than SpinLock.
+
 // SpinLockBase is a struct without an explicitly defined constructors so that
 // it can be initialized at compile time. See StaticSpinLock below.
 struct SpinLockBase {
index 106ed96eccb84d590452f798638e170735b56b28..92e20e47ffc81aea9bcb7d3a44bbda1e572708c0 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2007, 2009 Apple Inc. All rights reserved.
+ * Copyright (C) 2007, 2009, 2015 Apple Inc. All rights reserved.
  * Copyright (C) 2007 Justin Haygood (jhaygood@reaktix.com)
  * Copyright (C) 2011 Research In Motion Limited. All rights reserved.
  *
index ea4bd43ae1387e4431642d2a240a6679cc1b91ab..ab40087d87e366a605d072716ae0c7b27463c01d 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2007, 2008 Apple Inc. All rights reserved.
+ * Copyright (C) 2007, 2008, 2015 Apple Inc. All rights reserved.
  * Copyright (C) 2009 Google Inc. All rights reserved.
  * Copyright (C) 2009 Torch Mobile, Inc. All rights reserved.
  *
index bb55b946d766a3f89009ab478033c6ce19c527f9..19c9a0ea96961df22e8f31c300ab97c904356878 100644 (file)
@@ -27,7 +27,7 @@
 #include "dtoa.h"
 
 #if USE(WEB_THREAD)
-#include "SpinLock.h"
+#include "Lock.h"
 #endif
 
 namespace WTF {
index af595a6c73fca693f4bfb9fd28fc35359a0b2b74..ff4cf231ad666d5356a030e9c04ce4f73a5717df 100644 (file)
@@ -33,7 +33,7 @@
 #include <wtf/unicode/UTF8.h>
 
 #if USE(WEB_THREAD)
-#include "SpinLock.h"
+#include "Lock.h"
 #endif
 
 namespace WTF {
@@ -42,18 +42,18 @@ using namespace Unicode;
 
 #if USE(WEB_THREAD)
 
-class AtomicStringTableLocker : public SpinLockHolder {
+class AtomicStringTableLocker : public LockHolder {
     WTF_MAKE_NONCOPYABLE(AtomicStringTableLocker);
 
-    static StaticSpinLock s_stringTableLock;
+    static StaticLock s_stringTableLock;
 public:
     AtomicStringTableLocker()
-        : SpinLockHolder(&s_stringTableLock)
+        : LockHolder(&s_stringTableLock)
     {
     }
 };
 
-StaticSpinLock AtomicStringTableLocker::s_stringTableLock;
+StaticLock AtomicStringTableLocker::s_stringTableLock;
 
 #else
 
index 9c775f858db318c1fb79e7e0d1a14e6c527baa1b..f1b9de0fef37e9edd7ac2fc062fe2ad009124cfb 100644 (file)
@@ -1,3 +1,21 @@
+2015-08-05  Filip Pizlo  <fpizlo@apple.com>
+
+        Lightweight locks should be adaptive
+        https://bugs.webkit.org/show_bug.cgi?id=147545
+
+        Reviewed by Geoffrey Garen.
+
+        No new tests because no new behavior.
+
+        * bindings/objc/WebScriptObject.mm:
+        (WebCore::getJSWrapper):
+        (WebCore::addJSWrapper):
+        (WebCore::removeJSWrapper):
+        (WebCore::removeJSWrapperIfRetainCountOne):
+        * platform/ios/wak/WAKWindow.mm:
+        (-[WAKWindow setExposedScrollViewRect:]):
+        (-[WAKWindow exposedScrollViewRect]):
+
 2015-08-06  Alex Christensen  <achristensen@webkit.org>
 
         [Win] Enable all Windows features in CMake
index e9e9e0d9e85cc470713883b75a19337a2b690cfd..57f3202a8c2d6014ee1e1ac8fcf7e5bd062a77c8 100644 (file)
@@ -48,7 +48,7 @@
 #import <runtime/JSLock.h>
 #import <runtime/Completion.h>
 #import <runtime/Completion.h>
-#import <wtf/SpinLock.h>
+#import <wtf/Lock.h>
 #import <wtf/Threading.h>
 #import <wtf/spi/cocoa/NSMapTableSPI.h>
 #import <wtf/text/WTFString.h>
@@ -72,12 +72,12 @@ using JSC::makeSource;
 namespace WebCore {
 
 static NSMapTable* JSWrapperCache;
-static StaticSpinLock spinLock;
+static StaticLock spinLock;
 
 NSObject* getJSWrapper(JSObject* impl)
 {
     ASSERT(isMainThread());
-    SpinLockHolder holder(&spinLock);
+    LockHolder holder(&spinLock);
 
     if (!JSWrapperCache)
         return nil;
@@ -88,7 +88,7 @@ NSObject* getJSWrapper(JSObject* impl)
 void addJSWrapper(NSObject* wrapper, JSObject* impl)
 {
     ASSERT(isMainThread());
-    SpinLockHolder holder(&spinLock);
+    LockHolder holder(&spinLock);
 
     if (!JSWrapperCache)
         JSWrapperCache = createWrapperCache();
@@ -97,7 +97,7 @@ void addJSWrapper(NSObject* wrapper, JSObject* impl)
 
 void removeJSWrapper(JSObject* impl)
 {
-    SpinLockHolder holder(&spinLock);
+    LockHolder holder(&spinLock);
 
     if (!JSWrapperCache)
         return;
@@ -106,7 +106,7 @@ void removeJSWrapper(JSObject* impl)
 
 static void removeJSWrapperIfRetainCountOne(NSObject* wrapper, JSObject* impl)
 {
-    SpinLockHolder holder(&spinLock);
+    LockHolder holder(&spinLock);
 
     if (!JSWrapperCache)
         return;
index 05442adc45e977899315205c372e619601279d37..ec8d041fe7b2c3705d8ffb6cbbe8f1f5485d683b 100644 (file)
@@ -200,7 +200,7 @@ CARingBuffer::Error CARingBuffer::store(const AudioBufferList* list, size_t fram
 
 void CARingBuffer::setCurrentFrameBounds(uint64_t startTime, uint64_t endTime)
 {
-    ByteSpinLocker locker(m_currentFrameBoundsLock);
+    LockHolder locker(m_currentFrameBoundsLock);
     uint32_t nextPtr = m_timeBoundsQueuePtr + 1;
     uint32_t index = nextPtr & kGeneralRingTimeBoundsQueueMask;
 
@@ -212,7 +212,7 @@ void CARingBuffer::setCurrentFrameBounds(uint64_t startTime, uint64_t endTime)
 
 void CARingBuffer::getCurrentFrameBounds(uint64_t &startTime, uint64_t &endTime)
 {
-    ByteSpinLocker locker(m_currentFrameBoundsLock);
+    LockHolder locker(m_currentFrameBoundsLock);
     uint32_t curPtr = m_timeBoundsQueuePtr;
     uint32_t index = curPtr & kGeneralRingTimeBoundsQueueMask;
     CARingBuffer::TimeBounds& bounds = m_timeBoundsQueue[index];
index f4a00b88f9580e7feeca2529591ee25f292ef46d..b230b3536e4e3b1ff872bdf74ac5e7ad513e5b4c 100644 (file)
@@ -29,7 +29,7 @@
 #if ENABLE(WEB_AUDIO) && USE(MEDIATOOLBOX)
 
 #include <runtime/ArrayBuffer.h>
-#include <wtf/ByteSpinLock.h>
+#include <wtf/Lock.h>
 #include <wtf/Vector.h>
 
 typedef struct AudioBufferList AudioBufferList;
@@ -84,7 +84,7 @@ private:
     };
     
     Vector<TimeBounds> m_timeBoundsQueue;
-    ByteSpinLock m_currentFrameBoundsLock;
+    Lock m_currentFrameBoundsLock;
     int32_t m_timeBoundsQueuePtr;
 };
 
index 423d3ee52bd8559f275a4ed76f1024128fee24e7..e80ecfd5eb6bcb4a0467113aec48ce8bf0aaacca 100644 (file)
@@ -36,7 +36,7 @@
 #import "WKContentObservation.h"
 #import "WKViewPrivate.h"
 #import <QuartzCore/QuartzCore.h>
-#import <wtf/SpinLock.h>
+#import <wtf/Lock.h>
 
 WEBCORE_EXPORT NSString * const WAKWindowScreenScaleDidChangeNotification = @"WAKWindowScreenScaleDidChangeNotification";
 WEBCORE_EXPORT NSString * const WAKWindowVisibilityDidChangeNotification = @"WAKWindowVisibilityDidChangeNotification";
@@ -56,7 +56,7 @@ static WebEvent *currentEvent = nil;
 static id<OrientationProvider> gOrientationProvider;
 
 @implementation WAKWindow {
-    SpinLock _exposedScrollViewRectLock;
+    Lock _exposedScrollViewRectLock;
     CGRect _exposedScrollViewRect;
 }
 
@@ -358,14 +358,14 @@ static id<OrientationProvider> gOrientationProvider;
 
 - (void)setExposedScrollViewRect:(CGRect)exposedScrollViewRect
 {
-    SpinLockHolder locker(&_exposedScrollViewRectLock);
+    LockHolder locker(&_exposedScrollViewRectLock);
     _exposedScrollViewRect = exposedScrollViewRect;
 }
 
 - (CGRect)exposedScrollViewRect
 {
     {
-        SpinLockHolder locker(&_exposedScrollViewRectLock);
+        LockHolder locker(&_exposedScrollViewRectLock);
         if (!CGRectIsNull(_exposedScrollViewRect))
             return _exposedScrollViewRect;
     }
index cf6dfa98807ea1afb3b8d94e5372d96360148e9e..43870260afc66af34ed9e193f07d23b36ab7c3f0 100644 (file)
@@ -1,3 +1,21 @@
+2015-08-05  Filip Pizlo  <fpizlo@apple.com>
+
+        Lightweight locks should be adaptive
+        https://bugs.webkit.org/show_bug.cgi?id=147545
+
+        Reviewed by Geoffrey Garen.
+
+        * WebProcess/WebPage/EventDispatcher.cpp:
+        (WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
+        (WebKit::EventDispatcher::getQueuedTouchEventsForPage):
+        (WebKit::EventDispatcher::touchEvent):
+        (WebKit::EventDispatcher::dispatchTouchEvents):
+        * WebProcess/WebPage/EventDispatcher.h:
+        * WebProcess/WebPage/ViewUpdateDispatcher.cpp:
+        (WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
+        (WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
+        * WebProcess/WebPage/ViewUpdateDispatcher.h:
+
 2015-08-06  Matt Rajca  <mrajca@apple.com>
 
         Media Session: notify focus manager clients when the playing state of the focused media element changes
index 5f3f7c7bb4d6ef8e67f9fad11ad464c251857919..ae3e757fd719547bc2e428c997e144b6df88626f 100644 (file)
@@ -172,13 +172,13 @@ void EventDispatcher::wheelEvent(uint64_t pageID, const WebWheelEvent& wheelEven
 #if ENABLE(IOS_TOUCH_EVENTS)
 void EventDispatcher::clearQueuedTouchEventsForPage(const WebPage& webPage)
 {
-    SpinLockHolder locker(&m_touchEventsLock);
+    LockHolder locker(&m_touchEventsLock);
     m_touchEvents.remove(webPage.pageID());
 }
 
 void EventDispatcher::getQueuedTouchEventsForPage(const WebPage& webPage, TouchEventQueue& destinationQueue)
 {
-    SpinLockHolder locker(&m_touchEventsLock);
+    LockHolder locker(&m_touchEventsLock);
     destinationQueue = m_touchEvents.take(webPage.pageID());
 }
 
@@ -186,7 +186,7 @@ void EventDispatcher::touchEvent(uint64_t pageID, const WebKit::WebTouchEvent& t
 {
     bool updateListWasEmpty;
     {
-        SpinLockHolder locker(&m_touchEventsLock);
+        LockHolder locker(&m_touchEventsLock);
         updateListWasEmpty = m_touchEvents.isEmpty();
         auto addResult = m_touchEvents.add(pageID, TouchEventQueue());
         if (addResult.isNewEntry)
@@ -217,7 +217,7 @@ void EventDispatcher::dispatchTouchEvents()
 {
     HashMap<uint64_t, TouchEventQueue> localCopy;
     {
-        SpinLockHolder locker(&m_touchEventsLock);
+        LockHolder locker(&m_touchEventsLock);
         localCopy.swap(m_touchEvents);
     }
 
index 6f7431113940f6c3682e6274013620416f122490..ba36239ea1d9258ac74265da13ccbbd71971360b 100644 (file)
@@ -32,9 +32,9 @@
 #include <WebEvent.h>
 #include <memory>
 #include <wtf/HashMap.h>
+#include <wtf/Lock.h>
 #include <wtf/Noncopyable.h>
 #include <wtf/RefPtr.h>
-#include <wtf/SpinLock.h>
 #include <wtf/ThreadingPrimitives.h>
 
 namespace WebCore {
@@ -97,7 +97,7 @@ private:
 #endif
     std::unique_ptr<WebCore::WheelEventDeltaTracker> m_recentWheelEventDeltaTracker;
 #if ENABLE(IOS_TOUCH_EVENTS)
-    SpinLock m_touchEventsLock;
+    Lock m_touchEventsLock;
     HashMap<uint64_t, TouchEventQueue> m_touchEvents;
 #endif
 };
index 57f4a76077b8172024e407787334e17447461f6f..d9716eb5019aa7a7ec5c379b55503d3c0062583b 100644 (file)
@@ -58,7 +58,7 @@ void ViewUpdateDispatcher::visibleContentRectUpdate(uint64_t pageID, const Visib
 {
     bool updateListWasEmpty;
     {
-        SpinLockHolder locker(&m_dataMutex);
+        LockHolder locker(&m_dataMutex);
         updateListWasEmpty = m_latestUpdate.isEmpty();
         auto iterator = m_latestUpdate.find(pageID);
         if (iterator == m_latestUpdate.end())
@@ -78,7 +78,7 @@ void ViewUpdateDispatcher::dispatchVisibleContentRectUpdate()
 {
     HashMap<uint64_t, UpdateData> update;
     {
-        SpinLockHolder locker(&m_dataMutex);
+        LockHolder locker(&m_dataMutex);
         update = WTF::move(m_latestUpdate);
     }
 
index 681704d440540777b6042ea1e0d537df75ff15c3..a4d02e76f81847313c35289a8eb684ce22d43c5e 100644 (file)
@@ -30,8 +30,8 @@
 
 #include "VisibleContentRectUpdateInfo.h"
 #include <wtf/HashMap.h>
+#include <wtf/Lock.h>
 #include <wtf/Ref.h>
-#include <wtf/SpinLock.h>
 
 namespace WebKit {
 
@@ -57,7 +57,7 @@ private:
     };
 
     Ref<WorkQueue> m_queue;
-    SpinLock m_dataMutex;
+    Lock m_dataMutex;
     HashMap<uint64_t, UpdateData> m_latestUpdate;
 };
 
index ddb9b295f29bbff3c41eb541725b15b17a3be6d3..2280a6f92b766b53aad4b1014fddad8603d22ff3 100644 (file)
@@ -1,3 +1,17 @@
+2015-08-06  Filip Pizlo  <fpizlo@apple.com>
+
+        Lightweight locks should be adaptive
+        https://bugs.webkit.org/show_bug.cgi?id=147545
+
+        Reviewed by Geoffrey Garen.
+
+        * TestWebKitAPI/CMakeLists.txt:
+        * TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
+        * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
+        * TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
+        (TestWebKitAPI::runLockTest):
+        (TestWebKitAPI::TEST):
+
 2015-08-06  Ryosuke Niwa  <rniwa@webkit.org>
 
         http_server_driver and benchmark_builder should not be in run-benchmark's plan files
index 5e270167f6564e66f08ffbb92f0e0b544b2c3d6e..dc2513d0b5c3c39b4b00522f60cecf7f549e550b 100644 (file)
@@ -75,6 +75,7 @@ set(TestWTF_SOURCES
     ${TESTWEBKITAPI_DIR}/Tests/WTF/HashSet.cpp
     ${TESTWEBKITAPI_DIR}/Tests/WTF/IntegerToStringConversion.cpp
     ${TESTWEBKITAPI_DIR}/Tests/WTF/ListHashSet.cpp
+    ${TESTWEBKITAPI_DIR}/Tests/WTF/Lock.cpp
     ${TESTWEBKITAPI_DIR}/Tests/WTF/MD5.cpp
     ${TESTWEBKITAPI_DIR}/Tests/WTF/MathExtras.cpp
     ${TESTWEBKITAPI_DIR}/Tests/WTF/MediaTime.cpp
index f40ef9c4a83b6bda1f3658ae260949adfb030e21..62d0ce6908a19c93ad47d43f117013fb66847254 100644 (file)
@@ -1,4 +1,4 @@
-<?xml version="1.0" encoding="utf-8"?>
+<?xml version="1.0" encoding="utf-8"?>
 <Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
   <ItemGroup Label="ProjectConfigurations">
     <ProjectConfiguration Include="DebugSuffix|Win32">
     <ClCompile Include="..\Tests\WTF\HashSet.cpp" />
     <ClCompile Include="..\Tests\WTF\IntegerToStringConversion.cpp" />
     <ClCompile Include="..\Tests\WTF\ListHashSet.cpp" />
+    <ClCompile Include="..\Tests\WTF\Lock.cpp" />
     <ClCompile Include="..\Tests\WTF\MD5.cpp" />
     <ClCompile Include="..\Tests\WTF\MathExtras.cpp" />
     <ClCompile Include="..\Tests\WTF\MediaTime.cpp" />
   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
   <ImportGroup Label="ExtensionTargets">
   </ImportGroup>
-</Project>
\ No newline at end of file
+</Project>
index 6b75a1d532bdf781a04cff94455bf14590a803fb..9089587d967980da418a5f9f5312c6c44bf225be 100644 (file)
@@ -11,6 +11,7 @@
                0F139E781A423A6B00F590F5 /* PlatformUtilitiesCocoa.mm in Sources */ = {isa = PBXBuildFile; fileRef = 0F139E721A423A2B00F590F5 /* PlatformUtilitiesCocoa.mm */; };
                0F139E791A42457000F590F5 /* PlatformUtilitiesCocoa.mm in Sources */ = {isa = PBXBuildFile; fileRef = 0F139E721A423A2B00F590F5 /* PlatformUtilitiesCocoa.mm */; };
                0F3B94A71A77267400DE3272 /* WKWebViewEvaluateJavaScript.mm in Sources */ = {isa = PBXBuildFile; fileRef = 0F3B94A51A77266C00DE3272 /* WKWebViewEvaluateJavaScript.mm */; };
+               0FFC45A61B73EBEB0085BD62 /* Lock.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FFC45A41B73EBE20085BD62 /* Lock.cpp */; };
                1A02C870125D4CFD00E3F4BD /* find.html in Copy Resources */ = {isa = PBXBuildFile; fileRef = 1A02C84B125D4A5E00E3F4BD /* find.html */; };
                1A50AA201A2A51FC00F4C345 /* close-from-within-create-page.html in Copy Resources */ = {isa = PBXBuildFile; fileRef = 1A50AA1F1A2A4EA500F4C345 /* close-from-within-create-page.html */; };
                1A63479F183D72A4005B1707 /* all-content-in-one-iframe.html in Copy Resources */ = {isa = PBXBuildFile; fileRef = 93D3D19B17B1A7B000C7C415 /* all-content-in-one-iframe.html */; };
                0F3B94A51A77266C00DE3272 /* WKWebViewEvaluateJavaScript.mm */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.objcpp; path = WKWebViewEvaluateJavaScript.mm; sourceTree = "<group>"; };
                0FC6C4CB141027E0005B7F0C /* RedBlackTree.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = RedBlackTree.cpp; sourceTree = "<group>"; };
                0FC6C4CE141034AD005B7F0C /* MetaAllocator.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = MetaAllocator.cpp; sourceTree = "<group>"; };
+               0FFC45A41B73EBE20085BD62 /* Lock.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = Lock.cpp; sourceTree = "<group>"; };
                14464012167A8305000BD218 /* LayoutUnit.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = LayoutUnit.cpp; sourceTree = "<group>"; };
                14F3B11215E45EAB00210069 /* SaturatedArithmeticOperations.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = SaturatedArithmeticOperations.cpp; sourceTree = "<group>"; };
                1A02C84B125D4A5E00E3F4BD /* find.html */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.html; path = find.html; sourceTree = "<group>"; };
                                26B2DFF815BDE599004F691D /* HashSet.cpp */,
                                266FAFD215E5775200F61D5B /* IntegerToStringConversion.cpp */,
                                26300B1716755CD90066886D /* ListHashSet.cpp */,
+                               0FFC45A41B73EBE20085BD62 /* Lock.cpp */,
                                B4039F9C15E6D8B3007255D6 /* MathExtras.cpp */,
                                CD5393C71757BA9700C07123 /* MD5.cpp */,
                                CD5497B315857F0C00B5BC30 /* MediaTime.cpp */,
                                7CCE7EC91A411A7E00447C4C /* RenderedImageFromDOMNode.mm in Sources */,
                                7CCE7ECA1A411A7E00447C4C /* RenderedImageFromDOMRange.mm in Sources */,
                                51CD1C6C1B38CE4300142CA5 /* ModalAlerts.mm in Sources */,
+                               0FFC45A61B73EBEB0085BD62 /* Lock.cpp in Sources */,
                                7CCE7F0E1A411AE600447C4C /* ResizeReversePaginatedWebView.cpp in Sources */,
                                7CCE7F0F1A411AE600447C4C /* ResizeWindowAfterCrash.cpp in Sources */,
                                7CCE7F101A411AE600447C4C /* ResponsivenessTimerDoesntFireEarly.cpp in Sources */,
diff --git a/Tools/TestWebKitAPI/Tests/WTF/Lock.cpp b/Tools/TestWebKitAPI/Tests/WTF/Lock.cpp
new file mode 100644 (file)
index 0000000..2c5168e
--- /dev/null
@@ -0,0 +1,100 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "config.h"
+#include <wtf/Lock.h>
+#include <wtf/Threading.h>
+#include <wtf/ThreadingPrimitives.h>
+
+using namespace WTF;
+
+namespace TestWebKitAPI {
+
+template<typename LockType>
+void runLockTest(unsigned numThreadGroups, unsigned numThreadsPerGroup, unsigned workPerCriticalSection, unsigned numIterations)
+{
+    std::unique_ptr<LockType[]> locks = std::make_unique<LockType[]>(numThreadGroups);
+    std::unique_ptr<double[]> words = std::make_unique<double[]>(numThreadGroups);
+    std::unique_ptr<ThreadIdentifier[]> threads = std::make_unique<ThreadIdentifier[]>(numThreadGroups * numThreadsPerGroup);
+
+    for (unsigned threadGroupIndex = numThreadGroups; threadGroupIndex--;) {
+        words[threadGroupIndex] = 0;
+
+        for (unsigned threadIndex = numThreadsPerGroup; threadIndex--;) {
+            threads[threadGroupIndex * numThreadsPerGroup + threadIndex] = createThread(
+                "Benchmark thread",
+                [threadGroupIndex, &locks, &words, numIterations, workPerCriticalSection] () {
+                    for (unsigned i = numIterations; i--;) {
+                        locks[threadGroupIndex].lock();
+                        for (unsigned j = workPerCriticalSection; j--;)
+                            words[threadGroupIndex]++;
+                        locks[threadGroupIndex].unlock();
+                    }
+                });
+        }
+    }
+
+    for (unsigned threadIndex = numThreadGroups * numThreadsPerGroup; threadIndex--;)
+        waitForThreadCompletion(threads[threadIndex]);
+
+    double expected = 0;
+    for (uint64_t i = static_cast<uint64_t>(numIterations) * workPerCriticalSection * numThreadsPerGroup; i--;)
+        expected++;
+
+    for (unsigned threadGroupIndex = numThreadGroups; threadGroupIndex--;)
+        EXPECT_EQ(expected, words[threadGroupIndex]);
+}
+
+TEST(WTF_Lock, UncontentedShortSection)
+{
+    runLockTest<Lock>(1, 1, 1, 10000000);
+}
+
+TEST(WTF_Lock, UncontentedLongSection)
+{
+    runLockTest<Lock>(1, 1, 10000, 1000);
+}
+
+TEST(WTF_Lock, ContentedShortSection)
+{
+    runLockTest<Lock>(1, 10, 1, 10000000);
+}
+
+TEST(WTF_Lock, ContentedLongSection)
+{
+    runLockTest<Lock>(1, 10, 10000, 10000);
+}
+
+TEST(WTF_Lock, ManyContentedShortSections)
+{
+    runLockTest<Lock>(10, 10, 1, 500000);
+}
+
+TEST(WTF_Lock, ManyContentedLongSections)
+{
+    runLockTest<Lock>(10, 10, 10000, 2000);
+}
+
+} // namespace TestWebKitAPI