Lightweight locks should be adaptive
authorfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Fri, 7 Aug 2015 00:49:54 +0000 (00:49 +0000)
committerfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Fri, 7 Aug 2015 00:49:54 +0000 (00:49 +0000)
https://bugs.webkit.org/show_bug.cgi?id=147545

Reviewed by Geoffrey Garen.

Source/JavaScriptCore:

* heap/CopiedBlock.h:
(JSC::CopiedBlock::workListLock):
* heap/CopiedBlockInlines.h:
(JSC::CopiedBlock::shouldReportLiveBytes):
(JSC::CopiedBlock::reportLiveBytes):
* heap/CopiedSpace.h:
(JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::recycleEvacuatedBlock):
* heap/GCThreadSharedData.h:
(JSC::GCThreadSharedData::getNextBlocksToCopy):
* heap/ListableHandler.h:
(JSC::ListableHandler::List::addThreadSafe):
(JSC::ListableHandler::List::addNotThreadSafe):
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::copyLater):
* runtime/TypeProfilerLog.h:

Source/WebCore:

No new tests because no new behavior.

* bindings/objc/WebScriptObject.mm:
(WebCore::getJSWrapper):
(WebCore::addJSWrapper):
(WebCore::removeJSWrapper):
(WebCore::removeJSWrapperIfRetainCountOne):
* platform/ios/wak/WAKWindow.mm:
(-[WAKWindow setExposedScrollViewRect:]):
(-[WAKWindow exposedScrollViewRect]):

Source/WebKit2:

* WebProcess/WebPage/EventDispatcher.cpp:
(WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
(WebKit::EventDispatcher::getQueuedTouchEventsForPage):
(WebKit::EventDispatcher::touchEvent):
(WebKit::EventDispatcher::dispatchTouchEvents):
* WebProcess/WebPage/EventDispatcher.h:
* WebProcess/WebPage/ViewUpdateDispatcher.cpp:
(WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
(WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
* WebProcess/WebPage/ViewUpdateDispatcher.h:

Source/WTF:

A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
overhead is lower than system locks and because they take dramatically less space than system
locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
acquire is up to 10x faster and under microcontention - short critical section with two or
more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
should continue to avoid system locks - they are just far too slow and far too big.

But there is a problem with this idiom. System lock implementations will sleep a thread when
it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
microcontention, but awful when the lock will not be released for a while. In fact, when
critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
frequently in our current uses of spinlocks, but that's probably because right now there are
places where we make a conscious decision to use system locks - even though they use more
memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
while to acquire the lock.

The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
(not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
queue for critical sections that are held for a long time.

On a locking microbenchmark, this new Lock exhibits the following performance
characteristics:

- Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
  than a system mutex.

- Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
  than a system mutex.

- CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
  SpinLock.

- Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
  a SpinLock.

This patch replaces all uses of SpinLock with Lock, since our critical sections are not
no-ops so if you do basically anything in your critical section, the Lock overhead will be
invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
of having a lock that just uses a byte are still better than the CPU wastage benefits of
Lock. But, this work will enable some future work to create locks that will fit in just 1.6
bits: https://bugs.webkit.org/show_bug.cgi?id=147665.

[1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf

* WTF.vcxproj/WTF.vcxproj:
* WTF.xcodeproj/project.pbxproj:
* benchmarks: Added.
* benchmarks/LockSpeedTest.cpp: Added.
(main):
* wtf/CMakeLists.txt:
* wtf/Lock.cpp: Added.
(WTF::LockBase::lockSlow):
(WTF::LockBase::unlockSlow):
* wtf/Lock.h: Added.
(WTF::LockBase::lock):
(WTF::LockBase::unlock):
(WTF::LockBase::isHeld):
(WTF::Lock::Lock):
* wtf/MetaAllocator.cpp:
(WTF::MetaAllocator::release):
(WTF::MetaAllocatorHandle::shrink):
(WTF::MetaAllocator::allocate):
(WTF::MetaAllocator::currentStatistics):
(WTF::MetaAllocator::addFreshFreeSpace):
(WTF::MetaAllocator::debugFreeSpaceSize):
* wtf/MetaAllocator.h:
* wtf/SpinLock.h:
* wtf/ThreadingPthreads.cpp:
* wtf/ThreadingWin.cpp:
* wtf/text/AtomicString.cpp:
* wtf/text/AtomicStringImpl.cpp:
(WTF::AtomicStringTableLocker::AtomicStringTableLocker):

Tools:

* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188100 268f45cc-cd09-0410-ab3c-d52691b4dbfc

44 files changed:
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/dfg/DFGCommon.cpp
Source/JavaScriptCore/heap/CopiedBlock.h
Source/JavaScriptCore/heap/CopiedBlockInlines.h
Source/JavaScriptCore/heap/CopiedSpace.cpp
Source/JavaScriptCore/heap/CopiedSpace.h
Source/JavaScriptCore/heap/CopiedSpaceInlines.h
Source/JavaScriptCore/heap/GCThreadSharedData.cpp
Source/JavaScriptCore/heap/GCThreadSharedData.h
Source/JavaScriptCore/heap/ListableHandler.h
Source/JavaScriptCore/heap/MachineStackMarker.cpp
Source/JavaScriptCore/heap/SlotVisitorInlines.h
Source/JavaScriptCore/parser/SourceProvider.cpp
Source/JavaScriptCore/profiler/ProfilerDatabase.cpp
Source/JavaScriptCore/runtime/TypeProfilerLog.h
Source/WTF/ChangeLog
Source/WTF/WTF.vcxproj/WTF.vcxproj
Source/WTF/WTF.xcodeproj/project.pbxproj
Source/WTF/benchmarks/LockSpeedTest.cpp [new file with mode: 0644]
Source/WTF/wtf/CMakeLists.txt
Source/WTF/wtf/Lock.cpp [new file with mode: 0644]
Source/WTF/wtf/Lock.h [new file with mode: 0644]
Source/WTF/wtf/MetaAllocator.cpp
Source/WTF/wtf/MetaAllocator.h
Source/WTF/wtf/SpinLock.h
Source/WTF/wtf/ThreadingPthreads.cpp
Source/WTF/wtf/ThreadingWin.cpp
Source/WTF/wtf/text/AtomicString.cpp
Source/WTF/wtf/text/AtomicStringImpl.cpp
Source/WebCore/ChangeLog
Source/WebCore/bindings/objc/WebScriptObject.mm
Source/WebCore/platform/audio/mac/CARingBuffer.cpp
Source/WebCore/platform/audio/mac/CARingBuffer.h
Source/WebCore/platform/ios/wak/WAKWindow.mm
Source/WebKit2/ChangeLog
Source/WebKit2/WebProcess/WebPage/EventDispatcher.cpp
Source/WebKit2/WebProcess/WebPage/EventDispatcher.h
Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.cpp
Source/WebKit2/WebProcess/WebPage/ViewUpdateDispatcher.h
Tools/ChangeLog
Tools/TestWebKitAPI/CMakeLists.txt
Tools/TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj
Tools/TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj
Tools/TestWebKitAPI/Tests/WTF/Lock.cpp [new file with mode: 0644]

index d9600cc..99548ef 100644 (file)
@@ -1,3 +1,28 @@
+2015-08-05  Filip Pizlo  <fpizlo@apple.com>
+
+        Lightweight locks should be adaptive
+        https://bugs.webkit.org/show_bug.cgi?id=147545
+
+        Reviewed by Geoffrey Garen.
+
+        * heap/CopiedBlock.h:
+        (JSC::CopiedBlock::workListLock):
+        * heap/CopiedBlockInlines.h:
+        (JSC::CopiedBlock::shouldReportLiveBytes):
+        (JSC::CopiedBlock::reportLiveBytes):
+        * heap/CopiedSpace.h:
+        (JSC::CopiedSpace::CopiedGeneration::CopiedGeneration):
+        * heap/CopiedSpaceInlines.h:
+        (JSC::CopiedSpace::recycleEvacuatedBlock):
+        * heap/GCThreadSharedData.h:
+        (JSC::GCThreadSharedData::getNextBlocksToCopy):
+        * heap/ListableHandler.h:
+        (JSC::ListableHandler::List::addThreadSafe):
+        (JSC::ListableHandler::List::addNotThreadSafe):
+        * heap/SlotVisitorInlines.h:
+        (JSC::SlotVisitor::copyLater):
+        * runtime/TypeProfilerLog.h:
+
 2015-08-06  Sukolsak Sakshuwong  <sukolsak@gmail.com>
 
         Parse the entire WebAssembly modules
index 96ac252..cd2a12c 100644 (file)
@@ -34,7 +34,7 @@
 
 namespace JSC { namespace DFG {
 
-static StaticSpinLock crashLock;
+static StaticLock crashLock;
 
 void startCrashing()
 {
index f636529..3be585c 100644 (file)
@@ -31,7 +31,7 @@
 #include "Options.h"
 #include <wtf/Atomics.h>
 #include <wtf/DoublyLinkedList.h>
-#include <wtf/SpinLock.h>
+#include <wtf/Lock.h>
 
 namespace JSC {
 
@@ -54,8 +54,8 @@ public:
     void didPromote();
 
     unsigned liveBytes();
-    bool shouldReportLiveBytes(SpinLockHolder&, JSCell* owner);
-    void reportLiveBytes(SpinLockHolder&, JSCell*, CopyToken, unsigned);
+    bool shouldReportLiveBytes(LockHolder&, JSCell* owner);
+    void reportLiveBytes(LockHolder&, JSCell*, CopyToken, unsigned);
     void reportLiveBytesDuringCopying(unsigned);
     void didSurviveGC();
     void didEvacuateBytes(unsigned);
@@ -85,7 +85,7 @@ public:
 
     bool hasWorkList();
     CopyWorkList& workList();
-    SpinLock& workListLock() { return m_workListLock; }
+    Lock& workListLock() { return m_workListLock; }
 
 private:
     CopiedBlock(size_t);
@@ -98,7 +98,7 @@ private:
 
     size_t m_capacity;
 
-    SpinLock m_workListLock;
+    Lock m_workListLock;
     std::unique_ptr<CopyWorkList> m_workList;
 
     size_t m_remaining;
index 2db0cd1..3d11334 100644 (file)
@@ -33,7 +33,7 @@
 
 namespace JSC {
     
-inline bool CopiedBlock::shouldReportLiveBytes(SpinLockHolder&, JSCell* owner)
+inline bool CopiedBlock::shouldReportLiveBytes(LockHolder&, JSCell* owner)
 {
     // We want to add to live bytes if the owner isn't part of the remembered set or
     // if this block was allocated during the last cycle. 
@@ -43,7 +43,7 @@ inline bool CopiedBlock::shouldReportLiveBytes(SpinLockHolder&, JSCell* owner)
     return !Heap::isRemembered(owner) || !m_isOld;
 }
 
-inline void CopiedBlock::reportLiveBytes(SpinLockHolder&, JSCell* owner, CopyToken token, unsigned bytes)
+inline void CopiedBlock::reportLiveBytes(LockHolder&, JSCell* owner, CopyToken token, unsigned bytes)
 {
     checkConsistency();
 #ifndef NDEBUG
index 9592eaa..3f265e7 100644 (file)
@@ -191,7 +191,7 @@ void CopiedSpace::doneFillingBlock(CopiedBlock* block, CopiedBlock** exchange)
 
     {
         // Always put the block into the old gen because it's being promoted!
-        SpinLockHolder locker(&m_toSpaceLock);
+        LockHolder locker(&m_toSpaceLock);
         m_oldGen.toSpace->push(block);
         m_blockSet.add(block);
         m_oldGen.blockFilter.add(reinterpret_cast<Bits>(block));
index db1be18..d18d1fc 100644 (file)
@@ -33,9 +33,9 @@
 #include <wtf/CheckedBoolean.h>
 #include <wtf/DoublyLinkedList.h>
 #include <wtf/HashSet.h>
+#include <wtf/Lock.h>
 #include <wtf/OSAllocator.h>
 #include <wtf/PageBlock.h>
-#include <wtf/SpinLock.h>
 #include <wtf/StdLibExtras.h>
 #include <wtf/ThreadingPrimitives.h>
 
@@ -113,7 +113,7 @@ private:
 
     HashSet<CopiedBlock*> m_blockSet;
 
-    SpinLock m_toSpaceLock;
+    Lock m_toSpaceLock;
 
     struct CopiedGeneration {
         CopiedGeneration()
index 3faf342..8f4058b 100644 (file)
@@ -98,7 +98,7 @@ inline void CopiedSpace::recycleEvacuatedBlock(CopiedBlock* block, HeapOperation
     ASSERT(block->canBeRecycled());
     ASSERT(!block->m_isPinned);
     {
-        SpinLockHolder locker(&m_toSpaceLock);
+        LockHolder locker(&m_toSpaceLock);
         m_blockSet.remove(block);
         if (collectionType == EdenCollection)
             m_newGen.fromSpace->remove(block);
index 2286370..824b1ac 100644 (file)
@@ -178,7 +178,7 @@ void GCThreadSharedData::didFinishMarking()
 void GCThreadSharedData::didStartCopying()
 {
     {
-        SpinLockHolder locker(&m_copyLock);
+        LockHolder locker(&m_copyLock);
         if (m_vm->heap.operationInProgress() == EdenCollection) {
             // Reset the vector to be empty, but don't throw away the backing store.
             m_blocksToCopy.shrink(0);
index d6c960a..0c5ad1d 100644 (file)
@@ -33,7 +33,7 @@
 #include "WeakReferenceHarvester.h"
 #include <condition_variable>
 #include <wtf/HashSet.h>
-#include <wtf/SpinLock.h>
+#include <wtf/Lock.h>
 #include <wtf/Vector.h>
 
 namespace JSC {
@@ -97,7 +97,7 @@ private:
     std::mutex m_opaqueRootsMutex;
     HashSet<void*> m_opaqueRoots;
 
-    SpinLock m_copyLock;
+    Lock m_copyLock;
     Vector<CopiedBlock*> m_blocksToCopy;
     size_t m_copyIndex;
     static const size_t s_blockFragmentLength = 32;
@@ -115,7 +115,7 @@ private:
 
 inline void GCThreadSharedData::getNextBlocksToCopy(size_t& start, size_t& end)
 {
-    SpinLockHolder locker(&m_copyLock);
+    LockHolder locker(&m_copyLock);
     start = m_copyIndex;
     end = std::min(m_blocksToCopy.size(), m_copyIndex + s_blockFragmentLength);
     m_copyIndex = end;
index 560a356..d653f05 100644 (file)
@@ -21,9 +21,9 @@
 #define ListableHandler_h
 
 #include <stdint.h>
+#include <wtf/Lock.h>
 #include <wtf/Locker.h>
 #include <wtf/Noncopyable.h>
-#include <wtf/SpinLock.h>
 #include <wtf/ThreadingPrimitives.h>
 
 namespace JSC {
@@ -65,7 +65,7 @@ private:
         
         void addThreadSafe(T* handler)
         {
-            SpinLockHolder locker(&m_lock);
+            LockHolder locker(&m_lock);
             addNotThreadSafe(handler);
         }
         
@@ -103,7 +103,7 @@ private:
             m_first = handler;
         }
         
-        SpinLock m_lock;
+        Lock m_lock;
         T* m_first;
     };
     
index 4ae16a9..168f82a 100644 (file)
@@ -568,8 +568,8 @@ bool MachineThreads::tryCopyOtherThreadStacks(MutexLocker&, void* buffer, size_t
 {
     // Prevent two VMs from suspending each other's threads at the same time,
     // which can cause deadlock: <rdar://problem/20300842>.
-    static StaticSpinLock mutex;
-    std::lock_guard<StaticSpinLock> lock(mutex);
+    static StaticLock mutex;
+    std::lock_guard<StaticLock> lock(mutex);
 
     *size = 0;
 
index 1d688f8..548b99d 100644 (file)
@@ -250,7 +250,7 @@ inline void SlotVisitor::copyLater(JSCell* owner, CopyToken token, void* ptr, si
 
     ASSERT(heap()->m_storageSpace.contains(block));
 
-    SpinLockHolder locker(&block->workListLock());
+    LockHolder locker(&block->workListLock());
     if (heap()->operationInProgress() == FullCollection || block->shouldReportLiveBytes(locker, owner)) {
         m_bytesCopied += bytes;
         block->reportLiveBytes(locker, owner, token, bytes);
index c965245..19ffb33 100644 (file)
@@ -27,7 +27,7 @@
 #include "SourceProvider.h"
 
 #include "JSCInlines.h"
-#include <wtf/SpinLock.h>
+#include <wtf/Lock.h>
 #include <wtf/StdLibExtras.h>
 
 namespace JSC {
@@ -44,11 +44,11 @@ SourceProvider::~SourceProvider()
 {
 }
 
-static StaticSpinLock providerIdLock;
+static StaticLock providerIdLock;
 
 void SourceProvider::getID()
 {
-    SpinLockHolder lock(&providerIdLock);
+    LockHolder lock(&providerIdLock);
     if (!m_id) {
         static intptr_t nextProviderID = 0;
         m_id = ++nextProviderID;
index 236a5c8..d78305a 100644 (file)
@@ -35,7 +35,7 @@ namespace JSC { namespace Profiler {
 
 static std::atomic<int> databaseCounter;
 
-static StaticSpinLock registrationLock;
+static StaticLock registrationLock;
 static std::atomic<int> didRegisterAtExit;
 static Database* firstDatabase;
 
@@ -138,14 +138,14 @@ void Database::addDatabaseToAtExit()
     if (++didRegisterAtExit == 1)
         atexit(atExitCallback);
     
-    SpinLockHolder holder(registrationLock);
+    LockHolder holder(registrationLock);
     m_nextRegisteredDatabase = firstDatabase;
     firstDatabase = this;
 }
 
 void Database::removeDatabaseFromAtExit()
 {
-    SpinLockHolder holder(registrationLock);
+    LockHolder holder(registrationLock);
     for (Database** current = &firstDatabase; *current; current = &(*current)->m_nextRegisteredDatabase) {
         if (*current != this)
             continue;
@@ -163,7 +163,7 @@ void Database::performAtExitSave() const
 
 Database* Database::removeFirstAtExitDatabase()
 {
-    SpinLockHolder holder(registrationLock);
+    LockHolder holder(registrationLock);
     Database* result = firstDatabase;
     if (result) {
         firstDatabase = result->m_nextRegisteredDatabase;
index b3a1948..1dd188a 100644 (file)
@@ -32,7 +32,6 @@
 #include "JSCJSValue.h"
 #include "Structure.h"
 #include "TypeProfiler.h"
-#include <wtf/ByteSpinLock.h>
 
 namespace JSC {
 
index 721faa1..70ec738 100644 (file)
@@ -1,5 +1,99 @@
 2015-08-05  Filip Pizlo  <fpizlo@apple.com>
 
+        Lightweight locks should be adaptive
+        https://bugs.webkit.org/show_bug.cgi?id=147545
+
+        Reviewed by Geoffrey Garen.
+
+        A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition
+        overhead is lower than system locks and because they take dramatically less space than system
+        locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock
+        acquire is up to 10x faster and under microcontention - short critical section with two or
+        more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4
+        bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit
+        should continue to avoid system locks - they are just far too slow and far too big.
+
+        But there is a problem with this idiom. System lock implementations will sleep a thread when
+        it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU.
+        In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for
+        microcontention, but awful when the lock will not be released for a while. In fact, when
+        critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is
+        almost 100x more than the CPU time cost of a system lock. This case doesn't arise too
+        frequently in our current uses of spinlocks, but that's probably because right now there are
+        places where we make a conscious decision to use system locks - even though they use more
+        memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a
+        while to acquire the lock.
+
+        The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new
+        concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks
+        that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The
+        idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held,
+        the slow path tries some number of spins to acquire the lock, and if that fails, the thread is
+        put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and
+        the lock itself is a tagged pointer: either it is just bits telling us the complete lock state
+        (not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the
+        lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path
+        if the lock is not contended, a short burst of spinning for microcontention, and a full-blown
+        queue for critical sections that are held for a long time.
+
+        On a locking microbenchmark, this new Lock exhibits the following performance
+        characteristics:
+
+        - Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster
+          than a system mutex.
+
+        - Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster
+          than a system mutex.
+
+        - CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a
+          SpinLock.
+
+        - Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than
+          a SpinLock.
+
+        This patch replaces all uses of SpinLock with Lock, since our critical sections are not
+        no-ops so if you do basically anything in your critical section, the Lock overhead will be
+        invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead
+        instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is
+        as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time.
+        This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits
+        of having a lock that just uses a byte are still better than the CPU wastage benefits of
+        Lock. But, this work will enable some future work to create locks that will fit in just 1.6
+        bits: https://bugs.webkit.org/show_bug.cgi?id=147665.
+
+        [1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf
+
+        * WTF.vcxproj/WTF.vcxproj:
+        * WTF.xcodeproj/project.pbxproj:
+        * benchmarks: Added.
+        * benchmarks/LockSpeedTest.cpp: Added.
+        (main):
+        * wtf/CMakeLists.txt:
+        * wtf/Lock.cpp: Added.
+        (WTF::LockBase::lockSlow):
+        (WTF::LockBase::unlockSlow):
+        * wtf/Lock.h: Added.
+        (WTF::LockBase::lock):
+        (WTF::LockBase::unlock):
+        (WTF::LockBase::isHeld):
+        (WTF::Lock::Lock):
+        * wtf/MetaAllocator.cpp:
+        (WTF::MetaAllocator::release):
+        (WTF::MetaAllocatorHandle::shrink):
+        (WTF::MetaAllocator::allocate):
+        (WTF::MetaAllocator::currentStatistics):
+        (WTF::MetaAllocator::addFreshFreeSpace):
+        (WTF::MetaAllocator::debugFreeSpaceSize):
+        * wtf/MetaAllocator.h:
+        * wtf/SpinLock.h:
+        * wtf/ThreadingPthreads.cpp:
+        * wtf/ThreadingWin.cpp:
+        * wtf/text/AtomicString.cpp:
+        * wtf/text/AtomicStringImpl.cpp:
+        (WTF::AtomicStringTableLocker::AtomicStringTableLocker):
+
+2015-08-05  Filip Pizlo  <fpizlo@apple.com>
+
         Unreviewed, roll out http://trac.webkit.org/changeset/187972.
 
         * wtf/Atomics.cpp:
index 5a2d253..a481216 100644 (file)
@@ -1,4 +1,4 @@
-<?xml version="1.0" encoding="utf-8"?>
+<?xml version="1.0" encoding="utf-8"?>
 <Project DefaultTargets="Build" ToolsVersion="12.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
   <ItemGroup Label="ProjectConfigurations">
     <ProjectConfiguration Include="DebugSuffix|Win32">
     <ClCompile Include="..\wtf\glib\GThreadSafeMainLoopSource.cpp" />
     <ClCompile Include="..\wtf\GregorianDateTime.cpp" />
     <ClCompile Include="..\wtf\HashTable.cpp" />
+    <ClCompile Include="..\wtf\Lock.cpp" />
     <ClCompile Include="..\wtf\MainThread.cpp" />
     <ClCompile Include="..\wtf\MD5.cpp" />
     <ClCompile Include="..\wtf\MediaTime.cpp" />
     <ClInclude Include="..\wtf\IteratorAdaptors.h" />
     <ClInclude Include="..\wtf\IteratorRange.h" />
     <ClInclude Include="..\wtf\ListHashSet.h" />
+    <ClInclude Include="..\wtf\Lock.h" />
     <ClInclude Include="..\wtf\Locker.h" />
     <ClInclude Include="..\wtf\MainThread.h" />
     <ClInclude Include="..\wtf\MallocPtr.h" />
index 1557746..9c068b0 100644 (file)
@@ -40,6 +40,8 @@
                0FD81AC5154FB22E00983E72 /* FastBitVector.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FD81AC4154FB22E00983E72 /* FastBitVector.h */; settings = {ATTRIBUTES = (); }; };
                0FDDBFA71666DFA300C55FEF /* StringPrintStream.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FDDBFA51666DFA300C55FEF /* StringPrintStream.cpp */; };
                0FDDBFA81666DFA300C55FEF /* StringPrintStream.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FDDBFA61666DFA300C55FEF /* StringPrintStream.h */; };
+               0FE1646A1B6FFC9600400E7C /* Lock.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FE164681B6FFC9600400E7C /* Lock.cpp */; };
+               0FE1646B1B6FFC9600400E7C /* Lock.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FE164691B6FFC9600400E7C /* Lock.h */; };
                0FED67B61B22D4D80066CE15 /* TinyPtrSet.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FED67B51B22D4D80066CE15 /* TinyPtrSet.h */; };
                14022F4118F5C3FC007FF0EB /* libbmalloc.a in Frameworks */ = {isa = PBXBuildFile; fileRef = 14022F4018F5C3FC007FF0EB /* libbmalloc.a */; };
                143F611F1565F0F900DB514A /* RAMSize.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 143F611D1565F0F900DB514A /* RAMSize.cpp */; };
                0FD81AC4154FB22E00983E72 /* FastBitVector.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = FastBitVector.h; sourceTree = "<group>"; };
                0FDDBFA51666DFA300C55FEF /* StringPrintStream.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = StringPrintStream.cpp; sourceTree = "<group>"; };
                0FDDBFA61666DFA300C55FEF /* StringPrintStream.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = StringPrintStream.h; sourceTree = "<group>"; };
+               0FE164681B6FFC9600400E7C /* Lock.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = Lock.cpp; sourceTree = "<group>"; };
+               0FE164691B6FFC9600400E7C /* Lock.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = Lock.h; sourceTree = "<group>"; };
                0FEC3EE4171B834700FDAC8D /* ByteSpinLock.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; path = ByteSpinLock.h; sourceTree = "<group>"; };
                0FED67B51B22D4D80066CE15 /* TinyPtrSet.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = TinyPtrSet.h; sourceTree = "<group>"; };
                14022F4018F5C3FC007FF0EB /* libbmalloc.a */ = {isa = PBXFileReference; lastKnownFileType = archive.ar; path = libbmalloc.a; sourceTree = BUILT_PRODUCTS_DIR; };
                                7CDD7FF9186D2A54007433CD /* IteratorRange.h */,
                                A70DA0831799F04D00529A9B /* ListDump.h */,
                                A8A472C1151A825A004123FF /* ListHashSet.h */,
+                               0FE164681B6FFC9600400E7C /* Lock.cpp */,
+                               0FE164691B6FFC9600400E7C /* Lock.h */,
                                A8A472C3151A825A004123FF /* Locker.h */,
                                1447AEC818FCE59400B3D7FF /* mbmalloc.cpp */,
                                A8A472CA151A825B004123FF /* MD5.cpp */,
                                A8A4746A151A825B004123FF /* UTF8.h in Headers */,
                                A8A473B9151A825B004123FF /* utils.h in Headers */,
                                A8A4747D151A825B004123FF /* ValueCheck.h in Headers */,
+                               0FE1646B1B6FFC9600400E7C /* Lock.h in Headers */,
                                A8A4747E151A825B004123FF /* Vector.h in Headers */,
                                A8A4747F151A825B004123FF /* VectorTraits.h in Headers */,
                                0FED67B61B22D4D80066CE15 /* TinyPtrSet.h in Headers */,
                                A8A4739E151A825B004123FF /* DataLog.cpp in Sources */,
                                A8A473A0151A825B004123FF /* DateMath.cpp in Sources */,
                                A8A473A2151A825B004123FF /* DecimalNumber.cpp in Sources */,
+                               0FE1646A1B6FFC9600400E7C /* Lock.cpp in Sources */,
                                A8A473AE151A825B004123FF /* diy-fp.cc in Sources */,
                                A8A473B0151A825B004123FF /* double-conversion.cc in Sources */,
                                A8A473BA151A825B004123FF /* dtoa.cpp in Sources */,
diff --git a/Source/WTF/benchmarks/LockSpeedTest.cpp b/Source/WTF/benchmarks/LockSpeedTest.cpp
new file mode 100644 (file)
index 0000000..fcc2583
--- /dev/null
@@ -0,0 +1,137 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include "config.h"
+
+#include <unistd.h>
+#include <wtf/CurrentTime.h>
+#include <wtf/Lock.h>
+#include <wtf/SpinLock.h>
+#include <wtf/StdLibExtras.h>
+#include <wtf/Threading.h>
+#include <wtf/ThreadingPrimitives.h>
+
+namespace {
+
+unsigned numThreadGroups;
+unsigned numThreadsPerGroup;
+unsigned workPerCriticalSection;
+unsigned numNoiseThreads;
+unsigned numIterations;
+    
+void usage()
+{
+    printf("Usage: LockSpeedTest spinlock|lock|mutex|all <num thread groups> <num threads per group> <work per critical section> <num noise threads> <num iterations>\n");
+    exit(1);
+}
+
+template<typename LockType>
+void runBenchmark(const char* name)
+{
+    std::unique_ptr<LockType[]> locks = std::make_unique<LockType[]>(numThreadGroups);
+    std::unique_ptr<double[]> words = std::make_unique<double[]>(numThreadGroups);
+    std::unique_ptr<ThreadIdentifier[]> threads = std::make_unique<ThreadIdentifier[]>(numThreadGroups * numThreadsPerGroup);
+    std::unique_ptr<ThreadIdentifier[]> noiseThreads = std::make_unique<ThreadIdentifier[]>(numNoiseThreads);
+    std::unique_ptr<double[]> noiseCounts = std::make_unique<double[]>(numNoiseThreads);
+
+    volatile bool shouldStop = false;
+    for (unsigned threadIndex = numNoiseThreads; threadIndex--;) {
+        noiseCounts[threadIndex] = 0;
+        noiseThreads[threadIndex] = createThread(
+            "Noise Thread",
+            [&shouldStop, &noiseCounts, threadIndex] () {
+                while (!shouldStop)
+                    noiseCounts[threadIndex]++;
+            });
+    }
+
+    double before = monotonicallyIncreasingTimeMS();
+    
+    for (unsigned threadGroupIndex = numThreadGroups; threadGroupIndex--;) {
+        words[threadGroupIndex] = 0;
+
+        for (unsigned threadIndex = numThreadsPerGroup; threadIndex--;) {
+            threads[threadGroupIndex * numThreadsPerGroup + threadIndex] = createThread(
+                "Benchmark thread",
+                [threadGroupIndex, &locks, &words] () {
+                    for (unsigned i = numIterations; i--;) {
+                        locks[threadGroupIndex].lock();
+                        for (unsigned j = workPerCriticalSection; j--;) {
+                            words[threadGroupIndex]++;
+                            words[threadGroupIndex] *= 1.01;
+                        }
+                        locks[threadGroupIndex].unlock();
+                    }
+                });
+        }
+    }
+
+    for (unsigned threadIndex = numThreadGroups * numThreadsPerGroup; threadIndex--;)
+        waitForThreadCompletion(threads[threadIndex]);
+    shouldStop = true;
+    double noiseCount = 0;
+    for (unsigned threadIndex = numNoiseThreads; threadIndex--;) {
+        waitForThreadCompletion(noiseThreads[threadIndex]);
+        noiseCount += noiseCounts[threadIndex];
+    }
+
+    double after = monotonicallyIncreasingTimeMS();
+
+    printf("%s: %.3lf ms, %.0lf noise.\n", name, after - before, noiseCount);
+}
+
+} // anonymous namespace
+
+int main(int argc, char** argv)
+{
+    WTF::initializeThreading();
+    
+    if (argc != 7
+        || sscanf(argv[2], "%u", &numThreadGroups) != 1
+        || sscanf(argv[3], "%u", &numThreadsPerGroup) != 1
+        || sscanf(argv[4], "%u", &workPerCriticalSection) != 1
+        || sscanf(argv[5], "%u", &numNoiseThreads) != 1
+        || sscanf(argv[6], "%u", &numIterations) != 1)
+        usage();
+
+    bool didRun = false;
+    if (!strcmp(argv[1], "spinlock") || !strcmp(argv[1], "all")) {
+        runBenchmark<SpinLock>("SpinLock");
+        didRun = true;
+    }
+    if (!strcmp(argv[1], "lock") || !strcmp(argv[1], "all")) {
+        runBenchmark<Lock>("WTF Lock");
+        didRun = true;
+    }
+    if (!strcmp(argv[1], "mutex") || !strcmp(argv[1], "all")) {
+        runBenchmark<Mutex>("Platform Mutex");
+        didRun = true;
+    }
+
+    if (!didRun)
+        usage();
+
+    return 0;
+}
index 01d8bf1..5e9766d 100644 (file)
@@ -41,6 +41,7 @@ set(WTF_HEADERS
     IteratorAdaptors.h
     IteratorRange.h
     ListHashSet.h
+    Lock.h
     Locker.h
     MD5.h
     MainThread.h
@@ -156,6 +157,7 @@ set(WTF_SOURCES
     FunctionDispatcher.cpp
     GregorianDateTime.cpp
     HashTable.cpp
+    Lock.cpp
     MD5.cpp
     MainThread.cpp
     MediaTime.cpp
diff --git a/Source/WTF/wtf/Lock.cpp b/Source/WTF/wtf/Lock.cpp
new file mode 100644 (file)
index 0000000..10eb507
--- /dev/null
@@ -0,0 +1,253 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include "config.h"
+#include "Lock.h"
+
+#include "ThreadSpecific.h"
+#include "ThreadingPrimitives.h"
+#include <condition_variable>
+#include <mutex>
+#include <thread>
+
+namespace WTF {
+
+namespace {
+
+// This data structure serves three purposes:
+//
+// 1) A parking mechanism for threads that go to sleep. That involves just a system mutex and
+//    condition variable.
+//
+// 2) A queue node for when a thread is on some Lock's queue.
+//
+// 3) The queue head. This is kind of funky. When a thread is the head of a queue, it also serves as
+//    the basic queue bookkeeping data structure. When a thread is dequeued, the next thread in the
+//    queue takes on the queue head duties.
+struct ThreadData {
+    // The parking mechanism.
+    bool shouldPark { false };
+    std::mutex parkingLock;
+    std::condition_variable parkingCondition;
+
+    // The queue node.
+    ThreadData* nextInQueue { nullptr };
+
+    // The queue itself.
+    ThreadData* queueTail { nullptr };
+};
+
+ThreadSpecific<ThreadData>* threadData;
+
+ThreadData* myThreadData()
+{
+    static std::once_flag initializeOnce;
+    std::call_once(
+        initializeOnce,
+        [] {
+            threadData = new ThreadSpecific<ThreadData>();
+        });
+
+    return *threadData;
+}
+
+} // anonymous namespace
+
+void LockBase::lockSlow()
+{
+    unsigned spinCount = 0;
+
+    // This magic number turns out to be optimal based on past JikesRVM experiments.
+    const unsigned spinLimit = 40;
+    
+    for (;;) {
+        uintptr_t currentWordValue = m_word.load();
+        
+        if (!(currentWordValue & isLockedBit)) {
+            // It's not possible for someone to hold the queue lock while the lock itself is no longer
+            // held, since we will only attempt to acquire the queue lock when the lock is held and
+            // the queue lock prevents unlock.
+            ASSERT(!(currentWordValue & isQueueLockedBit));
+            if (m_word.compareExchangeWeak(currentWordValue, currentWordValue | isLockedBit)) {
+                // Success! We acquired the lock.
+                return;
+            }
+        }
+
+        // If there is no queue and we haven't spun too much, we can just try to spin around again.
+        if (!(currentWordValue & ~queueHeadMask) && spinCount < spinLimit) {
+            spinCount++;
+            std::this_thread::yield();
+            continue;
+        }
+
+        // Need to put ourselves on the queue. Create the queue if one does not exist. This requries
+        // owning the queue for a little bit. The lock that controls the queue is itself a spinlock.
+        // But before we acquire the queue spinlock, we make sure that we have a ThreadData for this
+        // thread.
+        ThreadData* me = myThreadData();
+        ASSERT(!me->shouldPark);
+        ASSERT(!me->nextInQueue);
+        ASSERT(!me->queueTail);
+
+        // Reload the current word value, since some time may have passed.
+        currentWordValue = m_word.load();
+
+        // We proceed only if the queue lock is not held, the Lock is held, and we succeed in
+        // acquiring the queue lock.
+        if ((currentWordValue & isQueueLockedBit)
+            || !(currentWordValue & isLockedBit)
+            || !m_word.compareExchangeWeak(currentWordValue, currentWordValue | isQueueLockedBit)) {
+            std::this_thread::yield();
+            continue;
+        }
+
+        me->shouldPark = true;
+
+        // We own the queue. Nobody can enqueue or dequeue until we're done. Also, it's not possible
+        // to release the Lock while we hold the queue lock.
+        ThreadData* queueHead = bitwise_cast<ThreadData*>(currentWordValue & ~queueHeadMask);
+        if (queueHead) {
+            // Put this thread at the end of the queue.
+            queueHead->queueTail->nextInQueue = me;
+            queueHead->queueTail = me;
+
+            // Release the queue lock.
+            currentWordValue = m_word.load();
+            ASSERT(currentWordValue & ~queueHeadMask);
+            ASSERT(currentWordValue & isQueueLockedBit);
+            ASSERT(currentWordValue & isLockedBit);
+            m_word.store(currentWordValue & ~isQueueLockedBit);
+        } else {
+            // Make this thread be the queue-head.
+            queueHead = me;
+            me->queueTail = me;
+
+            // Release the queue lock and install ourselves as the head. No need for a CAS loop, since
+            // we own the queue lock.
+            currentWordValue = m_word.load();
+            ASSERT(~(currentWordValue & ~queueHeadMask));
+            ASSERT(currentWordValue & isQueueLockedBit);
+            ASSERT(currentWordValue & isLockedBit);
+            uintptr_t newWordValue = currentWordValue;
+            newWordValue |= bitwise_cast<uintptr_t>(queueHead);
+            newWordValue &= ~isQueueLockedBit;
+            m_word.store(newWordValue);
+        }
+
+        // At this point everyone who acquires the queue lock will see me on the queue, and anyone who
+        // acquires me's lock will see that me wants to park. Note that shouldPark may have been
+        // cleared as soon as the queue lock was released above, but it will happen while the
+        // releasing thread holds me's parkingLock.
+
+        {
+            std::unique_lock<std::mutex> locker(me->parkingLock);
+            while (me->shouldPark)
+                me->parkingCondition.wait(locker);
+        }
+
+        ASSERT(!me->shouldPark);
+        ASSERT(!me->nextInQueue);
+        ASSERT(!me->queueTail);
+        
+        // Now we can loop around and try to acquire the lock again.
+    }
+}
+
+void LockBase::unlockSlow()
+{
+    // If the fast path failed, it can only be because someone put a thread on the queue or the queue
+    // lock is held. If the queue lock is held, it can only be because someone *will* enqueue a thread
+    // onto the queue.
+
+    // Acquire the queue lock.
+    for (;;) {
+        uintptr_t currentWordValue = m_word.load();
+
+        ASSERT(currentWordValue & isLockedBit);
+        
+        if (currentWordValue & isQueueLockedBit) {
+            std::this_thread::yield();
+            continue;
+        }
+
+        // If the queue lock is not held, then there must be an entry on the queue.
+        ASSERT(currentWordValue & ~queueHeadMask);
+
+        if (m_word.compareExchangeWeak(currentWordValue, currentWordValue | isQueueLockedBit))
+            break;
+    }
+
+    uintptr_t currentWordValue = m_word.load();
+        
+    // After we acquire the queue lock, the Lock must still be held and the queue must be
+    // non-empty. The queue must be non-empty since only the lockSlow() method could have held the
+    // queue lock and if it did then it only releases it after putting something on the queue.
+    ASSERT(currentWordValue & isLockedBit);
+    ASSERT(currentWordValue & isQueueLockedBit);
+    ThreadData* queueHead = bitwise_cast<ThreadData*>(currentWordValue & ~queueHeadMask);
+    ASSERT(queueHead);
+
+    ThreadData* newQueueHead = queueHead->nextInQueue;
+    // Either this was the only thread on the queue, in which case we delete the queue, or there
+    // are still more threads on the queue, in which case we create a new queue head.
+    if (newQueueHead)
+        newQueueHead->queueTail = queueHead->queueTail;
+
+    // Change the queue head, possibly removing it if newQueueHead is null. No need for a CAS loop,
+    // since we hold the queue lock and the lock itself so nothing about the lock can change right
+    // now.
+    currentWordValue = m_word.load();
+    ASSERT(currentWordValue & isLockedBit);
+    ASSERT(currentWordValue & isQueueLockedBit);
+    ASSERT((currentWordValue & ~queueHeadMask) == bitwise_cast<uintptr_t>(queueHead));
+    uintptr_t newWordValue = currentWordValue;
+    newWordValue &= ~isLockedBit; // Release the Lock.
+    newWordValue &= ~isQueueLockedBit; // Release the queue lock.
+    newWordValue &= queueHeadMask; // Clear out the old queue head.
+    newWordValue |= bitwise_cast<uintptr_t>(newQueueHead); // Install new queue head.
+    m_word.store(newWordValue);
+
+    // Now the lock is available for acquisition. But we just have to wake up the old queue head.
+    // After that, we're done!
+
+    queueHead->nextInQueue = nullptr;
+    queueHead->queueTail = nullptr;
+
+    // We do this carefully because this may run either before or during the parkingLock critical
+    // section in lockSlow().
+    {
+        std::unique_lock<std::mutex> locker(queueHead->parkingLock);
+        queueHead->shouldPark = false;
+    }
+    // Doesn't matter if we notify_all() or notify_one() here since the only thread that could be
+    // waiting is queueHead.
+    queueHead->parkingCondition.notify_one();
+
+    // The old queue head can now contend for the lock again. We're done!
+}
+
+} // namespace WTF
+
diff --git a/Source/WTF/wtf/Lock.h b/Source/WTF/wtf/Lock.h
new file mode 100644 (file)
index 0000000..0c340a2
--- /dev/null
@@ -0,0 +1,168 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#ifndef WTF_Lock_h
+#define WTF_Lock_h
+
+#include <wtf/Atomics.h>
+#include <wtf/Compiler.h>
+#include <wtf/Locker.h>
+#include <wtf/Noncopyable.h>
+
+namespace WTF {
+
+// A Lock is a fully adaptive mutex that gives you the best of SpinLock and Mutex. For small critical
+// sections (that take nanoseconds), it will usually perform within 2x of a SpinLock in both the
+// contended and uncontended case. When using a Mutex, such critical sections take up to 100x longer
+// than Lock in the contended case, or 3x longer than Lock in the uncontended case. For longer
+// critical sections (that take tens of microseconds), it will perform as well as a Mutex and slightly
+// better than a SpinLock. But, crucially, a SpinLock will burn up to 90x more time in the kernel for
+// such critical sections than either Lock or Mutex. Hence, using Lock will make the common case of
+// locking perform close to SpinLock for any critical section that does more than a few nanoseconds of
+// work while being as kind to the scheduler for longer critical sections as a Mutex.
+//
+// Like SpinLock, Lock takes very little memory - just sizeof(void*), though see a detailed caveat
+// below.
+//
+// Generally, you should use Lock instead of SpinLock because while it penalizes you slightly, you
+// make up for it by not wasting CPU cycles in case of contention.
+//
+// The Lock has the following nice properties:
+//
+// - Uncontended fast paths for lock acquisition and lock release that are almost as fast as the
+//   uncontended fast paths of a spinlock. The only overhead is that the spinlock will not CAS on
+//   release, while Lock will CAS. This overhead *can* slow things down for extremely small critical
+//   sections that do little or nothing - it makes them 2x slower since in that case, CAS is the most
+//   expensive instruction and having two of them is twice as bad as just having one. However, this
+//   lock implementation is still almost 3x faster than a platform mutex in those cases. It's unlikely
+//   that you'll encounter no-op critical sections, so usually, this lock is better than a spinlock.
+//
+// - Contended fast path that attempts to spin and yield for some number of times. For critical
+//   sections that are held only briefly, this allows Lock to perform almost as well as a SpinLock.
+//   SpinLock can still be almost 2x faster than Lock if the critical section is a no-op, but this
+//   advantage diminishes as the critical section grows.
+//
+// - Contended slow path that enqueues the contending thread and causes it to wait on a condition
+//   variable until the lock is released. This is the only case in which system mutexes and condition
+//   variables are used. This case is rare and self-limiting: it will only happen when a lock is held
+//   for long enough that spinning some number of times doesn't acquire it. The fact that Lock does
+//   this as a fallback when spinning for some number of times fails means that it will burn
+//   dramatically fewer CPU cycles - for example with 10 threads on an 8 logical CPU machine acquiring
+//   a critical section that takes 50 microseconds, the WTF SpinLock will cause 90x more time to be
+//   spent in the kernel than Lock.
+//
+// - Very low memory usage. Each Lock requires only sizeof(void*) memory. When the contended slow
+//   path is activated, Lock only relies on each thread having a preallocated thread-specific data
+//   structure called ThreadData that, together with the Lock itself, is used to build up a thread
+//   queue. So, the total memory usage of all Locks is still bounded by:
+//
+//       numberOfLocks * sizeof(void*) + numberOfThreads * sizeof(ThreadData)
+//
+//   Where ThreadData is a decently large data structure, but we will only ever have one per thread,
+//   regardless of the number of Locks in memory. Another way to view this is that the worst case
+//   memory usage per Lock is:
+//
+//       sizeof(void*) + numberOfThreads / numberOfLocks * sizeof(ThreadData)
+//
+//   So, unless you have a small number of Locks (or, a large number of threads, which is far less
+//   likely), the memory usage per-Lock is still going to be somewhere around sizeof(void*).
+//
+// - Barging fast paths. The Lock is tuned for maximum throughput rather than maximum fairness. If
+//   a thread releases a Lock that was contended and had a queue of waiting threads, then it will
+//   wake up the head of the queue, but it will also mark the lock as being available. This means that
+//   some other thread that is just now attempting to acquire the lock may get it before the thread
+//   that got woken up. When a thread barges into the lock, the thread that got woken up will simply
+//   go back to the end of the queue. The barging behavior ends up being probabilistic on most
+//   platforms and even though it may be unfair to some thread at some moment in time, it will rarely
+//   have a long streak of unfairness towards any particular thread: eventually each thread waiting on
+//   the lock will get to have a turn so long as no thread just holds the lock forever. That said,
+//   there *is* a chance of pathologies - users of Lock should not depend on first-in, first-out lock
+//   acquisition order under contention. The same caveat is generally true of SpinLock and platform
+//   mutexes on some platforms.
+
+// This is a struct without a constructor or destructor so that it can be statically initialized.
+// Use Lock in instance variables.
+struct LockBase {
+    void lock()
+    {
+        if (LIKELY(m_word.compareExchangeWeak(0, isLockedBit, std::memory_order_acquire))) {
+            // Lock acquired!
+            return;
+        }
+
+        lockSlow();
+    }
+
+    void unlock()
+    {
+        if (LIKELY(m_word.compareExchangeWeak(isLockedBit, 0, std::memory_order_release))) {
+            // Lock released, and nobody was waiting!
+            return;
+        }
+
+        unlockSlow();
+    }
+
+    bool isHeld() const
+    {
+        return m_word.load(std::memory_order_acquire) & isLockedBit;
+    }
+
+    bool isLocked() const
+    {
+        return isHeld();
+    }
+
+protected:
+    static const uintptr_t isLockedBit = 1;
+    static const uintptr_t isQueueLockedBit = 2;
+    static const uintptr_t queueHeadMask = 3;
+
+    WTF_EXPORT_PRIVATE void lockSlow();
+    WTF_EXPORT_PRIVATE void unlockSlow();
+
+    Atomic<uintptr_t> m_word;
+};
+
+class Lock : public LockBase {
+    WTF_MAKE_NONCOPYABLE(Lock);
+public:
+    Lock()
+    {
+        m_word.store(0, std::memory_order_relaxed);
+    }
+};
+
+typedef LockBase StaticLock;
+typedef Locker<LockBase> LockHolder;
+
+} // namespace WTF
+
+using WTF::StaticLock;
+using WTF::Lock;
+using WTF::LockHolder;
+
+#endif // WTF_Lock_h
+
index 4408572..6eb29e0 100644 (file)
@@ -60,7 +60,7 @@ void MetaAllocatorTracker::release(MetaAllocatorHandle* handle)
 
 ALWAYS_INLINE void MetaAllocator::release(MetaAllocatorHandle* handle)
 {
-    SpinLockHolder locker(&m_lock);
+    LockHolder locker(&m_lock);
     if (handle->sizeInBytes()) {
         decrementPageOccupancy(handle->start(), handle->sizeInBytes());
         addFreeSpaceFromReleasedHandle(handle->start(), handle->sizeInBytes());
@@ -91,7 +91,7 @@ void MetaAllocatorHandle::shrink(size_t newSizeInBytes)
 {
     ASSERT(newSizeInBytes <= m_sizeInBytes);
     
-    SpinLockHolder locker(&m_allocator->m_lock);
+    LockHolder locker(&m_allocator->m_lock);
 
     newSizeInBytes = m_allocator->roundUp(newSizeInBytes);
     
@@ -150,7 +150,7 @@ MetaAllocator::MetaAllocator(size_t allocationGranule, size_t pageSize)
 
 PassRefPtr<MetaAllocatorHandle> MetaAllocator::allocate(size_t sizeInBytes, void* ownerUID)
 {
-    SpinLockHolder locker(&m_lock);
+    LockHolder locker(&m_lock);
 
     if (!sizeInBytes)
         return 0;
@@ -196,7 +196,7 @@ PassRefPtr<MetaAllocatorHandle> MetaAllocator::allocate(size_t sizeInBytes, void
 
 MetaAllocator::Statistics MetaAllocator::currentStatistics()
 {
-    SpinLockHolder locker(&m_lock);
+    LockHolder locker(&m_lock);
     Statistics result;
     result.bytesAllocated = m_bytesAllocated;
     result.bytesReserved = m_bytesReserved;
@@ -281,7 +281,7 @@ void MetaAllocator::addFreeSpaceFromReleasedHandle(void* start, size_t sizeInByt
 
 void MetaAllocator::addFreshFreeSpace(void* start, size_t sizeInBytes)
 {
-    SpinLockHolder locker(&m_lock);
+    LockHolder locker(&m_lock);
     m_bytesReserved += sizeInBytes;
     addFreeSpace(start, sizeInBytes);
 }
@@ -289,7 +289,7 @@ void MetaAllocator::addFreshFreeSpace(void* start, size_t sizeInBytes)
 size_t MetaAllocator::debugFreeSpaceSize()
 {
 #ifndef NDEBUG
-    SpinLockHolder locker(&m_lock);
+    LockHolder locker(&m_lock);
     size_t result = 0;
     for (FreeSpaceNode* node = m_freeSpaceSizeMap.first(); node; node = node->successor())
         result += node->m_sizeInBytes;
index 2ce9180..2d753cf 100644 (file)
 
 #include <wtf/Assertions.h>
 #include <wtf/HashMap.h>
+#include <wtf/Lock.h>
 #include <wtf/MetaAllocatorHandle.h>
 #include <wtf/Noncopyable.h>
 #include <wtf/PageBlock.h>
 #include <wtf/RedBlackTree.h>
 #include <wtf/RefCounted.h>
 #include <wtf/RefPtr.h>
-#include <wtf/SpinLock.h>
 
 namespace WTF {
 
@@ -183,7 +183,7 @@ private:
     size_t m_bytesReserved;
     size_t m_bytesCommitted;
     
-    SpinLock m_lock;
+    Lock m_lock;
 
     MetaAllocatorTracker* m_tracker;
 
index 90944e6..58604b0 100644 (file)
 
 namespace WTF {
 
+// SpinLock is a very simple lock implementation that has extremely fast lock/unlock for very small
+// uncontended critical sections. However, it will exhibit bad performance degradation when the lock
+// becomes contended: the thread trying to acquire the lock will simply waste CPU cycles.
+//
+// For most (all?) locking use cases, it's better to use Lock (see wtf/Lock.h). That uses only a bit
+// more memory (8 bytes instead of 4 on 64-bit), and is only a bit slower in the uncontended case
+// (Lock needs CAS to unlock, while SpinLock doesn't), but will burn a lot less CPU time - for 10
+// threads acquiring a 50 microsecond critical section, Lock will use up to 100x less kernel CPU time
+// than SpinLock.
+
 // SpinLockBase is a struct without an explicitly defined constructors so that
 // it can be initialized at compile time. See StaticSpinLock below.
 struct SpinLockBase {
index 106ed96..92e20e4 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2007, 2009 Apple Inc. All rights reserved.
+ * Copyright (C) 2007, 2009, 2015 Apple Inc. All rights reserved.
  * Copyright (C) 2007 Justin Haygood (jhaygood@reaktix.com)
  * Copyright (C) 2011 Research In Motion Limited. All rights reserved.
  *
index ea4bd43..ab40087 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2007, 2008 Apple Inc. All rights reserved.
+ * Copyright (C) 2007, 2008, 2015 Apple Inc. All rights reserved.
  * Copyright (C) 2009 Google Inc. All rights reserved.
  * Copyright (C) 2009 Torch Mobile, Inc. All rights reserved.
  *
index bb55b94..19c9a0e 100644 (file)
@@ -27,7 +27,7 @@
 #include "dtoa.h"
 
 #if USE(WEB_THREAD)
-#include "SpinLock.h"
+#include "Lock.h"
 #endif
 
 namespace WTF {
index af595a6..ff4cf23 100644 (file)
@@ -33,7 +33,7 @@
 #include <wtf/unicode/UTF8.h>
 
 #if USE(WEB_THREAD)
-#include "SpinLock.h"
+#include "Lock.h"
 #endif
 
 namespace WTF {
@@ -42,18 +42,18 @@ using namespace Unicode;
 
 #if USE(WEB_THREAD)
 
-class AtomicStringTableLocker : public SpinLockHolder {
+class AtomicStringTableLocker : public LockHolder {
     WTF_MAKE_NONCOPYABLE(AtomicStringTableLocker);
 
-    static StaticSpinLock s_stringTableLock;
+    static StaticLock s_stringTableLock;
 public:
     AtomicStringTableLocker()
-        : SpinLockHolder(&s_stringTableLock)
+        : LockHolder(&s_stringTableLock)
     {
     }
 };
 
-StaticSpinLock AtomicStringTableLocker::s_stringTableLock;
+StaticLock AtomicStringTableLocker::s_stringTableLock;
 
 #else
 
index 9c775f8..f1b9de0 100644 (file)
@@ -1,3 +1,21 @@
+2015-08-05  Filip Pizlo  <fpizlo@apple.com>
+
+        Lightweight locks should be adaptive
+        https://bugs.webkit.org/show_bug.cgi?id=147545
+
+        Reviewed by Geoffrey Garen.
+
+        No new tests because no new behavior.
+
+        * bindings/objc/WebScriptObject.mm:
+        (WebCore::getJSWrapper):
+        (WebCore::addJSWrapper):
+        (WebCore::removeJSWrapper):
+        (WebCore::removeJSWrapperIfRetainCountOne):
+        * platform/ios/wak/WAKWindow.mm:
+        (-[WAKWindow setExposedScrollViewRect:]):
+        (-[WAKWindow exposedScrollViewRect]):
+
 2015-08-06  Alex Christensen  <achristensen@webkit.org>
 
         [Win] Enable all Windows features in CMake
index e9e9e0d..57f3202 100644 (file)
@@ -48,7 +48,7 @@
 #import <runtime/JSLock.h>
 #import <runtime/Completion.h>
 #import <runtime/Completion.h>
-#import <wtf/SpinLock.h>
+#import <wtf/Lock.h>
 #import <wtf/Threading.h>
 #import <wtf/spi/cocoa/NSMapTableSPI.h>
 #import <wtf/text/WTFString.h>
@@ -72,12 +72,12 @@ using JSC::makeSource;
 namespace WebCore {
 
 static NSMapTable* JSWrapperCache;
-static StaticSpinLock spinLock;
+static StaticLock spinLock;
 
 NSObject* getJSWrapper(JSObject* impl)
 {
     ASSERT(isMainThread());
-    SpinLockHolder holder(&spinLock);
+    LockHolder holder(&spinLock);
 
     if (!JSWrapperCache)
         return nil;
@@ -88,7 +88,7 @@ NSObject* getJSWrapper(JSObject* impl)
 void addJSWrapper(NSObject* wrapper, JSObject* impl)
 {
     ASSERT(isMainThread());
-    SpinLockHolder holder(&spinLock);
+    LockHolder holder(&spinLock);
 
     if (!JSWrapperCache)
         JSWrapperCache = createWrapperCache();
@@ -97,7 +97,7 @@ void addJSWrapper(NSObject* wrapper, JSObject* impl)
 
 void removeJSWrapper(JSObject* impl)
 {
-    SpinLockHolder holder(&spinLock);
+    LockHolder holder(&spinLock);
 
     if (!JSWrapperCache)
         return;
@@ -106,7 +106,7 @@ void removeJSWrapper(JSObject* impl)
 
 static void removeJSWrapperIfRetainCountOne(NSObject* wrapper, JSObject* impl)
 {
-    SpinLockHolder holder(&spinLock);
+    LockHolder holder(&spinLock);
 
     if (!JSWrapperCache)
         return;
index 05442ad..ec8d041 100644 (file)
@@ -200,7 +200,7 @@ CARingBuffer::Error CARingBuffer::store(const AudioBufferList* list, size_t fram
 
 void CARingBuffer::setCurrentFrameBounds(uint64_t startTime, uint64_t endTime)
 {
-    ByteSpinLocker locker(m_currentFrameBoundsLock);
+    LockHolder locker(m_currentFrameBoundsLock);
     uint32_t nextPtr = m_timeBoundsQueuePtr + 1;
     uint32_t index = nextPtr & kGeneralRingTimeBoundsQueueMask;
 
@@ -212,7 +212,7 @@ void CARingBuffer::setCurrentFrameBounds(uint64_t startTime, uint64_t endTime)
 
 void CARingBuffer::getCurrentFrameBounds(uint64_t &startTime, uint64_t &endTime)
 {
-    ByteSpinLocker locker(m_currentFrameBoundsLock);
+    LockHolder locker(m_currentFrameBoundsLock);
     uint32_t curPtr = m_timeBoundsQueuePtr;
     uint32_t index = curPtr & kGeneralRingTimeBoundsQueueMask;
     CARingBuffer::TimeBounds& bounds = m_timeBoundsQueue[index];
index f4a00b8..b230b35 100644 (file)
@@ -29,7 +29,7 @@
 #if ENABLE(WEB_AUDIO) && USE(MEDIATOOLBOX)
 
 #include <runtime/ArrayBuffer.h>
-#include <wtf/ByteSpinLock.h>
+#include <wtf/Lock.h>
 #include <wtf/Vector.h>
 
 typedef struct AudioBufferList AudioBufferList;
@@ -84,7 +84,7 @@ private:
     };
     
     Vector<TimeBounds> m_timeBoundsQueue;
-    ByteSpinLock m_currentFrameBoundsLock;
+    Lock m_currentFrameBoundsLock;
     int32_t m_timeBoundsQueuePtr;
 };
 
index 423d3ee..e80ecfd 100644 (file)
@@ -36,7 +36,7 @@
 #import "WKContentObservation.h"
 #import "WKViewPrivate.h"
 #import <QuartzCore/QuartzCore.h>
-#import <wtf/SpinLock.h>
+#import <wtf/Lock.h>
 
 WEBCORE_EXPORT NSString * const WAKWindowScreenScaleDidChangeNotification = @"WAKWindowScreenScaleDidChangeNotification";
 WEBCORE_EXPORT NSString * const WAKWindowVisibilityDidChangeNotification = @"WAKWindowVisibilityDidChangeNotification";
@@ -56,7 +56,7 @@ static WebEvent *currentEvent = nil;
 static id<OrientationProvider> gOrientationProvider;
 
 @implementation WAKWindow {
-    SpinLock _exposedScrollViewRectLock;
+    Lock _exposedScrollViewRectLock;
     CGRect _exposedScrollViewRect;
 }
 
@@ -358,14 +358,14 @@ static id<OrientationProvider> gOrientationProvider;
 
 - (void)setExposedScrollViewRect:(CGRect)exposedScrollViewRect
 {
-    SpinLockHolder locker(&_exposedScrollViewRectLock);
+    LockHolder locker(&_exposedScrollViewRectLock);
     _exposedScrollViewRect = exposedScrollViewRect;
 }
 
 - (CGRect)exposedScrollViewRect
 {
     {
-        SpinLockHolder locker(&_exposedScrollViewRectLock);
+        LockHolder locker(&_exposedScrollViewRectLock);
         if (!CGRectIsNull(_exposedScrollViewRect))
             return _exposedScrollViewRect;
     }
index cf6dfa9..4387026 100644 (file)
@@ -1,3 +1,21 @@
+2015-08-05  Filip Pizlo  <fpizlo@apple.com>
+
+        Lightweight locks should be adaptive
+        https://bugs.webkit.org/show_bug.cgi?id=147545
+
+        Reviewed by Geoffrey Garen.
+
+        * WebProcess/WebPage/EventDispatcher.cpp:
+        (WebKit::EventDispatcher::clearQueuedTouchEventsForPage):
+        (WebKit::EventDispatcher::getQueuedTouchEventsForPage):
+        (WebKit::EventDispatcher::touchEvent):
+        (WebKit::EventDispatcher::dispatchTouchEvents):
+        * WebProcess/WebPage/EventDispatcher.h:
+        * WebProcess/WebPage/ViewUpdateDispatcher.cpp:
+        (WebKit::ViewUpdateDispatcher::visibleContentRectUpdate):
+        (WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate):
+        * WebProcess/WebPage/ViewUpdateDispatcher.h:
+
 2015-08-06  Matt Rajca  <mrajca@apple.com>
 
         Media Session: notify focus manager clients when the playing state of the focused media element changes
index 5f3f7c7..ae3e757 100644 (file)
@@ -172,13 +172,13 @@ void EventDispatcher::wheelEvent(uint64_t pageID, const WebWheelEvent& wheelEven
 #if ENABLE(IOS_TOUCH_EVENTS)
 void EventDispatcher::clearQueuedTouchEventsForPage(const WebPage& webPage)
 {
-    SpinLockHolder locker(&m_touchEventsLock);
+    LockHolder locker(&m_touchEventsLock);
     m_touchEvents.remove(webPage.pageID());
 }
 
 void EventDispatcher::getQueuedTouchEventsForPage(const WebPage& webPage, TouchEventQueue& destinationQueue)
 {
-    SpinLockHolder locker(&m_touchEventsLock);
+    LockHolder locker(&m_touchEventsLock);
     destinationQueue = m_touchEvents.take(webPage.pageID());
 }
 
@@ -186,7 +186,7 @@ void EventDispatcher::touchEvent(uint64_t pageID, const WebKit::WebTouchEvent& t
 {
     bool updateListWasEmpty;
     {
-        SpinLockHolder locker(&m_touchEventsLock);
+        LockHolder locker(&m_touchEventsLock);
         updateListWasEmpty = m_touchEvents.isEmpty();
         auto addResult = m_touchEvents.add(pageID, TouchEventQueue());
         if (addResult.isNewEntry)
@@ -217,7 +217,7 @@ void EventDispatcher::dispatchTouchEvents()
 {
     HashMap<uint64_t, TouchEventQueue> localCopy;
     {
-        SpinLockHolder locker(&m_touchEventsLock);
+        LockHolder locker(&m_touchEventsLock);
         localCopy.swap(m_touchEvents);
     }
 
index 6f74311..ba36239 100644 (file)
@@ -32,9 +32,9 @@
 #include <WebEvent.h>
 #include <memory>
 #include <wtf/HashMap.h>
+#include <wtf/Lock.h>
 #include <wtf/Noncopyable.h>
 #include <wtf/RefPtr.h>
-#include <wtf/SpinLock.h>
 #include <wtf/ThreadingPrimitives.h>
 
 namespace WebCore {
@@ -97,7 +97,7 @@ private:
 #endif
     std::unique_ptr<WebCore::WheelEventDeltaTracker> m_recentWheelEventDeltaTracker;
 #if ENABLE(IOS_TOUCH_EVENTS)
-    SpinLock m_touchEventsLock;
+    Lock m_touchEventsLock;
     HashMap<uint64_t, TouchEventQueue> m_touchEvents;
 #endif
 };
index 57f4a76..d9716eb 100644 (file)
@@ -58,7 +58,7 @@ void ViewUpdateDispatcher::visibleContentRectUpdate(uint64_t pageID, const Visib
 {
     bool updateListWasEmpty;
     {
-        SpinLockHolder locker(&m_dataMutex);
+        LockHolder locker(&m_dataMutex);
         updateListWasEmpty = m_latestUpdate.isEmpty();
         auto iterator = m_latestUpdate.find(pageID);
         if (iterator == m_latestUpdate.end())
@@ -78,7 +78,7 @@ void ViewUpdateDispatcher::dispatchVisibleContentRectUpdate()
 {
     HashMap<uint64_t, UpdateData> update;
     {
-        SpinLockHolder locker(&m_dataMutex);
+        LockHolder locker(&m_dataMutex);
         update = WTF::move(m_latestUpdate);
     }
 
index 681704d..a4d02e7 100644 (file)
@@ -30,8 +30,8 @@
 
 #include "VisibleContentRectUpdateInfo.h"
 #include <wtf/HashMap.h>
+#include <wtf/Lock.h>
 #include <wtf/Ref.h>
-#include <wtf/SpinLock.h>
 
 namespace WebKit {
 
@@ -57,7 +57,7 @@ private:
     };
 
     Ref<WorkQueue> m_queue;
-    SpinLock m_dataMutex;
+    Lock m_dataMutex;
     HashMap<uint64_t, UpdateData> m_latestUpdate;
 };
 
index ddb9b29..2280a6f 100644 (file)
@@ -1,3 +1,17 @@
+2015-08-06  Filip Pizlo  <fpizlo@apple.com>
+
+        Lightweight locks should be adaptive
+        https://bugs.webkit.org/show_bug.cgi?id=147545
+
+        Reviewed by Geoffrey Garen.
+
+        * TestWebKitAPI/CMakeLists.txt:
+        * TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj:
+        * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
+        * TestWebKitAPI/Tests/WTF/Lock.cpp: Added.
+        (TestWebKitAPI::runLockTest):
+        (TestWebKitAPI::TEST):
+
 2015-08-06  Ryosuke Niwa  <rniwa@webkit.org>
 
         http_server_driver and benchmark_builder should not be in run-benchmark's plan files
index 5e27016..dc2513d 100644 (file)
@@ -75,6 +75,7 @@ set(TestWTF_SOURCES
     ${TESTWEBKITAPI_DIR}/Tests/WTF/HashSet.cpp
     ${TESTWEBKITAPI_DIR}/Tests/WTF/IntegerToStringConversion.cpp
     ${TESTWEBKITAPI_DIR}/Tests/WTF/ListHashSet.cpp
+    ${TESTWEBKITAPI_DIR}/Tests/WTF/Lock.cpp
     ${TESTWEBKITAPI_DIR}/Tests/WTF/MD5.cpp
     ${TESTWEBKITAPI_DIR}/Tests/WTF/MathExtras.cpp
     ${TESTWEBKITAPI_DIR}/Tests/WTF/MediaTime.cpp
index f40ef9c..62d0ce6 100644 (file)
@@ -1,4 +1,4 @@
-<?xml version="1.0" encoding="utf-8"?>
+<?xml version="1.0" encoding="utf-8"?>
 <Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
   <ItemGroup Label="ProjectConfigurations">
     <ProjectConfiguration Include="DebugSuffix|Win32">
     <ClCompile Include="..\Tests\WTF\HashSet.cpp" />
     <ClCompile Include="..\Tests\WTF\IntegerToStringConversion.cpp" />
     <ClCompile Include="..\Tests\WTF\ListHashSet.cpp" />
+    <ClCompile Include="..\Tests\WTF\Lock.cpp" />
     <ClCompile Include="..\Tests\WTF\MD5.cpp" />
     <ClCompile Include="..\Tests\WTF\MathExtras.cpp" />
     <ClCompile Include="..\Tests\WTF\MediaTime.cpp" />
   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
   <ImportGroup Label="ExtensionTargets">
   </ImportGroup>
-</Project>
\ No newline at end of file
+</Project>
index 6b75a1d..9089587 100644 (file)
@@ -11,6 +11,7 @@
                0F139E781A423A6B00F590F5 /* PlatformUtilitiesCocoa.mm in Sources */ = {isa = PBXBuildFile; fileRef = 0F139E721A423A2B00F590F5 /* PlatformUtilitiesCocoa.mm */; };
                0F139E791A42457000F590F5 /* PlatformUtilitiesCocoa.mm in Sources */ = {isa = PBXBuildFile; fileRef = 0F139E721A423A2B00F590F5 /* PlatformUtilitiesCocoa.mm */; };
                0F3B94A71A77267400DE3272 /* WKWebViewEvaluateJavaScript.mm in Sources */ = {isa = PBXBuildFile; fileRef = 0F3B94A51A77266C00DE3272 /* WKWebViewEvaluateJavaScript.mm */; };
+               0FFC45A61B73EBEB0085BD62 /* Lock.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FFC45A41B73EBE20085BD62 /* Lock.cpp */; };
                1A02C870125D4CFD00E3F4BD /* find.html in Copy Resources */ = {isa = PBXBuildFile; fileRef = 1A02C84B125D4A5E00E3F4BD /* find.html */; };
                1A50AA201A2A51FC00F4C345 /* close-from-within-create-page.html in Copy Resources */ = {isa = PBXBuildFile; fileRef = 1A50AA1F1A2A4EA500F4C345 /* close-from-within-create-page.html */; };
                1A63479F183D72A4005B1707 /* all-content-in-one-iframe.html in Copy Resources */ = {isa = PBXBuildFile; fileRef = 93D3D19B17B1A7B000C7C415 /* all-content-in-one-iframe.html */; };
                0F3B94A51A77266C00DE3272 /* WKWebViewEvaluateJavaScript.mm */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.objcpp; path = WKWebViewEvaluateJavaScript.mm; sourceTree = "<group>"; };
                0FC6C4CB141027E0005B7F0C /* RedBlackTree.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = RedBlackTree.cpp; sourceTree = "<group>"; };
                0FC6C4CE141034AD005B7F0C /* MetaAllocator.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = MetaAllocator.cpp; sourceTree = "<group>"; };
+               0FFC45A41B73EBE20085BD62 /* Lock.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = Lock.cpp; sourceTree = "<group>"; };
                14464012167A8305000BD218 /* LayoutUnit.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = LayoutUnit.cpp; sourceTree = "<group>"; };
                14F3B11215E45EAB00210069 /* SaturatedArithmeticOperations.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = SaturatedArithmeticOperations.cpp; sourceTree = "<group>"; };
                1A02C84B125D4A5E00E3F4BD /* find.html */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.html; path = find.html; sourceTree = "<group>"; };
                                26B2DFF815BDE599004F691D /* HashSet.cpp */,
                                266FAFD215E5775200F61D5B /* IntegerToStringConversion.cpp */,
                                26300B1716755CD90066886D /* ListHashSet.cpp */,
+                               0FFC45A41B73EBE20085BD62 /* Lock.cpp */,
                                B4039F9C15E6D8B3007255D6 /* MathExtras.cpp */,
                                CD5393C71757BA9700C07123 /* MD5.cpp */,
                                CD5497B315857F0C00B5BC30 /* MediaTime.cpp */,
                                7CCE7EC91A411A7E00447C4C /* RenderedImageFromDOMNode.mm in Sources */,
                                7CCE7ECA1A411A7E00447C4C /* RenderedImageFromDOMRange.mm in Sources */,
                                51CD1C6C1B38CE4300142CA5 /* ModalAlerts.mm in Sources */,
+                               0FFC45A61B73EBEB0085BD62 /* Lock.cpp in Sources */,
                                7CCE7F0E1A411AE600447C4C /* ResizeReversePaginatedWebView.cpp in Sources */,
                                7CCE7F0F1A411AE600447C4C /* ResizeWindowAfterCrash.cpp in Sources */,
                                7CCE7F101A411AE600447C4C /* ResponsivenessTimerDoesntFireEarly.cpp in Sources */,
diff --git a/Tools/TestWebKitAPI/Tests/WTF/Lock.cpp b/Tools/TestWebKitAPI/Tests/WTF/Lock.cpp
new file mode 100644 (file)
index 0000000..2c5168e
--- /dev/null
@@ -0,0 +1,100 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "config.h"
+#include <wtf/Lock.h>
+#include <wtf/Threading.h>
+#include <wtf/ThreadingPrimitives.h>
+
+using namespace WTF;
+
+namespace TestWebKitAPI {
+
+template<typename LockType>
+void runLockTest(unsigned numThreadGroups, unsigned numThreadsPerGroup, unsigned workPerCriticalSection, unsigned numIterations)
+{
+    std::unique_ptr<LockType[]> locks = std::make_unique<LockType[]>(numThreadGroups);
+    std::unique_ptr<double[]> words = std::make_unique<double[]>(numThreadGroups);
+    std::unique_ptr<ThreadIdentifier[]> threads = std::make_unique<ThreadIdentifier[]>(numThreadGroups * numThreadsPerGroup);
+
+    for (unsigned threadGroupIndex = numThreadGroups; threadGroupIndex--;) {
+        words[threadGroupIndex] = 0;
+
+        for (unsigned threadIndex = numThreadsPerGroup; threadIndex--;) {
+            threads[threadGroupIndex * numThreadsPerGroup + threadIndex] = createThread(
+                "Benchmark thread",
+                [threadGroupIndex, &locks, &words, numIterations, workPerCriticalSection] () {
+                    for (unsigned i = numIterations; i--;) {
+                        locks[threadGroupIndex].lock();
+                        for (unsigned j = workPerCriticalSection; j--;)
+                            words[threadGroupIndex]++;
+                        locks[threadGroupIndex].unlock();
+                    }
+                });
+        }
+    }
+
+    for (unsigned threadIndex = numThreadGroups * numThreadsPerGroup; threadIndex--;)
+        waitForThreadCompletion(threads[threadIndex]);
+
+    double expected = 0;
+    for (uint64_t i = static_cast<uint64_t>(numIterations) * workPerCriticalSection * numThreadsPerGroup; i--;)
+        expected++;
+
+    for (unsigned threadGroupIndex = numThreadGroups; threadGroupIndex--;)
+        EXPECT_EQ(expected, words[threadGroupIndex]);
+}
+
+TEST(WTF_Lock, UncontentedShortSection)
+{
+    runLockTest<Lock>(1, 1, 1, 10000000);
+}
+
+TEST(WTF_Lock, UncontentedLongSection)
+{
+    runLockTest<Lock>(1, 1, 10000, 1000);
+}
+
+TEST(WTF_Lock, ContentedShortSection)
+{
+    runLockTest<Lock>(1, 10, 1, 10000000);
+}
+
+TEST(WTF_Lock, ContentedLongSection)
+{
+    runLockTest<Lock>(1, 10, 10000, 10000);
+}
+
+TEST(WTF_Lock, ManyContentedShortSections)
+{
+    runLockTest<Lock>(10, 10, 1, 500000);
+}
+
+TEST(WTF_Lock, ManyContentedLongSections)
+{
+    runLockTest<Lock>(10, 10, 10000, 2000);
+}
+
+} // namespace TestWebKitAPI