Optimize SharedArrayBuffer in the DFG+FTL
authorfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 20 Apr 2017 17:55:44 +0000 (17:55 +0000)
committerfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 20 Apr 2017 17:55:44 +0000 (17:55 +0000)
commit958272e6e37f0c02acb5a893a563c71b698c4965
treed5ccc9df5dc1f7eafe3f3c3610ff1528330028ef
parent4857c09a7035f8cdcd93fe315db87131a100945c
Optimize SharedArrayBuffer in the DFG+FTL
https://bugs.webkit.org/show_bug.cgi?id=164108

Reviewed by Saam Barati.

JSTests:

Added a fairly comprehensive test of the intrinsics. This creates a function for each possible
combination of type and operation, and then first uses it nicely and then tries a bunch of
erroneous conditions like OOB.

* stress/SharedArrayBuffer-opt.js: Added.
(string_appeared_here.switch):
(string_appeared_here.str):
(runAtomic):
(shouldFail):
(Symbol):
(string_appeared_here.a.of.arrays.m.of.atomics):
* stress/SharedArrayBuffer.js:

Source/JavaScriptCore:

This adds atomics intrinsics to the DFG and wires them through to the DFG and FTL backends. This
was super easy in the FTL since B3 already has comprehensive atomic intrinsics, which are more
powerful than what we need right now. In the DFG backend, I went with an easy-to-write
implementation that just reduces everything to a weak CAS loop. It's very inefficient with
registers (it needs ~8) but it's the DFG backend, so it's not obvious how much we care.

To make the rare cases easy to handle, I refactored AtomicsObject.cpp so that the operations for
the slow paths can share code with the native functions.

This also fixes register handling in the X86 implementations of CAS, in the case that
expectedAndResult is not %rax. This also fixes the ARM64 implementation of branchWeakCAS.

I adapted the CascadeLock from WTF/benchmarks/ToyLocks.h as a microbenchmark of lock performance.
This benchmark performs 2.5x faster, in both the contended and uncontended case, thanks to this
change. It's still about 3x slower than native. I investigated this only a bit. I suspect that
the story will be different in asm.js code, which will get constant-folding of the typed array
backing store by virtue of how it uses lexically scoped variables as pointers to the heap arrays.
It's worth noting that the native lock I was comparing against, the very nicely-tuned
CascadeLock, is at the very high end of lock throughput under virtually all conditions
(uncontended, microcontended, held for a long time). I also compared to WTF::Lock and others, and
the only ones that performed better in this microbenchmark were spinlocks. I don't recommend
using those. So, when I say this is 3x slower than native, I really mean that it's 3x slower than
the fastest native lock that I have in my arsenal.

Also worth noting is that I experimented with exposing Atomics.yield(), which uses sched_yield,
as a way of testing if adding a yield loop to the JS cascadeLock would help. It does not help. I
did not investigate why.

* assembler/AbstractMacroAssembler.h:
(JSC::AbstractMacroAssembler::JumpList::append):
* assembler/CPU.h:
(JSC::is64Bit):
(JSC::is32Bit):
* b3/B3Common.h:
(JSC::B3::is64Bit): Deleted.
(JSC::B3::is32Bit): Deleted.
* b3/B3LowerToAir.cpp:
(JSC::B3::Air::LowerToAir::appendTrapping):
(JSC::B3::Air::LowerToAir::appendCAS):
(JSC::B3::Air::LowerToAir::appendGeneralAtomic):
* dfg/DFGAbstractInterpreterInlines.h:
(JSC::DFG::AbstractInterpreter<AbstractStateType>::executeEffects):
* dfg/DFGByteCodeParser.cpp:
(JSC::DFG::ByteCodeParser::handleIntrinsicCall):
* dfg/DFGClobberize.h:
(JSC::DFG::clobberize):
* dfg/DFGDoesGC.cpp:
(JSC::DFG::doesGC):
* dfg/DFGFixupPhase.cpp:
(JSC::DFG::FixupPhase::fixupNode):
* dfg/DFGNode.h:
(JSC::DFG::Node::hasHeapPrediction):
(JSC::DFG::Node::hasArrayMode):
* dfg/DFGNodeType.h:
(JSC::DFG::isAtomicsIntrinsic):
(JSC::DFG::numExtraAtomicsArgs):
* dfg/DFGPredictionPropagationPhase.cpp:
* dfg/DFGSSALoweringPhase.cpp:
(JSC::DFG::SSALoweringPhase::handleNode):
* dfg/DFGSafeToExecute.h:
(JSC::DFG::safeToExecute):
* dfg/DFGSpeculativeJIT.cpp:
(JSC::DFG::SpeculativeJIT::loadFromIntTypedArray):
(JSC::DFG::SpeculativeJIT::setIntTypedArrayLoadResult):
(JSC::DFG::SpeculativeJIT::compileGetByValOnIntTypedArray):
(JSC::DFG::SpeculativeJIT::getIntTypedArrayStoreOperand):
(JSC::DFG::SpeculativeJIT::compilePutByValForIntTypedArray):
* dfg/DFGSpeculativeJIT.h:
(JSC::DFG::SpeculativeJIT::callOperation):
* dfg/DFGSpeculativeJIT32_64.cpp:
(JSC::DFG::SpeculativeJIT::compile):
* dfg/DFGSpeculativeJIT64.cpp:
(JSC::DFG::SpeculativeJIT::compile):
* ftl/FTLAbstractHeapRepository.cpp:
(JSC::FTL::AbstractHeapRepository::decorateFencedAccess):
(JSC::FTL::AbstractHeapRepository::computeRangesAndDecorateInstructions):
* ftl/FTLAbstractHeapRepository.h:
* ftl/FTLCapabilities.cpp:
(JSC::FTL::canCompile):
* ftl/FTLLowerDFGToB3.cpp:
(JSC::FTL::DFG::LowerDFGToB3::compileNode):
(JSC::FTL::DFG::LowerDFGToB3::compileAtomicsReadModifyWrite):
(JSC::FTL::DFG::LowerDFGToB3::compileAtomicsIsLockFree):
(JSC::FTL::DFG::LowerDFGToB3::compileGetByVal):
(JSC::FTL::DFG::LowerDFGToB3::compilePutByVal):
(JSC::FTL::DFG::LowerDFGToB3::pointerIntoTypedArray):
(JSC::FTL::DFG::LowerDFGToB3::loadFromIntTypedArray):
(JSC::FTL::DFG::LowerDFGToB3::storeType):
(JSC::FTL::DFG::LowerDFGToB3::setIntTypedArrayLoadResult):
(JSC::FTL::DFG::LowerDFGToB3::getIntTypedArrayStoreOperand):
(JSC::FTL::DFG::LowerDFGToB3::vmCall):
* ftl/FTLOutput.cpp:
(JSC::FTL::Output::store):
(JSC::FTL::Output::store32As8):
(JSC::FTL::Output::store32As16):
(JSC::FTL::Output::atomicXchgAdd):
(JSC::FTL::Output::atomicXchgAnd):
(JSC::FTL::Output::atomicXchgOr):
(JSC::FTL::Output::atomicXchgSub):
(JSC::FTL::Output::atomicXchgXor):
(JSC::FTL::Output::atomicXchg):
(JSC::FTL::Output::atomicStrongCAS):
* ftl/FTLOutput.h:
(JSC::FTL::Output::store32):
(JSC::FTL::Output::store64):
(JSC::FTL::Output::storePtr):
(JSC::FTL::Output::storeFloat):
(JSC::FTL::Output::storeDouble):
* jit/JITOperations.h:
* runtime/AtomicsObject.cpp:
(JSC::atomicsFuncAdd):
(JSC::atomicsFuncAnd):
(JSC::atomicsFuncCompareExchange):
(JSC::atomicsFuncExchange):
(JSC::atomicsFuncIsLockFree):
(JSC::atomicsFuncLoad):
(JSC::atomicsFuncOr):
(JSC::atomicsFuncStore):
(JSC::atomicsFuncSub):
(JSC::atomicsFuncWait):
(JSC::atomicsFuncWake):
(JSC::atomicsFuncXor):
(JSC::operationAtomicsAdd):
(JSC::operationAtomicsAnd):
(JSC::operationAtomicsCompareExchange):
(JSC::operationAtomicsExchange):
(JSC::operationAtomicsIsLockFree):
(JSC::operationAtomicsLoad):
(JSC::operationAtomicsOr):
(JSC::operationAtomicsStore):
(JSC::operationAtomicsSub):
(JSC::operationAtomicsXor):
* runtime/AtomicsObject.h:

Source/WTF:

Made small changes as part of benchmarking the JS versions of these locks.

* benchmarks/LockSpeedTest.cpp:
* benchmarks/ToyLocks.h:
* wtf/Range.h:
(WTF::Range::dump):

LayoutTests:

Add a test of futex performance.

* workers/sab/cascade_lock-worker.js: Added.
(onmessage):
* workers/sab/cascade_lock.html: Added.
* workers/sab/worker-resources.js:
(cascadeLockSlow):
(cascadeLock):
(cascadeUnlock):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@215565 268f45cc-cd09-0410-ab3c-d52691b4dbfc
46 files changed:
JSTests/ChangeLog
JSTests/stress/SharedArrayBuffer-opt.js [new file with mode: 0644]
JSTests/stress/SharedArrayBuffer.js
JSTests/stress/atomics-known-int-use.js [new file with mode: 0644]
JSTests/stress/isLockFree.js [new file with mode: 0644]
LayoutTests/ChangeLog
LayoutTests/workers/sab/cascade_lock-expected.txt [new file with mode: 0644]
LayoutTests/workers/sab/cascade_lock-worker.js [new file with mode: 0644]
LayoutTests/workers/sab/cascade_lock.html [new file with mode: 0644]
LayoutTests/workers/sab/worker-resources.js
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/assembler/AbstractMacroAssembler.h
Source/JavaScriptCore/assembler/CPU.h
Source/JavaScriptCore/assembler/MacroAssemblerARM64.h
Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h
Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h
Source/JavaScriptCore/b3/B3Common.h
Source/JavaScriptCore/b3/B3LowerToAir.cpp
Source/JavaScriptCore/b3/air/AirAllocateRegistersByGraphColoring.h
Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h
Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp
Source/JavaScriptCore/dfg/DFGClobberize.h
Source/JavaScriptCore/dfg/DFGDoesGC.cpp
Source/JavaScriptCore/dfg/DFGFixupPhase.cpp
Source/JavaScriptCore/dfg/DFGNode.h
Source/JavaScriptCore/dfg/DFGNodeType.h
Source/JavaScriptCore/dfg/DFGPredictionPropagationPhase.cpp
Source/JavaScriptCore/dfg/DFGSSALoweringPhase.cpp
Source/JavaScriptCore/dfg/DFGSafeToExecute.h
Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp
Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h
Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp
Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp
Source/JavaScriptCore/ftl/FTLAbstractHeapRepository.cpp
Source/JavaScriptCore/ftl/FTLAbstractHeapRepository.h
Source/JavaScriptCore/ftl/FTLCapabilities.cpp
Source/JavaScriptCore/ftl/FTLLowerDFGToB3.cpp
Source/JavaScriptCore/ftl/FTLOutput.cpp
Source/JavaScriptCore/ftl/FTLOutput.h
Source/JavaScriptCore/jit/JITOperations.h
Source/JavaScriptCore/runtime/AtomicsObject.cpp
Source/JavaScriptCore/runtime/AtomicsObject.h
Source/WTF/ChangeLog
Source/WTF/benchmarks/LockSpeedTest.cpp
Source/WTF/benchmarks/ToyLocks.h
Source/WTF/wtf/Range.h