Teach Call ICs how to call Wasm
[WebKit-https.git] / Source / JavaScriptCore / ChangeLog
index 2d24864..790a993 100644 (file)
@@ -1,3 +1,403 @@
+2019-04-04  Saam barati  <sbarati@apple.com>
+
+        Teach Call ICs how to call Wasm
+        https://bugs.webkit.org/show_bug.cgi?id=196387
+
+        Reviewed by Filip Pizlo.
+
+        This patch teaches JS to call Wasm without going through the native thunk.
+        Currently, we emit a JIT "JS" callee stub which marshals arguments from
+        JS to Wasm. Like the native version of this, this thunk is responsible
+        for saving and restoring the VM's current Wasm context. Instead of emitting
+        an exception handler, we also teach the unwinder how to read the previous
+        wasm context to restore it as it unwindws past this frame.
+        
+        This patch is straight forward, and leaves some areas for perf improvement:
+        - We can teach the DFG/FTL to directly use the Wasm calling convention when
+          it knows it's calling a single Wasm function. This way we don't shuffle
+          registers to the stack and then back into registers.
+        - We bail out to the slow path for mismatched arity. I opened a bug to fix
+          optimize arity check failures: https://bugs.webkit.org/show_bug.cgi?id=196564
+        - We bail out to the slow path Double JSValues flowing into i32 arguments.
+          We should teach this thunk how to do that conversion directly.
+        
+        This patch also refactors the code to explicitly have a single pinned size register.
+        We used pretend in some places that we could have more than one pinned size register.
+        However, there was other code that just asserted the size was one. This patch just rips
+        out this code since we never moved to having more than one pinned size register. Doing
+        this refactoring cleans up the various places where we set up the size register.
+        
+        This patch is a 50-60% progression on JetStream 2's richards-wasm.
+
+        * JavaScriptCore.xcodeproj/project.pbxproj:
+        * Sources.txt:
+        * assembler/MacroAssemblerCodeRef.h:
+        (JSC::MacroAssemblerCodeRef::operator=):
+        (JSC::MacroAssemblerCodeRef::MacroAssemblerCodeRef):
+        * interpreter/Interpreter.cpp:
+        (JSC::UnwindFunctor::operator() const):
+        (JSC::UnwindFunctor::copyCalleeSavesToEntryFrameCalleeSavesBuffer const):
+        * interpreter/StackVisitor.cpp:
+        (JSC::StackVisitor::Frame::calleeSaveRegistersForUnwinding):
+        (JSC::StackVisitor::Frame::calleeSaveRegisters): Deleted.
+        * interpreter/StackVisitor.h:
+        * jit/JITOperations.cpp:
+        * jit/RegisterSet.cpp:
+        (JSC::RegisterSet::runtimeTagRegisters):
+        (JSC::RegisterSet::specialRegisters):
+        (JSC::RegisterSet::runtimeRegisters): Deleted.
+        * jit/RegisterSet.h:
+        * jit/Repatch.cpp:
+        (JSC::linkPolymorphicCall):
+        * runtime/JSFunction.cpp:
+        (JSC::getCalculatedDisplayName):
+        * runtime/JSGlobalObject.cpp:
+        (JSC::JSGlobalObject::init):
+        (JSC::JSGlobalObject::visitChildren):
+        * runtime/JSGlobalObject.h:
+        (JSC::JSGlobalObject::jsToWasmICCalleeStructure const):
+        * runtime/VM.cpp:
+        (JSC::VM::VM):
+        * runtime/VM.h:
+        * wasm/WasmAirIRGenerator.cpp:
+        (JSC::Wasm::AirIRGenerator::AirIRGenerator):
+        (JSC::Wasm::AirIRGenerator::restoreWebAssemblyGlobalState):
+        (JSC::Wasm::AirIRGenerator::addCallIndirect):
+        * wasm/WasmB3IRGenerator.cpp:
+        (JSC::Wasm::B3IRGenerator::B3IRGenerator):
+        (JSC::Wasm::B3IRGenerator::restoreWebAssemblyGlobalState):
+        (JSC::Wasm::B3IRGenerator::addCallIndirect):
+        * wasm/WasmBinding.cpp:
+        (JSC::Wasm::wasmToWasm):
+        * wasm/WasmContext.h:
+        (JSC::Wasm::Context::pointerToInstance):
+        * wasm/WasmContextInlines.h:
+        (JSC::Wasm::Context::store):
+        * wasm/WasmMemoryInformation.cpp:
+        (JSC::Wasm::getPinnedRegisters):
+        (JSC::Wasm::PinnedRegisterInfo::get):
+        (JSC::Wasm::PinnedRegisterInfo::PinnedRegisterInfo):
+        * wasm/WasmMemoryInformation.h:
+        (JSC::Wasm::PinnedRegisterInfo::toSave const):
+        * wasm/WasmOMGPlan.cpp:
+        (JSC::Wasm::OMGPlan::work):
+        * wasm/js/JSToWasm.cpp:
+        (JSC::Wasm::createJSToWasmWrapper):
+        * wasm/js/JSToWasmICCallee.cpp: Added.
+        (JSC::JSToWasmICCallee::create):
+        (JSC::JSToWasmICCallee::createStructure):
+        (JSC::JSToWasmICCallee::visitChildren):
+        * wasm/js/JSToWasmICCallee.h: Added.
+        (JSC::JSToWasmICCallee::function):
+        (JSC::JSToWasmICCallee::JSToWasmICCallee):
+        * wasm/js/WebAssemblyFunction.cpp:
+        (JSC::WebAssemblyFunction::useTagRegisters const):
+        (JSC::WebAssemblyFunction::calleeSaves const):
+        (JSC::WebAssemblyFunction::usedCalleeSaveRegisters const):
+        (JSC::WebAssemblyFunction::previousInstanceOffset const):
+        (JSC::WebAssemblyFunction::previousInstance):
+        (JSC::WebAssemblyFunction::jsCallEntrypointSlow):
+        (JSC::WebAssemblyFunction::visitChildren):
+        (JSC::WebAssemblyFunction::destroy):
+        * wasm/js/WebAssemblyFunction.h:
+        * wasm/js/WebAssemblyFunctionHeapCellType.cpp: Added.
+        (JSC::WebAssemblyFunctionDestroyFunc::operator() const):
+        (JSC::WebAssemblyFunctionHeapCellType::WebAssemblyFunctionHeapCellType):
+        (JSC::WebAssemblyFunctionHeapCellType::~WebAssemblyFunctionHeapCellType):
+        (JSC::WebAssemblyFunctionHeapCellType::finishSweep):
+        (JSC::WebAssemblyFunctionHeapCellType::destroy):
+        * wasm/js/WebAssemblyFunctionHeapCellType.h: Added.
+        * wasm/js/WebAssemblyPrototype.h:
+
+2019-04-04  Yusuke Suzuki  <ysuzuki@apple.com>
+
+        [JSC] Pass CodeOrigin to FuzzerAgent
+        https://bugs.webkit.org/show_bug.cgi?id=196590
+
+        Reviewed by Saam Barati.
+
+        Pass CodeOrigin instead of bytecodeIndex. CodeOrigin includes richer information (InlineCallFrame*).
+        We also mask prediction with SpecBytecodeTop in DFGByteCodeParser. The fuzzer can produce any SpeculatedTypes,
+        but DFGByteCodeParser should only see predictions that can be actually produced from the bytecode execution.
+
+        * dfg/DFGByteCodeParser.cpp:
+        (JSC::DFG::ByteCodeParser::getPredictionWithoutOSRExit):
+        * runtime/FuzzerAgent.cpp:
+        (JSC::FuzzerAgent::getPrediction):
+        * runtime/FuzzerAgent.h:
+        * runtime/RandomizingFuzzerAgent.cpp:
+        (JSC::RandomizingFuzzerAgent::getPrediction):
+        * runtime/RandomizingFuzzerAgent.h:
+
+2019-04-04  Caio Lima  <ticaiolima@gmail.com>
+
+        [JSC] We should consider moving UnlinkedFunctionExecutable::m_parentScopeTDZVariables to RareData
+        https://bugs.webkit.org/show_bug.cgi?id=194944
+
+        Reviewed by Keith Miller.
+
+        Based on profile data collected on JetStream2, Speedometer 2 and
+        other benchmarks, it is very rare having non-empty
+        UnlinkedFunctionExecutable::m_parentScopeTDZVariables.
+
+        - Data collected from Speedometer2
+            Total number of UnlinkedFunctionExecutable: 39463
+            Total number of non-empty parentScopeTDZVars: 428 (~1%)
+
+        - Data collected from JetStream2
+            Total number of UnlinkedFunctionExecutable: 83715
+            Total number of non-empty parentScopeTDZVars: 5285 (~6%)
+
+        We also collected numbers on 6 of top 10 Alexia sites.
+
+        - Data collected from youtube.com
+            Total number of UnlinkedFunctionExecutable: 29599
+            Total number of non-empty parentScopeTDZVars: 97 (~0.3%)
+
+        - Data collected from twitter.com
+            Total number of UnlinkedFunctionExecutable: 23774
+            Total number of non-empty parentScopeTDZVars: 172 (~0.7%)
+
+        - Data collected from google.com
+            Total number of UnlinkedFunctionExecutable: 33209
+            Total number of non-empty parentScopeTDZVars: 174 (~0.5%)
+
+        - Data collected from amazon.com:
+            Total number of UnlinkedFunctionExecutable: 15182
+            Total number of non-empty parentScopeTDZVars: 166 (~1%)
+
+        - Data collected from facebook.com:
+            Total number of UnlinkedFunctionExecutable: 54443
+            Total number of non-empty parentScopeTDZVars: 269 (~0.4%)
+
+        - Data collected from netflix.com:
+            Total number of UnlinkedFunctionExecutable: 39266
+            Total number of non-empty parentScopeTDZVars: 97 (~0.2%)
+
+        Considering such numbers, this patch is moving `m_parentScopeTDZVariables`
+        to RareData. This decreases sizeof(UnlinkedFunctionExecutable) by
+        16 bytes. With this change, now UnlinkedFunctionExecutable constructors
+        receives an `Optional<VariableEnvironmentMap::Handle>` and only stores
+        it when `value != WTF::nullopt`. We also changed
+        UnlinkedFunctionExecutable::parentScopeTDZVariables() and it returns
+        `VariableEnvironment()` whenever the Executable doesn't have RareData,
+        or VariableEnvironmentMap::Handle is unitialized. This is required
+        because RareData is instantiated when any of its field is stored and
+        we can have an unitialized `Handle` even on cases when parentScopeTDZVariables
+        is `WTF::nullopt`.
+
+        Results on memory usage on JetStrem2 is neutral.
+
+            Mean of memory peak on ToT: 4258633728 bytes (confidence interval: 249720072.95)
+            Mean of memory peak on Changes: 4367325184 bytes (confidence interval: 321285583.61)
+
+        * builtins/BuiltinExecutables.cpp:
+        (JSC::BuiltinExecutables::createExecutable):
+        * bytecode/UnlinkedFunctionExecutable.cpp:
+        (JSC::UnlinkedFunctionExecutable::UnlinkedFunctionExecutable):
+        * bytecode/UnlinkedFunctionExecutable.h:
+        * bytecompiler/BytecodeGenerator.cpp:
+        (JSC::BytecodeGenerator::getVariablesUnderTDZ):
+
+        BytecodeGenerator::getVariablesUnderTDZ now also caches if m_cachedVariablesUnderTDZ
+        is empty, so we can properly return `WTF::nullopt` without the
+        reconstruction of a VariableEnvironment to check if it is empty.
+
+        * bytecompiler/BytecodeGenerator.h:
+        (JSC::BytecodeGenerator::makeFunction):
+        * parser/VariableEnvironment.h:
+        (JSC::VariableEnvironment::isEmpty const):
+        * runtime/CachedTypes.cpp:
+        (JSC::CachedCompactVariableMapHandle::decode const):
+
+        It returns an unitialized Handle when there is no
+        CompactVariableEnvironment. This can happen when RareData is ensured
+        because of another field.
+
+        (JSC::CachedFunctionExecutableRareData::encode):
+        (JSC::CachedFunctionExecutableRareData::decode const):
+        (JSC::CachedFunctionExecutable::encode):
+        (JSC::CachedFunctionExecutable::decode const):
+        (JSC::UnlinkedFunctionExecutable::UnlinkedFunctionExecutable):
+        * runtime/CodeCache.cpp:
+
+        Instead of creating a dummyVariablesUnderTDZ, we simply pass
+        WTF::nullopt.
+
+        (JSC::CodeCache::getUnlinkedGlobalFunctionExecutable):
+
+2019-04-04  Tadeu Zagallo  <tzagallo@apple.com>
+
+        Cache bytecode for jsc.cpp helpers and fix CachedStringImpl
+        https://bugs.webkit.org/show_bug.cgi?id=196409
+
+        Reviewed by Saam Barati.
+
+        Some of the helpers in jsc.cpp, such as `functionRunString`, were stll using
+        using `makeSource` instead of `jscSource`, which does not use the ShellSourceProvider
+        and therefore does not write the bytecode cache to disk.
+
+        Changing that revealed a bug in bytecode cache. The Encoder keeps a mapping
+        of pointers to offsets of already cached objects, in order to avoid caching
+        the same object twice. Similarly, the Decoder keeps a mapping from offsets
+        to pointers, in order to avoid creating multiple objects in memory for the
+        same cached object. The following was happening:
+        1) A StringImpl* S was cached as CachedPtr<CachedStringImpl> at offset O. We add
+        an entry in the Encoder mapping that S has already been encoded at O.
+        2) We cache StringImpl* S again, but now as CachedPtr<CachedUniquedStringImpl>.
+        We find an entry in the Encoder mapping for S, and return the offset O. However,
+        the object cached at O is a CachedPtr<CachedStringImpl> (i.e. not Uniqued).
+
+        3) When decoding, there are 2 possibilities:
+        3.1) We find S for the first time through a CachedPtr<CachedStringImpl>. In
+        this case, everything works as expected since we add an entry in the decoder
+        mapping from the offset O to the decoded StringImpl* S. The next time we find
+        S through the uniqued version, we'll return the already decoded S.
+        3.2) We find S through a CachedPtr<CachedUniquedStringImpl>. Now we have a
+        problem, since the CachedPtr has the offset of a CachedStringImpl (not uniqued),
+        which has a different shape and we crash.
+
+        We fix this by making CachedStringImpl and CachedUniquedStringImpl share the
+        same implementation. Since it doesn't matter whether a string is uniqued for
+        encoding, and we always decode strings as uniqued either way, they can be used
+        interchangeably.
+
+        * jsc.cpp:
+        (functionRunString):
+        (functionLoadString):
+        (functionDollarAgentStart):
+        (functionCheckModuleSyntax):
+        (runInteractive):
+        * runtime/CachedTypes.cpp:
+        (JSC::CachedUniquedStringImplBase::decode const):
+        (JSC::CachedFunctionExecutable::rareData const):
+        (JSC::CachedCodeBlock::rareData const):
+        (JSC::CachedFunctionExecutable::encode):
+        (JSC::CachedCodeBlock<CodeBlockType>::encode):
+        (JSC::CachedUniquedStringImpl::encode): Deleted.
+        (JSC::CachedUniquedStringImpl::decode const): Deleted.
+        (JSC::CachedStringImpl::encode): Deleted.
+        (JSC::CachedStringImpl::decode const): Deleted.
+
+2019-04-04  Tadeu Zagallo  <tzagallo@apple.com>
+
+        UnlinkedCodeBlock constructor from cache should initialize m_didOptimize
+        https://bugs.webkit.org/show_bug.cgi?id=196396
+
+        Reviewed by Saam Barati.
+
+        The UnlinkedCodeBlock constructor in CachedTypes was missing the initialization
+        for m_didOptimize, which leads to crashes in CodeBlock::thresholdForJIT.
+
+        * runtime/CachedTypes.cpp:
+        (JSC::UnlinkedCodeBlock::UnlinkedCodeBlock):
+
+2019-04-03  Yusuke Suzuki  <ysuzuki@apple.com>
+
+        Unreviewed, rolling in r243843 with the build fix
+        https://bugs.webkit.org/show_bug.cgi?id=196586
+
+        * runtime/Options.cpp:
+        (JSC::recomputeDependentOptions):
+        * runtime/Options.h:
+        * runtime/RandomizingFuzzerAgent.cpp:
+        (JSC::RandomizingFuzzerAgent::getPrediction):
+
+2019-04-03  Ryan Haddad  <ryanhaddad@apple.com>
+
+        Unreviewed, rolling out r243843.
+
+        Broke CLoop and Windows builds.
+
+        Reverted changeset:
+
+        "[JSC] Add dump feature for RandomizingFuzzerAgent"
+        https://bugs.webkit.org/show_bug.cgi?id=196586
+        https://trac.webkit.org/changeset/243843
+
+2019-04-03  Robin Morisset  <rmorisset@apple.com>
+
+        B3 should use associativity to optimize expression trees
+        https://bugs.webkit.org/show_bug.cgi?id=194081
+
+        Reviewed by Filip Pizlo.
+
+        This patch adds a new B3 pass, that tries to find and optimize expression trees made purely of any one associative and commutative operator (Add/Mul/BitOr/BitAnd/BitXor).
+        The pass only runs in O2, and runs once, after lowerMacros and just before a run of B3ReduceStrength (which helps clean up the dead code it tends to leave behind).
+        I had to separate killDeadCode out of B3ReduceStrength (as a new B3EliminateDeadCode pass) to run it before B3OptimizeAssociativeExpressionTrees, as otherwise it is stopped by high use counts
+        inherited from CSE.
+        This extra run of DCE is by itself a win, most notably on microbenchmarks/instanceof-always-hit-two (1.5x faster), and on microbenchmarks/licm-dragons(-out-of-bounds) (both get 1.16x speedup).
+        I suspect it is because it runs between CSE and tail-dedup, and as a result allows a lot more tail-dedup to occur.
+
+        The pass is currently extremely conservative, not trying anything if it would cause _any_ code duplication.
+        For this purpose, it starts by computing use counts for the potentially interesting nodes (those with the right opcodes), and segregate them into expression trees.
+        The root of an expression tree is a node that is either used in multiple places, or is used by a value with a different opcode.
+        The leaves of an expression tree are nodes that are either used in multiple places, or have a different opcode.
+        All constant leaves of a tree are combined, as well as all leaves that are identical. What remains is then laid out into a balanced binary tree, hopefully maximizing ILP.
+
+        This optimization was implemented as a stand-alone pass and not as part of B3ReduceStrength mostly because it needs use counts to avoid code duplication.
+        It also benefits from finding all tree roots first, and not trying to repeatedly optimize subtrees.
+
+        I added several tests to testB3 with varying patterns of trees. It is also tested in a less focused way by lots of older tests.
+
+        In the future this pass could be expanded to allow some bounded amount of code duplication, and merging more leaves (e.g. Mul(a, 3) and a in an Add tree, into Mul(a, 4))
+        The latter will need exposing the peephole optimizations out of B3ReduceStrength to avoid duplicating code.
+
+        * JavaScriptCore.xcodeproj/project.pbxproj:
+        * Sources.txt:
+        * b3/B3Common.cpp:
+        (JSC::B3::shouldDumpIR):
+        (JSC::B3::shouldDumpIRAtEachPhase):
+        * b3/B3Common.h:
+        * b3/B3EliminateDeadCode.cpp: Added.
+        (JSC::B3::EliminateDeadCode::run):
+        (JSC::B3::eliminateDeadCode):
+        * b3/B3EliminateDeadCode.h: Added.
+        (JSC::B3::EliminateDeadCode::EliminateDeadCode):
+        * b3/B3Generate.cpp:
+        (JSC::B3::generateToAir):
+        * b3/B3OptimizeAssociativeExpressionTrees.cpp: Added.
+        (JSC::B3::OptimizeAssociativeExpressionTrees::OptimizeAssociativeExpressionTrees):
+        (JSC::B3::OptimizeAssociativeExpressionTrees::neutralElement):
+        (JSC::B3::OptimizeAssociativeExpressionTrees::isAbsorbingElement):
+        (JSC::B3::OptimizeAssociativeExpressionTrees::combineConstants):
+        (JSC::B3::OptimizeAssociativeExpressionTrees::emitValue):
+        (JSC::B3::OptimizeAssociativeExpressionTrees::optimizeRootedTree):
+        (JSC::B3::OptimizeAssociativeExpressionTrees::run):
+        (JSC::B3::optimizeAssociativeExpressionTrees):
+        * b3/B3OptimizeAssociativeExpressionTrees.h: Added.
+        * b3/B3ReduceStrength.cpp:
+        * b3/B3Value.cpp:
+        (JSC::B3::Value::replaceWithIdentity):
+        * b3/testb3.cpp:
+        (JSC::B3::testBitXorTreeArgs):
+        (JSC::B3::testBitXorTreeArgsEven):
+        (JSC::B3::testBitXorTreeArgImm):
+        (JSC::B3::testAddTreeArg32):
+        (JSC::B3::testMulTreeArg32):
+        (JSC::B3::testBitAndTreeArg32):
+        (JSC::B3::testBitOrTreeArg32):
+        (JSC::B3::run):
+
+2019-04-03  Yusuke Suzuki  <ysuzuki@apple.com>
+
+        [JSC] Add dump feature for RandomizingFuzzerAgent
+        https://bugs.webkit.org/show_bug.cgi?id=196586
+
+        Reviewed by Saam Barati.
+
+        Towards deterministic tests for the results from randomizing fuzzer agent, this patch adds Options::dumpRandomizingFuzzerAgentPredictions, which dumps the generated types.
+        The results is like this.
+
+            getPrediction name:(#C2q9xD),bytecodeIndex:(22),original:(Array),generated:(OtherObj|Array|Float64Array|BigInt|NonIntAsDouble)
+            getPrediction name:(makeUnwriteableUnconfigurableObject#AiEJv1),bytecodeIndex:(14),original:(OtherObj),generated:(Final|Uint8Array|Float64Array|SetObject|WeakSetObject|BigInt|NonIntAsDouble)
+
+        * runtime/Options.cpp:
+        (JSC::recomputeDependentOptions):
+        * runtime/Options.h:
+        * runtime/RandomizingFuzzerAgent.cpp:
+        (JSC::RandomizingFuzzerAgent::getPrediction):
+
 2019-04-03  Myles C. Maxfield  <mmaxfield@apple.com>
 
         -apple-trailing-word is needed for browser detection