Infer constant global variables
authorfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 20 Nov 2013 05:49:05 +0000 (05:49 +0000)
committerfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 20 Nov 2013 05:49:05 +0000 (05:49 +0000)
commit20972953ac5808323c35d8205328741b84a7d04e
tree20445808bf7dbfe33d3214cb665e0a1716be26b1
parent5c3b787a6d64d8c76e267e3deda324587ce20a01
Infer constant global variables
https://bugs.webkit.org/show_bug.cgi?id=124464

Source/JavaScriptCore:

Reviewed by Sam Weinig.

All global variables that are candidates for watchpoint-based constant inference (i.e.
not 'const' variables) will now have WatchpointSet's associated with them and those
are used to drive the inference by tracking three states of each variable:

Uninitialized: the variable's value is Undefined and the WatchpointSet state is
    ClearWatchpoint.

Initialized: the variable's value was set to something (could even be explicitly set
    to Undefined) and the WatchpointSet state is IsWatching.

Invalidated: the variable's value was set to something else (could even be the same
    thing as before but the point is that a put operation did execute again) and the
    WatchpointSet is IsInvalidated.

If the compiler tries to compile a GetGlobalVar and the WatchpointSet state is
IsWatching, then the current value of the variable can be folded in place of the get,
and a watchpoint on the variable can be registered.

We handle race conditions between the mutator and compiler by mandating that:

- The mutator changes the WatchpointSet state after executing the put.

- There is no opportunity to install code or call functions between when the mutator
  executes a put and changes the WatchpointSet state.

- The compiler checks the WatchpointSet state prior to reading the value.

The concrete algorithm used by the mutator is:

    1. Store the new value into the variable.
    --- Execute a store-store fence.
    2. Bump the state (ClearWatchpoing becomes IsWatching, IsWatching becomes
       IsInvalidated); the IsWatching->IsInvalidated transition may end up firing
       watchpoints.

The concrete algorithm that the compiler uses is:

    1. Load the state. If it's *not* IsWatching, then give up on constant inference.
    --- Execute a load-load fence.
    2. Load the value of the variable and use that for folding, while also registering
       a DesiredWatchpoint. The various parts of this step can be done in any order.

The desired watchpoint registration will fail if the watchpoint set is already
invalidated. Now consider the following interesting interleavings:

Uninitialized->M1->M2->C1->C2: Compiler sees IsWatching because of the mutator's store
    operation, and the variable is folded. The fencing ensures that C2 sees the value
    stored in M1 - i.e. we fold on the value that will actually be watchpointed. If
    before the compilation is installed the mutator executes another store then we
    will be sure that it will be a complete sequence of M1+M2 since compilations get
    installed at safepoints and never "in the middle" of a put_to_scope. Hence that
    compilation installation will be invalidated. If the M1+M2 sequence happens after
    the code is installed, then the code will be invalidated by triggering a jettison.

Uninitialized->M1->C1->C2->M2: Compiler sees Uninitialized and will not fold. This is
    a sensible outcome since if the compiler read the variable's value, it would have
    seen Undefined.

Uninitialized->C1->C2->M1->M2: Compiler sees Uninitialized and will not fold.
Uninitialized->C1->M1->C2->M2: Compiler sees Uninitialized and will not fold.
Uninitialized->C1->M1->M2->C2: Compiler sees Uninitialized and will not fold.
Uninitialized->M1->C1->M2->C2: Compiler sees Uninitialized and will not fold.

IsWatched->M1->M2->C1->C2: Compiler sees IsInvalidated and will not fold.

IsWatched->M1->C1->C2->M2: Compiler will fold, but will also register a desired
    watchpoint, and that watchpoint will get invalidated before the code is installed.

IsWatched->M1->C1->M2->C2: As above, will fold but the code will get invalidated.
IsWatched->C1->C2->M1->M2: As above, will fold but the code will get invalidated.
IsWatched->C1->M1->C2->M2: As above, will fold but the code will get invalidated.
IsWatched->C1->M1->M2->C2: As above, will fold but the code will get invalidated.

Note that this kind of reasoning shows why having the mutator first bump the state and
then store the new value would be wrong. If we had done that (M1 = bump state, M2 =
execute put) then we could have the following deadly interleavings:

Uninitialized->M1->C1->C2->M2:
Uninitialized->M1->C1->M2->C2: Mutator bumps the state to IsWatched and then the
    compiler folds Undefined, since M2 hasn't executed yet. Although C2 will set the
    watchpoint, M1 didn't notify it - it mearly initiated watching. M2 then stores a
    value other than Undefined, and you're toast.

You could fix this sort of thing by making the Desired Watchpoints machinery more
sophisticated, for example having it track the value that was folded; if the global
variable's value was later found to be different then we could invalidate the
compilation. You could also fix it by having the compiler also check that the value of
the variable is not Undefined before folding. While those all sound great, I decided
to instead just use the right interleaving since that results in less code and feels
more intuitive.

This is a 0.5% speed-up on SunSpider, mostly due to a 20% speed-up on math-cordic.
It's a 0.6% slow-down on LongSpider, mostly due to a 25% slow-down on 3d-cube. This is
because 3d-cube takes global variable assignment slow paths very often. Note that this
3d-cube slow-down doesn't manifest as much in SunSpider (only 6% there). This patch is
also a 1.5% speed-up on V8v7 and a 2.8% speed-up on Octane v1, mostly due to deltablue
(3.7%), richards (4%), and mandreel (26%). This is a 2% speed-up on Kraken, mostly due
to a 17.5% speed-up on imaging-gaussian-blur. Something that really illustrates the
slam-dunk-itude of this patch is the wide range of speed-ups on JSRegress. Casual JS
programming often leads to global-var-based idioms and those variables tend to be
assigned once, leading to excellent constant folding opportunities in an optimizing
JIT. This is very evident in the speed-ups on JSRegress.

* assembler/ARM64Assembler.h:
(JSC::ARM64Assembler::dmbSY):
* assembler/ARMv7Assembler.h:
(JSC::ARMv7Assembler::dmbSY):
* assembler/MacroAssemblerARM64.h:
(JSC::MacroAssemblerARM64::memfence):
* assembler/MacroAssemblerARMv7.h:
(JSC::MacroAssemblerARMv7::load8):
(JSC::MacroAssemblerARMv7::memfence):
* assembler/MacroAssemblerX86.h:
(JSC::MacroAssemblerX86::load8):
(JSC::MacroAssemblerX86::store8):
* assembler/MacroAssemblerX86Common.h:
(JSC::MacroAssemblerX86Common::getUnusedRegister):
(JSC::MacroAssemblerX86Common::store8):
(JSC::MacroAssemblerX86Common::memoryFence):
* assembler/MacroAssemblerX86_64.h:
(JSC::MacroAssemblerX86_64::load8):
(JSC::MacroAssemblerX86_64::store8):
* assembler/X86Assembler.h:
(JSC::X86Assembler::movb_rm):
(JSC::X86Assembler::movzbl_mr):
(JSC::X86Assembler::mfence):
(JSC::X86Assembler::X86InstructionFormatter::threeByteOp):
(JSC::X86Assembler::X86InstructionFormatter::oneByteOp8):
* bytecode/CodeBlock.cpp:
(JSC::CodeBlock::CodeBlock):
* bytecode/Watchpoint.cpp:
(JSC::WatchpointSet::WatchpointSet):
(JSC::WatchpointSet::add):
(JSC::WatchpointSet::notifyWriteSlow):
* bytecode/Watchpoint.h:
(JSC::WatchpointSet::state):
(JSC::WatchpointSet::isStillValid):
(JSC::WatchpointSet::addressOfSetIsNotEmpty):
* dfg/DFGAbstractInterpreterInlines.h:
(JSC::DFG::::executeEffects):
* dfg/DFGByteCodeParser.cpp:
(JSC::DFG::ByteCodeParser::getJSConstantForValue):
(JSC::DFG::ByteCodeParser::getJSConstant):
(JSC::DFG::ByteCodeParser::parseBlock):
* dfg/DFGClobberize.h:
(JSC::DFG::clobberize):
* dfg/DFGFixupPhase.cpp:
(JSC::DFG::FixupPhase::fixupNode):
* dfg/DFGNode.h:
(JSC::DFG::Node::isStronglyProvedConstantIn):
(JSC::DFG::Node::hasIdentifierNumberForCheck):
(JSC::DFG::Node::hasRegisterPointer):
* dfg/DFGNodeFlags.h:
* dfg/DFGNodeType.h:
* dfg/DFGOperations.cpp:
* dfg/DFGOperations.h:
* dfg/DFGPredictionPropagationPhase.cpp:
(JSC::DFG::PredictionPropagationPhase::propagate):
* dfg/DFGSafeToExecute.h:
(JSC::DFG::safeToExecute):
* dfg/DFGSpeculativeJIT.cpp:
(JSC::DFG::SpeculativeJIT::compileNotifyPutGlobalVar):
* dfg/DFGSpeculativeJIT.h:
(JSC::DFG::SpeculativeJIT::callOperation):
* dfg/DFGSpeculativeJIT32_64.cpp:
(JSC::DFG::SpeculativeJIT::compile):
* dfg/DFGSpeculativeJIT64.cpp:
(JSC::DFG::SpeculativeJIT::compile):
* ftl/FTLAbbreviatedTypes.h:
* ftl/FTLAbbreviations.h:
(JSC::FTL::buildFence):
* ftl/FTLCapabilities.cpp:
(JSC::FTL::canCompile):
* ftl/FTLIntrinsicRepository.h:
* ftl/FTLLowerDFGToLLVM.cpp:
(JSC::FTL::LowerDFGToLLVM::compileNode):
(JSC::FTL::LowerDFGToLLVM::compileNotifyPutGlobalVar):
* ftl/FTLOutput.h:
(JSC::FTL::Output::fence):
* jit/JIT.h:
* jit/JITOperations.h:
* jit/JITPropertyAccess.cpp:
(JSC::JIT::emitPutGlobalVar):
(JSC::JIT::emit_op_put_to_scope):
(JSC::JIT::emitSlow_op_put_to_scope):
* jit/JITPropertyAccess32_64.cpp:
(JSC::JIT::emitPutGlobalVar):
(JSC::JIT::emit_op_put_to_scope):
(JSC::JIT::emitSlow_op_put_to_scope):
* llint/LowLevelInterpreter32_64.asm:
* llint/LowLevelInterpreter64.asm:
* llvm/LLVMAPIFunctions.h:
* offlineasm/arm.rb:
* offlineasm/arm64.rb:
* offlineasm/cloop.rb:
* offlineasm/instructions.rb:
* offlineasm/x86.rb:
* runtime/JSGlobalObject.cpp:
(JSC::JSGlobalObject::addGlobalVar):
(JSC::JSGlobalObject::addFunction):
* runtime/JSGlobalObject.h:
(JSC::JSGlobalObject::addVar):
(JSC::JSGlobalObject::addConst):
* runtime/JSScope.cpp:
(JSC::abstractAccess):
* runtime/JSSymbolTableObject.h:
(JSC::symbolTablePut):
(JSC::symbolTablePutWithAttributes):
* runtime/SymbolTable.cpp:
(JSC::SymbolTableEntry::couldBeWatched):
(JSC::SymbolTableEntry::prepareToWatch):
(JSC::SymbolTableEntry::notifyWriteSlow):
* runtime/SymbolTable.h:

LayoutTests:

Reviewed by Sam Weinig.

* js/regress/global-var-const-infer-expected.txt: Added.
* js/regress/global-var-const-infer-fire-from-opt-expected.txt: Added.
* js/regress/global-var-const-infer-fire-from-opt.html: Added.
* js/regress/global-var-const-infer.html: Added.
* js/regress/script-tests/global-var-const-infer-fire-from-opt.js: Added.
(foo):
(setA):
(setB):
(check):
* js/regress/script-tests/global-var-const-infer.js: Added.
(foo):
(check):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@159545 268f45cc-cd09-0410-ab3c-d52691b4dbfc
58 files changed:
LayoutTests/ChangeLog
LayoutTests/js/regress/global-var-const-infer-expected.txt [new file with mode: 0644]
LayoutTests/js/regress/global-var-const-infer-fire-from-opt-expected.txt [new file with mode: 0644]
LayoutTests/js/regress/global-var-const-infer-fire-from-opt.html [new file with mode: 0644]
LayoutTests/js/regress/global-var-const-infer.html [new file with mode: 0644]
LayoutTests/js/regress/script-tests/global-var-const-infer-fire-from-opt.js [new file with mode: 0644]
LayoutTests/js/regress/script-tests/global-var-const-infer.js [new file with mode: 0644]
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/assembler/ARM64Assembler.h
Source/JavaScriptCore/assembler/ARMv7Assembler.h
Source/JavaScriptCore/assembler/MacroAssemblerARM64.h
Source/JavaScriptCore/assembler/MacroAssemblerARMv7.h
Source/JavaScriptCore/assembler/MacroAssemblerX86.h
Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h
Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h
Source/JavaScriptCore/assembler/X86Assembler.h
Source/JavaScriptCore/bytecode/CodeBlock.cpp
Source/JavaScriptCore/bytecode/Watchpoint.cpp
Source/JavaScriptCore/bytecode/Watchpoint.h
Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h
Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp
Source/JavaScriptCore/dfg/DFGClobberize.h
Source/JavaScriptCore/dfg/DFGFixupPhase.cpp
Source/JavaScriptCore/dfg/DFGNode.h
Source/JavaScriptCore/dfg/DFGNodeFlags.h
Source/JavaScriptCore/dfg/DFGNodeType.h
Source/JavaScriptCore/dfg/DFGOperations.cpp
Source/JavaScriptCore/dfg/DFGOperations.h
Source/JavaScriptCore/dfg/DFGPredictionPropagationPhase.cpp
Source/JavaScriptCore/dfg/DFGSafeToExecute.h
Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp
Source/JavaScriptCore/dfg/DFGSpeculativeJIT.h
Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp
Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp
Source/JavaScriptCore/ftl/FTLAbbreviatedTypes.h
Source/JavaScriptCore/ftl/FTLAbbreviations.h
Source/JavaScriptCore/ftl/FTLCapabilities.cpp
Source/JavaScriptCore/ftl/FTLIntrinsicRepository.h
Source/JavaScriptCore/ftl/FTLLowerDFGToLLVM.cpp
Source/JavaScriptCore/ftl/FTLOutput.h
Source/JavaScriptCore/jit/JIT.h
Source/JavaScriptCore/jit/JITOperations.h
Source/JavaScriptCore/jit/JITPropertyAccess.cpp
Source/JavaScriptCore/jit/JITPropertyAccess32_64.cpp
Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm
Source/JavaScriptCore/llint/LowLevelInterpreter64.asm
Source/JavaScriptCore/llvm/LLVMAPIFunctions.h
Source/JavaScriptCore/offlineasm/arm.rb
Source/JavaScriptCore/offlineasm/arm64.rb
Source/JavaScriptCore/offlineasm/cloop.rb
Source/JavaScriptCore/offlineasm/instructions.rb
Source/JavaScriptCore/offlineasm/x86.rb
Source/JavaScriptCore/runtime/JSGlobalObject.cpp
Source/JavaScriptCore/runtime/JSGlobalObject.h
Source/JavaScriptCore/runtime/JSScope.cpp
Source/JavaScriptCore/runtime/JSSymbolTableObject.h
Source/JavaScriptCore/runtime/SymbolTable.cpp
Source/JavaScriptCore/runtime/SymbolTable.h