Nested parenthesized regular expressions with non-zero minimum counts appear to hang...
authormsaboff@apple.com <msaboff@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 18 Jan 2017 01:27:04 +0000 (01:27 +0000)
committermsaboff@apple.com <msaboff@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 18 Jan 2017 01:27:04 +0000 (01:27 +0000)
commite1443cd61243a77d5255c4323dfd438ca9f43d8b
tree058c0f643d9c27e9d31d0b94bac5c7a1d05c7821
parent9a3e78685fc79a917fdf47358b254f7ea087199e
Nested parenthesized regular expressions with non-zero minimum counts appear to hang and use lots of memory
https://bugs.webkit.org/show_bug.cgi?id=167125

Reviewed by Filip Pizlo.

JSTests:

* microbenchmarks/regexp-nested-nonzero-min-counted-parens.js: Added.
New test with limits that run slow and take a reasonable amount of memory
before the change and run fast, using little memory with the change.

Source/JavaScriptCore:

Changed Yarr to handle nested parenthesized subexpressions where the minimum count is
not 0 directly in the Yarr interpreter.  Previously we'd factor an expression like
(a|b)+ into (a|b)(a|b)* with special handling for captures.  This factoring was done
using a deep copy that doubled the size of the resulting expresion for each nested
parenthesized subexpression.  Now the Yarr interpreter can directly process a regexp
like (a|b){2,42}.

The parser will allow one level of nested, non-zero minimum, counted parenthesis using
the old copy method.  After one level, it will generate parenthesis terms with a non-zero
minimum.   Such an expression wasn't handled by the Yarr JIT before the change, so this
change isn't a performance regression.

Added a minimum count to the YarrPattern and ByteTerm classes, and then factored that
minimum into the interpreter.  A non-zero minimum is only handled by the Yarr interpreter.
If the Yarr JIT see such a term, it punts back to the interpreter.

* yarr/YarrInterpreter.cpp:
(JSC::Yarr::Interpreter::backtrackPatternCharacter):
(JSC::Yarr::Interpreter::backtrackPatternCasedCharacter):
(JSC::Yarr::Interpreter::matchCharacterClass):
(JSC::Yarr::Interpreter::backtrackCharacterClass):
(JSC::Yarr::Interpreter::matchBackReference):
(JSC::Yarr::Interpreter::backtrackBackReference):
(JSC::Yarr::Interpreter::matchParenthesesOnceBegin):
(JSC::Yarr::Interpreter::matchParenthesesOnceEnd):
(JSC::Yarr::Interpreter::backtrackParenthesesOnceBegin):
(JSC::Yarr::Interpreter::backtrackParenthesesOnceEnd):
(JSC::Yarr::Interpreter::matchParenthesesTerminalBegin):
(JSC::Yarr::Interpreter::backtrackParenthesesTerminalBegin):
(JSC::Yarr::Interpreter::matchParentheticalAssertionBegin):
(JSC::Yarr::Interpreter::matchParentheticalAssertionEnd):
(JSC::Yarr::Interpreter::backtrackParentheticalAssertionBegin):
(JSC::Yarr::Interpreter::backtrackParentheticalAssertionEnd):
(JSC::Yarr::Interpreter::matchParentheses):
(JSC::Yarr::Interpreter::backtrackParentheses):
(JSC::Yarr::Interpreter::matchDisjunction):
(JSC::Yarr::ByteCompiler::atomPatternCharacter):
(JSC::Yarr::ByteCompiler::atomCharacterClass):
(JSC::Yarr::ByteCompiler::atomBackReference):
(JSC::Yarr::ByteCompiler::atomParentheticalAssertionEnd):
(JSC::Yarr::ByteCompiler::atomParenthesesSubpatternEnd):
(JSC::Yarr::ByteCompiler::atomParenthesesOnceEnd):
(JSC::Yarr::ByteCompiler::atomParenthesesTerminalEnd):
(JSC::Yarr::ByteCompiler::emitDisjunction):
* yarr/YarrInterpreter.h:
(JSC::Yarr::ByteTerm::ByteTerm):
* yarr/YarrJIT.cpp:
(JSC::Yarr::YarrGenerator::generatePatternCharacterOnce):
(JSC::Yarr::YarrGenerator::generatePatternCharacterFixed):
(JSC::Yarr::YarrGenerator::generatePatternCharacterGreedy):
(JSC::Yarr::YarrGenerator::backtrackPatternCharacterNonGreedy):
(JSC::Yarr::YarrGenerator::generateCharacterClassFixed):
(JSC::Yarr::YarrGenerator::generateCharacterClassGreedy):
(JSC::Yarr::YarrGenerator::backtrackCharacterClassNonGreedy):
(JSC::Yarr::YarrGenerator::generateTerm):
(JSC::Yarr::YarrGenerator::backtrackTerm):
(JSC::Yarr::YarrGenerator::generate):
(JSC::Yarr::YarrGenerator::backtrack):
(JSC::Yarr::YarrGenerator::opCompileParenthesesSubpattern):
* yarr/YarrPattern.cpp:
(JSC::Yarr::YarrPatternConstructor::copyTerm):
(JSC::Yarr::YarrPatternConstructor::quantifyAtom):
(JSC::Yarr::YarrPatternConstructor::checkForTerminalParentheses):
(JSC::Yarr::YarrPattern::YarrPattern):
* yarr/YarrPattern.h:
(JSC::Yarr::PatternTerm::PatternTerm):
(JSC::Yarr::PatternTerm::quantify):
(JSC::Yarr::YarrPattern::reset):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@210837 268f45cc-cd09-0410-ab3c-d52691b4dbfc
JSTests/ChangeLog
JSTests/microbenchmarks/regexp-nested-nonzero-min-counted-parens.js [new file with mode: 0644]
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/yarr/YarrInterpreter.cpp
Source/JavaScriptCore/yarr/YarrInterpreter.h
Source/JavaScriptCore/yarr/YarrJIT.cpp
Source/JavaScriptCore/yarr/YarrPattern.cpp
Source/JavaScriptCore/yarr/YarrPattern.h