[ES6] Implement RegExp sticky flag and related functionality
authormsaboff@apple.com <msaboff@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 9 Mar 2016 20:11:46 +0000 (20:11 +0000)
committermsaboff@apple.com <msaboff@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 9 Mar 2016 20:11:46 +0000 (20:11 +0000)
https://bugs.webkit.org/show_bug.cgi?id=155177

Reviewed by Saam Barati.

Source/JavaScriptCore:

Implemented the ES6 RegExp sticky functionality.

There are two main behavior changes when the sticky flag is specified.
1) Matching starts at lastIndex and lastIndex is updated after the match.
2) The regular expression is only matched from the start position in the string.
See ES6 section 21.2.5.2.2 for details.

Changed both the Yarr interpreter and jit to not loop to the next character for sticky RegExp's.
Updated RegExp exec and match, and stringProtoFuncMatch to handle lastIndex changes.

Restructured the way flags are passed to and through YarrPatterns to use RegExpFlags instead of
individual bools.

Updated tests for 'y' flag and new behavior.

* bytecode/CodeBlock.cpp:
(JSC::regexpToSourceString):
* inspector/ContentSearchUtilities.cpp:
(Inspector::ContentSearchUtilities::findMagicComment):
* runtime/CommonIdentifiers.h:
* runtime/RegExp.cpp:
(JSC::regExpFlags):
(JSC::RegExpFunctionalTestCollector::outputOneTest):
(JSC::RegExp::finishCreation):
(JSC::RegExp::compile):
(JSC::RegExp::compileMatchOnly):
* runtime/RegExp.h:
* runtime/RegExpKey.h:
* runtime/RegExpObjectInlines.h:
(JSC::RegExpObject::execInline):
(JSC::RegExpObject::matchInline):
* runtime/RegExpPrototype.cpp:
(JSC::regExpProtoFuncCompile):
(JSC::flagsString):
(JSC::regExpProtoGetterMultiline):
(JSC::regExpProtoGetterSticky):
(JSC::regExpProtoGetterUnicode):
* runtime/StringPrototype.cpp:
(JSC::stringProtoFuncMatch):
* tests/es6.yaml:
* tests/stress/static-getter-in-names.js:
(shouldBe):
* yarr/RegularExpression.cpp:
(JSC::Yarr::RegularExpression::Private::compile):
* yarr/YarrInterpreter.cpp:
(JSC::Yarr::Interpreter::tryConsumeBackReference):
(JSC::Yarr::Interpreter::matchAssertionBOL):
(JSC::Yarr::Interpreter::matchAssertionEOL):
(JSC::Yarr::Interpreter::matchAssertionWordBoundary):
(JSC::Yarr::Interpreter::matchDotStarEnclosure):
(JSC::Yarr::Interpreter::matchDisjunction):
(JSC::Yarr::Interpreter::Interpreter):
(JSC::Yarr::ByteCompiler::atomPatternCharacter):
* yarr/YarrInterpreter.h:
(JSC::Yarr::BytecodePattern::BytecodePattern):
(JSC::Yarr::BytecodePattern::estimatedSizeInBytes):
(JSC::Yarr::BytecodePattern::ignoreCase):
(JSC::Yarr::BytecodePattern::multiline):
(JSC::Yarr::BytecodePattern::sticky):
(JSC::Yarr::BytecodePattern::unicode):
* yarr/YarrJIT.cpp:
(JSC::Yarr::YarrGenerator::matchCharacterClass):
(JSC::Yarr::YarrGenerator::jumpIfCharNotEquals):
(JSC::Yarr::YarrGenerator::generateAssertionBOL):
(JSC::Yarr::YarrGenerator::generateAssertionEOL):
(JSC::Yarr::YarrGenerator::generatePatternCharacterOnce):
(JSC::Yarr::YarrGenerator::generatePatternCharacterFixed):
(JSC::Yarr::YarrGenerator::generateDotStarEnclosure):
(JSC::Yarr::YarrGenerator::backtrack):
* yarr/YarrPattern.cpp:
(JSC::Yarr::YarrPatternConstructor::YarrPatternConstructor):
(JSC::Yarr::YarrPatternConstructor::atomPatternCharacter):
(JSC::Yarr::YarrPatternConstructor::setupAlternativeOffsets):
(JSC::Yarr::YarrPatternConstructor::optimizeBOL):
(JSC::Yarr::YarrPattern::compile):
(JSC::Yarr::YarrPattern::YarrPattern):
* yarr/YarrPattern.h:
(JSC::Yarr::YarrPattern::reset):
(JSC::Yarr::YarrPattern::nonwordcharCharacterClass):
(JSC::Yarr::YarrPattern::ignoreCase):
(JSC::Yarr::YarrPattern::multiline):
(JSC::Yarr::YarrPattern::sticky):
(JSC::Yarr::YarrPattern::unicode):

LayoutTests:

New and updated tests.

* js/Object-getOwnPropertyNames-expected.txt:
* js/regexp-flags-expected.txt:
* js/regexp-sticky-expected.txt: Added.
* js/regexp-sticky.html: Added.
* js/script-tests/Object-getOwnPropertyNames.js:
* js/script-tests/regexp-flags.js:
(RegExp.prototype.hasOwnProperty): Deleted check for sticky property.
* js/script-tests/regexp-sticky.js: New test.
(asString):
(testStickyExec):
(testStickyMatch):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@197869 268f45cc-cd09-0410-ab3c-d52691b4dbfc

26 files changed:
LayoutTests/ChangeLog
LayoutTests/js/Object-getOwnPropertyNames-expected.txt
LayoutTests/js/regexp-flags-expected.txt
LayoutTests/js/regexp-sticky-expected.txt [new file with mode: 0644]
LayoutTests/js/regexp-sticky.html [new file with mode: 0644]
LayoutTests/js/script-tests/Object-getOwnPropertyNames.js
LayoutTests/js/script-tests/regexp-flags.js
LayoutTests/js/script-tests/regexp-sticky.js [new file with mode: 0644]
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/bytecode/CodeBlock.cpp
Source/JavaScriptCore/inspector/ContentSearchUtilities.cpp
Source/JavaScriptCore/runtime/CommonIdentifiers.h
Source/JavaScriptCore/runtime/RegExp.cpp
Source/JavaScriptCore/runtime/RegExp.h
Source/JavaScriptCore/runtime/RegExpKey.h
Source/JavaScriptCore/runtime/RegExpObjectInlines.h
Source/JavaScriptCore/runtime/RegExpPrototype.cpp
Source/JavaScriptCore/runtime/StringPrototype.cpp
Source/JavaScriptCore/tests/es6.yaml
Source/JavaScriptCore/tests/stress/static-getter-in-names.js
Source/JavaScriptCore/yarr/RegularExpression.cpp
Source/JavaScriptCore/yarr/YarrInterpreter.cpp
Source/JavaScriptCore/yarr/YarrInterpreter.h
Source/JavaScriptCore/yarr/YarrJIT.cpp
Source/JavaScriptCore/yarr/YarrPattern.cpp
Source/JavaScriptCore/yarr/YarrPattern.h

index 13dc64b..bfdc375 100644 (file)
@@ -1,3 +1,24 @@
+2016-03-09  Michael Saboff  <msaboff@apple.com>
+
+        [ES6] Implement RegExp sticky flag and related functionality
+        https://bugs.webkit.org/show_bug.cgi?id=155177
+
+        Reviewed by Saam Barati.
+
+        New and updated tests.
+
+        * js/Object-getOwnPropertyNames-expected.txt:
+        * js/regexp-flags-expected.txt:
+        * js/regexp-sticky-expected.txt: Added.
+        * js/regexp-sticky.html: Added.
+        * js/script-tests/Object-getOwnPropertyNames.js:
+        * js/script-tests/regexp-flags.js:
+        (RegExp.prototype.hasOwnProperty): Deleted check for sticky property.
+        * js/script-tests/regexp-sticky.js: New test.
+        (asString):
+        (testStickyExec):
+        (testStickyMatch):
+
 2016-03-09  Mark Lam  <mark.lam@apple.com>
 
         FunctionExecutable::ecmaName() should not be based on inferredName().
index fd7f682..94646f8 100644 (file)
@@ -56,7 +56,7 @@ PASS getSortedOwnPropertyNames(Number.prototype) is ['constructor', 'toExponenti
 PASS getSortedOwnPropertyNames(Date) is ['UTC', 'length', 'name', 'now', 'parse', 'prototype']
 PASS getSortedOwnPropertyNames(Date.prototype) is ['constructor', 'getDate', 'getDay', 'getFullYear', 'getHours', 'getMilliseconds', 'getMinutes', 'getMonth', 'getSeconds', 'getTime', 'getTimezoneOffset', 'getUTCDate', 'getUTCDay', 'getUTCFullYear', 'getUTCHours', 'getUTCMilliseconds', 'getUTCMinutes', 'getUTCMonth', 'getUTCSeconds', 'getYear', 'setDate', 'setFullYear', 'setHours', 'setMilliseconds', 'setMinutes', 'setMonth', 'setSeconds', 'setTime', 'setUTCDate', 'setUTCFullYear', 'setUTCHours', 'setUTCMilliseconds', 'setUTCMinutes', 'setUTCMonth', 'setUTCSeconds', 'setYear', 'toDateString', 'toGMTString', 'toISOString', 'toJSON', 'toLocaleDateString', 'toLocaleString', 'toLocaleTimeString', 'toString', 'toTimeString', 'toUTCString', 'valueOf']
 PASS getSortedOwnPropertyNames(RegExp) is ['$&', "$'", '$*', '$+', '$1', '$2', '$3', '$4', '$5', '$6', '$7', '$8', '$9', '$_', '$`', 'input', 'lastMatch', 'lastParen', 'leftContext', 'length', 'multiline', 'name', 'prototype', 'rightContext']
-PASS getSortedOwnPropertyNames(RegExp.prototype) is ['compile', 'constructor', 'exec', 'flags', 'global', 'ignoreCase', 'lastIndex', 'multiline', 'source', 'test', 'toString', 'unicode']
+PASS getSortedOwnPropertyNames(RegExp.prototype) is ['compile', 'constructor', 'exec', 'flags', 'global', 'ignoreCase', 'lastIndex', 'multiline', 'source', 'sticky', 'test', 'toString', 'unicode']
 PASS getSortedOwnPropertyNames(Error) is ['length', 'name', 'prototype']
 PASS getSortedOwnPropertyNames(Error.prototype) is ['constructor', 'message', 'name', 'toString']
 PASS getSortedOwnPropertyNames(Math) is ['E','LN10','LN2','LOG10E','LOG2E','PI','SQRT1_2','SQRT2','abs','acos','acosh','asin','asinh','atan','atan2','atanh','cbrt','ceil','clz32','cos','cosh','exp','expm1','floor','fround','hypot','imul','log','log10','log1p','log2','max','min','pow','random','round','sign','sin','sinh','sqrt','tan','tanh','trunc']
index 8c4a803..ddb1020 100644 (file)
@@ -27,6 +27,10 @@ unicode flag
 PASS /a/uimg.flags is 'gimu'
 PASS new RegExp('a', 'uimg').flags is 'gimu'
 PASS flags.call({global: true, multiline: true, ignoreCase: true, unicode: true}) is 'gimu'
+sticky flag
+PASS /a/yimg.flags is 'gimy'
+PASS new RegExp('a', 'yimg').flags is 'gimy'
+PASS flags.call({global: true, multiline: true, ignoreCase: true, sticky: true}) is 'gimy'
 PASS successfullyParsed is true
 
 TEST COMPLETE
diff --git a/LayoutTests/js/regexp-sticky-expected.txt b/LayoutTests/js/regexp-sticky-expected.txt
new file mode 100644 (file)
index 0000000..0c179e7
--- /dev/null
@@ -0,0 +1,42 @@
+Test for ES6 sticky flag regular expression processing
+
+On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
+
+
+PASS Repeating Pattern
+PASS Test lastIndex resets
+PASS Ignore Case
+PASS Alternates
+PASS BOL Anchored, starting at 0
+PASS BOL Anchored, starting at 1
+PASS EOL Anchored, not at EOL
+PASS EOL Anchored, at EOL
+PASS Lookahead Assertion
+PASS Lookahead Negative Assertion
+PASS Subpatterns - exec
+PASS Subpatterns - match
+PASS Fixed Count
+PASS Greedy
+PASS Non-greedy
+PASS Greedy/Non-greedy
+PASS Counted Range
+PASS Character Classes
+PASS Unmatched Greedy
+PASS Global Flag - exec
+PASS Global Flag - match
+PASS Unicode Flag - Any Character
+PASS Unicode & Ignore Case Flags
+PASS Multiline
+PASS Multiline with BOL Anchor
+PASS "123 1234 ".search(re) is 0
+PASS "123 1234 ".search(re) is 0
+PASS " 123 1234 ".search(re) is -1
+PASS re.test("123 1234 ") is true
+PASS re.lastIndex is 4
+PASS re.test("123 1234 ") is true
+PASS re.lastIndex is 9
+PASS re.test("123 1234 ") is false
+PASS successfullyParsed is true
+
+TEST COMPLETE
+
diff --git a/LayoutTests/js/regexp-sticky.html b/LayoutTests/js/regexp-sticky.html
new file mode 100644 (file)
index 0000000..4a00ad9
--- /dev/null
@@ -0,0 +1,10 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+<head>
+<script src="../resources/js-test-pre.js"></script>
+</head>
+<body>
+<script src="script-tests/regexp-sticky.js"></script>
+<script src="../resources/js-test-post.js"></script>
+</body>
+</html>
index f45d01a..2a17f73 100644 (file)
@@ -65,7 +65,7 @@ var expectedPropertyNamesSet = {
     "Date": "['UTC', 'length', 'name', 'now', 'parse', 'prototype']",
     "Date.prototype": "['constructor', 'getDate', 'getDay', 'getFullYear', 'getHours', 'getMilliseconds', 'getMinutes', 'getMonth', 'getSeconds', 'getTime', 'getTimezoneOffset', 'getUTCDate', 'getUTCDay', 'getUTCFullYear', 'getUTCHours', 'getUTCMilliseconds', 'getUTCMinutes', 'getUTCMonth', 'getUTCSeconds', 'getYear', 'setDate', 'setFullYear', 'setHours', 'setMilliseconds', 'setMinutes', 'setMonth', 'setSeconds', 'setTime', 'setUTCDate', 'setUTCFullYear', 'setUTCHours', 'setUTCMilliseconds', 'setUTCMinutes', 'setUTCMonth', 'setUTCSeconds', 'setYear', 'toDateString', 'toGMTString', 'toISOString', 'toJSON', 'toLocaleDateString', 'toLocaleString', 'toLocaleTimeString', 'toString', 'toTimeString', 'toUTCString', 'valueOf']",
     "RegExp": "['$&', \"$'\", '$*', '$+', '$1', '$2', '$3', '$4', '$5', '$6', '$7', '$8', '$9', '$_', '$`', 'input', 'lastMatch', 'lastParen', 'leftContext', 'length', 'multiline', 'name', 'prototype', 'rightContext']",
-    "RegExp.prototype": "['compile', 'constructor', 'exec', 'flags', 'global', 'ignoreCase', 'lastIndex', 'multiline', 'source', 'test', 'toString', 'unicode']",
+    "RegExp.prototype": "['compile', 'constructor', 'exec', 'flags', 'global', 'ignoreCase', 'lastIndex', 'multiline', 'source', 'sticky', 'test', 'toString', 'unicode']",
     "Error": "['length', 'name', 'prototype']",
     "Error.prototype": "['constructor', 'message', 'name', 'toString']",
     "Math": "['E','LN10','LN2','LOG10E','LOG2E','PI','SQRT1_2','SQRT2','abs','acos','acosh','asin','asinh','atan','atan2','atanh','cbrt','ceil','clz32','cos','cosh','exp','expm1','floor','fround','hypot','imul','log','log10','log1p','log2','max','min','pow','random','round','sign','sin','sinh','sqrt','tan','tanh','trunc']",
index 8124635..6dff088 100644 (file)
@@ -33,11 +33,7 @@ shouldBe("/a/uimg.flags", "'gimu'");
 shouldBe("new RegExp('a', 'uimg').flags", "'gimu'");
 shouldBe("flags.call({global: true, multiline: true, ignoreCase: true, unicode: true})", "'gimu'");
 
-if (RegExp.prototype.hasOwnProperty('sticky')) {
-  debug("sticky flag");
-  // when the engine supports "sticky", these tests will fail by design.
-  // Hopefully, only the expected output will need updating.
-  shouldBe("/a/yimg.flags", "'gimy'");
-  shouldBe("new RegExp('a', 'yimg').flags", "'gimy'");
-  shouldBe("flags.call({global: true, multiline: true, ignoreCase: true, sticky: true})", "'gimy'");
-}
+debug("sticky flag");
+shouldBe("/a/yimg.flags", "'gimy'");
+shouldBe("new RegExp('a', 'yimg').flags", "'gimy'");
+shouldBe("flags.call({global: true, multiline: true, ignoreCase: true, sticky: true})", "'gimy'");
diff --git a/LayoutTests/js/script-tests/regexp-sticky.js b/LayoutTests/js/script-tests/regexp-sticky.js
new file mode 100644 (file)
index 0000000..d3a54d2
--- /dev/null
@@ -0,0 +1,111 @@
+description(
+'Test for ES6 sticky flag regular expression processing'
+);
+
+function asString(o)
+{
+    if (o === null)
+        return "<null>";
+
+    return o.toString();
+}
+
+function testStickyExec(testDescription, re, str, beginLastIndex, expected)
+{
+    re.lastIndex = beginLastIndex;
+
+    let failures = 0;
+
+    for (let iter = 0; iter < expected.length; iter++) {
+        let lastIndexStart = re.lastIndex;
+
+        let result = re.exec(str);
+
+        if (result != expected[iter]) {
+            testFailed(testDescription + ", iteration " + iter + ", from lastIndex: " + lastIndexStart +
+                       ", expected \"" + asString(expected[iter]) + "\", got \"" + asString(result) + "\"");
+            failures++;
+        }
+    }
+
+    if (failures)
+        testFailed(testDescription + " - failed: " + failures + " tests");
+    else
+        testPassed(testDescription);
+}
+
+function testStickyMatch(testDescription, re, str, beginLastIndex, expected)
+{
+    re.lastIndex = beginLastIndex;
+
+    let failures = 0;
+
+    for (var iter = 0; iter < expected.length; iter++) {
+        let lastIndexStart = re.lastIndex;
+
+        let result = str.match(re);
+        let correctResult = false;
+        if (expected[iter] === null || result === null)
+            correctResult = (expected[iter] === result);
+        else if (result.length == expected[iter].length) {
+            correctResult = true;
+            for (let i = 0; i < result.length; i++) {
+                if (result[i] != expected[iter][i])
+                    correctResult = false;
+            }
+        }
+
+        if (!correctResult) {
+            testFailed(testDescription + ", iteration " + iter + ", from lastIndex: " + lastIndexStart +
+                       ", expected \"" + asString(expected[iter]) + "\", got \"" + asString(result) + "\"");
+            failures++;
+        }
+    }
+
+    if (failures)
+        testFailed(testDescription + " - failed: " + failures + " tests");
+    else
+        testPassed(testDescription);
+}
+
+
+testStickyExec("Repeating Pattern", new RegExp("abc", "y"), "abcabcabc", 0, ["abc", "abc", "abc", null]);
+testStickyExec("Test lastIndex resets", /\d/y, "12345", 0, ["1", "2", "3", "4", "5", null, "1", "2", "3", "4", "5", null]);
+testStickyExec("Ignore Case", new RegExp("test", "iy"), "TESTtestTest", 0, ["TEST", "test", "Test", null]);
+testStickyExec("Alternates", new RegExp("Dog |Cat |Mouse ", "y"), "Mouse Dog Cat ", 0, ["Mouse ", "Dog ", "Cat ", null]);
+testStickyExec("BOL Anchored, starting at 0", /^X/y, "XXX", 0, ["X", null]);
+testStickyExec("BOL Anchored, starting at 1", /^X/y, "XXX", 1, [null, "X", null]);
+testStickyExec("EOL Anchored, not at EOL", /#$/y, "##", 0, [null]);
+testStickyExec("EOL Anchored, at EOL", /#$/y, "##", 1, ["#", null]);
+testStickyExec("Lookahead Assertion", /\d+(?=-)/y, "212-555-1212", 0, ["212", null]);
+testStickyExec("Lookahead Negative Assertion", /\d+(?!\d)/y, "212-555-1212", 0, ["212", null]);
+testStickyExec("Subpatterns - exec", /(\d+)(?:-|$)/y, "212-555-1212", 0, ["212-,212", "555-,555", "1212,1212"], null);
+testStickyMatch("Subpatterns - match", /(\d+)(?:-|$)/y, "212-555-1212", 0, [["212-", "212"], ["555-", "555"], ["1212", "1212"]], null);
+testStickyExec("Fixed Count", /\d{4}/y, "123456789", 0, ["1234", "5678", null]);
+testStickyExec("Greedy", /\d*/y, "12345 67890", 0, ["12345", ""]);
+testStickyMatch("Non-greedy", /\w+?./y, "abcdefg", 0, [["ab"], ["cd"], ["ef"], null]);
+testStickyExec("Greedy/Non-greedy", /\s*(\d+)/y, "    1234  324512   74352", 0, ["    1234,1234", "  324512,324512", "   74352,74352", null]);
+testStickyExec("Counted Range", /(\w+\s+){1,3}/y, "The quick brown fox jumped over the ", 0, ["The quick brown ,brown ", "fox jumped over ,over ", "the ,the ", null]);
+testStickyMatch("Character Classes", /[0-9A-F]/iy, "fEEd123X", 0, [["f"], ["E"], ["E"], ["d"], ["1"], ["2"], ["3"], null]);
+testStickyExec("Unmatched Greedy", /^\s*|\s*$/y, "ab", 1, [null]);
+testStickyExec("Global Flag - exec", /\s*(\+|[0-9]+)\s*/gy, "3 + 4", 0, ["3 ,3", "+ ,+", "4,4", null]);
+testStickyMatch("Global Flag - match", /\s*(\+|[0-9]+)\s*/gy, "3 + 4", 0, [["3 ", "+ ", "4"], ["3 ", "+ ", "4"]]);
+testStickyExec("Unicode Flag - Any Character", /./uy, "a@\u{10402}1\u202a\u{12345}", 0, ["a", "@", "\u{10402}", "1", "\u202a", "\u{12345}", null]);
+testStickyMatch("Unicode & Ignore Case Flags", /(?:\u{118c0}|\u{10cb0}|\w):/iuy, "a:\u{118a0}:x:\u{10cb0}", 0, [["a:"], ["\u{118a0}:"], ["x:"], null]);
+testStickyExec("Multiline", /(?:\w+ *)+(?:\n|$)/my, "Line One\nLine Two", 0, ["Line One\n", "Line Two", null]);
+testStickyMatch("Multiline with BOL Anchor", /^\d*\s?/my, "13574\n295\n99", 0, [["13574\n"], ["295\n"], ["99"], null]);
+
+// Verify that String.search starts at 0 even with the sticky flag.
+var re = new RegExp("\\d+\\s", "y");
+shouldBe('"123 1234 ".search(re)', '0');
+shouldBe('"123 1234 ".search(re)', '0');
+// Verify that String.search doesn't advance past 0 with the sticky flag.
+shouldBe('" 123 1234 ".search(re)', '-1');
+
+re.lastIndex = 0;
+shouldBeTrue('re.test("123 1234 ")');
+shouldBe('re.lastIndex', '4');
+shouldBeTrue('re.test("123 1234 ")');
+shouldBe('re.lastIndex', '9');
+shouldBeFalse('re.test("123 1234 ")');
+
index 5a88dc9..485cdef 100644 (file)
@@ -1,3 +1,94 @@
+2016-03-09  Michael Saboff  <msaboff@apple.com>
+
+        [ES6] Implement RegExp sticky flag and related functionality
+        https://bugs.webkit.org/show_bug.cgi?id=155177
+
+        Reviewed by Saam Barati.
+
+        Implemented the ES6 RegExp sticky functionality.
+
+        There are two main behavior changes when the sticky flag is specified.
+        1) Matching starts at lastIndex and lastIndex is updated after the match.
+        2) The regular expression is only matched from the start position in the string.
+        See ES6 section 21.2.5.2.2 for details.
+
+        Changed both the Yarr interpreter and jit to not loop to the next character for sticky RegExp's.
+        Updated RegExp exec and match, and stringProtoFuncMatch to handle lastIndex changes.
+
+        Restructured the way flags are passed to and through YarrPatterns to use RegExpFlags instead of
+        individual bools.
+
+        Updated tests for 'y' flag and new behavior.
+
+        * bytecode/CodeBlock.cpp:
+        (JSC::regexpToSourceString):
+        * inspector/ContentSearchUtilities.cpp:
+        (Inspector::ContentSearchUtilities::findMagicComment):
+        * runtime/CommonIdentifiers.h:
+        * runtime/RegExp.cpp:
+        (JSC::regExpFlags):
+        (JSC::RegExpFunctionalTestCollector::outputOneTest):
+        (JSC::RegExp::finishCreation):
+        (JSC::RegExp::compile):
+        (JSC::RegExp::compileMatchOnly):
+        * runtime/RegExp.h:
+        * runtime/RegExpKey.h:
+        * runtime/RegExpObjectInlines.h:
+        (JSC::RegExpObject::execInline):
+        (JSC::RegExpObject::matchInline):
+        * runtime/RegExpPrototype.cpp:
+        (JSC::regExpProtoFuncCompile):
+        (JSC::flagsString):
+        (JSC::regExpProtoGetterMultiline):
+        (JSC::regExpProtoGetterSticky):
+        (JSC::regExpProtoGetterUnicode):
+        * runtime/StringPrototype.cpp:
+        (JSC::stringProtoFuncMatch):
+        * tests/es6.yaml:
+        * tests/stress/static-getter-in-names.js:
+        (shouldBe):
+        * yarr/RegularExpression.cpp:
+        (JSC::Yarr::RegularExpression::Private::compile):
+        * yarr/YarrInterpreter.cpp:
+        (JSC::Yarr::Interpreter::tryConsumeBackReference):
+        (JSC::Yarr::Interpreter::matchAssertionBOL):
+        (JSC::Yarr::Interpreter::matchAssertionEOL):
+        (JSC::Yarr::Interpreter::matchAssertionWordBoundary):
+        (JSC::Yarr::Interpreter::matchDotStarEnclosure):
+        (JSC::Yarr::Interpreter::matchDisjunction):
+        (JSC::Yarr::Interpreter::Interpreter):
+        (JSC::Yarr::ByteCompiler::atomPatternCharacter):
+        * yarr/YarrInterpreter.h:
+        (JSC::Yarr::BytecodePattern::BytecodePattern):
+        (JSC::Yarr::BytecodePattern::estimatedSizeInBytes):
+        (JSC::Yarr::BytecodePattern::ignoreCase):
+        (JSC::Yarr::BytecodePattern::multiline):
+        (JSC::Yarr::BytecodePattern::sticky):
+        (JSC::Yarr::BytecodePattern::unicode):
+        * yarr/YarrJIT.cpp:
+        (JSC::Yarr::YarrGenerator::matchCharacterClass):
+        (JSC::Yarr::YarrGenerator::jumpIfCharNotEquals):
+        (JSC::Yarr::YarrGenerator::generateAssertionBOL):
+        (JSC::Yarr::YarrGenerator::generateAssertionEOL):
+        (JSC::Yarr::YarrGenerator::generatePatternCharacterOnce):
+        (JSC::Yarr::YarrGenerator::generatePatternCharacterFixed):
+        (JSC::Yarr::YarrGenerator::generateDotStarEnclosure):
+        (JSC::Yarr::YarrGenerator::backtrack):
+        * yarr/YarrPattern.cpp:
+        (JSC::Yarr::YarrPatternConstructor::YarrPatternConstructor):
+        (JSC::Yarr::YarrPatternConstructor::atomPatternCharacter):
+        (JSC::Yarr::YarrPatternConstructor::setupAlternativeOffsets):
+        (JSC::Yarr::YarrPatternConstructor::optimizeBOL):
+        (JSC::Yarr::YarrPattern::compile):
+        (JSC::Yarr::YarrPattern::YarrPattern):
+        * yarr/YarrPattern.h:
+        (JSC::Yarr::YarrPattern::reset):
+        (JSC::Yarr::YarrPattern::nonwordcharCharacterClass):
+        (JSC::Yarr::YarrPattern::ignoreCase):
+        (JSC::Yarr::YarrPattern::multiline):
+        (JSC::Yarr::YarrPattern::sticky):
+        (JSC::Yarr::YarrPattern::unicode):
+
 2016-03-09  Mark Lam  <mark.lam@apple.com>
 
         FunctionExecutable::ecmaName() should not be based on inferredName().
index 73392c5..f436c52 100644 (file)
@@ -274,6 +274,10 @@ static CString regexpToSourceString(RegExp* regExp)
         postfix[index++] = 'i';
     if (regExp->multiline())
         postfix[index] = 'm';
+    if (regExp->sticky())
+        postfix[index++] = 'y';
+    if (regExp->unicode())
+        postfix[index++] = 'u';
 
     return toCString("/", regExp->pattern().impl(), postfix);
 }
index 323d480..aef7812 100644 (file)
@@ -176,7 +176,7 @@ static String findMagicComment(const String& content, const String& patternStrin
 {
     ASSERT(!content.isNull());
     const char* error = nullptr;
-    JSC::Yarr::YarrPattern pattern(patternString, false, true, false, &error);
+    JSC::Yarr::YarrPattern pattern(patternString, JSC::RegExpFlags::FlagMultiline, &error);
     ASSERT(!error);
     BumpPointerAllocator regexAllocator;
     auto bytecodePattern = JSC::Yarr::byteCompile(pattern, &regexAllocator);
index fd00698..6c04359 100644 (file)
     macro(sourceCode) \
     macro(sourceURL) \
     macro(stack) \
+    macro(sticky) \
     macro(subarray) \
     macro(target) \
     macro(test) \
index 8f9f534..c31cf52 100644 (file)
@@ -65,6 +65,12 @@ RegExpFlags regExpFlags(const String& string)
             flags = static_cast<RegExpFlags>(flags | FlagUnicode);
             break;
                 
+        case 'y':
+            if (flags & FlagSticky)
+                return InvalidFlags;
+            flags = static_cast<RegExpFlags>(flags | FlagSticky);
+            break;
+
         default:
             return InvalidFlags;
         }
@@ -98,6 +104,8 @@ void RegExpFunctionalTestCollector::outputOneTest(RegExp* regExp, const String&
             fputc('i', m_file);
         if (regExp->multiline())
             fputc('m', m_file);
+        if (regExp->sticky())
+            fputc('y', m_file);
         if (regExp->unicode())
             fputc('u', m_file);
         fprintf(m_file, "\n");
@@ -214,7 +222,7 @@ RegExp::RegExp(VM& vm, const String& patternString, RegExpFlags flags)
 void RegExp::finishCreation(VM& vm)
 {
     Base::finishCreation(vm);
-    Yarr::YarrPattern pattern(m_patternString, ignoreCase(), multiline(), unicode(), &m_constructionError);
+    Yarr::YarrPattern pattern(m_patternString, m_flags, &m_constructionError);
     if (m_constructionError)
         m_state = ParseError;
     else
@@ -254,7 +262,7 @@ RegExp* RegExp::create(VM& vm, const String& patternString, RegExpFlags flags)
 
 void RegExp::compile(VM* vm, Yarr::YarrCharSize charSize)
 {
-    Yarr::YarrPattern pattern(m_patternString, ignoreCase(), multiline(), unicode(), &m_constructionError);
+    Yarr::YarrPattern pattern(m_patternString, m_flags, &m_constructionError);
     if (m_constructionError) {
         RELEASE_ASSERT_NOT_REACHED();
 #if COMPILER_QUIRK(CONSIDERS_UNREACHABLE_CODE)
@@ -293,7 +301,7 @@ int RegExp::match(VM& vm, const String& s, unsigned startOffset, Vector<int, 32>
 
 void RegExp::compileMatchOnly(VM* vm, Yarr::YarrCharSize charSize)
 {
-    Yarr::YarrPattern pattern(m_patternString, ignoreCase(), multiline(), unicode(), &m_constructionError);
+    Yarr::YarrPattern pattern(m_patternString, m_flags, &m_constructionError);
     if (m_constructionError) {
         RELEASE_ASSERT_NOT_REACHED();
 #if COMPILER_QUIRK(CONSIDERS_UNREACHABLE_CODE)
index 9c1983f..ad451f9 100644 (file)
@@ -55,6 +55,7 @@ public:
     bool global() const { return m_flags & FlagGlobal; }
     bool ignoreCase() const { return m_flags & FlagIgnoreCase; }
     bool multiline() const { return m_flags & FlagMultiline; }
+    bool sticky() const { return m_flags & FlagSticky; }
     bool unicode() const { return m_flags & FlagUnicode; }
 
     const String& pattern() const { return m_patternString; }
index 557923e..064d338 100644 (file)
@@ -38,8 +38,9 @@ enum RegExpFlags {
     FlagGlobal = 1,
     FlagIgnoreCase = 2,
     FlagMultiline = 4,
-    FlagUnicode = 8,
-    InvalidFlags = 16,
+    FlagSticky = 8,
+    FlagUnicode = 16,
+    InvalidFlags = 32,
     DeletedValueFlags = -1
 };
 
index ecc3350..403c196 100644 (file)
@@ -63,10 +63,10 @@ JSValue RegExpObject::execInline(ExecState* exec, JSGlobalObject* globalObject,
     String input = string->value(exec); // FIXME: Handle errors. https://bugs.webkit.org/show_bug.cgi?id=155145
     VM& vm = globalObject->vm();
 
-    bool global = regExp->global();
+    bool globalOrSticky = regExp->global() || regExp->sticky();
 
     unsigned lastIndex;
-    if (global) {
+    if (globalOrSticky) {
         lastIndex = getRegExpObjectLastIndexAsUnsigned(exec, this, input);
         if (lastIndex == UINT_MAX)
             return jsNull();
@@ -77,11 +77,12 @@ JSValue RegExpObject::execInline(ExecState* exec, JSGlobalObject* globalObject,
     JSArray* array =
         createRegExpMatchesArray(vm, globalObject, string, input, regExp, lastIndex, result);
     if (!array) {
-        if (global)
+        if (globalOrSticky)
             setLastIndex(exec, 0);
         return jsNull();
     }
-    if (global)
+
+    if (globalOrSticky)
         setLastIndex(exec, result.end);
     regExpConstructor->recordMatch(vm, regExp, string, result);
     return array;
@@ -95,7 +96,7 @@ MatchResult RegExpObject::matchInline(
     RegExpConstructor* regExpConstructor = globalObject->regExpConstructor();
     String input = string->value(exec); // FIXME: Handle errors. https://bugs.webkit.org/show_bug.cgi?id=155145
     VM& vm = globalObject->vm();
-    if (!regExp->global())
+    if (!regExp->global() && !regExp->sticky())
         return regExpConstructor->performMatch(vm, regExp, string, input, 0);
 
     unsigned lastIndex = getRegExpObjectLastIndexAsUnsigned(exec, this, input);
index 8ecec1b..2ef0699 100644 (file)
@@ -48,6 +48,7 @@ static EncodedJSValue JSC_HOST_CALL regExpProtoFuncSearch(ExecState*);
 static EncodedJSValue JSC_HOST_CALL regExpProtoGetterGlobal(ExecState*);
 static EncodedJSValue JSC_HOST_CALL regExpProtoGetterIgnoreCase(ExecState*);
 static EncodedJSValue JSC_HOST_CALL regExpProtoGetterMultiline(ExecState*);
+static EncodedJSValue JSC_HOST_CALL regExpProtoGetterSticky(ExecState*);
 static EncodedJSValue JSC_HOST_CALL regExpProtoGetterUnicode(ExecState*);
 static EncodedJSValue JSC_HOST_CALL regExpProtoGetterSource(ExecState*);
 static EncodedJSValue JSC_HOST_CALL regExpProtoGetterFlags(ExecState*);
@@ -69,6 +70,7 @@ const ClassInfo RegExpPrototype::s_info = { "RegExp", &RegExpObject::s_info, &re
   global        regExpProtoGetterGlobal     DontEnum|Accessor
   ignoreCase    regExpProtoGetterIgnoreCase DontEnum|Accessor
   multiline     regExpProtoGetterMultiline  DontEnum|Accessor
+  sticky        regExpProtoGetterSticky     DontEnum|Accessor
   unicode       regExpProtoGetterUnicode    DontEnum|Accessor
   source        regExpProtoGetterSource     DontEnum|Accessor
   flags         regExpProtoGetterFlags      DontEnum|Accessor
@@ -154,22 +156,29 @@ EncodedJSValue JSC_HOST_CALL regExpProtoFuncCompile(ExecState* exec)
     return JSValue::encode(jsUndefined());
 }
 
-typedef std::array<char, 4 + 1> FlagsString; // 4 different flags and a null character terminator.
+typedef std::array<char, 5 + 1> FlagsString; // 5 different flags and a null character terminator.
 
 static inline FlagsString flagsString(ExecState* exec, JSObject* regexp)
 {
     FlagsString string;
 
+    VM& vm = exec->vm();
+
     JSValue globalValue = regexp->get(exec, exec->propertyNames().global);
-    if (exec->hadException())
+    if (vm.exception())
         return string;
     JSValue ignoreCaseValue = regexp->get(exec, exec->propertyNames().ignoreCase);
-    if (exec->hadException())
+    if (vm.exception())
         return string;
     JSValue multilineValue = regexp->get(exec, exec->propertyNames().multiline);
-    if (exec->hadException())
+    if (vm.exception())
         return string;
     JSValue unicodeValue = regexp->get(exec, exec->propertyNames().unicode);
+    if (vm.exception())
+        return string;
+    JSValue stickyValue = regexp->get(exec, exec->propertyNames().sticky);
+    if (vm.exception())
+        return string;
 
     unsigned index = 0;
     if (globalValue.toBoolean(exec))
@@ -180,6 +189,8 @@ static inline FlagsString flagsString(ExecState* exec, JSObject* regexp)
         string[index++] = 'm';
     if (unicodeValue.toBoolean(exec))
         string[index++] = 'u';
+    if (stickyValue.toBoolean(exec))
+        string[index++] = 'y';
     ASSERT(index < string.size());
     string[index] = 0;
     return string;
@@ -238,6 +249,15 @@ EncodedJSValue JSC_HOST_CALL regExpProtoGetterMultiline(ExecState* exec)
     return JSValue::encode(jsBoolean(asRegExpObject(thisValue)->regExp()->multiline()));
 }
 
+EncodedJSValue JSC_HOST_CALL regExpProtoGetterSticky(ExecState* exec)
+{
+    JSValue thisValue = exec->thisValue();
+    if (!thisValue.inherits(RegExpObject::info()))
+        return throwVMTypeError(exec);
+    
+    return JSValue::encode(jsBoolean(asRegExpObject(thisValue)->regExp()->sticky()));
+}
+
 EncodedJSValue JSC_HOST_CALL regExpProtoGetterUnicode(ExecState* exec)
 {
     JSValue thisValue = exec->thisValue();
index ffe3fdc..efb1be1 100644 (file)
@@ -1041,16 +1041,39 @@ EncodedJSValue JSC_HOST_CALL stringProtoFuncMatch(ExecState* exec)
     JSValue a0 = exec->argument(0);
 
     RegExp* regExp;
+    unsigned startOffset = 0;
     bool global = false;
+    bool sticky = false;
+    RegExpObject* regExpObject = nullptr;
     if (a0.inherits(RegExpObject::info())) {
-        RegExpObject* regExpObject = asRegExpObject(a0);
+        regExpObject = asRegExpObject(a0);
         regExp = regExpObject->regExp();
         if ((global = regExp->global())) {
-            // ES5.1 15.5.4.10 step 8.a.
+            // ES6 21.2.5.6 step 6.b.
             regExpObject->setLastIndex(exec, 0);
             if (exec->hadException())
                 return JSValue::encode(jsUndefined());
         }
+        if ((sticky = regExp->sticky())) {
+            JSValue jsLastIndex = regExpObject->getLastIndex();
+            unsigned lastIndex;
+            if (LIKELY(jsLastIndex.isUInt32())) {
+                lastIndex = jsLastIndex.asUInt32();
+                if (lastIndex > s.length()) {
+                    regExpObject->setLastIndex(exec, 0);
+                    return JSValue::encode(jsUndefined());
+                }
+            } else {
+                double doubleLastIndex = jsLastIndex.toInteger(exec);
+                if (doubleLastIndex < 0 || doubleLastIndex > s.length()) {
+                    regExpObject->setLastIndex(exec, 0);
+                    return JSValue::encode(jsUndefined());
+                }
+                lastIndex = static_cast<unsigned>(doubleLastIndex);
+            }
+
+            startOffset = lastIndex;
+        }
     } else {
         /*
          *  ECMA 15.5.4.12 String.prototype.search (regexp)
@@ -1069,13 +1092,18 @@ EncodedJSValue JSC_HOST_CALL stringProtoFuncMatch(ExecState* exec)
             return throwVMError(exec, createSyntaxError(exec, regExp->errorMessage()));
     }
     RegExpConstructor* regExpConstructor = globalObject->regExpConstructor();
-    MatchResult result = regExpConstructor->performMatch(*vm, regExp, string, s, 0);
+    MatchResult result = regExpConstructor->performMatch(*vm, regExp, string, s, startOffset);
     // case without 'g' flag is handled like RegExp.prototype.exec
-    if (!global)
+    if (!global) {
+        if (sticky)
+            regExpObject->setLastIndex(exec, result ? result.end : 0);
+
         return JSValue::encode(result ? createRegExpMatchesArray(exec, globalObject, string, regExp, result.start) : jsNull());
+    }
 
     // return array of matches
     MarkedArgumentBuffer list;
+    size_t end = 0;
     while (result) {
         // We defend ourselves from crazy.
         const size_t maximumReasonableMatchSize = 1000000000;
@@ -1084,20 +1112,30 @@ EncodedJSValue JSC_HOST_CALL stringProtoFuncMatch(ExecState* exec)
             return JSValue::encode(jsUndefined());
         }
         
-        size_t end = result.end;
+        end = result.end;
         size_t length = end - result.start;
         list.append(jsSubstring(exec, s, result.start, length));
         if (!length)
             ++end;
-        result = regExpConstructor->performMatch(*vm, regExp, string, s, end);
+        
+        if (global)
+            result = regExpConstructor->performMatch(*vm, regExp, string, s, end);
+        else
+            result = MatchResult();
     }
     if (list.isEmpty()) {
         // if there are no matches at all, it's important to return
         // Null instead of an empty array, because this matches
         // other browsers and because Null is a false value.
+        if (sticky)
+            regExpObject->setLastIndex(exec, 0);
+        
         return JSValue::encode(jsNull());
     }
-
+    
+    if (sticky)
+        regExpObject->setLastIndex(exec, end);
+    
     return JSValue::encode(constructArray(exec, static_cast<ArrayAllocationProfile*>(0), list));
 }
 
index 5672972..ff748fd 100644 (file)
 - path: es6/Proxy_internal_get_calls_Promise_resolve_functions.js
   cmd: runES6 :normal
 - path: es6/Proxy_internal_get_calls_RegExp.prototype.flags.js
-  cmd: runES6 :fail
+  cmd: runES6 :normal
 - path: es6/Proxy_internal_get_calls_RegExp.prototype.test.js
   cmd: runES6 :fail
 - path: es6/Proxy_internal_get_calls_RegExp.prototype[Symbol.match].js
 - path: es6/RegExp_y_and_u_flags_u_flag_Unicode_code_point_escapes.js
   cmd: runES6 :normal
 - path: es6/RegExp_y_and_u_flags_y_flag.js
-  cmd: runES6 :fail
+  cmd: runES6 :normal
 - path: es6/RegExp_y_and_u_flags_y_flag_lastIndex.js
-  cmd: runES6 :fail
+  cmd: runES6 :normal
 - path: es6/rest_parameters_arguments_object_interaction.js
   cmd: runES6 :normal
 - path: es6/rest_parameters_basic_functionality.js
index bc37deb..283f2f4 100644 (file)
@@ -3,5 +3,5 @@ function shouldBe(actual, expected) {
         throw new Error('bad value: ' + actual);
 }
 
-shouldBe(JSON.stringify(Object.getOwnPropertyNames(RegExp.prototype).sort()), '["compile","constructor","exec","flags","global","ignoreCase","lastIndex","multiline","source","test","toString","unicode"]');
+shouldBe(JSON.stringify(Object.getOwnPropertyNames(RegExp.prototype).sort()), '["compile","constructor","exec","flags","global","ignoreCase","lastIndex","multiline","source","sticky","test","toString","unicode"]');
 shouldBe(JSON.stringify(Object.getOwnPropertyNames(/Cocoa/).sort()), '["lastIndex"]');
index 93fef3c..1c435b6 100644 (file)
@@ -57,7 +57,15 @@ private:
 
     std::unique_ptr<JSC::Yarr::BytecodePattern> compile(const String& patternString, TextCaseSensitivity caseSensitivity, MultilineMode multilineMode)
     {
-        JSC::Yarr::YarrPattern pattern(patternString, (caseSensitivity == TextCaseInsensitive), (multilineMode == MultilineEnabled), false, &m_constructionError);
+        RegExpFlags flags = NoFlags;
+
+        if (caseSensitivity == TextCaseInsensitive)
+            flags = static_cast<RegExpFlags>(flags | FlagIgnoreCase);
+
+        if (multilineMode == MultilineEnabled)
+            flags = static_cast<RegExpFlags>(flags | FlagMultiline);
+
+        JSC::Yarr::YarrPattern pattern(patternString, flags, &m_constructionError);
         if (m_constructionError) {
             LOG_ERROR("RegularExpression: YARR compile failed with '%s'", m_constructionError);
             return nullptr;
index 6bc9b6c..bad2a37 100644 (file)
@@ -376,7 +376,7 @@ public:
             if (oldCh == ch)
                 continue;
 
-            if (pattern->m_ignoreCase) {
+            if (pattern->ignoreCase()) {
                 // See ES 6.0, 21.2.2.8.2 for the definition of Canonicalize(). For non-Unicode
                 // patterns, Unicode values are never allowed to match against ASCII ones.
                 // For Unicode, we need to check all canonical equivalents of a character.
@@ -396,15 +396,15 @@ public:
 
     bool matchAssertionBOL(ByteTerm& term)
     {
-        return (input.atStart(term.inputPosition)) || (pattern->m_multiline && testCharacterClass(pattern->newlineCharacterClass, input.readChecked(term.inputPosition + 1)));
+        return (input.atStart(term.inputPosition)) || (pattern->multiline() && testCharacterClass(pattern->newlineCharacterClass, input.readChecked(term.inputPosition + 1)));
     }
 
     bool matchAssertionEOL(ByteTerm& term)
     {
         if (term.inputPosition)
-            return (input.atEnd(term.inputPosition)) || (pattern->m_multiline && testCharacterClass(pattern->newlineCharacterClass, input.readChecked(term.inputPosition)));
+            return (input.atEnd(term.inputPosition)) || (pattern->multiline() && testCharacterClass(pattern->newlineCharacterClass, input.readChecked(term.inputPosition)));
 
-        return (input.atEnd()) || (pattern->m_multiline && testCharacterClass(pattern->newlineCharacterClass, input.read()));
+        return (input.atEnd()) || (pattern->multiline() && testCharacterClass(pattern->newlineCharacterClass, input.read()));
     }
 
     bool matchAssertionWordBoundary(ByteTerm& term)
@@ -1153,7 +1153,7 @@ public:
 
         if (((matchBegin && term.anchors.m_bol)
              || ((matchEnd != input.end()) && term.anchors.m_eol))
-            && !pattern->m_multiline)
+            && !pattern->multiline())
             return false;
 
         context->matchBegin = matchBegin;
@@ -1387,7 +1387,7 @@ public:
             if (offset > 0)
                 MATCH_NEXT();
 
-            if (input.atEnd())
+            if (input.atEnd() || pattern->sticky())
                 return JSRegExpNoMatch;
 
             input.next();
@@ -1541,9 +1541,9 @@ public:
 
     Interpreter(BytecodePattern* pattern, unsigned* output, const CharType* input, unsigned length, unsigned start)
         : pattern(pattern)
-        , unicode(pattern->m_unicode)
+        , unicode(pattern->unicode())
         , output(output)
-        , input(input, start, length, pattern->m_unicode)
+        , input(input, start, length, pattern->unicode())
         , allocatorPool(0)
         , remainingMatchCount(matchLimit)
     {
@@ -1612,7 +1612,7 @@ public:
 
     void atomPatternCharacter(UChar32 ch, unsigned inputPosition, unsigned frameLocation, Checked<unsigned> quantityCount, QuantifierType quantityType)
     {
-        if (m_pattern.m_ignoreCase) {
+        if (m_pattern.ignoreCase()) {
             UChar32 lo = u_tolower(ch);
             UChar32 hi = u_toupper(ch);
 
index 3a5bc28..176fc6d 100644 (file)
@@ -339,9 +339,7 @@ struct BytecodePattern {
 public:
     BytecodePattern(std::unique_ptr<ByteDisjunction> body, Vector<std::unique_ptr<ByteDisjunction>>& parenthesesInfoToAdopt, YarrPattern& pattern, BumpPointerAllocator* allocator)
         : m_body(WTFMove(body))
-        , m_ignoreCase(pattern.m_ignoreCase)
-        , m_multiline(pattern.m_multiline)
-        , m_unicode(pattern.m_unicode)
+        , m_flags(pattern.m_flags)
         , m_allocator(allocator)
     {
         m_body->terms.shrinkToFit();
@@ -357,11 +355,14 @@ public:
     }
 
     size_t estimatedSizeInBytes() const { return m_body->estimatedSizeInBytes(); }
+    
+    bool ignoreCase() const { return m_flags & FlagIgnoreCase; }
+    bool multiline() const { return m_flags & FlagMultiline; }
+    bool sticky() const { return m_flags & FlagSticky; }
+    bool unicode() const { return m_flags & FlagUnicode; }
 
     std::unique_ptr<ByteDisjunction> m_body;
-    bool m_ignoreCase;
-    bool m_multiline;
-    bool m_unicode;
+    RegExpFlags m_flags;
     // Each BytecodePattern is associated with a RegExp, each RegExp is associated
     // with a VM.  Cache a pointer to out VM's m_regExpAllocator.
     BumpPointerAllocator* m_allocator;
index 77863d8..91d635e 100644 (file)
@@ -234,7 +234,7 @@ class YarrGenerator : private MacroAssembler {
 
             for (unsigned i = 0; i < charClass->m_matches.size(); ++i) {
                 char ch = charClass->m_matches[i];
-                if (m_pattern.m_ignoreCase) {
+                if (m_pattern.ignoreCase()) {
                     if (isASCIILower(ch)) {
                         matchesAZaz.append(ch);
                         continue;
@@ -291,8 +291,8 @@ class YarrGenerator : private MacroAssembler {
 
         // For case-insesitive compares, non-ascii characters that have different
         // upper & lower case representations are converted to a character class.
-        ASSERT(!m_pattern.m_ignoreCase || isASCIIAlpha(ch) || isCanonicallyUnique(ch));
-        if (m_pattern.m_ignoreCase && isASCIIAlpha(ch)) {
+        ASSERT(!m_pattern.ignoreCase() || isASCIIAlpha(ch) || isCanonicallyUnique(ch));
+        if (m_pattern.ignoreCase() && isASCIIAlpha(ch)) {
             or32(TrustedImm32(0x20), character);
             ch |= 0x20;
         }
@@ -356,6 +356,13 @@ class YarrGenerator : private MacroAssembler {
             addPtr(Imm32(alignCallFrameSizeInBytes(callFrameSize)), stackPointerRegister);
     }
 
+    void generateFailReturn()
+    {
+        move(TrustedImmPtr((void*)WTF::notFound), returnRegister);
+        move(TrustedImm32(0), returnRegister2);
+        generateReturn();
+    }
+
     // Used to record subpatters, should only be called if compileMode is IncludeSubpatterns.
     void setSubpatternStart(RegisterID reg, unsigned subpattern)
     {
@@ -633,7 +640,7 @@ class YarrGenerator : private MacroAssembler {
         YarrOp& op = m_ops[opIndex];
         PatternTerm* term = op.m_term;
 
-        if (m_pattern.m_multiline) {
+        if (m_pattern.multiline()) {
             const RegisterID character = regT0;
 
             JumpList matchDest;
@@ -663,7 +670,7 @@ class YarrGenerator : private MacroAssembler {
         YarrOp& op = m_ops[opIndex];
         PatternTerm* term = op.m_term;
 
-        if (m_pattern.m_multiline) {
+        if (m_pattern.multiline()) {
             const RegisterID character = regT0;
 
             JumpList matchDest;
@@ -787,9 +794,9 @@ class YarrGenerator : private MacroAssembler {
 
         // For case-insesitive compares, non-ascii characters that have different
         // upper & lower case representations are converted to a character class.
-        ASSERT(!m_pattern.m_ignoreCase || isASCIIAlpha(ch) || isCanonicallyUnique(ch));
+        ASSERT(!m_pattern.ignoreCase() || isASCIIAlpha(ch) || isCanonicallyUnique(ch));
 
-        if (m_pattern.m_ignoreCase && isASCIIAlpha(ch))
+        if (m_pattern.ignoreCase() && isASCIIAlpha(ch))
 #if CPU(BIG_ENDIAN)
             ignoreCaseMask |= 32 << (m_charSize == Char8 ? 24 : 16);
 #else
@@ -823,11 +830,11 @@ class YarrGenerator : private MacroAssembler {
 
             // For case-insesitive compares, non-ascii characters that have different
             // upper & lower case representations are converted to a character class.
-            ASSERT(!m_pattern.m_ignoreCase || isASCIIAlpha(currentCharacter) || isCanonicallyUnique(currentCharacter));
+            ASSERT(!m_pattern.ignoreCase() || isASCIIAlpha(currentCharacter) || isCanonicallyUnique(currentCharacter));
 
             allCharacters |= (currentCharacter << shiftAmount);
 
-            if ((m_pattern.m_ignoreCase) && (isASCIIAlpha(currentCharacter)))
+            if ((m_pattern.ignoreCase()) && (isASCIIAlpha(currentCharacter)))
                 ignoreCaseMask |= 32 << shiftAmount;                    
         }
 
@@ -900,8 +907,8 @@ class YarrGenerator : private MacroAssembler {
 
         // For case-insesitive compares, non-ascii characters that have different
         // upper & lower case representations are converted to a character class.
-        ASSERT(!m_pattern.m_ignoreCase || isASCIIAlpha(ch) || isCanonicallyUnique(ch));
-        if (m_pattern.m_ignoreCase && isASCIIAlpha(ch)) {
+        ASSERT(!m_pattern.ignoreCase() || isASCIIAlpha(ch) || isCanonicallyUnique(ch));
+        if (m_pattern.ignoreCase() && isASCIIAlpha(ch)) {
             or32(TrustedImm32(0x20), character);
             ch |= 0x20;
         }
@@ -1195,7 +1202,7 @@ class YarrGenerator : private MacroAssembler {
         add32(TrustedImm32(1), matchPos); // Advance past newline
         saveStartIndex.link(this);
 
-        if (!m_pattern.m_multiline && term->anchors.bolAnchor)
+        if (!m_pattern.multiline() && term->anchors.bolAnchor)
             op.m_jumps.append(branchTest32(NonZero, matchPos));
 
         ASSERT(!m_pattern.m_body->m_hasFixedSize);
@@ -1215,7 +1222,7 @@ class YarrGenerator : private MacroAssembler {
 
         foundEndingNewLine.link(this);
 
-        if (!m_pattern.m_multiline && term->anchors.eolAnchor)
+        if (!m_pattern.multiline() && term->anchors.eolAnchor)
             op.m_jumps.append(branch32(NotEqual, matchPos, length));
 
         move(matchPos, index);
@@ -1750,9 +1757,7 @@ class YarrGenerator : private MacroAssembler {
 
             case OpMatchFailed:
                 removeCallFrame();
-                move(TrustedImmPtr((void*)WTF::notFound), returnRegister);
-                move(TrustedImm32(0), returnRegister2);
-                generateReturn();
+                generateFailReturn();
                 break;
             }
 
@@ -1843,47 +1848,64 @@ class YarrGenerator : private MacroAssembler {
                         // around to the first alternative.
                         m_backtrackingState.link(this);
 
-                        // If the pattern size is not fixed, then store the start index, for use if we match.
-                        if (!m_pattern.m_body->m_hasFixedSize) {
-                            if (alternative->m_minimumSize == 1)
-                                setMatchStart(index);
-                            else {
-                                move(index, regT0);
-                                if (alternative->m_minimumSize)
-                                    sub32(Imm32(alternative->m_minimumSize - 1), regT0);
-                                else
-                                    add32(TrustedImm32(1), regT0);
-                                setMatchStart(regT0);
+                        // No need to advance and retry for a stick pattern.
+                        if (!m_pattern.sticky()) {
+                            // If the pattern size is not fixed, then store the start index for use if we match.
+                            if (!m_pattern.m_body->m_hasFixedSize) {
+                                if (alternative->m_minimumSize == 1)
+                                    setMatchStart(index);
+                                else {
+                                    move(index, regT0);
+                                    if (alternative->m_minimumSize)
+                                        sub32(Imm32(alternative->m_minimumSize - 1), regT0);
+                                    else
+                                        add32(TrustedImm32(1), regT0);
+                                    setMatchStart(regT0);
+                                }
                             }
-                        }
 
-                        // Generate code to loop. Check whether the last alternative is longer than the
-                        // first (e.g. /a|xy/ or /a|xyz/).
-                        if (alternative->m_minimumSize > beginOp->m_alternative->m_minimumSize) {
-                            // We want to loop, and increment input position. If the delta is 1, it is
-                            // already correctly incremented, if more than one then decrement as appropriate.
-                            unsigned delta = alternative->m_minimumSize - beginOp->m_alternative->m_minimumSize;
-                            ASSERT(delta);
-                            if (delta != 1)
-                                sub32(Imm32(delta - 1), index);
-                            jump(beginOp->m_reentry);
-                        } else {
-                            // If the first alternative has minimum size 0xFFFFFFFFu, then there cannot
-                            // be sufficent input available to handle this, so just fall through.
-                            unsigned delta = beginOp->m_alternative->m_minimumSize - alternative->m_minimumSize;
-                            if (delta != 0xFFFFFFFFu) {
-                                // We need to check input because we are incrementing the input.
-                                add32(Imm32(delta + 1), index);
-                                checkInput().linkTo(beginOp->m_reentry, this);
+                            // Generate code to loop. Check whether the last alternative is longer than the
+                            // first (e.g. /a|xy/ or /a|xyz/).
+                            if (alternative->m_minimumSize > beginOp->m_alternative->m_minimumSize) {
+                                // We want to loop, and increment input position. If the delta is 1, it is
+                                // already correctly incremented, if more than one then decrement as appropriate.
+                                unsigned delta = alternative->m_minimumSize - beginOp->m_alternative->m_minimumSize;
+                                ASSERT(delta);
+                                if (delta != 1)
+                                    sub32(Imm32(delta - 1), index);
+                                jump(beginOp->m_reentry);
+                            } else {
+                                // If the first alternative has minimum size 0xFFFFFFFFu, then there cannot
+                                // be sufficent input available to handle this, so just fall through.
+                                unsigned delta = beginOp->m_alternative->m_minimumSize - alternative->m_minimumSize;
+                                if (delta != 0xFFFFFFFFu) {
+                                    // We need to check input because we are incrementing the input.
+                                    add32(Imm32(delta + 1), index);
+                                    checkInput().linkTo(beginOp->m_reentry, this);
+                                }
                             }
                         }
                     }
                 }
 
+                if (m_pattern.sticky()) {
+                    // We have failed matching from the initial index and we're a sticky expression.
+                    // We are done matching. Link failures for any reason to here.
+                    YarrOp* tempOp = beginOp;
+                    do {
+                        tempOp->m_jumps.link(this);
+                        tempOp = &m_ops[tempOp->m_nextOp];
+                    } while (tempOp->m_op != OpBodyAlternativeEnd);
+
+                    removeCallFrame();
+                    generateFailReturn();
+                    break;
+                }
+
                 // We can reach this point in the code in two ways:
                 //  - Fallthrough from the code above (a repeating alternative backtracked out of its
                 //    last alternative, and did not have sufficent input to run the first).
-                //  - We will loop back up to the following label when a releating alternative loops,
+                //  - We will loop back up to the following label when a repeating alternative loops,
                 //    following a failed input check.
                 //
                 // Either way, we have just failed the input check for the first alternative.
@@ -1990,9 +2012,7 @@ class YarrGenerator : private MacroAssembler {
                 matchFailed.link(this);
 
                 removeCallFrame();
-                move(TrustedImmPtr((void*)WTF::notFound), returnRegister);
-                move(TrustedImm32(0), returnRegister2);
-                generateReturn();
+                generateFailReturn();
                 break;
             }
             case OpBodyAlternativeEnd: {
@@ -2631,9 +2651,7 @@ public:
         generateEnter();
 
         Jump hasInput = checkInput();
-        move(TrustedImmPtr((void*)WTF::notFound), returnRegister);
-        move(TrustedImm32(0), returnRegister2);
-        generateReturn();
+        generateFailReturn();
         hasInput.link(this);
 
         if (compileMode == IncludeSubpatterns) {
index a7b8c34..daf38f2 100644 (file)
@@ -276,7 +276,7 @@ class YarrPatternConstructor {
 public:
     YarrPatternConstructor(YarrPattern& pattern)
         : m_pattern(pattern)
-        , m_characterClassConstructor(pattern.m_ignoreCase, pattern.m_unicode ? CanonicalMode::Unicode : CanonicalMode::UCS2)
+        , m_characterClassConstructor(pattern.ignoreCase(), pattern.unicode() ? CanonicalMode::Unicode : CanonicalMode::UCS2)
         , m_invertParentheticalAssertion(false)
     {
         auto body = std::make_unique<PatternDisjunction>();
@@ -322,12 +322,12 @@ public:
     {
         // We handle case-insensitive checking of unicode characters which do have both
         // cases by handling them as if they were defined using a CharacterClass.
-        if (!m_pattern.m_ignoreCase || (isASCII(ch) && !m_pattern.m_unicode)) {
+        if (!m_pattern.ignoreCase() || (isASCII(ch) && !m_pattern.unicode())) {
             m_alternative->m_terms.append(PatternTerm(ch));
             return;
         }
 
-        const CanonicalizationRange* info = canonicalRangeInfoFor(ch, m_pattern.m_unicode ? CanonicalMode::Unicode : CanonicalMode::UCS2);
+        const CanonicalizationRange* info = canonicalRangeInfoFor(ch, m_pattern.unicode() ? CanonicalMode::Unicode : CanonicalMode::UCS2);
         if (info->type == CanonicalizeUnique) {
             m_alternative->m_terms.append(PatternTerm(ch));
             return;
@@ -601,7 +601,7 @@ public:
                     term.frameLocation = currentCallFrameSize;
                     currentCallFrameSize += YarrStackSpaceForBackTrackInfoPatternCharacter;
                     alternative->m_hasFixedSize = false;
-                } else if (m_pattern.m_unicode) {
+                } else if (m_pattern.unicode()) {
                     currentInputPosition += U16_LENGTH(term.patternCharacter) * term.quantityCount;
                 } else
                     currentInputPosition += term.quantityCount;
@@ -613,7 +613,7 @@ public:
                     term.frameLocation = currentCallFrameSize;
                     currentCallFrameSize += YarrStackSpaceForBackTrackInfoCharacterClass;
                     alternative->m_hasFixedSize = false;
-                } else if (m_pattern.m_unicode) {
+                } else if (m_pattern.unicode()) {
                     term.frameLocation = currentCallFrameSize;
                     currentCallFrameSize += YarrStackSpaceForBackTrackInfoCharacterClass;
                     currentInputPosition += term.quantityCount;
@@ -733,7 +733,7 @@ public:
         // At this point, this is only valid for non-multiline expressions.
         PatternDisjunction* disjunction = m_pattern.m_body;
         
-        if (!m_pattern.m_containsBOL || m_pattern.m_multiline)
+        if (!m_pattern.m_containsBOL || m_pattern.multiline())
             return;
         
         PatternDisjunction* loopDisjunction = copyDisjunction(disjunction, true);
@@ -844,7 +844,7 @@ const char* YarrPattern::compile(const String& patternString)
 {
     YarrPatternConstructor constructor(*this);
 
-    if (const char* error = parse(constructor, patternString, m_unicode))
+    if (const char* error = parse(constructor, patternString, unicode()))
         return error;
     
     // If the pattern contains illegal backreferences reset & reparse.
@@ -858,7 +858,7 @@ const char* YarrPattern::compile(const String& patternString)
 #if !ASSERT_DISABLED
         const char* error =
 #endif
-            parse(constructor, patternString, m_unicode, numSubpatterns);
+            parse(constructor, patternString, unicode(), numSubpatterns);
 
         ASSERT(!error);
         ASSERT(numSubpatterns == m_numSubpatterns);
@@ -873,13 +873,11 @@ const char* YarrPattern::compile(const String& patternString)
     return 0;
 }
 
-YarrPattern::YarrPattern(const String& pattern, bool ignoreCase, bool multiline, bool unicode, const char** error)
-    : m_ignoreCase(ignoreCase)
-    , m_multiline(multiline)
-    , m_unicode(unicode)
-    , m_containsBackreferences(false)
+YarrPattern::YarrPattern(const String& pattern, RegExpFlags flags, const char** error)
+    : m_containsBackreferences(false)
     , m_containsBOL(false)
     , m_containsUnsignedLengthPattern(false)
+    , m_flags(flags)
     , m_numSubpatterns(0)
     , m_maxBackReference(0)
     , newlineCached(0)
index e7fefc8..b69dafe 100644 (file)
@@ -27,6 +27,7 @@
 #ifndef YarrPattern_h
 #define YarrPattern_h
 
+#include "RegExpKey.h"
 #include <wtf/CheckedArithmetic.h>
 #include <wtf/RefCounted.h>
 #include <wtf/Vector.h>
@@ -299,8 +300,9 @@ struct TermChain {
     Vector<TermChain> hotTerms;
 };
 
+
 struct YarrPattern {
-    JS_EXPORT_PRIVATE YarrPattern(const String& pattern, bool ignoreCase, bool multiline, bool unicode, const char** error);
+    JS_EXPORT_PRIVATE YarrPattern(const String& pattern, RegExpFlags flags, const char** error);
 
     void reset()
     {
@@ -390,12 +392,15 @@ struct YarrPattern {
         return nonwordcharCached;
     }
 
-    bool m_ignoreCase : 1;
-    bool m_multiline : 1;
-    bool m_unicode : 1;
+    bool ignoreCase() const { return m_flags & FlagIgnoreCase; }
+    bool multiline() const { return m_flags & FlagMultiline; }
+    bool sticky() const { return m_flags & FlagSticky; }
+    bool unicode() const { return m_flags & FlagUnicode; }
+
     bool m_containsBackreferences : 1;
     bool m_containsBOL : 1;
     bool m_containsUnsignedLengthPattern : 1; 
+    RegExpFlags m_flags;
     unsigned m_numSubpatterns;
     unsigned m_maxBackReference;
     PatternDisjunction* m_body;