Add support for RegExp "dotAll" flag
authormsaboff@apple.com <msaboff@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 24 Aug 2017 21:14:43 +0000 (21:14 +0000)
committermsaboff@apple.com <msaboff@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 24 Aug 2017 21:14:43 +0000 (21:14 +0000)
https://bugs.webkit.org/show_bug.cgi?id=175924

Reviewed by Keith Miller.

JSTests:

Updated tests for new dotAll ('s' flag) changes.

* es6/Proxy_internal_get_calls_RegExp.prototype.flags.js:
* stress/static-getter-in-names.js:

Source/JavaScriptCore:

The dotAll RegExp flag, 's', changes . to match any character including line terminators.
Added a the "dotAll" identifier as well as RegExp.prototype.dotAll getter.
Added a new any character CharacterClass that is used to match . terms in a dotAll flags
RegExp.  In the YARR pattern and parsing code, changed the NewlineClassID, which was only
used for '.' processing, to DotClassID.  The selection of which builtin character class
that DotClassID resolves to when generating the pattern is conditional on the dotAll flag.
This NewlineClassID to DotClassID refactoring includes the atomBuiltInCharacterClass() in
the WebCore content extensions code in the PatternParser class.

As an optimization, the Yarr JIT actually doesn't perform match checks against the builtin
any character CharacterClass, it merely reads the character.  There is another optimization
in our DotStart enclosure processing where a non-capturing regular expression in the form
of .*<expression.*, with options beginning ^ and/or trailing $, match the contained
expression and then look for the extents of the surrounding .*'s.  When used with the
dotAll flag, that processing alwys results with the beinning of the string and the end
of the string.  Therefore we short circuit the finding the beginning and end of the line
or string with dotAll patterns.

* bytecode/BytecodeDumper.cpp:
(JSC::regexpToSourceString):
* runtime/CommonIdentifiers.h:
* runtime/RegExp.cpp:
(JSC::regExpFlags):
(JSC::RegExpFunctionalTestCollector::outputOneTest):
* runtime/RegExp.h:
* runtime/RegExpKey.h:
* runtime/RegExpPrototype.cpp:
(JSC::RegExpPrototype::finishCreation):
(JSC::flagsString):
(JSC::regExpProtoGetterDotAll):
* yarr/YarrInterpreter.cpp:
(JSC::Yarr::Interpreter::matchDotStarEnclosure):
* yarr/YarrInterpreter.h:
(JSC::Yarr::BytecodePattern::dotAll const):
* yarr/YarrJIT.cpp:
(JSC::Yarr::YarrGenerator::optimizeAlternative):
(JSC::Yarr::YarrGenerator::generateCharacterClassOnce):
(JSC::Yarr::YarrGenerator::generateCharacterClassFixed):
(JSC::Yarr::YarrGenerator::generateCharacterClassGreedy):
(JSC::Yarr::YarrGenerator::backtrackCharacterClassNonGreedy):
(JSC::Yarr::YarrGenerator::generateDotStarEnclosure):
* yarr/YarrParser.h:
(JSC::Yarr::Parser::parseTokens):
* yarr/YarrPattern.cpp:
(JSC::Yarr::YarrPatternConstructor::atomBuiltInCharacterClass):
(JSC::Yarr::YarrPatternConstructor::atomCharacterClassBuiltIn):
(JSC::Yarr::YarrPatternConstructor::optimizeDotStarWrappedExpressions):
(JSC::Yarr::YarrPattern::YarrPattern):
(JSC::Yarr::PatternTerm::dump):
(JSC::Yarr::anycharCreate):
* yarr/YarrPattern.h:
(JSC::Yarr::YarrPattern::reset):
(JSC::Yarr::YarrPattern::anyCharacterClass):
(JSC::Yarr::YarrPattern::dotAll const):

Source/WebCore:

Changed due to refactoring NewlineClassID to DotClassID.

No new tests. No change in behavior.

* contentextensions/URLFilterParser.cpp:
(WebCore::ContentExtensions::PatternParser::atomBuiltInCharacterClass):

LayoutTests:

* js/regexp-dotall-expected.txt: Added.
* js/regexp-dotall.html: Added.
* js/script-tests/Object-getOwnPropertyNames.js:
* js/script-tests/regexp-dotall.js: Added.
New tests.

* js/Object-getOwnPropertyNames-expected.txt:
Updated tests for new dotAll ('s' flag) changes.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@221160 268f45cc-cd09-0410-ab3c-d52691b4dbfc

24 files changed:
JSTests/ChangeLog
JSTests/es6/Proxy_internal_get_calls_RegExp.prototype.flags.js
JSTests/stress/static-getter-in-names.js
LayoutTests/ChangeLog
LayoutTests/js/Object-getOwnPropertyNames-expected.txt
LayoutTests/js/regexp-dotall-expected.txt [new file with mode: 0644]
LayoutTests/js/regexp-dotall.html [new file with mode: 0644]
LayoutTests/js/script-tests/Object-getOwnPropertyNames.js
LayoutTests/js/script-tests/regexp-dotall.js [new file with mode: 0644]
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/bytecode/BytecodeDumper.cpp
Source/JavaScriptCore/runtime/CommonIdentifiers.h
Source/JavaScriptCore/runtime/RegExp.cpp
Source/JavaScriptCore/runtime/RegExp.h
Source/JavaScriptCore/runtime/RegExpKey.h
Source/JavaScriptCore/runtime/RegExpPrototype.cpp
Source/JavaScriptCore/yarr/YarrInterpreter.cpp
Source/JavaScriptCore/yarr/YarrInterpreter.h
Source/JavaScriptCore/yarr/YarrJIT.cpp
Source/JavaScriptCore/yarr/YarrParser.h
Source/JavaScriptCore/yarr/YarrPattern.cpp
Source/JavaScriptCore/yarr/YarrPattern.h
Source/WebCore/ChangeLog
Source/WebCore/contentextensions/URLFilterParser.cpp

index d554493..6f32055 100644 (file)
@@ -1,3 +1,15 @@
+2017-08-24  Michael Saboff  <msaboff@apple.com>
+
+        Add support for RegExp "dotAll" flag
+        https://bugs.webkit.org/show_bug.cgi?id=175924
+
+        Reviewed by Keith Miller.
+
+        Updated tests for new dotAll ('s' flag) changes.
+
+        * es6/Proxy_internal_get_calls_RegExp.prototype.flags.js:
+        * stress/static-getter-in-names.js:
+
 2017-08-24  Mark Lam  <mark.lam@apple.com>
 
         Land regression test for https://bugs.webkit.org/show_bug.cgi?id=164081.
index 12cc214..48668d9 100644 (file)
@@ -4,7 +4,7 @@ function test() {
 var get = [];
 var p = new Proxy({}, { get: function(o, k) { get.push(k); return o[k]; }});
 Object.getOwnPropertyDescriptor(RegExp.prototype, 'flags').get.call(p);
-return get + '' === "global,ignoreCase,multiline,unicode,sticky";
+return get + '' === "global,ignoreCase,multiline,dotAll,unicode,sticky";
       
 }
 
index ccf3a7d..0c44df2 100644 (file)
@@ -3,5 +3,5 @@ function shouldBe(actual, expected) {
         throw new Error('bad value: ' + actual);
 }
 
-shouldBe(JSON.stringify(Object.getOwnPropertyNames(RegExp.prototype).sort()), '["compile","constructor","exec","flags","global","ignoreCase","multiline","source","sticky","test","toString","unicode"]');
+shouldBe(JSON.stringify(Object.getOwnPropertyNames(RegExp.prototype).sort()), '["compile","constructor","dotAll","exec","flags","global","ignoreCase","multiline","source","sticky","test","toString","unicode"]');
 shouldBe(JSON.stringify(Object.getOwnPropertyNames(/Cocoa/).sort()), '["lastIndex"]');
index 1feda3d..006eff4 100644 (file)
@@ -1,3 +1,19 @@
+2017-08-24  Michael Saboff  <msaboff@apple.com>
+
+        Add support for RegExp "dotAll" flag
+        https://bugs.webkit.org/show_bug.cgi?id=175924
+
+        Reviewed by Keith Miller.
+
+        * js/regexp-dotall-expected.txt: Added.
+        * js/regexp-dotall.html: Added.
+        * js/script-tests/Object-getOwnPropertyNames.js:
+        * js/script-tests/regexp-dotall.js: Added.
+        New tests.
+
+        * js/Object-getOwnPropertyNames-expected.txt:
+        Updated tests for new dotAll ('s' flag) changes.
+
 2017-08-24  Kirill Ovchinnikov  <kirill.ovchinn@gmail.com>
 
         HTMLTrackElement behavior violates the standard
index a619791..db3fd61 100644 (file)
@@ -57,7 +57,7 @@ PASS getSortedOwnPropertyNames(Number.prototype) is ['constructor', 'toExponenti
 PASS getSortedOwnPropertyNames(Date) is ['UTC', 'length', 'name', 'now', 'parse', 'prototype']
 PASS getSortedOwnPropertyNames(Date.prototype) is ['constructor', 'getDate', 'getDay', 'getFullYear', 'getHours', 'getMilliseconds', 'getMinutes', 'getMonth', 'getSeconds', 'getTime', 'getTimezoneOffset', 'getUTCDate', 'getUTCDay', 'getUTCFullYear', 'getUTCHours', 'getUTCMilliseconds', 'getUTCMinutes', 'getUTCMonth', 'getUTCSeconds', 'getYear', 'setDate', 'setFullYear', 'setHours', 'setMilliseconds', 'setMinutes', 'setMonth', 'setSeconds', 'setTime', 'setUTCDate', 'setUTCFullYear', 'setUTCHours', 'setUTCMilliseconds', 'setUTCMinutes', 'setUTCMonth', 'setUTCSeconds', 'setYear', 'toDateString', 'toGMTString', 'toISOString', 'toJSON', 'toLocaleDateString', 'toLocaleString', 'toLocaleTimeString', 'toString', 'toTimeString', 'toUTCString', 'valueOf']
 PASS getSortedOwnPropertyNames(RegExp) is ['$&', "$'", '$*', '$+', '$1', '$2', '$3', '$4', '$5', '$6', '$7', '$8', '$9', '$_', '$`', 'input', 'lastMatch', 'lastParen', 'leftContext', 'length', 'multiline', 'name', 'prototype', 'rightContext']
-PASS getSortedOwnPropertyNames(RegExp.prototype) is ['compile', 'constructor', 'exec', 'flags', 'global', 'ignoreCase', 'multiline', 'source', 'sticky', 'test', 'toString', 'unicode']
+PASS getSortedOwnPropertyNames(RegExp.prototype) is ['compile', 'constructor', 'dotAll', 'exec', 'flags', 'global', 'ignoreCase', 'multiline', 'source', 'sticky', 'test', 'toString', 'unicode']
 PASS getSortedOwnPropertyNames(Error) is ['length', 'name', 'prototype', 'stackTraceLimit']
 PASS getSortedOwnPropertyNames(Error.prototype) is ['constructor', 'message', 'name', 'toString']
 PASS getSortedOwnPropertyNames(Math) is ['E','LN10','LN2','LOG10E','LOG2E','PI','SQRT1_2','SQRT2','abs','acos','acosh','asin','asinh','atan','atan2','atanh','cbrt','ceil','clz32','cos','cosh','exp','expm1','floor','fround','hypot','imul','log','log10','log1p','log2','max','min','pow','random','round','sign','sin','sinh','sqrt','tan','tanh','trunc']
diff --git a/LayoutTests/js/regexp-dotall-expected.txt b/LayoutTests/js/regexp-dotall-expected.txt
new file mode 100644 (file)
index 0000000..6112796
--- /dev/null
@@ -0,0 +1,79 @@
+Test for processing of RegExp dotAll flag
+
+On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
+
+
+PASS "aaXcc".match(/.X./)[0].length is 3
+PASS "aaXcc".match(/.X./s)[0].length is 3
+PASS "aa\nXcc".match(/.X./) is null
+PASS "aa\nXcc".match(/.X./m) is null
+PASS "aa\nX\ncc".match(/.X./s)[0] is "\nX\n"
+PASS "aa\nX\ncc".match(/.X./ms)[0] is "\nX\n"
+PASS "aa\nXcc".match(/.*X/)[0] is "X"
+PASS "aa\nXcc".match(/.*X/m)[0] is "X"
+PASS "aa\nXcc".match(/.*X/s)[0] is "aa\nX"
+PASS "aa\nXcc".match(/.*X/sm)[0] is "aa\nX"
+PASS "aaX\ncc".match(/X.*/)[0] is "X"
+PASS "aaX\ncc".match(/X.*/m)[0] is "X"
+PASS "aaX\ncc".match(/X.*/s)[0] is "X\ncc"
+PASS "aaX\ncc".match(/X.*/sm)[0] is "X\ncc"
+PASS "aa\nX\ncc".match(/.*X.*/)[0] is "X"
+PASS "aa\nX\ncc".match(/.*X.*/m)[0] is "X"
+PASS "aa\nX\ncc".match(/.*X.*/s)[0] is "aa\nX\ncc"
+PASS "aa\nX\ncc".match(/.*X.*/sm)[0] is "aa\nX\ncc"
+PASS "aa\nXcc".match(/.*^X/) is null
+PASS "aa\nXcc".match(/.*^X/m)[0] is "X"
+PASS "aa\nXcc".match(/.*^X/s) is null
+PASS "aa\nXcc".match(/.*^X/sm)[0] is "aa\nX"
+PASS "aaX\ncc".match(/X$.*/) is null
+PASS "aaX\ncc".match(/X$.*/m)[0] is "X"
+PASS "aaX\ncc".match(/X$.*/s) is null
+PASS "aaX\ncc".match(/X$.*/sm)[0] is "X\ncc"
+PASS "aa\nX\ncc".match(/.*^X$.*/) is null
+PASS "aa\nX\ncc".match(/.*^X$.*/m)[0] is "X"
+PASS "aa\nX\ncc".match(/.*^X$.*/s) is null
+PASS "aa\nX\ncc".match(/.*^X$.*/sm)[0] is "aa\nX\ncc"
+PASS "aa\nXcc".match(/^.*X/) is null
+PASS "aa\nXcc".match(/^.*X/m)[0] is "X"
+PASS "aa\nXcc".match(/^.*X/s)[0] is "aa\nX"
+PASS "aa\nXcc".match(/^.*X/sm)[0] is "aa\nX"
+PASS "aaX\ncc".match(/X.*$/) is null
+PASS "aaX\ncc".match(/X.*$/m)[0] is "X"
+PASS "aaX\ncc".match(/X.*$/s)[0] is "X\ncc"
+PASS "aaX\ncc".match(/X.*$/sm)[0] is "X\ncc"
+PASS "aa\nX\ncc".match(/^.*X.*$/) is null
+PASS "aa\nX\ncc".match(/^.*X.*$/m)[0] is "X"
+PASS "aa\nX\ncc".match(/^.*X.*$/s)[0] is "aa\nX\ncc"
+PASS "aa\nX\ncc".match(/^.*X.*$/sm)[0] is "aa\nX\ncc"
+PASS "a\na\nX\nc\nc\n".match(/^.*X.*$/) is null
+PASS "a\na\nX\nc\nc\n".match(/^.*X.*$/m)[0] is "X"
+PASS "a\na\nX\nc\nc\n".match(/^.*X.*$/s)[0] is "a\na\nX\nc\nc\n"
+PASS "a\na\nX\nc\nc\n".match(/^.*X.*$/sm)[0] is "a\na\nX\nc\nc\n"
+PASS "a\na\nX\nc\nc\n".match(/^.*X.*$/) is null
+PASS "a\na\nX\nc\nc\n".match(/^.*X.*$/m)[0] is "X"
+PASS "a\na\nX\nc\nc\n".match(/^.*X.*$/s)[0] is "a\na\nX\nc\nc\n"
+PASS "a\na\nX\nc\nc\n".match(/^.*X.*$/sm)[0] is "a\na\nX\nc\nc\n"
+PASS "\n\n\nX".match(/.{1}X/sm)[0] is "\nX"
+PASS "\n\n\nX".match(/.{1,2}X/sm)[0] is "\n\nX"
+PASS "\n\n\nX".match(/.{1,3}X/sm)[0] is "\n\n\nX"
+PASS "\n\n\nX".match(/.{1,4}X/sm)[0] is "\n\n\nX"
+PASS "\n\n\nX".match(/.{1,2}?X/sm)[0] is "\n\nX"
+PASS "\n\n\nX".match(/.{1,3}?X/sm)[0] is "\n\n\nX"
+PASS "\n\n\nX".match(/.{1,4}?X/sm)[0] is "\n\n\nX"
+PASS "X\n\n\nY".match(/X.{1}/sm)[0] is "X\n"
+PASS "X\n\n\nY".match(/X.{1,2}/sm)[0] is "X\n\n"
+PASS "X\n\n\nY".match(/X.{1,3}/sm)[0] is "X\n\n\n"
+PASS "X\n\n\nY".match(/X.{1,4}/sm)[0] is "X\n\n\nY"
+PASS "X\n\n\nY".match(/X.{1,2}?/sm)[0] is "X\n"
+PASS "X\n\n\nY".match(/X.{1,3}?/sm)[0] is "X\n"
+PASS "X\n\n\nY".match(/X.{1,4}?/sm)[0] is "X\n"
+PASS "The\nquick\nbrown\nfox\njumped.".match(/.*brown.*/)[0] is "brown"
+PASS "The\nquick\nbrown\nfox\njumped.".match(/.*brown.*/s)[0] is "The\nquick\nbrown\nfox\njumped."
+PASS "The\nquick\nbrown\nfox\njumped.".match(/The.quick.brown.fox.jumped./) is null
+PASS "The\nquick\nbrown\nfox\njumped.".match(/The.quick.brown.fox.jumped./s)[0] is "The\nquick\nbrown\nfox\njumped."
+PASS /a/.dotAll is false
+PASS /a/s.dotAll is true
+PASS successfullyParsed is true
+
+TEST COMPLETE
+
diff --git a/LayoutTests/js/regexp-dotall.html b/LayoutTests/js/regexp-dotall.html
new file mode 100644 (file)
index 0000000..1a0cbeb
--- /dev/null
@@ -0,0 +1,10 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+<head>
+<script src="../resources/js-test-pre.js"></script>
+</head>
+<body>
+<script src="script-tests/regexp-dotall.js"></script>
+<script src="../resources/js-test-post.js"></script>
+</body>
+</html>
index 566942f..6431192 100644 (file)
@@ -66,7 +66,7 @@ var expectedPropertyNamesSet = {
     "Date": "['UTC', 'length', 'name', 'now', 'parse', 'prototype']",
     "Date.prototype": "['constructor', 'getDate', 'getDay', 'getFullYear', 'getHours', 'getMilliseconds', 'getMinutes', 'getMonth', 'getSeconds', 'getTime', 'getTimezoneOffset', 'getUTCDate', 'getUTCDay', 'getUTCFullYear', 'getUTCHours', 'getUTCMilliseconds', 'getUTCMinutes', 'getUTCMonth', 'getUTCSeconds', 'getYear', 'setDate', 'setFullYear', 'setHours', 'setMilliseconds', 'setMinutes', 'setMonth', 'setSeconds', 'setTime', 'setUTCDate', 'setUTCFullYear', 'setUTCHours', 'setUTCMilliseconds', 'setUTCMinutes', 'setUTCMonth', 'setUTCSeconds', 'setYear', 'toDateString', 'toGMTString', 'toISOString', 'toJSON', 'toLocaleDateString', 'toLocaleString', 'toLocaleTimeString', 'toString', 'toTimeString', 'toUTCString', 'valueOf']",
     "RegExp": "['$&', \"$'\", '$*', '$+', '$1', '$2', '$3', '$4', '$5', '$6', '$7', '$8', '$9', '$_', '$`', 'input', 'lastMatch', 'lastParen', 'leftContext', 'length', 'multiline', 'name', 'prototype', 'rightContext']",
-    "RegExp.prototype": "['compile', 'constructor', 'exec', 'flags', 'global', 'ignoreCase', 'multiline', 'source', 'sticky', 'test', 'toString', 'unicode']",
+    "RegExp.prototype": "['compile', 'constructor', 'dotAll', 'exec', 'flags', 'global', 'ignoreCase', 'multiline', 'source', 'sticky', 'test', 'toString', 'unicode']",
     "Error": "['length', 'name', 'prototype', 'stackTraceLimit']",
     "Error.prototype": "['constructor', 'message', 'name', 'toString']",
     "Math": "['E','LN10','LN2','LOG10E','LOG2E','PI','SQRT1_2','SQRT2','abs','acos','acosh','asin','asinh','atan','atan2','atanh','cbrt','ceil','clz32','cos','cosh','exp','expm1','floor','fround','hypot','imul','log','log10','log1p','log2','max','min','pow','random','round','sign','sin','sinh','sqrt','tan','tanh','trunc']",
diff --git a/LayoutTests/js/script-tests/regexp-dotall.js b/LayoutTests/js/script-tests/regexp-dotall.js
new file mode 100644 (file)
index 0000000..7de5cc9
--- /dev/null
@@ -0,0 +1,77 @@
+description(
+'Test for processing of RegExp dotAll flag'
+);
+
+// Check dotAll matching operation
+shouldBe('"aaXcc".match(/.X./)[0].length', '3');
+shouldBe('"aaXcc".match(/.X./s)[0].length', '3');
+shouldBeNull('"aa\\nXcc".match(/.X./)');
+shouldBeNull('"aa\\nXcc".match(/.X./m)');
+shouldBe('"aa\\nX\\ncc".match(/.X./s)[0]', '"\\nX\\n"');
+shouldBe('"aa\\nX\\ncc".match(/.X./ms)[0]', '"\\nX\\n"');
+shouldBe('"aa\\nXcc".match(/.*X/)[0]', '"X"');
+shouldBe('"aa\\nXcc".match(/.*X/m)[0]', '"X"');
+shouldBe('"aa\\nXcc".match(/.*X/s)[0]', '"aa\\nX"');
+shouldBe('"aa\\nXcc".match(/.*X/sm)[0]', '"aa\\nX"');
+shouldBe('"aaX\\ncc".match(/X.*/)[0]', '"X"');
+shouldBe('"aaX\\ncc".match(/X.*/m)[0]', '"X"');
+shouldBe('"aaX\\ncc".match(/X.*/s)[0]', '"X\\ncc"');
+shouldBe('"aaX\\ncc".match(/X.*/sm)[0]', '"X\\ncc"');
+shouldBe('"aa\\nX\\ncc".match(/.*X.*/)[0]', '"X"');
+shouldBe('"aa\\nX\\ncc".match(/.*X.*/m)[0]', '"X"');
+shouldBe('"aa\\nX\\ncc".match(/.*X.*/s)[0]', '"aa\\nX\\ncc"');
+shouldBe('"aa\\nX\\ncc".match(/.*X.*/sm)[0]', '"aa\\nX\\ncc"');
+shouldBeNull('"aa\\nXcc".match(/.*^X/)');
+shouldBe('"aa\\nXcc".match(/.*^X/m)[0]', '"X"');
+shouldBeNull('"aa\\nXcc".match(/.*^X/s)', '"aa\\nX"');
+shouldBe('"aa\\nXcc".match(/.*^X/sm)[0]', '"aa\\nX"');
+shouldBeNull('"aaX\\ncc".match(/X$.*/)');
+shouldBe('"aaX\\ncc".match(/X$.*/m)[0]', '"X"');
+shouldBeNull('"aaX\\ncc".match(/X$.*/s)');
+shouldBe('"aaX\\ncc".match(/X$.*/sm)[0]', '"X\\ncc"');
+shouldBeNull('"aa\\nX\\ncc".match(/.*^X$.*/)');
+shouldBe('"aa\\nX\\ncc".match(/.*^X$.*/m)[0]', '"X"');
+shouldBeNull('"aa\\nX\\ncc".match(/.*^X$.*/s)');
+shouldBe('"aa\\nX\\ncc".match(/.*^X$.*/sm)[0]', '"aa\\nX\\ncc"');
+shouldBeNull('"aa\\nXcc".match(/^.*X/)');
+shouldBe('"aa\\nXcc".match(/^.*X/m)[0]', '"X"');
+shouldBe('"aa\\nXcc".match(/^.*X/s)[0]', '"aa\\nX"');
+shouldBe('"aa\\nXcc".match(/^.*X/sm)[0]', '"aa\\nX"');
+shouldBeNull('"aaX\\ncc".match(/X.*$/)');
+shouldBe('"aaX\\ncc".match(/X.*$/m)[0]', '"X"');
+shouldBe('"aaX\\ncc".match(/X.*$/s)[0]', '"X\\ncc"');
+shouldBe('"aaX\\ncc".match(/X.*$/sm)[0]', '"X\\ncc"');
+shouldBeNull('"aa\\nX\\ncc".match(/^.*X.*$/)');
+shouldBe('"aa\\nX\\ncc".match(/^.*X.*$/m)[0]', '"X"');
+shouldBe('"aa\\nX\\ncc".match(/^.*X.*$/s)[0]', '"aa\\nX\\ncc"');
+shouldBe('"aa\\nX\\ncc".match(/^.*X.*$/sm)[0]', '"aa\\nX\\ncc"');
+shouldBeNull('"a\\na\\nX\\nc\\nc\\n".match(/^.*X.*$/)');
+shouldBe('"a\\na\\nX\\nc\\nc\\n".match(/^.*X.*$/m)[0]', '"X"');
+shouldBe('"a\\na\\nX\\nc\\nc\\n".match(/^.*X.*$/s)[0]', '"a\\na\\nX\\nc\\nc\\n"');
+shouldBe('"a\\na\\nX\\nc\\nc\\n".match(/^.*X.*$/sm)[0]', '"a\\na\\nX\\nc\\nc\\n"');
+shouldBeNull('"a\\na\\nX\\nc\\nc\\n".match(/^.*X.*$/)');
+shouldBe('"a\\na\\nX\\nc\\nc\\n".match(/^.*X.*$/m)[0]', '"X"');
+shouldBe('"a\\na\\nX\\nc\\nc\\n".match(/^.*X.*$/s)[0]', '"a\\na\\nX\\nc\\nc\\n"');
+shouldBe('"a\\na\\nX\\nc\\nc\\n".match(/^.*X.*$/sm)[0]', '"a\\na\\nX\\nc\\nc\\n"');
+shouldBe('"\\n\\n\\nX".match(/.{1}X/sm)[0]', '"\\nX"');
+shouldBe('"\\n\\n\\nX".match(/.{1,2}X/sm)[0]', '"\\n\\nX"');
+shouldBe('"\\n\\n\\nX".match(/.{1,3}X/sm)[0]', '"\\n\\n\\nX"');
+shouldBe('"\\n\\n\\nX".match(/.{1,4}X/sm)[0]', '"\\n\\n\\nX"');
+shouldBe('"\\n\\n\\nX".match(/.{1,2}?X/sm)[0]', '"\\n\\nX"');
+shouldBe('"\\n\\n\\nX".match(/.{1,3}?X/sm)[0]', '"\\n\\n\\nX"');
+shouldBe('"\\n\\n\\nX".match(/.{1,4}?X/sm)[0]', '"\\n\\n\\nX"');
+shouldBe('"X\\n\\n\\nY".match(/X.{1}/sm)[0]', '"X\\n"');
+shouldBe('"X\\n\\n\\nY".match(/X.{1,2}/sm)[0]', '"X\\n\\n"');
+shouldBe('"X\\n\\n\\nY".match(/X.{1,3}/sm)[0]', '"X\\n\\n\\n"');
+shouldBe('"X\\n\\n\\nY".match(/X.{1,4}/sm)[0]', '"X\\n\\n\\nY"');
+shouldBe('"X\\n\\n\\nY".match(/X.{1,2}?/sm)[0]', '"X\\n"');
+shouldBe('"X\\n\\n\\nY".match(/X.{1,3}?/sm)[0]', '"X\\n"');
+shouldBe('"X\\n\\n\\nY".match(/X.{1,4}?/sm)[0]', '"X\\n"');
+shouldBe('"The\\nquick\\nbrown\\nfox\\njumped.".match(/.*brown.*/)[0]', '"brown"');
+shouldBe('"The\\nquick\\nbrown\\nfox\\njumped.".match(/.*brown.*/s)[0]', '"The\\nquick\\nbrown\\nfox\\njumped."');
+shouldBeNull('"The\\nquick\\nbrown\\nfox\\njumped.".match(/The.quick.brown.fox.jumped./)');
+shouldBe('"The\\nquick\\nbrown\\nfox\\njumped.".match(/The.quick.brown.fox.jumped./s)[0]', '"The\\nquick\\nbrown\\nfox\\njumped."');
+
+// Check that the dotAll flag getter works as expected
+shouldBeFalse('/a/.dotAll');
+shouldBeTrue('/a/s.dotAll');
index 91c48d8..5d2fb82 100644 (file)
@@ -1,3 +1,65 @@
+2017-08-24  Michael Saboff  <msaboff@apple.com>
+
+        Add support for RegExp "dotAll" flag
+        https://bugs.webkit.org/show_bug.cgi?id=175924
+
+        Reviewed by Keith Miller.
+
+        The dotAll RegExp flag, 's', changes . to match any character including line terminators.
+        Added a the "dotAll" identifier as well as RegExp.prototype.dotAll getter.
+        Added a new any character CharacterClass that is used to match . terms in a dotAll flags
+        RegExp.  In the YARR pattern and parsing code, changed the NewlineClassID, which was only
+        used for '.' processing, to DotClassID.  The selection of which builtin character class
+        that DotClassID resolves to when generating the pattern is conditional on the dotAll flag.
+        This NewlineClassID to DotClassID refactoring includes the atomBuiltInCharacterClass() in
+        the WebCore content extensions code in the PatternParser class.
+
+        As an optimization, the Yarr JIT actually doesn't perform match checks against the builtin
+        any character CharacterClass, it merely reads the character.  There is another optimization
+        in our DotStart enclosure processing where a non-capturing regular expression in the form
+        of .*<expression.*, with options beginning ^ and/or trailing $, match the contained
+        expression and then look for the extents of the surrounding .*'s.  When used with the
+        dotAll flag, that processing alwys results with the beinning of the string and the end
+        of the string.  Therefore we short circuit the finding the beginning and end of the line
+        or string with dotAll patterns.
+
+        * bytecode/BytecodeDumper.cpp:
+        (JSC::regexpToSourceString):
+        * runtime/CommonIdentifiers.h:
+        * runtime/RegExp.cpp:
+        (JSC::regExpFlags):
+        (JSC::RegExpFunctionalTestCollector::outputOneTest):
+        * runtime/RegExp.h:
+        * runtime/RegExpKey.h:
+        * runtime/RegExpPrototype.cpp:
+        (JSC::RegExpPrototype::finishCreation):
+        (JSC::flagsString):
+        (JSC::regExpProtoGetterDotAll):
+        * yarr/YarrInterpreter.cpp:
+        (JSC::Yarr::Interpreter::matchDotStarEnclosure):
+        * yarr/YarrInterpreter.h:
+        (JSC::Yarr::BytecodePattern::dotAll const):
+        * yarr/YarrJIT.cpp:
+        (JSC::Yarr::YarrGenerator::optimizeAlternative):
+        (JSC::Yarr::YarrGenerator::generateCharacterClassOnce):
+        (JSC::Yarr::YarrGenerator::generateCharacterClassFixed):
+        (JSC::Yarr::YarrGenerator::generateCharacterClassGreedy):
+        (JSC::Yarr::YarrGenerator::backtrackCharacterClassNonGreedy):
+        (JSC::Yarr::YarrGenerator::generateDotStarEnclosure):
+        * yarr/YarrParser.h:
+        (JSC::Yarr::Parser::parseTokens):
+        * yarr/YarrPattern.cpp:
+        (JSC::Yarr::YarrPatternConstructor::atomBuiltInCharacterClass):
+        (JSC::Yarr::YarrPatternConstructor::atomCharacterClassBuiltIn):
+        (JSC::Yarr::YarrPatternConstructor::optimizeDotStarWrappedExpressions):
+        (JSC::Yarr::YarrPattern::YarrPattern):
+        (JSC::Yarr::PatternTerm::dump):
+        (JSC::Yarr::anycharCreate):
+        * yarr/YarrPattern.h:
+        (JSC::Yarr::YarrPattern::reset):
+        (JSC::Yarr::YarrPattern::anyCharacterClass):
+        (JSC::Yarr::YarrPattern::dotAll const):
+
 2017-08-23  Filip Pizlo  <fpizlo@apple.com>
 
         Reduce Gigacage sizes
index 0499bba..5ab7b81 100644 (file)
@@ -252,7 +252,7 @@ const Identifier& BytecodeDumper<Block>::identifier(int index) const
 
 static CString regexpToSourceString(RegExp* regExp)
 {
-    char postfix[5] = { '/', 0, 0, 0, 0 };
+    char postfix[7] = { '/', 0, 0, 0, 0, 0, 0 };
     int index = 1;
     if (regExp->global())
         postfix[index++] = 'g';
@@ -260,10 +260,12 @@ static CString regexpToSourceString(RegExp* regExp)
         postfix[index++] = 'i';
     if (regExp->multiline())
         postfix[index] = 'm';
-    if (regExp->sticky())
-        postfix[index++] = 'y';
+    if (regExp->dotAll())
+        postfix[index++] = 's';
     if (regExp->unicode())
         postfix[index++] = 'u';
+    if (regExp->sticky())
+        postfix[index++] = 'y';
 
     return toCString("/", regExp->pattern().impl(), postfix);
 }
index e4a3e79..6768294 100644 (file)
     macro(displayName) \
     macro(document) \
     macro(done) \
+    macro(dotAll) \
     macro(enumerable) \
     macro(era) \
     macro(eval) \
index 436b36e..55e78bd 100644 (file)
@@ -59,6 +59,12 @@ RegExpFlags regExpFlags(const String& string)
             flags = static_cast<RegExpFlags>(flags | FlagMultiline);
             break;
 
+        case 's':
+            if (flags & FlagDotAll)
+                return InvalidFlags;
+            flags = static_cast<RegExpFlags>(flags | FlagDotAll);
+            break;
+            
         case 'u':
             if (flags & FlagUnicode)
                 return InvalidFlags;
@@ -104,10 +110,12 @@ void RegExpFunctionalTestCollector::outputOneTest(RegExp* regExp, const String&
             fputc('i', m_file);
         if (regExp->multiline())
             fputc('m', m_file);
-        if (regExp->sticky())
-            fputc('y', m_file);
+        if (regExp->dotAll())
+            fputc('s', m_file);
         if (regExp->unicode())
             fputc('u', m_file);
+        if (regExp->sticky())
+            fputc('y', m_file);
         fprintf(m_file, "\n");
     }
 
index 868a193..5944e29 100644 (file)
@@ -56,6 +56,7 @@ public:
     bool sticky() const { return m_flags & FlagSticky; }
     bool globalOrSticky() const { return global() || sticky(); }
     bool unicode() const { return m_flags & FlagUnicode; }
+    bool dotAll() const { return m_flags & FlagDotAll; }
 
     const String& pattern() const { return m_patternString; }
 
index 57a3f35..79b652f 100644 (file)
@@ -39,7 +39,8 @@ enum RegExpFlags {
     FlagMultiline = 4,
     FlagSticky = 8,
     FlagUnicode = 16,
-    InvalidFlags = 32,
+    FlagDotAll = 32,
+    InvalidFlags = 64,
     DeletedValueFlags = -1
 };
 
index 10182aa..146c255 100644 (file)
@@ -50,6 +50,7 @@ static EncodedJSValue JSC_HOST_CALL regExpProtoFuncToString(ExecState*);
 static EncodedJSValue JSC_HOST_CALL regExpProtoGetterGlobal(ExecState*);
 static EncodedJSValue JSC_HOST_CALL regExpProtoGetterIgnoreCase(ExecState*);
 static EncodedJSValue JSC_HOST_CALL regExpProtoGetterMultiline(ExecState*);
+static EncodedJSValue JSC_HOST_CALL regExpProtoGetterDotAll(ExecState*);
 static EncodedJSValue JSC_HOST_CALL regExpProtoGetterSticky(ExecState*);
 static EncodedJSValue JSC_HOST_CALL regExpProtoGetterUnicode(ExecState*);
 static EncodedJSValue JSC_HOST_CALL regExpProtoGetterSource(ExecState*);
@@ -70,6 +71,7 @@ void RegExpPrototype::finishCreation(VM& vm, JSGlobalObject* globalObject)
     JSC_NATIVE_INTRINSIC_FUNCTION_WITHOUT_TRANSITION(vm.propertyNames->exec, regExpProtoFuncExec, DontEnum, 1, RegExpExecIntrinsic);
     JSC_NATIVE_FUNCTION_WITHOUT_TRANSITION(vm.propertyNames->toString, regExpProtoFuncToString, DontEnum, 0);
     JSC_NATIVE_GETTER(vm.propertyNames->global, regExpProtoGetterGlobal, DontEnum | Accessor);
+    JSC_NATIVE_GETTER(vm.propertyNames->dotAll, regExpProtoGetterDotAll, DontEnum | Accessor);
     JSC_NATIVE_GETTER(vm.propertyNames->ignoreCase, regExpProtoGetterIgnoreCase, DontEnum | Accessor);
     JSC_NATIVE_GETTER(vm.propertyNames->multiline, regExpProtoGetterMultiline, DontEnum | Accessor);
     JSC_NATIVE_GETTER(vm.propertyNames->sticky, regExpProtoGetterSticky, DontEnum | Accessor);
@@ -210,6 +212,8 @@ static inline FlagsString flagsString(ExecState* exec, JSObject* regexp)
     RETURN_IF_EXCEPTION(scope, string);
     JSValue multilineValue = regexp->get(exec, vm.propertyNames->multiline);
     RETURN_IF_EXCEPTION(scope, string);
+    JSValue dotAllValue = regexp->get(exec, vm.propertyNames->dotAll);
+    RETURN_IF_EXCEPTION(scope, string);
     JSValue unicodeValue = regexp->get(exec, vm.propertyNames->unicode);
     RETURN_IF_EXCEPTION(scope, string);
     JSValue stickyValue = regexp->get(exec, vm.propertyNames->sticky);
@@ -222,6 +226,8 @@ static inline FlagsString flagsString(ExecState* exec, JSObject* regexp)
         string[index++] = 'i';
     if (multilineValue.toBoolean(exec))
         string[index++] = 'm';
+    if (dotAllValue.toBoolean(exec))
+        string[index++] = 's';
     if (unicodeValue.toBoolean(exec))
         string[index++] = 'u';
     if (stickyValue.toBoolean(exec))
@@ -306,6 +312,21 @@ EncodedJSValue JSC_HOST_CALL regExpProtoGetterMultiline(ExecState* exec)
     return JSValue::encode(jsBoolean(asRegExpObject(thisValue)->regExp()->multiline()));
 }
 
+EncodedJSValue JSC_HOST_CALL regExpProtoGetterDotAll(ExecState* exec)
+{
+    VM& vm = exec->vm();
+    auto scope = DECLARE_THROW_SCOPE(vm);
+    
+    JSValue thisValue = exec->thisValue();
+    if (UNLIKELY(!thisValue.inherits(vm, RegExpObject::info()))) {
+        if (thisValue.inherits(vm, RegExpPrototype::info()))
+            return JSValue::encode(jsUndefined());
+        return throwVMTypeError(exec, scope, ASCIILiteral("The RegExp.prototype.dotAll getter can only be called on a RegExp object"));
+    }
+    
+    return JSValue::encode(jsBoolean(asRegExpObject(thisValue)->regExp()->dotAll()));
+}
+    
 EncodedJSValue JSC_HOST_CALL regExpProtoGetterSticky(ExecState* exec)
 {
     VM& vm = exec->vm();
index 2eeba8a..edafef2 100644 (file)
@@ -1120,6 +1120,13 @@ public:
     bool matchDotStarEnclosure(ByteTerm& term, DisjunctionContext* context)
     {
         UNUSED_PARAM(term);
+
+        if (pattern->dotAll()) {
+            context->matchBegin = startOffset;
+            context->matchEnd = input.end();
+            return true;
+        }
+
         unsigned matchBegin = context->matchBegin;
 
         if (matchBegin > startOffset) {
index 43dcb1f..a319cb3 100644 (file)
@@ -371,6 +371,7 @@ public:
     bool multiline() const { return m_flags & FlagMultiline; }
     bool sticky() const { return m_flags & FlagSticky; }
     bool unicode() const { return m_flags & FlagUnicode; }
+    bool dotAll() const { return m_flags & FlagDotAll; }
 
     std::unique_ptr<ByteDisjunction> m_body;
     RegExpFlags m_flags;
index 685ae5f..542cd72 100644 (file)
@@ -1169,15 +1169,18 @@ class YarrGenerator : private MacroAssembler {
 
         JumpList matchDest;
         readCharacter(m_checkedOffset - term->inputPosition, character);
-        matchCharacterClass(character, matchDest, term->characterClass);
+        // If we are matching the "any character" builtin class we only need to read the
+        // character and don't need to match as it will always succeed.
+        if (term->invert() || term->characterClass != m_pattern.anyCharacterClass()) {
+            matchCharacterClass(character, matchDest, term->characterClass);
 
-        if (term->invert())
-            op.m_jumps.append(matchDest);
-        else {
-            op.m_jumps.append(jump());
-            matchDest.link(this);
+            if (term->invert())
+                op.m_jumps.append(matchDest);
+            else {
+                op.m_jumps.append(jump());
+                matchDest.link(this);
+            }
         }
-
 #ifdef JIT_UNICODE_EXPRESSIONS
         if (m_decodeSurrogatePairs) {
             Jump isBMPChar = branch32(LessThan, character, supplementaryPlanesBase);
@@ -1215,13 +1218,17 @@ class YarrGenerator : private MacroAssembler {
         Label loop(this);
         JumpList matchDest;
         readCharacter(m_checkedOffset - term->inputPosition - term->quantityMaxCount, character, countRegister);
-        matchCharacterClass(character, matchDest, term->characterClass);
+        // If we are matching the "any character" builtin class we only need to read the
+        // character and don't need to match as it will always succeed.
+        if (term->invert() || term->characterClass != m_pattern.anyCharacterClass()) {
+            matchCharacterClass(character, matchDest, term->characterClass);
 
-        if (term->invert())
-            op.m_jumps.append(matchDest);
-        else {
-            op.m_jumps.append(jump());
-            matchDest.link(this);
+            if (term->invert())
+                op.m_jumps.append(matchDest);
+            else {
+                op.m_jumps.append(jump());
+                matchDest.link(this);
+            }
         }
 
         add32(TrustedImm32(1), countRegister);
@@ -1263,8 +1270,12 @@ class YarrGenerator : private MacroAssembler {
         } else {
             JumpList matchDest;
             readCharacter(m_checkedOffset - term->inputPosition, character);
-            matchCharacterClass(character, matchDest, term->characterClass);
-            failures.append(jump());
+            // If we are matching the "any character" builtin class we only need to read the
+            // character and don't need to match as it will always succeed.
+            if (term->characterClass != m_pattern.anyCharacterClass()) {
+                matchCharacterClass(character, matchDest, term->characterClass);
+                failures.append(jump());
+            }
             matchDest.link(this);
         }
 
@@ -1365,13 +1376,17 @@ class YarrGenerator : private MacroAssembler {
 
         JumpList matchDest;
         readCharacter(m_checkedOffset - term->inputPosition, character);
-        matchCharacterClass(character, matchDest, term->characterClass);
+        // If we are matching the "any character" builtin class we only need to read the
+        // character and don't need to match as it will always succeed.
+        if (term->invert() || term->characterClass != m_pattern.anyCharacterClass()) {
+            matchCharacterClass(character, matchDest, term->characterClass);
 
-        if (term->invert())
-            nonGreedyFailures.append(matchDest);
-        else {
-            nonGreedyFailures.append(jump());
-            matchDest.link(this);
+            if (term->invert())
+                nonGreedyFailures.append(matchDest);
+            else {
+                nonGreedyFailures.append(jump());
+                matchDest.link(this);
+            }
         }
 
         add32(TrustedImm32(1), index);
@@ -1407,6 +1422,13 @@ class YarrGenerator : private MacroAssembler {
         JumpList saveStartIndex;
         JumpList foundEndingNewLine;
 
+        if (m_pattern.dotAll()) {
+            move(TrustedImm32(0), matchPos);
+            setMatchStart(matchPos);
+            move(length, index);
+            return;
+        }
+
         ASSERT(!m_pattern.m_body->m_hasFixedSize);
         getMatchStart(matchPos);
 
index 4bd7e0f..46c27b7 100644 (file)
@@ -36,7 +36,7 @@ enum BuiltInCharacterClassID {
     DigitClassID,
     SpaceClassID,
     WordClassID,
-    NewlineClassID,
+    DotClassID,
 };
 
 // The Parser class should not be used directly - only via the Yarr::parse() method.
@@ -694,7 +694,7 @@ private:
 
             case '.':
                 consume();
-                m_delegate.atomBuiltInCharacterClass(NewlineClassID, true);
+                m_delegate.atomBuiltInCharacterClass(DotClassID, false);
                 lastTokenWasAnAtom = true;
                 break;
 
index ac9d7bf..86c32fb 100644 (file)
@@ -373,8 +373,12 @@ public:
             else
                 m_alternative->m_terms.append(PatternTerm(m_pattern.wordcharCharacterClass(), invert));
             break;
-        case NewlineClassID:
-            m_alternative->m_terms.append(PatternTerm(m_pattern.newlineCharacterClass(), invert));
+        case DotClassID:
+            ASSERT(!invert);
+            if (m_pattern.dotAll())
+                m_alternative->m_terms.append(PatternTerm(m_pattern.anyCharacterClass(), false));
+            else
+                m_alternative->m_terms.append(PatternTerm(m_pattern.newlineCharacterClass(), true));
             break;
         }
     }
@@ -396,7 +400,7 @@ public:
 
     void atomCharacterClassBuiltIn(BuiltInCharacterClassID classID, bool invert)
     {
-        ASSERT(classID != NewlineClassID);
+        ASSERT(classID != DotClassID);
 
         switch (classID) {
         case DigitClassID:
@@ -849,6 +853,7 @@ public:
         if (alternatives.size() != 1)
             return;
 
+        CharacterClass* dotCharacterClass = m_pattern.dotAll() ? m_pattern.anyCharacterClass() : m_pattern.newlineCharacterClass();
         PatternAlternative* alternative = alternatives[0].get();
         Vector<PatternTerm>& terms = alternative->m_terms;
         if (terms.size() >= 3) {
@@ -863,7 +868,10 @@ public:
             }
             
             PatternTerm& firstNonAnchorTerm = terms[termIndex];
-            if ((firstNonAnchorTerm.type != PatternTerm::TypeCharacterClass) || (firstNonAnchorTerm.characterClass != m_pattern.newlineCharacterClass()) || !((firstNonAnchorTerm.quantityType == QuantifierGreedy) || (firstNonAnchorTerm.quantityType == QuantifierNonGreedy)))
+            if ((firstNonAnchorTerm.type != PatternTerm::TypeCharacterClass)
+                || (firstNonAnchorTerm.characterClass != dotCharacterClass)
+                || !((firstNonAnchorTerm.quantityType == QuantifierGreedy)
+                    || (firstNonAnchorTerm.quantityType == QuantifierNonGreedy)))
                 return;
             
             firstExpressionTerm = termIndex + 1;
@@ -875,7 +883,9 @@ public:
             }
             
             PatternTerm& lastNonAnchorTerm = terms[termIndex];
-            if ((lastNonAnchorTerm.type != PatternTerm::TypeCharacterClass) || (lastNonAnchorTerm.characterClass != m_pattern.newlineCharacterClass()) || (lastNonAnchorTerm.quantityType != QuantifierGreedy))
+            if ((lastNonAnchorTerm.type != PatternTerm::TypeCharacterClass)
+                || (lastNonAnchorTerm.characterClass != dotCharacterClass)
+                || (lastNonAnchorTerm.quantityType != QuantifierGreedy))
                 return;
 
             size_t endIndex = termIndex;
@@ -994,6 +1004,7 @@ YarrPattern::YarrPattern(const String& pattern, RegExpFlags flags, const char**
     , m_flags(flags)
     , m_numSubpatterns(0)
     , m_maxBackReference(0)
+    , anycharCached(0)
     , newlineCached(0)
     , digitsCached(0)
     , spacesCached(0)
@@ -1089,7 +1100,9 @@ void PatternTerm::dump(PrintStream& out, YarrPattern* thisPattern, unsigned nest
         break;
     case TypeCharacterClass:
         out.print("character class ");
-        if (characterClass == thisPattern->newlineCharacterClass())
+        if (characterClass == thisPattern->anyCharacterClass())
+            out.print("<any character>");
+        else if (characterClass == thisPattern->newlineCharacterClass())
             out.print("<newline>");
         else if (characterClass == thisPattern->digitsCharacterClass())
             out.print("<digits>");
@@ -1284,4 +1297,13 @@ void YarrPattern::dumpPattern(PrintStream& out, const String& patternString)
     m_body->dump(out, this);
 }
 
+std::unique_ptr<CharacterClass> anycharCreate()
+{
+    auto characterClass = std::make_unique<CharacterClass>();
+    characterClass->m_ranges.append(CharacterRange(0x00, 0x7f));
+    characterClass->m_rangesUnicode.append(CharacterRange(0x0080, 0x10ffff));
+    characterClass->m_hasNonBMPCharacters = true;
+    return characterClass;
+}
+
 } }
index 2cebbce..e7b1584 100644 (file)
@@ -305,6 +305,8 @@ public:
 // (please to be calling newlineCharacterClass() et al on your
 // friendly neighborhood YarrPattern instance to get nicely
 // cached copies).
+
+std::unique_ptr<CharacterClass> anycharCreate();
 std::unique_ptr<CharacterClass> newlineCreate();
 std::unique_ptr<CharacterClass> digitsCreate();
 std::unique_ptr<CharacterClass> spacesCreate();
@@ -363,6 +365,7 @@ struct YarrPattern {
         m_hasCopiedParenSubexpressions = false;
         m_saveInitialStartValue = false;
 
+        anycharCached = 0;
         newlineCached = 0;
         digitsCached = 0;
         spacesCached = 0;
@@ -387,6 +390,14 @@ struct YarrPattern {
         return m_containsUnsignedLengthPattern;
     }
 
+    CharacterClass* anyCharacterClass()
+    {
+        if (!anycharCached) {
+            m_userCharacterClasses.append(anycharCreate());
+            anycharCached = m_userCharacterClasses.last().get();
+        }
+        return anycharCached;
+    }
     CharacterClass* newlineCharacterClass()
     {
         if (!newlineCached) {
@@ -468,6 +479,7 @@ struct YarrPattern {
     bool multiline() const { return m_flags & FlagMultiline; }
     bool sticky() const { return m_flags & FlagSticky; }
     bool unicode() const { return m_flags & FlagUnicode; }
+    bool dotAll() const { return m_flags & FlagDotAll; }
 
     bool m_containsBackreferences : 1;
     bool m_containsBOL : 1;
@@ -485,6 +497,7 @@ struct YarrPattern {
 private:
     const char* compile(const String& patternString, void* stackLimit);
 
+    CharacterClass* anycharCached;
     CharacterClass* newlineCached;
     CharacterClass* digitsCached;
     CharacterClass* spacesCached;
index 768e462..216d0d9 100644 (file)
@@ -1,3 +1,17 @@
+2017-08-24  Michael Saboff  <msaboff@apple.com>
+
+        Add support for RegExp "dotAll" flag
+        https://bugs.webkit.org/show_bug.cgi?id=175924
+
+        Reviewed by Keith Miller.
+
+        Changed due to refactoring NewlineClassID to DotClassID.
+
+        No new tests. No change in behavior.
+
+        * contentextensions/URLFilterParser.cpp:
+        (WebCore::ContentExtensions::PatternParser::atomBuiltInCharacterClass):
+
 2017-08-24  Ryan Haddad  <ryanhaddad@apple.com>
 
         Unreviewed, revert part of r221152 to fix internal builds.
index 3588c65..120fa53 100644 (file)
@@ -101,7 +101,7 @@ public:
         sinkFloatingTermIfNecessary();
         ASSERT(!m_floatingTerm.isValid());
 
-        if (builtInCharacterClassID == JSC::Yarr::NewlineClassID && inverted)
+        if (builtInCharacterClassID == JSC::Yarr::DotClassID && !inverted)
             m_floatingTerm = Term(Term::UniversalTransition);
         else
             fail(URLFilterParser::UnsupportedCharacterClass);