Invalid numeric and named references should be early syntax errors
authorshvaikalesh@gmail.com <shvaikalesh@gmail.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 26 Mar 2020 01:24:05 +0000 (01:24 +0000)
committershvaikalesh@gmail.com <shvaikalesh@gmail.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 26 Mar 2020 01:24:05 +0000 (01:24 +0000)
https://bugs.webkit.org/show_bug.cgi?id=178175

Reviewed by Ross Kirsling.

JSTests:

* test262/expectations.yaml: Mark 44 test cases as passing.

Source/JavaScriptCore:

This patch:

1. Fixes named reference parsing in parseEscape(), making /\k/u throw SyntaxError per spec [1].

2. Reworks containsIllegalNamedForwardReferences(), making dangling (e.g. /\k<a>(?<b>.)/) and
   incomplete (e.g. /\k<(?<a>.)/) named references throw SyntaxError if the non-Unicode pattern
   contains a named group [2].

3. Moves reparsing logic from YarrPattern to YarrParser, ensuring syntax errors due to illegal
   references (named & numeric) are thrown at parse time; drops isValidNamedForwardReference()
   from Delegate, refactors saveUnmatchedNamedForwardReferences(), and overall improves cohesion
   of illegal references logic.

[1]: https://tc39.es/ecma262/#prod-IdentityEscape
[2]: https://tc39.es/ecma262/#sec-regexpinitialize (step 7.b)

* yarr/YarrErrorCode.cpp:
(JSC::Yarr::errorMessage):
(JSC::Yarr::errorToThrow):
* yarr/YarrErrorCode.h:
* yarr/YarrParser.h:
(JSC::Yarr::Parser::CharacterClassParserDelegate::atomNamedBackReference):
(JSC::Yarr::Parser::Parser):
(JSC::Yarr::Parser::parseEscape):
(JSC::Yarr::Parser::parseParenthesesBegin):
(JSC::Yarr::Parser::parse):
(JSC::Yarr::Parser::handleIllegalReferences):
(JSC::Yarr::Parser::containsIllegalNamedForwardReference):
(JSC::Yarr::Parser::resetForReparsing):
(JSC::Yarr::parse):
(JSC::Yarr::Parser::CharacterClassParserDelegate::isValidNamedForwardReference): Deleted.
* yarr/YarrPattern.cpp:
(JSC::Yarr::YarrPatternConstructor::atomBackReference):
(JSC::Yarr::YarrPatternConstructor::atomNamedForwardReference):
(JSC::Yarr::YarrPattern::compile):
(JSC::Yarr::YarrPatternConstructor::saveUnmatchedNamedForwardReferences): Deleted.
(JSC::Yarr::YarrPatternConstructor::isValidNamedForwardReference): Deleted.
* yarr/YarrPattern.h:
(JSC::Yarr::YarrPattern::resetForReparsing):
(JSC::Yarr::YarrPattern::containsIllegalBackReference): Deleted.
(JSC::Yarr::YarrPattern::containsIllegalNamedForwardReferences): Deleted.
* yarr/YarrSyntaxChecker.cpp:
(JSC::Yarr::SyntaxChecker::atomNamedBackReference):
(JSC::Yarr::SyntaxChecker::resetForReparsing):
(JSC::Yarr::SyntaxChecker::isValidNamedForwardReference): Deleted.

Source/WebCore:

Accounts for changes of YarrParser's Delegate interface, no behavioral changes.
resetForReparsing() is never called because we disable numeric backrefences
and named forward references (see arguments of Yarr::parse() call).

Test: TestWebKitAPI.ContentExtensionTest.ParsingFailures

* contentextensions/URLFilterParser.cpp:
(WebCore::ContentExtensions::PatternParser::resetForReparsing):
(WebCore::ContentExtensions::URLFilterParser::addPattern):
(WebCore::ContentExtensions::PatternParser::isValidNamedForwardReference): Deleted.

Tools:

Removes FIXME as YarrParser is correct not to throw errors as it is
parsing in non-Unicode mode. Also adds a few named groups tests.

* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:

LayoutTests:

* js/regexp-named-capture-groups-expected.txt:
* js/script-tests/regexp-named-capture-groups.js:

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@259026 268f45cc-cd09-0410-ab3c-d52691b4dbfc

16 files changed:
JSTests/ChangeLog
JSTests/test262/expectations.yaml
LayoutTests/ChangeLog
LayoutTests/js/regexp-named-capture-groups-expected.txt
LayoutTests/js/script-tests/regexp-named-capture-groups.js
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/yarr/YarrErrorCode.cpp
Source/JavaScriptCore/yarr/YarrErrorCode.h
Source/JavaScriptCore/yarr/YarrParser.h
Source/JavaScriptCore/yarr/YarrPattern.cpp
Source/JavaScriptCore/yarr/YarrPattern.h
Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp
Source/WebCore/ChangeLog
Source/WebCore/contentextensions/URLFilterParser.cpp
Tools/ChangeLog
Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp

index 06bd3fe..af30086 100644 (file)
@@ -1,5 +1,14 @@
 2020-03-25  Alexey Shvayka  <shvaikalesh@gmail.com>
 
+        Invalid numeric and named references should be early syntax errors
+        https://bugs.webkit.org/show_bug.cgi?id=178175
+
+        Reviewed by Ross Kirsling.
+
+        * test262/expectations.yaml: Mark 44 test cases as passing.
+
+2020-03-25  Alexey Shvayka  <shvaikalesh@gmail.com>
+
         \b escapes inside character classes should be valid in Unicode patterns
         https://bugs.webkit.org/show_bug.cgi?id=209528
 
index 4e49e6b..5471046 100644 (file)
@@ -1711,9 +1711,6 @@ test/built-ins/RegExp/prototype/unicode/cross-realm.js:
 test/built-ins/RegExp/quantifier-integer-limit.js:
   default: 'SyntaxError: Invalid regular expression: number too large in {} quantifier'
   strict mode: 'SyntaxError: Invalid regular expression: number too large in {} quantifier'
-test/built-ins/RegExp/unicode_restricted_identity_escape_alpha.js:
-  default: "Test262Error: IdentityEscape in AtomEscape: 'k' Expected a SyntaxError to be thrown but no exception was thrown at all"
-  strict mode: "Test262Error: IdentityEscape in AtomEscape: 'k' Expected a SyntaxError to be thrown but no exception was thrown at all"
 test/built-ins/RegExp/unicode_restricted_octal_escape.js:
   default: 'Test262Error: RegExp("[\1]", "u"):  Expected a SyntaxError to be thrown but no exception was thrown at all'
   strict mode: 'Test262Error: RegExp("[\1]", "u"):  Expected a SyntaxError to be thrown but no exception was thrown at all'
@@ -3264,72 +3261,9 @@ test/language/identifiers/start-unicode-13.0.0-escaped.js:
 test/language/identifiers/start-unicode-13.0.0.js:
   default: "SyntaxError: Invalid character '\\u08be'"
   strict mode: "SyntaxError: Invalid character '\\u08be'"
-test/language/literals/regexp/named-groups/invalid-dangling-groupname-2-u.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-dangling-groupname-2.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-dangling-groupname-3-u.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-dangling-groupname-3.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-dangling-groupname-4-u.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-dangling-groupname-4.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-dangling-groupname-5.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-dangling-groupname-u.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-dangling-groupname-without-group-u.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-dangling-groupname.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-incomplete-groupname-2.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-incomplete-groupname-3.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-incomplete-groupname-4.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-incomplete-groupname-5.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-incomplete-groupname-6.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-incomplete-groupname-u.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-incomplete-groupname-without-group-3-u.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/named-groups/invalid-incomplete-groupname.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
 test/language/literals/regexp/u-astral-char-class-invert.js:
   default: 'Test262Error: Expected SameValue(«�», «null») to be true'
   strict mode: 'Test262Error: Expected SameValue(«�», «null») to be true'
-test/language/literals/regexp/u-dec-esc.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/u-invalid-legacy-octal-escape.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
-test/language/literals/regexp/u-invalid-oob-decimal-escape.js:
-  default: 'Test262: This statement should not be evaluated.'
-  strict mode: 'Test262: This statement should not be evaluated.'
 test/language/module-code/eval-rqstd-once.js:
   module: "SyntaxError: Unexpected identifier 'as'. Expected 'from' before exported module name."
 test/language/module-code/eval-rqstd-order.js:
index 71efc60..ca3ec02 100644 (file)
@@ -1,3 +1,13 @@
+2020-03-25  Alexey Shvayka  <shvaikalesh@gmail.com>
+
+        Invalid numeric and named references should be early syntax errors
+        https://bugs.webkit.org/show_bug.cgi?id=178175
+
+        Reviewed by Ross Kirsling.
+
+        * js/regexp-named-capture-groups-expected.txt:
+        * js/script-tests/regexp-named-capture-groups.js:
+
 2020-03-25  Pinki Gyanchandani  <pgyanchandani@apple.com>
 
         CanvasRenderingContext2D.putImageData() should not process neutered ImageData
index 084fb51..d320363 100644 (file)
@@ -66,7 +66,8 @@ PASS "XzzXzz".match(/\k<z>X(?<z>z*)X\k<z>/u) is ["XzzXzz", "zz"]
 PASS "1122332211".match(/\k<ones>\k<twos>\k<threes>(?<ones>1*)(?<twos>2*)(?<threes>3*)\k<threes>\k<twos>\k<ones>/) is ["1122332211", "11", "22", "3"]
 PASS "1122332211".match(/\k<ones>\k<twos>\k<threes>(?<ones>1*)(?<twos>2*)(?<threes>3*)\k<threes>\k<twos>\k<ones>/u) is ["1122332211", "11", "22", "3"]
 PASS "\k<z>XzzX".match(/\k<z>X(z*)X/) is ["k<z>XzzX", "zz"]
-PASS "\k<z>XzzX".match(/\k<z>X(z*)X/u) threw exception SyntaxError: Invalid regular expression: invalid backreference for Unicode pattern.
+PASS "\k<z>XzzX".match(/\k<z>X(z*)X/u) threw exception SyntaxError: Invalid regular expression: invalid \k<> named backreference.
+PASS /\k<xxx(?<a>y)(/ threw exception SyntaxError: Invalid regular expression: invalid \k<> named backreference.
 PASS successfullyParsed is true
 
 TEST COMPLETE
index d1b95a1..1eea6de 100644 (file)
@@ -111,6 +111,7 @@ shouldBe('"1122332211".match(/\\\k<ones>\\\k<twos>\\\k<threes>(?<ones>1*)(?<twos
 shouldBe('"1122332211".match(/\\\k<ones>\\\k<twos>\\\k<threes>(?<ones>1*)(?<twos>2*)(?<threes>3*)\\\k<threes>\\\k<twos>\\\k<ones>/u)', '["1122332211", "11", "22", "3"]');
 
 // Check that a named forward reference for a non-existent named capture
-// matches for non-unicode patterns and throws for unicode patterns.
+// matches for non-Unicode patterns w/o a named group and throws for patterns with a named group or Unicode flag.
 shouldBe('"\\\k<z>XzzX".match(/\\\k<z>X(z*)X/)', '["k<z>XzzX", "zz"]');
-shouldThrow('"\\\k<z>XzzX".match(/\\\k<z>X(z*)X/u)', '"SyntaxError: Invalid regular expression: invalid backreference for Unicode pattern"');
+shouldThrow('"\\\k<z>XzzX".match(/\\\k<z>X(z*)X/u)', '"SyntaxError: Invalid regular expression: invalid \\\\k<> named backreference"');
+shouldThrow('/\\\k<xxx(?<a>y)(/', '"SyntaxError: Invalid regular expression: invalid \\\\k<> named backreference"');
index 328cc9d..d421b25 100644 (file)
@@ -1,3 +1,56 @@
+2020-03-25  Alexey Shvayka  <shvaikalesh@gmail.com>
+
+        Invalid numeric and named references should be early syntax errors
+        https://bugs.webkit.org/show_bug.cgi?id=178175
+
+        Reviewed by Ross Kirsling.
+
+        This patch:
+
+        1. Fixes named reference parsing in parseEscape(), making /\k/u throw SyntaxError per spec [1].
+
+        2. Reworks containsIllegalNamedForwardReferences(), making dangling (e.g. /\k<a>(?<b>.)/) and
+           incomplete (e.g. /\k<(?<a>.)/) named references throw SyntaxError if the non-Unicode pattern
+           contains a named group [2].
+
+        3. Moves reparsing logic from YarrPattern to YarrParser, ensuring syntax errors due to illegal
+           references (named & numeric) are thrown at parse time; drops isValidNamedForwardReference()
+           from Delegate, refactors saveUnmatchedNamedForwardReferences(), and overall improves cohesion
+           of illegal references logic.
+
+        [1]: https://tc39.es/ecma262/#prod-IdentityEscape
+        [2]: https://tc39.es/ecma262/#sec-regexpinitialize (step 7.b)
+
+        * yarr/YarrErrorCode.cpp:
+        (JSC::Yarr::errorMessage):
+        (JSC::Yarr::errorToThrow):
+        * yarr/YarrErrorCode.h:
+        * yarr/YarrParser.h:
+        (JSC::Yarr::Parser::CharacterClassParserDelegate::atomNamedBackReference):
+        (JSC::Yarr::Parser::Parser):
+        (JSC::Yarr::Parser::parseEscape):
+        (JSC::Yarr::Parser::parseParenthesesBegin):
+        (JSC::Yarr::Parser::parse):
+        (JSC::Yarr::Parser::handleIllegalReferences):
+        (JSC::Yarr::Parser::containsIllegalNamedForwardReference):
+        (JSC::Yarr::Parser::resetForReparsing):
+        (JSC::Yarr::parse):
+        (JSC::Yarr::Parser::CharacterClassParserDelegate::isValidNamedForwardReference): Deleted.
+        * yarr/YarrPattern.cpp:
+        (JSC::Yarr::YarrPatternConstructor::atomBackReference):
+        (JSC::Yarr::YarrPatternConstructor::atomNamedForwardReference):
+        (JSC::Yarr::YarrPattern::compile):
+        (JSC::Yarr::YarrPatternConstructor::saveUnmatchedNamedForwardReferences): Deleted.
+        (JSC::Yarr::YarrPatternConstructor::isValidNamedForwardReference): Deleted.
+        * yarr/YarrPattern.h:
+        (JSC::Yarr::YarrPattern::resetForReparsing):
+        (JSC::Yarr::YarrPattern::containsIllegalBackReference): Deleted.
+        (JSC::Yarr::YarrPattern::containsIllegalNamedForwardReferences): Deleted.
+        * yarr/YarrSyntaxChecker.cpp:
+        (JSC::Yarr::SyntaxChecker::atomNamedBackReference):
+        (JSC::Yarr::SyntaxChecker::resetForReparsing):
+        (JSC::Yarr::SyntaxChecker::isValidNamedForwardReference): Deleted.
+
 2020-03-25  Chris Dumez  <cdumez@apple.com>
 
         Use JSC::EnsureStillAliveScope RAII object in the generated bindings code
index 3372bad..3425d13 100644 (file)
@@ -53,6 +53,7 @@ const char* errorMessage(ErrorCode error)
         REGEXP_ERROR_PREFIX "\\ at end of pattern",                                 // EscapeUnterminated
         REGEXP_ERROR_PREFIX "invalid Unicode {} escape",                            // InvalidUnicodeEscape
         REGEXP_ERROR_PREFIX "invalid backreference for Unicode pattern",            // InvalidBackreference
+        REGEXP_ERROR_PREFIX "invalid \\k<> named backreference",                    // InvalidNamedBackReference
         REGEXP_ERROR_PREFIX "invalid escaped character for Unicode pattern",        // InvalidIdentityEscape
         REGEXP_ERROR_PREFIX "invalid \\c escape for Unicode pattern",               // InvalidControlLetterEscape
         REGEXP_ERROR_PREFIX "invalid property expression",                          // InvalidUnicodePropertyExpression
@@ -87,6 +88,7 @@ JSObject* errorToThrow(JSGlobalObject* globalObject, ErrorCode error)
     case ErrorCode::EscapeUnterminated:
     case ErrorCode::InvalidUnicodeEscape:
     case ErrorCode::InvalidBackreference:
+    case ErrorCode::InvalidNamedBackReference:
     case ErrorCode::InvalidIdentityEscape:
     case ErrorCode::InvalidControlLetterEscape:
     case ErrorCode::InvalidUnicodePropertyExpression:
index 3c3b69d..1cb0dbf 100644 (file)
@@ -52,6 +52,7 @@ enum class ErrorCode : uint8_t {
     EscapeUnterminated,
     InvalidUnicodeEscape,
     InvalidBackreference,
+    InvalidNamedBackReference,
     InvalidIdentityEscape,
     InvalidControlLetterEscape,
     InvalidUnicodePropertyExpression,
index c28bfeb..30092ba 100644 (file)
@@ -1,5 +1,6 @@
 /*
- * Copyright (C) 2009-2019 Apple Inc. All rights reserved.
+ * Copyright (C) 2009-2020 Apple Inc. All rights reserved.
+ * Copyright (C) 2020 Alexey Shvayka <shvaikalesh@gmail.com>.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
- * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
- * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
- * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
- * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
- * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
- * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
- * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
- * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #pragma once
@@ -41,7 +42,7 @@ template<class Delegate, typename CharType>
 class Parser {
 private:
     template<class FriendDelegate>
-    friend ErrorCode parse(FriendDelegate&, const String& pattern, bool isUnicode, unsigned backReferenceLimit);
+    friend ErrorCode parse(FriendDelegate&, const String& pattern, bool isUnicode, unsigned backReferenceLimit, bool isNamedForwardReferenceAllowed);
 
     /*
      * CharacterClassParserDelegate:
@@ -200,7 +201,6 @@ private:
         NO_RETURN_DUE_TO_ASSERT void assertionWordBoundary(bool) { RELEASE_ASSERT_NOT_REACHED(); }
         NO_RETURN_DUE_TO_ASSERT void atomBackReference(unsigned) { RELEASE_ASSERT_NOT_REACHED(); }
         NO_RETURN_DUE_TO_ASSERT void atomNamedBackReference(const String&) { RELEASE_ASSERT_NOT_REACHED(); }
-        NO_RETURN_DUE_TO_ASSERT bool isValidNamedForwardReference(const String&) { RELEASE_ASSERT_NOT_REACHED(); }
         NO_RETURN_DUE_TO_ASSERT void atomNamedForwardReference(const String&) { RELEASE_ASSERT_NOT_REACHED(); }
 
     private:
@@ -217,12 +217,13 @@ private:
         UChar32 m_character;
     };
 
-    Parser(Delegate& delegate, const String& pattern, bool isUnicode, unsigned backReferenceLimit)
+    Parser(Delegate& delegate, const String& pattern, bool isUnicode, unsigned backReferenceLimit, bool isNamedForwardReferenceAllowed)
         : m_delegate(delegate)
-        , m_backReferenceLimit(backReferenceLimit)
         , m_data(pattern.characters<CharType>())
         , m_size(pattern.length())
         , m_isUnicode(isUnicode)
+        , m_backReferenceLimit(backReferenceLimit)
+        , m_isNamedForwardReferenceAllowed(isNamedForwardReferenceAllowed)
     {
     }
 
@@ -338,16 +339,12 @@ private:
 
                 unsigned backReference = consumeNumber();
                 if (backReference <= m_backReferenceLimit) {
+                    m_maxSeenBackReference = std::max(m_maxSeenBackReference, backReference);
                     delegate.atomBackReference(backReference);
                     break;
                 }
 
                 restoreState(state);
-
-                if (m_isUnicode) {
-                    m_errorCode = ErrorCode::InvalidBackreference;
-                    return false;
-                }
             }
 
             // Not a backreference, and not octal. Just a number.
@@ -439,28 +436,27 @@ private:
         case 'k': {
             consume();
             ParseState state = saveState();
-            if (!atEndOfPattern() && !inCharacterClass) {
-                if (consume() == '<') {
-                    auto groupName = tryConsumeGroupName();
-                    if (groupName) {
-                        if (m_captureGroupNames.contains(groupName.value())) {
-                            delegate.atomNamedBackReference(groupName.value());
-                            break;
-                        }
-                        
-                        if (delegate.isValidNamedForwardReference(groupName.value())) {
-                            delegate.atomNamedForwardReference(groupName.value());
-                            break;
-                        }
+            if (!inCharacterClass && tryConsume('<')) {
+                auto groupName = tryConsumeGroupName();
+                if (groupName) {
+                    if (m_captureGroupNames.contains(groupName.value())) {
+                        delegate.atomNamedBackReference(groupName.value());
+                        break;
                     }
-                    if (m_isUnicode) {
-                        m_errorCode = ErrorCode::InvalidBackreference;
+
+                    if (m_isNamedForwardReferenceAllowed) {
+                        m_forwardReferenceNames.add(groupName.value());
+                        delegate.atomNamedForwardReference(groupName.value());
                         break;
                     }
                 }
             }
+
             restoreState(state);
-            delegate.atomPatternCharacter('k');
+            if (!isIdentityEscapeAnError('k')) {
+                delegate.atomPatternCharacter('k');
+                m_kIdentityEscapeSeen = true; 
+            }
             break;
         }
 
@@ -677,6 +673,11 @@ private:
             case '<': {
                 auto groupName = tryConsumeGroupName();
                 if (groupName) {
+                    if (m_kIdentityEscapeSeen) {
+                        m_errorCode = ErrorCode::InvalidNamedBackReference;
+                        break;
+                    }
+
                     auto setAddResult = m_captureGroupNames.add(groupName.value());
                     if (setAddResult.isNewEntry)
                         m_delegate.atomParenthesesSubpatternBegin(true, groupName);
@@ -694,6 +695,9 @@ private:
         } else
             m_delegate.atomParenthesesSubpatternBegin();
 
+        if (type == ParenthesesType::Subpattern)
+            ++m_numSubpatterns;
+
         m_parenthesesStack.append(type);
     }
 
@@ -881,14 +885,87 @@ private:
     ErrorCode parse()
     {
         if (m_size > MAX_PATTERN_SIZE)
-            m_errorCode = ErrorCode::PatternTooLarge;
-        else
-            parseTokens();
-        ASSERT(atEndOfPattern() || hasError(m_errorCode));
-        
+            return ErrorCode::PatternTooLarge;
+
+        parseTokens();
+
+        if (!hasError(m_errorCode)) {
+            ASSERT(atEndOfPattern());
+            handleIllegalReferences();
+            ASSERT(atEndOfPattern());
+        }
+
         return m_errorCode;
     }
 
+    void handleIllegalReferences()
+    {
+        bool shouldReparse = false;
+
+        if (m_maxSeenBackReference > m_numSubpatterns) {
+            // Contains illegal numeric backreference. See https://tc39.es/ecma262/#prod-annexB-AtomEscape
+            if (m_isUnicode) {
+                m_errorCode = ErrorCode::InvalidBackreference;
+                return;
+            }
+
+            m_backReferenceLimit = m_numSubpatterns;
+            shouldReparse = true;
+        }
+
+        if (m_kIdentityEscapeSeen && !m_captureGroupNames.isEmpty()) {
+            m_errorCode = ErrorCode::InvalidNamedBackReference;
+            return;
+        }
+
+        if (containsIllegalNamedForwardReference()) {
+            // \k<a> is parsed as named reference in Unicode patterns because of strict IdentityEscape grammar.
+            // See https://tc39.es/ecma262/#sec-patterns-static-semantics-early-errors
+            if (m_isUnicode || !m_captureGroupNames.isEmpty()) {
+                m_errorCode = ErrorCode::InvalidNamedBackReference;
+                return;
+            }
+
+            m_isNamedForwardReferenceAllowed = false;
+            shouldReparse = true;
+        }
+
+        if (shouldReparse) {
+            resetForReparsing();
+            parseTokens();
+        }
+    }
+
+    bool containsIllegalNamedForwardReference()
+    {
+        if (m_forwardReferenceNames.isEmpty())
+            return false;
+
+        if (m_captureGroupNames.isEmpty())
+            return true;
+
+        for (auto& entry : m_forwardReferenceNames) {
+            if (!m_captureGroupNames.contains(entry))
+                return true;
+        }
+
+        return false;
+    }
+
+    void resetForReparsing()
+    {
+        ASSERT(!hasError(m_errorCode));
+
+        m_delegate.resetForReparsing();
+        m_index = 0;
+        m_numSubpatterns = 0;
+        m_maxSeenBackReference = 0;
+        m_kIdentityEscapeSeen = false;
+        m_parenthesesStack.clear();
+        m_captureGroupNames.clear();
+        m_forwardReferenceNames.clear();
+    }
+
     // Misc helper functions:
 
     typedef unsigned ParseState;
@@ -1153,14 +1230,19 @@ private:
     enum class ParenthesesType : uint8_t { Subpattern, Assertion };
 
     Delegate& m_delegate;
-    unsigned m_backReferenceLimit;
     ErrorCode m_errorCode { ErrorCode::NoError };
     const CharType* m_data;
     unsigned m_size;
     unsigned m_index { 0 };
     bool m_isUnicode;
+    unsigned m_backReferenceLimit;
+    unsigned m_numSubpatterns { 0 };
+    unsigned m_maxSeenBackReference { 0 };
+    bool m_isNamedForwardReferenceAllowed;
+    bool m_kIdentityEscapeSeen { false };
     Vector<ParenthesesType, 16> m_parenthesesStack;
     HashSet<String> m_captureGroupNames;
+    HashSet<String> m_forwardReferenceNames;
 
     // Derived by empirical testing of compile time in PCRE and WREC.
     static constexpr unsigned MAX_PATTERN_SIZE = 1024 * 1024;
@@ -1192,13 +1274,14 @@ private:
  *    void atomParenthesesEnd();
  *    void atomBackReference(unsigned subpatternId);
  *    void atomNamedBackReference(const String& subpatternName);
- *    bool isValidNamedForwardReference(const String& subpatternName);
  *    void atomNamedForwardReference(const String& subpatternName);
  *
  *    void quantifyAtom(unsigned min, unsigned max, bool greedy);
  *
  *    void disjunction();
  *
+ *    void resetForReparsing();
+ *
  * The regular expression is described by a sequence of assertion*() and atom*()
  * callbacks to the delegate, describing the terms in the regular expression.
  * Following an atom a quantifyAtom() call may occur to indicate that the previous
@@ -1229,11 +1312,11 @@ private:
  */
 
 template<class Delegate>
-ErrorCode parse(Delegate& delegate, const String& pattern, bool isUnicode, unsigned backReferenceLimit = quantifyInfinite)
+ErrorCode parse(Delegate& delegate, const String& pattern, bool isUnicode, unsigned backReferenceLimit = quantifyInfinite, bool isNamedForwardReferenceAllowed = true)
 {
     if (pattern.is8Bit())
-        return Parser<Delegate, LChar>(delegate, pattern, isUnicode, backReferenceLimit).parse();
-    return Parser<Delegate, UChar>(delegate, pattern, isUnicode, backReferenceLimit).parse();
+        return Parser<Delegate, LChar>(delegate, pattern, isUnicode, backReferenceLimit, isNamedForwardReferenceAllowed).parse();
+    return Parser<Delegate, UChar>(delegate, pattern, isUnicode, backReferenceLimit, isNamedForwardReferenceAllowed).parse();
 }
 
 } } // namespace JSC::Yarr
index c508208..97fb868 100644 (file)
@@ -462,16 +462,6 @@ public:
         m_pattern.m_disjunctions.append(WTFMove(body));
     }
 
-    void saveUnmatchedNamedForwardReferences()
-    {
-        m_unmatchedNamedForwardReferences.shrink(0);
-        
-        for (auto& entry : m_pattern.m_namedForwardReferences) {
-            if (!m_pattern.m_captureGroupNames.contains(entry))
-                m_unmatchedNamedForwardReferences.append(entry);
-        }
-    }
-
     void assertionBOL()
     {
         if (!m_alternative->m_terms.size() && !m_invertParentheticalAssertion) {
@@ -657,7 +647,6 @@ public:
     {
         ASSERT(subpatternId);
         m_pattern.m_containsBackreferences = true;
-        m_pattern.m_maxBackReference = std::max(m_pattern.m_maxBackReference, subpatternId);
 
         if (subpatternId > m_pattern.m_numSubpatterns) {
             m_alternative->m_terms.append(PatternTerm::ForwardReference());
@@ -687,14 +676,8 @@ public:
         atomBackReference(m_pattern.m_namedGroupToParenIndex.get(subpatternName));
     }
 
-    bool isValidNamedForwardReference(const String& subpatternName)
+    void atomNamedForwardReference(const String&)
     {
-        return !m_unmatchedNamedForwardReferences.contains(subpatternName);
-    }
-
-    void atomNamedForwardReference(const String& subpatternName)
-    {
-        m_pattern.m_namedForwardReferences.appendIfNotContains(subpatternName);
         m_alternative->m_terms.append(PatternTerm::ForwardReference());
     }
     
@@ -1130,7 +1113,6 @@ private:
     YarrPattern& m_pattern;
     PatternAlternative* m_alternative;
     CharacterClassConstructor m_characterClassConstructor;
-    Vector<String> m_unmatchedNamedForwardReferences;
     void* m_stackLimit;
     ErrorCode m_error { ErrorCode::NoError };
     bool m_invertCharacterClass;
@@ -1146,23 +1128,6 @@ ErrorCode YarrPattern::compile(const String& patternString, void* stackLimit)
         if (hasError(error))
             return error;
     }
-    
-    // If the pattern contains illegal backreferences reset & reparse.
-    // Quoting Netscape's "What's new in JavaScript 1.2",
-    //      "Note: if the number of left parentheses is less than the number specified
-    //       in \#, the \# is taken as an octal escape as described in the next row."
-    if (containsIllegalBackReference() || containsIllegalNamedForwardReferences()) {
-        if (unicode())
-            return ErrorCode::InvalidBackreference;
-
-        unsigned numSubpatterns = m_numSubpatterns;
-
-        constructor.saveUnmatchedNamedForwardReferences();
-        constructor.resetForReparsing();
-        ErrorCode error = parse(constructor, patternString, unicode(), numSubpatterns);
-        ASSERT_UNUSED(error, !hasError(error));
-        ASSERT(numSubpatterns == m_numSubpatterns);
-    }
 
     constructor.checkForTerminalParentheses();
     constructor.optimizeDotStarWrappedExpressions();
index f6b22e5..277f20b 100644 (file)
@@ -391,7 +391,6 @@ struct YarrPattern {
     void resetForReparsing()
     {
         m_numSubpatterns = 0;
-        m_maxBackReference = 0;
         m_initialStartValueFrameLocation = 0;
 
         m_containsBackreferences = false;
@@ -415,25 +414,6 @@ struct YarrPattern {
         m_disjunctions.clear();
         m_userCharacterClasses.clear();
         m_captureGroupNames.shrink(0);
-        m_namedForwardReferences.shrink(0);
-    }
-
-    bool containsIllegalBackReference()
-    {
-        return m_maxBackReference > m_numSubpatterns;
-    }
-    
-    bool containsIllegalNamedForwardReferences()
-    {
-        if (m_namedForwardReferences.isEmpty())
-            return false;
-
-        for (auto& entry : m_namedForwardReferences) {
-            if (!m_captureGroupNames.contains(entry))
-                return true;
-        }
-
-        return false;
     }
 
     bool containsUnsignedLengthPattern()
@@ -555,13 +535,11 @@ struct YarrPattern {
     bool m_saveInitialStartValue : 1;
     OptionSet<Flags> m_flags;
     unsigned m_numSubpatterns { 0 };
-    unsigned m_maxBackReference { 0 };
     unsigned m_initialStartValueFrameLocation { 0 };
     PatternDisjunction* m_body;
     Vector<std::unique_ptr<PatternDisjunction>, 4> m_disjunctions;
     Vector<std::unique_ptr<CharacterClass>> m_userCharacterClasses;
     Vector<String> m_captureGroupNames;
-    Vector<String> m_namedForwardReferences;
     HashMap<String, unsigned> m_namedGroupToParenIndex;
 
 private:
index 86b0e67..3c7a5dc 100644 (file)
@@ -49,10 +49,10 @@ public:
     void atomParenthesesEnd() {}
     void atomBackReference(unsigned) {}
     void atomNamedBackReference(const String&) {}
-    bool isValidNamedForwardReference(const String&) { return true; }
     void atomNamedForwardReference(const String&) {}
     void quantifyAtom(unsigned, unsigned, bool) {}
     void disjunction() {}
+    void resetForReparsing() {}
 };
 
 ErrorCode checkSyntax(const String& pattern, const String& flags)
index 9992d16..3a901c3 100644 (file)
@@ -1,3 +1,21 @@
+2020-03-25  Alexey Shvayka  <shvaikalesh@gmail.com>
+
+        Invalid numeric and named references should be early syntax errors
+        https://bugs.webkit.org/show_bug.cgi?id=178175
+
+        Reviewed by Ross Kirsling.
+
+        Accounts for changes of YarrParser's Delegate interface, no behavioral changes.
+        resetForReparsing() is never called because we disable numeric backrefences
+        and named forward references (see arguments of Yarr::parse() call).
+
+        Test: TestWebKitAPI.ContentExtensionTest.ParsingFailures
+
+        * contentextensions/URLFilterParser.cpp:
+        (WebCore::ContentExtensions::PatternParser::resetForReparsing):
+        (WebCore::ContentExtensions::URLFilterParser::addPattern):
+        (WebCore::ContentExtensions::PatternParser::isValidNamedForwardReference): Deleted.
+
 2020-03-25  Pinki Gyanchandani  <pgyanchandani@apple.com>
 
         CanvasRenderingContext2D.putImageData() should not process neutered ImageData
index f0420d5..ef50cbb 100644 (file)
@@ -134,11 +134,6 @@ public:
         fail(URLFilterParser::BackReference);
     }
 
-    bool isValidNamedForwardReference(const String&)
-    {
-        return false;
-    }
-
     void atomNamedForwardReference(const String&)
     {
         fail(URLFilterParser::ForwardReference);
@@ -249,6 +244,11 @@ public:
         fail(URLFilterParser::Disjunction);
     }
 
+    NO_RETURN_DUE_TO_CRASH void resetForReparsing()
+    {
+        RELEASE_ASSERT_NOT_REACHED();
+    }
+
 private:
     bool hasError() const
     {
@@ -358,7 +358,7 @@ URLFilterParser::ParseStatus URLFilterParser::addPattern(const String& pattern,
 
     ParseStatus status = Ok;
     PatternParser patternParser(patternIsCaseSensitive);
-    if (!JSC::Yarr::hasError(JSC::Yarr::parse(patternParser, pattern, false, 0)))
+    if (!JSC::Yarr::hasError(JSC::Yarr::parse(patternParser, pattern, false, 0, false)))
         patternParser.finalize(patternId, m_combinedURLFilters);
     else
         status = YarrError;
index 08ab06a..12e14d4 100644 (file)
@@ -1,3 +1,15 @@
+2020-03-25  Alexey Shvayka  <shvaikalesh@gmail.com>
+
+        Invalid numeric and named references should be early syntax errors
+        https://bugs.webkit.org/show_bug.cgi?id=178175
+
+        Reviewed by Ross Kirsling.
+
+        Removes FIXME as YarrParser is correct not to throw errors as it is
+        parsing in non-Unicode mode. Also adds a few named groups tests.
+
+        * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
+
 2020-03-25  Zhifei Fang  <zhifei_fang@apple.com>
 
         [Timeline] A better default get label function, which fit the assumpation the label is always a string
index 8e137a8..c35028e 100644 (file)
@@ -1760,7 +1760,6 @@ TEST_F(ContentExtensionTest, ParsingFailures)
     testPatternStatus("[", ContentExtensions::URLFilterParser::ParseStatus::YarrError);
     testPatternStatus("[a}", ContentExtensions::URLFilterParser::ParseStatus::YarrError);
     
-    // FIXME: Look into why these do not cause YARR parsing errors.  They probably should.
     testPatternStatus("a]", ContentExtensions::URLFilterParser::ParseStatus::Ok);
     testPatternStatus("{", ContentExtensions::URLFilterParser::ParseStatus::Ok);
     testPatternStatus("{[a]", ContentExtensions::URLFilterParser::ParseStatus::Ok);
@@ -1789,10 +1788,12 @@ TEST_F(ContentExtensionTest, ParsingFailures)
     
     testPatternStatus("(a)\\1", ContentExtensions::URLFilterParser::ParseStatus::Ok); // Back references are disabled, it parse as octal 1
     testPatternStatus("(<A>a)\\k<A>", ContentExtensions::URLFilterParser::ParseStatus::Ok); // Named back references aren't handled, it parse as "k<A>"
+    testPatternStatus("(?<A>a)\\k<A>", ContentExtensions::URLFilterParser::ParseStatus::BackReference);
     testPatternStatus("\\1(a)", ContentExtensions::URLFilterParser::ParseStatus::Ok); // Forward references are disabled, it parse as octal 1
     testPatternStatus("\\8(a)", ContentExtensions::URLFilterParser::ParseStatus::Ok); // Forward references are disabled, it parse as '8'
     testPatternStatus("\\9(a)", ContentExtensions::URLFilterParser::ParseStatus::Ok); // Forward references are disabled, it parse as '9'
     testPatternStatus("\\k<A>(<A>a)", ContentExtensions::URLFilterParser::ParseStatus::Ok); // Named forward references aren't handled, it parse as "k<A>"
+    testPatternStatus("\\k<A>(?<A>a)", ContentExtensions::URLFilterParser::ParseStatus::YarrError);
     testPatternStatus("\\k<A>(a)", ContentExtensions::URLFilterParser::ParseStatus::Ok); // Unmatched named forward references aren't handled, it parse as "k<A>"
 }