JavaScriptCore:
authordarin@apple.com <darin@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 3 Jan 2008 06:39:50 +0000 (06:39 +0000)
committerdarin@apple.com <darin@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 3 Jan 2008 06:39:50 +0000 (06:39 +0000)
        Reviewed by Geoff.

        - fix http://bugs.webkit.org/show_bug.cgi?id=16696
          JSCRE fails fails to match Acid3 regexp

        Test: fast/regex/early-acid3-86.html

        The problem was with the cutoff point between backreferences and octal
        escape sequences. We need to determine the cutoff point by counting the
        total number of capturing brackets, which requires an extra pass through
        the expression when compiling it.

        * pcre/pcre_compile.cpp:
        (CompileData::CompileData): Added numCapturingBrackets. Removed some
        unused fields.
        (compileBranch): Use numCapturingBrackets when calling checkEscape.
        (calculateCompiledPatternLength): Use numCapturingBrackets when calling
        checkEscape, and also store the bracket count at the end of the compile.
        (jsRegExpCompile): Call calculateCompiledPatternLength twice -- once to
        count the number of brackets and then a second time to calculate the length.

LayoutTests:

        Reviewed by Geoff.

        - test for http://bugs.webkit.org/show_bug.cgi?id=16696
          JSCRE fails fails to match Acid3 regexp

        * fast/regex/early-acid3-86-expected.txt: Added.
        * fast/regex/early-acid3-86.html: Added.
        * fast/regex/resources/early-acid3-86.js: Added.

        * fast/regex/test1-expected.txt: Updated for a few cases where we now fail.
        But these "failures" represent us replacing PCRE behavior with semantics
        that are correct for JavaScript regular expressions.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@29110 268f45cc-cd09-0410-ab3c-d52691b4dbfc

JavaScriptCore/ChangeLog
JavaScriptCore/pcre/pcre_compile.cpp
LayoutTests/ChangeLog
LayoutTests/fast/regex/early-acid3-86-expected.txt [new file with mode: 0644]
LayoutTests/fast/regex/early-acid3-86.html [new file with mode: 0644]
LayoutTests/fast/regex/resources/early-acid3-86.js [new file with mode: 0644]
LayoutTests/fast/regex/test1-expected.txt

index c4d82f0edc0463913f6c9058607184e769b6c9db..c31a28de6d56667598379ee7f875aa6a5469068b 100644 (file)
@@ -1,3 +1,49 @@
+2008-01-02  Darin Adler  <darin@apple.com>
+
+        Reviewed by Geoff.
+
+        - fix http://bugs.webkit.org/show_bug.cgi?id=16696
+          JSCRE fails fails to match Acid3 regexp
+
+        Test: fast/regex/early-acid3-86.html
+
+        The problem was with the cutoff point between backreferences and octal
+        escape sequences. We need to determine the cutoff point by counting the
+        total number of capturing brackets, which requires an extra pass through
+        the expression when compiling it.
+
+        * pcre/pcre_compile.cpp:
+        (CompileData::CompileData): Added numCapturingBrackets. Removed some
+        unused fields.
+        (compileBranch): Use numCapturingBrackets when calling checkEscape.
+        (calculateCompiledPatternLength): Use numCapturingBrackets when calling
+        checkEscape, and also store the bracket count at the end of the compile.
+        (jsRegExpCompile): Call calculateCompiledPatternLength twice -- once to
+        count the number of brackets and then a second time to calculate the length.
+
+2008-01-02  Darin Adler  <darin@apple.com>
+
+        Reviewed by Geoff.
+
+        - fix http://bugs.webkit.org/show_bug.cgi?id=16696
+          JSCRE fails fails to match Acid3 regexp
+
+        Test: fast/regex/early-acid3-86.html
+
+        The problem was with the cutoff point between backreferences and octal
+        escape sequences. We need to determine the cutoff point by counting the
+        total number of capturing brackets, which requires an extra pass through
+        the expression when compiling it.
+
+        * pcre/pcre_compile.cpp:
+        (CompileData::CompileData): Added numCapturingBrackets. Removed some
+        unused fields.
+        (compileBranch): Use numCapturingBrackets when calling checkEscape.
+        (calculateCompiledPatternLength): Use numCapturingBrackets when calling
+        checkEscape, and also store the bracket count at the end of the compile.
+        (jsRegExpCompile): Call calculateCompiledPatternLength twice -- once to
+        count the number of brackets and then a second time to calculate the length.
+
 2008-01-02  David Kilzer <ddkilzer@webkit.org>
 
         Reviewed and landed by Darin.
index b2886762c633e8468068250e5d17acf11075a83a..fe027b1e06e0126afc6923f9e429ff7f29222963 100644 (file)
@@ -135,19 +135,17 @@ doing the compiling. */
 
 struct CompileData {
     CompileData() {
-        start_code = 0;
-        start_pattern = 0;
         top_backref = 0;
         backrefMap = 0;
         req_varyopt = 0;
         needOuterBracket = false;
+        numCapturingBrackets = 0;
     }
-    const unsigned char* start_code;   /* The start of the compiled code */
-    const UChar* start_pattern; /* The start of the pattern */
     int top_backref;            /* Maximum back reference */
     unsigned backrefMap;       /* Bitmap of low back refs */
     int req_varyopt;            /* "After variable item" flag for reqbyte */
     bool needOuterBracket;
+    int numCapturingBrackets;
 };
 
 /* Definitions to allow mutual recursion */
@@ -729,7 +727,7 @@ compileBranch(int options, int* brackets, unsigned char** codeptr,
                      character in them, so set class_charcount bigger than one. */
                     
                     if (c == '\\') {
-                        c = checkEscape(&ptr, patternEnd, errorcodeptr, *brackets, true);
+                        c = checkEscape(&ptr, patternEnd, errorcodeptr, cd.numCapturingBrackets, true);
                         if (c < 0) {
                             class_charcount += 2;     /* Greater than 1 is what matters */
                             switch (-c) {
@@ -796,7 +794,7 @@ compileBranch(int options, int* brackets, unsigned char** codeptr,
                         
                         if (d == '\\') {
                             const UChar* oldptr = ptr;
-                            d = checkEscape(&ptr, patternEnd, errorcodeptr, *brackets, true);
+                            d = checkEscape(&ptr, patternEnd, errorcodeptr, cd.numCapturingBrackets, true);
                             
                             /* \X is literal X; any other special means the '-' was literal */
                             if (d < 0) {
@@ -1564,7 +1562,7 @@ compileBranch(int options, int* brackets, unsigned char** codeptr,
                 
             case '\\':
                 tempptr = ptr;
-                c = checkEscape(&ptr, patternEnd, errorcodeptr, *brackets, false);
+                c = checkEscape(&ptr, patternEnd, errorcodeptr, cd.numCapturingBrackets, false);
                 
                 /* Handle metacharacters introduced by \. For ones like \d, the ESC_ values
                  are arranged to be the negation of the corresponding OP_values. For the
@@ -1987,15 +1985,12 @@ static int bracketFindFirstAssertedCharacter(const unsigned char* code, bool ina
     return c;
 }
 
-static int calculateCompiledPatternLengthAndFlags(const UChar* pattern, int patternLength, JSRegExpIgnoreCaseOption ignoreCase,
+static int calculateCompiledPatternLength(const UChar* pattern, int patternLength, JSRegExpIgnoreCaseOption ignoreCase,
     CompileData& cd, ErrorCode& errorcode)
 {
     /* Make a pass over the pattern to compute the
      amount of store required to hold the compiled code. This does not have to be
-     perfect as long as errors are overestimates. At the same time we can detect any
-     flag settings right at the start, and extract them. Make an attempt to correct
-     for any counted white space if an "extended" flag setting appears late in the
-     pattern. We can't be so clever for #-comments. */
+     perfect as long as errors are overestimates. */
     
     int length = 1 + LINK_SIZE;      /* For initial BRA plus length */
     int branch_extra = 0;
@@ -2017,7 +2012,7 @@ static int calculateCompiledPatternLengthAndFlags(const UChar* pattern, int patt
              character type. */
 
             case '\\':
-                c = checkEscape(&ptr, patternEnd, &errorcode, bracount, false);
+                c = checkEscape(&ptr, patternEnd, &errorcode, cd.numCapturingBrackets, false);
                 if (errorcode != 0)
                     return -1;
                 
@@ -2150,7 +2145,7 @@ static int calculateCompiledPatternLengthAndFlags(const UChar* pattern, int patt
                     /* Check for escapes */
                     
                     if (*ptr == '\\') {
-                        c = checkEscape(&ptr, patternEnd, &errorcode, bracount, true);
+                        c = checkEscape(&ptr, patternEnd, &errorcode, cd.numCapturingBrackets, true);
                         if (errorcode != 0)
                             return -1;
                         
@@ -2185,7 +2180,7 @@ static int calculateCompiledPatternLengthAndFlags(const UChar* pattern, int patt
                             UChar const *hyptr = ptr++;
                             if (safelyCheckNextChar(ptr, patternEnd, '\\')) {
                                 ptr++;
-                                d = checkEscape(&ptr, patternEnd, &errorcode, bracount, true);
+                                d = checkEscape(&ptr, patternEnd, &errorcode, cd.numCapturingBrackets, true);
                                 if (errorcode != 0)
                                     return -1;
                             }
@@ -2468,6 +2463,8 @@ static int calculateCompiledPatternLengthAndFlags(const UChar* pattern, int patt
     }
     
     length += 2 + LINK_SIZE;    /* For final KET and END */
+
+    cd.numCapturingBrackets = bracount;
     return length;
 }
 
@@ -2512,7 +2509,10 @@ JSRegExp* jsRegExpCompile(const UChar* pattern, int patternLength,
     CompileData cd;
     
     ErrorCode errorcode = ERR0;
-    int length = calculateCompiledPatternLengthAndFlags(pattern, patternLength, ignoreCase, cd, errorcode);
+    /* Call this once just to count the brackets. */
+    calculateCompiledPatternLength(pattern, patternLength, ignoreCase, cd, errorcode);
+    /* Call it again to compute the length. */
+    int length = calculateCompiledPatternLength(pattern, patternLength, ignoreCase, cd, errorcode);
     if (errorcode)
         return returnError(errorcode, errorptr);
     
@@ -2531,8 +2531,6 @@ JSRegExp* jsRegExpCompile(const UChar* pattern, int patternLength,
      passed around in the compile data block. */
     
     const unsigned char* codeStart = (const unsigned char*)(re + 1);
-    cd.start_code = codeStart;
-    cd.start_pattern = (const UChar*)pattern;
     
     /* Set up a starting, non-extracting bracket, then compile the expression. On
      error, errorcode will be set non-zero, so we don't need to look at the result
index e77f1712e0f17cd6291681ad957da365b24d9efe..2038b3e445ae6c0202bd6e07eb9f8fed7f4b2e45 100644 (file)
@@ -1,8 +1,38 @@
+2008-01-02  Darin Adler  <darin@apple.com>
+
+        Reviewed by Geoff.
+
+        - test for http://bugs.webkit.org/show_bug.cgi?id=16696
+          JSCRE fails fails to match Acid3 regexp
+
+        * fast/regex/early-acid3-86-expected.txt: Added.
+        * fast/regex/early-acid3-86.html: Added.
+        * fast/regex/resources/early-acid3-86.js: Added.
+
+        * fast/regex/test1-expected.txt: Updated for a few cases where we now fail.
+        But these "failures" represent us replacing PCRE behavior with semantics
+        that are correct for JavaScript regular expressions.
+
 2008-01-02  Alice Liu  <alice.liu@apple.com>
 
         * platform/win/Skipped: removing a fixed test
         this was fixed with r28836 but was left behind on the skipped list
 
+2008-01-02  Darin Adler  <darin@apple.com>
+
+        Reviewed by Geoff.
+
+        - test for http://bugs.webkit.org/show_bug.cgi?id=16696
+          JSCRE fails fails to match Acid3 regexp
+
+        * fast/regex/early-acid3-86-expected.txt: Added.
+        * fast/regex/early-acid3-86.html: Added.
+        * fast/regex/resources/early-acid3-86.js: Added.
+
+        * fast/regex/test1-expected.txt: Updated for a few cases where we now fail.
+        But these "failures" represent us replacing PCRE behavior with semantics
+        that are correct for JavaScript regular expressions.
+
 2008-01-02  Darin Adler  <darin@apple.com>
 
         Reviewed by Maciej.
diff --git a/LayoutTests/fast/regex/early-acid3-86-expected.txt b/LayoutTests/fast/regex/early-acid3-86-expected.txt
new file mode 100644 (file)
index 0000000..01793b4
--- /dev/null
@@ -0,0 +1,12 @@
+Test that covers capturing brackets, and was adapted from a part of an early version of Acid3.
+
+On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
+
+
+PASS /TA[])]/.exec('TA]') threw exception SyntaxError: Invalid regular expression: unmatched parentheses.
+PASS /[]/.exec('') is null
+PASS /(\3)(\1)(a)/.exec('cat').toString() is 'a,,,a'
+PASS successfullyParsed is true
+
+TEST COMPLETE
+
diff --git a/LayoutTests/fast/regex/early-acid3-86.html b/LayoutTests/fast/regex/early-acid3-86.html
new file mode 100644 (file)
index 0000000..0966290
--- /dev/null
@@ -0,0 +1,13 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+<head>
+<link rel="stylesheet" href="../js/resources/js-test-style.css">
+<script src="../js/resources/js-test-pre.js"></script>
+</head>
+<body>
+<p id="description"></p>
+<div id="console"></div>
+<script src="resources/early-acid3-86.js"></script>
+<script src="../js/resources/js-test-post.js"></script>
+</body>
+</html>
diff --git a/LayoutTests/fast/regex/resources/early-acid3-86.js b/LayoutTests/fast/regex/resources/early-acid3-86.js
new file mode 100644 (file)
index 0000000..bc4c52f
--- /dev/null
@@ -0,0 +1,12 @@
+description(
+'Test that covers capturing brackets, and was adapted from a part of an early version of Acid3.'
+);
+
+// JS regexps aren't like Perl regexps, if their character
+// classes start with a ] that means they're empty. So this
+// is a syntax error; if we get here it's a bug.
+shouldThrow("/TA[])]/.exec('TA]')");
+shouldBe("/[]/.exec('')", "null");
+shouldBe("/(\\3)(\\1)(a)/.exec('cat').toString()", "'a,,,a'");
+
+var successfullyParsed = true;
index 97e778f8d6d7783e8bb192e11b92a64fbc45c310..fd2448b2b11aeec5a95130458784b2821adf9bd1 100644 (file)
@@ -1785,10 +1785,10 @@ FAILED TO COMPILE
     ababbbcbc: PASS
 
 /((\3|b)\2(a)x)+/
-    aaaxabaxbaaxbbax: PASS
+    aaaxabaxbaaxbbax: FAIL. Actual results: "ax,ax,,a"
 
 /((\3|b)\2(a)){2,}/
-    bbaababbabaaaaabbaaaabba: FAIL. Actual results: "null"
+    bbaababbabaaaaabbaaaabba: FAIL. Actual results: "abba,bba,b,a"
 
 /abc/i
     ABC: PASS