Bug 48101 - Yarr gives different results for /(?:a*?){2,}/
authorbarraclough@apple.com <barraclough@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 29 Nov 2010 00:45:16 +0000 (00:45 +0000)
committerbarraclough@apple.com <barraclough@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 29 Nov 2010 00:45:16 +0000 (00:45 +0000)
Reviewed by Sam Weinig.

JavaScriptCore:

The test cases in the linked mozilla bug demostrate a couple of
problems in subpattern matching. These bugs lie in the optimized
cases - for matching parentheses with a quantity count of 1, and
for matching greedy quantified parentheses at the end of a regex
(which do not backtrack).

In both of these cases we are failing to correctly handle empty
matches. In the case of parenthese-single matches (quantity count
one) we are failing to test for empty matches at all. In the case
of terminal subpattern matches we do currenty check, however there
is a subtler bug here too. In the case of an empty match we will
presently immediately fall through to the next alternative (or
complete the regex match), whereas upon a failed match we should
be backtracking into the failing alternative, to give it a chance
to match further (e.g. consider /a??b?|a/.exec("ab") - upon first
attempting to match the first alternative this will match the empty
string - since a?? is non-greedy, however rather than moving on to
the second alternative we should be re-matching the first one, at
which point the non-greedy a?? will match, and as such the result
should be "ab", not "a").

Terminal sunpattern matching contains a second bug, too. The frame
location values in the subpattern should be being allocated with
the outer disjunction's frame (as we do for the parentheses-single
optimization). Consider the following three regexes:
    /a*(?:b*)*c*/
    /a*(?:b*)c*/
    /a*(?:b*)*/
Considering only the frame location required by the atoms a,b, and
c, (ignoring space associated with the nested subpattern) the first
regex (a normal subpattern match) requires a frame size of 2 for
the outer disjunction, (to backtrack terms a & c), with each
iteration of the subpattern requiring a frame of size 1 (in order
to backtrack b). In the case of the second regex (where the
parentheses-single optimization will kick in) the outer frame must
be set up with a frame size of 3, since the outer frame will also
be used when running the nested subpattern. We will currently only
allocate a farme of size 1 for the outer disjuntion (to contain a),
howver the frame size should be 2 (since the subpattern will be
evaluated in the outer frame). In addition to failing to allocate
frame space the frame offsets are also presently invalid - in the
case of the last regex b's frame location will be set assuming it
to be the first term in the frame, whereas in this case b lies
after the term a, and should be taking a separate frame location.

In order to correctly allocate the frame for terminal subpattern
matches we must move this optimization back up from the JIT into
the compiler (and thus interpreter too), since this is where the
frame allocation takes place.

* yarr/RegexCompiler.cpp:
(JSC::Yarr::RegexPatternConstructor::setupAlternativeOffsets):
(JSC::Yarr::RegexPatternConstructor::checkForTerminalParentheses):
(JSC::Yarr::compileRegex):
* yarr/RegexInterpreter.cpp:
(JSC::Yarr::Interpreter::matchParenthesesOnceBegin):
(JSC::Yarr::Interpreter::matchParenthesesOnceEnd):
(JSC::Yarr::Interpreter::backtrackParenthesesOnceBegin):
(JSC::Yarr::Interpreter::backtrackParenthesesOnceEnd):
(JSC::Yarr::Interpreter::matchParenthesesTerminalBegin):
(JSC::Yarr::Interpreter::matchParenthesesTerminalEnd):
(JSC::Yarr::Interpreter::backtrackParenthesesTerminalBegin):
(JSC::Yarr::Interpreter::backtrackParenthesesTerminalEnd):
(JSC::Yarr::Interpreter::matchDisjunction):
(JSC::Yarr::ByteCompiler::atomParenthesesOnceBegin):
(JSC::Yarr::ByteCompiler::atomParenthesesTerminalBegin):
(JSC::Yarr::ByteCompiler::atomParenthesesSubpatternBegin):
(JSC::Yarr::ByteCompiler::atomParentheticalAssertionEnd):
(JSC::Yarr::ByteCompiler::atomParenthesesSubpatternEnd):
(JSC::Yarr::ByteCompiler::atomParenthesesOnceEnd):
(JSC::Yarr::ByteCompiler::atomParenthesesTerminalEnd):
(JSC::Yarr::ByteCompiler::emitDisjunction):
* yarr/RegexInterpreter.h:
* yarr/RegexJIT.cpp:
(JSC::Yarr::RegexGenerator::generateParenthesesSingle):
(JSC::Yarr::RegexGenerator::generateParenthesesGreedyNoBacktrack):
(JSC::Yarr::RegexGenerator::generateTerm):
* yarr/RegexPattern.h:
(JSC::Yarr::PatternTerm::PatternTerm):

LayoutTests:

Add layout tests for corner cases of repeat matches in regular expressions,
and of the examples documented in the ECMA-262 spec .

* fast/regex/ecma-regex-examples-expected.txt: Added.
* fast/regex/ecma-regex-examples.html: Added.
* fast/regex/repeat-match-waldemar-expected.txt: Added.
* fast/regex/repeat-match-waldemar.html: Added.
* fast/regex/script-tests/ecma-regex-examples.js: Added.
* fast/regex/script-tests/repeat-match-waldemar.js: Added.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@72781 268f45cc-cd09-0410-ab3c-d52691b4dbfc

14 files changed:
JavaScriptCore/ChangeLog
JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj
JavaScriptCore/yarr/RegexCompiler.cpp
JavaScriptCore/yarr/RegexInterpreter.cpp
JavaScriptCore/yarr/RegexInterpreter.h
JavaScriptCore/yarr/RegexJIT.cpp
JavaScriptCore/yarr/RegexPattern.h
LayoutTests/ChangeLog
LayoutTests/fast/regex/ecma-regex-examples-expected.txt [new file with mode: 0644]
LayoutTests/fast/regex/ecma-regex-examples.html [new file with mode: 0644]
LayoutTests/fast/regex/repeat-match-waldemar-expected.txt [new file with mode: 0644]
LayoutTests/fast/regex/repeat-match-waldemar.html [new file with mode: 0644]
LayoutTests/fast/regex/script-tests/ecma-regex-examples.js [new file with mode: 0755]
LayoutTests/fast/regex/script-tests/repeat-match-waldemar.js [new file with mode: 0755]

index 79fdedf..f5176f2 100644 (file)
@@ -1,3 +1,89 @@
+2010-11-27  Gavin Barraclough  <barraclough@apple.com>
+
+        Reviewed by Sam Weinig.
+
+        Bug 48101 - Yarr gives different results for /(?:a*?){2,}/
+
+        The test cases in the linked mozilla bug demostrate a couple of
+        problems in subpattern matching. These bugs lie in the optimized
+        cases - for matching parentheses with a quantity count of 1, and
+        for matching greedy quantified parentheses at the end of a regex
+        (which do not backtrack).
+
+        In both of these cases we are failing to correctly handle empty
+        matches. In the case of parenthese-single matches (quantity count
+        one) we are failing to test for empty matches at all. In the case
+        of terminal subpattern matches we do currenty check, however there
+        is a subtler bug here too. In the case of an empty match we will
+        presently immediately fall through to the next alternative (or
+        complete the regex match), whereas upon a failed match we should
+        be backtracking into the failing alternative, to give it a chance
+        to match further (e.g. consider /a??b?|a/.exec("ab") - upon first
+        attempting to match the first alternative this will match the empty
+        string - since a?? is non-greedy, however rather than moving on to
+        the second alternative we should be re-matching the first one, at
+        which point the non-greedy a?? will match, and as such the result
+        should be "ab", not "a").
+
+        Terminal sunpattern matching contains a second bug, too. The frame
+        location values in the subpattern should be being allocated with
+        the outer disjunction's frame (as we do for the parentheses-single
+        optimization). Consider the following three regexes:
+            /a*(?:b*)*c*/
+            /a*(?:b*)c*/
+            /a*(?:b*)*/
+        Considering only the frame location required by the atoms a,b, and
+        c, (ignoring space associated with the nested subpattern) the first
+        regex (a normal subpattern match) requires a frame size of 2 for
+        the outer disjunction, (to backtrack terms a & c), with each
+        iteration of the subpattern requiring a frame of size 1 (in order
+        to backtrack b). In the case of the second regex (where the
+        parentheses-single optimization will kick in) the outer frame must
+        be set up with a frame size of 3, since the outer frame will also
+        be used when running the nested subpattern. We will currently only
+        allocate a farme of size 1 for the outer disjuntion (to contain a),
+        howver the frame size should be 2 (since the subpattern will be
+        evaluated in the outer frame). In addition to failing to allocate
+        frame space the frame offsets are also presently invalid - in the
+        case of the last regex b's frame location will be set assuming it
+        to be the first term in the frame, whereas in this case b lies
+        after the term a, and should be taking a separate frame location.
+
+        In order to correctly allocate the frame for terminal subpattern
+        matches we must move this optimization back up from the JIT into
+        the compiler (and thus interpreter too), since this is where the
+        frame allocation takes place.
+
+        * yarr/RegexCompiler.cpp:
+        (JSC::Yarr::RegexPatternConstructor::setupAlternativeOffsets):
+        (JSC::Yarr::RegexPatternConstructor::checkForTerminalParentheses):
+        (JSC::Yarr::compileRegex):
+        * yarr/RegexInterpreter.cpp:
+        (JSC::Yarr::Interpreter::matchParenthesesOnceBegin):
+        (JSC::Yarr::Interpreter::matchParenthesesOnceEnd):
+        (JSC::Yarr::Interpreter::backtrackParenthesesOnceBegin):
+        (JSC::Yarr::Interpreter::backtrackParenthesesOnceEnd):
+        (JSC::Yarr::Interpreter::matchParenthesesTerminalBegin):
+        (JSC::Yarr::Interpreter::matchParenthesesTerminalEnd):
+        (JSC::Yarr::Interpreter::backtrackParenthesesTerminalBegin):
+        (JSC::Yarr::Interpreter::backtrackParenthesesTerminalEnd):
+        (JSC::Yarr::Interpreter::matchDisjunction):
+        (JSC::Yarr::ByteCompiler::atomParenthesesOnceBegin):
+        (JSC::Yarr::ByteCompiler::atomParenthesesTerminalBegin):
+        (JSC::Yarr::ByteCompiler::atomParenthesesSubpatternBegin):
+        (JSC::Yarr::ByteCompiler::atomParentheticalAssertionEnd):
+        (JSC::Yarr::ByteCompiler::atomParenthesesSubpatternEnd):
+        (JSC::Yarr::ByteCompiler::atomParenthesesOnceEnd):
+        (JSC::Yarr::ByteCompiler::atomParenthesesTerminalEnd):
+        (JSC::Yarr::ByteCompiler::emitDisjunction):
+        * yarr/RegexInterpreter.h:
+        * yarr/RegexJIT.cpp:
+        (JSC::Yarr::RegexGenerator::generateParenthesesSingle):
+        (JSC::Yarr::RegexGenerator::generateParenthesesGreedyNoBacktrack):
+        (JSC::Yarr::RegexGenerator::generateTerm):
+        * yarr/RegexPattern.h:
+        (JSC::Yarr::PatternTerm::PatternTerm):
+
 2010-11-24  Patrick Gansterer  <paroga@webkit.org>
 
         Reviewed by Csaba Osztrogon√°c.
index b40d74e..a1e1c5c 100644 (file)
                        isa = PBXProject;
                        buildConfigurationList = 149C277108902AFE008A9EFC /* Build configuration list for PBXProject "JavaScriptCore" */;
                        compatibilityVersion = "Xcode 2.4";
-                       developmentRegion = English;
                        hasScannedForEncodings = 1;
                        knownRegions = (
                                English,
index 2202dbb..06ecbad 100644 (file)
@@ -667,14 +667,17 @@ public:
             case PatternTerm::TypeParenthesesSubpattern:
                 // Note: for fixed once parentheses we will ensure at least the minimum is available; others are on their own.
                 term.frameLocation = currentCallFrameSize;
-                if ((term.quantityCount == 1) && !term.parentheses.isCopy) {
-                    if (term.quantityType == QuantifierFixedCount) {
-                        currentCallFrameSize = setupDisjunctionOffsets(term.parentheses.disjunction, currentCallFrameSize, currentInputPosition);
-                        currentInputPosition += term.parentheses.disjunction->m_minimumSize;
-                    } else {
+                if (term.quantityCount == 1 && !term.parentheses.isCopy) {
+                    if (term.quantityType != QuantifierFixedCount)
                         currentCallFrameSize += RegexStackSpaceForBackTrackInfoParenthesesOnce;
-                        currentCallFrameSize = setupDisjunctionOffsets(term.parentheses.disjunction, currentCallFrameSize, currentInputPosition);
-                    }
+                    currentCallFrameSize = setupDisjunctionOffsets(term.parentheses.disjunction, currentCallFrameSize, currentInputPosition);
+                    // If quantity is fixed, then pre-check its minimum size.
+                    if (term.quantityType == QuantifierFixedCount)
+                        currentInputPosition += term.parentheses.disjunction->m_minimumSize;
+                    term.inputPosition = currentInputPosition;
+                } else if (term.parentheses.isTerminal) {
+                    currentCallFrameSize += RegexStackSpaceForBackTrackInfoParenthesesTerminal;
+                    currentCallFrameSize = setupDisjunctionOffsets(term.parentheses.disjunction, currentCallFrameSize, currentInputPosition);
                     term.inputPosition = currentInputPosition;
                 } else {
                     term.inputPosition = currentInputPosition;
@@ -728,6 +731,33 @@ public:
         setupDisjunctionOffsets(m_pattern.m_body, 0, 0);
     }
 
+    // This optimization identifies sets of parentheses that we will never need to backtrack.
+    // In these cases we do not need to store state from prior iterations.
+    // We can presently avoid backtracking for:
+    //   * a set of parens at the end of the regular expression (last term in any of the alternatives of the main body disjunction).
+    //   * where the parens are non-capturing, and quantified unbounded greedy (*).
+    //   * where the parens do not contain any capturing subpatterns.
+    void checkForTerminalParentheses()
+    {
+        // This check is much too crude; should be just checking whether the candidate
+        // node contains nested capturing subpatterns, not the whole expression!
+        if (m_pattern.m_numSubpatterns)
+            return;
+
+        Vector<PatternAlternative*>& alternatives = m_pattern.m_body->m_alternatives;
+        for (unsigned i =0; i < alternatives.size(); ++i) {
+            Vector<PatternTerm>& terms = alternatives[i]->m_terms;
+            if (terms.size()) {
+                PatternTerm& term = terms.last();
+                if (term.type == PatternTerm::TypeParenthesesSubpattern
+                    && term.quantityType == QuantifierGreedy
+                    && term.quantityCount == UINT_MAX
+                    && !term.capture())
+                    term.parentheses.isTerminal = true;
+            }
+        }
+    }
+
     void optimizeBOL()
     {
         // Look for expressions containing beginning of line (^) anchoring and unroll them.
@@ -930,6 +960,7 @@ const char* compileRegex(const UString& patternString, RegexPattern& pattern)
         ASSERT(numSubpatterns == pattern.m_numSubpatterns);
     }
 
+    constructor.checkForTerminalParentheses();
     constructor.optimizeBOL();
         
     constructor.setupOffsets();
index 6d770a5..164158e 100644 (file)
@@ -60,7 +60,10 @@ public:
         uintptr_t begin;
     };
     struct BackTrackInfoParenthesesOnce {
-        uintptr_t inParentheses;
+        uintptr_t begin;
+    };
+    struct BackTrackInfoParenthesesTerminal {
+        uintptr_t begin;
     };
     struct BackTrackInfoParentheses {
         uintptr_t matchAmount;
@@ -631,11 +634,11 @@ public:
         switch (term.atom.quantityType) {
         case QuantifierGreedy: {
             // set this speculatively; if we get to the parens end this will be true.
-            backTrack->inParentheses = 1;
+            backTrack->begin = input.getPos();
             break;
         }
         case QuantifierNonGreedy: {
-            backTrack->inParentheses = 0;
+            backTrack->begin = notFound;
             context->term += term.atom.parenthesesWidth;
             return true;
         }
@@ -651,7 +654,7 @@ public:
         return true;
     }
 
-    bool matchParenthesesOnceEnd(ByteTerm& term, DisjunctionContext*)
+    bool matchParenthesesOnceEnd(ByteTerm& term, DisjunctionContext* context)
     {
         ASSERT(term.type == ByteTerm::TypeParenthesesSubpatternOnceEnd);
         ASSERT(term.atom.quantityCount == 1);
@@ -660,7 +663,12 @@ public:
             unsigned subpatternId = term.atom.subpatternId;
             output[(subpatternId << 1) + 1] = input.getPos() + term.inputPosition;
         }
-        return true;
+
+        if (term.atom.quantityType == QuantifierFixedCount)
+            return true;
+
+        BackTrackInfoParenthesesOnce* backTrack = reinterpret_cast<BackTrackInfoParenthesesOnce*>(context->frame + term.frameLocation);
+        return backTrack->begin != input.getPos();
     }
 
     bool backtrackParenthesesOnceBegin(ByteTerm& term, DisjunctionContext* context)
@@ -679,12 +687,12 @@ public:
         switch (term.atom.quantityType) {
         case QuantifierGreedy:
             // if we backtrack to this point, there is another chance - try matching nothing.
-            ASSERT(backTrack->inParentheses);
-            backTrack->inParentheses = 0;
+            ASSERT(backTrack->begin != notFound);
+            backTrack->begin = notFound;
             context->term += term.atom.parenthesesWidth;
             return true;
         case QuantifierNonGreedy:
-            ASSERT(backTrack->inParentheses);
+            ASSERT(backTrack->begin != notFound);
         case QuantifierFixedCount:
             break;
         }
@@ -701,15 +709,19 @@ public:
 
         switch (term.atom.quantityType) {
         case QuantifierGreedy:
-            if (!backTrack->inParentheses) {
+            if (backTrack->begin == notFound) {
                 context->term -= term.atom.parenthesesWidth;
                 return false;
             }
         case QuantifierNonGreedy:
-            if (!backTrack->inParentheses) {
-                // now try to match the parens; set this speculatively.
-                backTrack->inParentheses = 1;
+            if (backTrack->begin == notFound) {
+                backTrack->begin = input.getPos();
                 if (term.capture()) {
+                    // Technically this access to inputPosition should be accessing the begin term's
+                    // inputPosition, but for repeats other than fixed these values should be
+                    // the same anyway! (we don't pre-check for greedy or non-greedy matches.)
+                    ASSERT((&term - term.atom.parenthesesWidth)->type == ByteTerm::TypeParenthesesSubpatternOnceBegin);
+                    ASSERT((&term - term.atom.parenthesesWidth)->inputPosition == term.inputPosition);
                     unsigned subpatternId = term.atom.subpatternId;
                     output[subpatternId << 1] = input.getPos() + term.inputPosition;
                 }
@@ -723,6 +735,53 @@ public:
         return false;
     }
 
+    bool matchParenthesesTerminalBegin(ByteTerm& term, DisjunctionContext* context)
+    {
+        ASSERT(term.type == ByteTerm::TypeParenthesesSubpatternTerminalBegin);
+        ASSERT(term.atom.quantityType == QuantifierGreedy);
+        ASSERT(term.atom.quantityCount == UINT_MAX);
+        ASSERT(!term.capture());
+
+        BackTrackInfoParenthesesTerminal* backTrack = reinterpret_cast<BackTrackInfoParenthesesTerminal*>(context->frame + term.frameLocation);
+        backTrack->begin = input.getPos();
+        return true;
+    }
+
+    bool matchParenthesesTerminalEnd(ByteTerm& term, DisjunctionContext* context)
+    {
+        ASSERT(term.type == ByteTerm::TypeParenthesesSubpatternTerminalEnd);
+
+        BackTrackInfoParenthesesTerminal* backTrack = reinterpret_cast<BackTrackInfoParenthesesTerminal*>(context->frame + term.frameLocation);
+        // Empty match is a failed match.
+        if (backTrack->begin == input.getPos())
+            return false;
+
+        // Successful match! Okay, what's next? - loop around and try to match moar!
+        context->term -= (term.atom.parenthesesWidth + 1);
+        return true;
+    }
+
+    bool backtrackParenthesesTerminalBegin(ByteTerm& term, DisjunctionContext* context)
+    {
+        ASSERT(term.type == ByteTerm::TypeParenthesesSubpatternTerminalBegin);
+        ASSERT(term.atom.quantityType == QuantifierGreedy);
+        ASSERT(term.atom.quantityCount == UINT_MAX);
+        ASSERT(!term.capture());
+
+        // If we backtrack to this point, we have failed to match this iteration of the parens.
+        // Since this is greedy / zero minimum a failed is also accepted as a match!
+        context->term += term.atom.parenthesesWidth;
+        return true;
+    }
+
+    bool backtrackParenthesesTerminalEnd(ByteTerm&, DisjunctionContext*)
+    {
+        // 'Terminal' parentheses are at the end of the regex, and as such a match past end
+        // should always be returned as a successful match - we should never becktrack to here.
+        ASSERT_NOT_REACHED();
+        return false;
+    }
+
     bool matchParentheticalAssertionBegin(ByteTerm& term, DisjunctionContext* context)
     {
         ASSERT(term.type == ByteTerm::TypeParentheticalAssertionBegin);
@@ -1171,6 +1230,14 @@ public:
             if (matchParenthesesOnceEnd(currentTerm(), context))
                 MATCH_NEXT();
             BACKTRACK();
+        case ByteTerm::TypeParenthesesSubpatternTerminalBegin:
+            if (matchParenthesesTerminalBegin(currentTerm(), context))
+                MATCH_NEXT();
+            BACKTRACK();
+        case ByteTerm::TypeParenthesesSubpatternTerminalEnd:
+            if (matchParenthesesTerminalEnd(currentTerm(), context))
+                MATCH_NEXT();
+            BACKTRACK();
         case ByteTerm::TypeParentheticalAssertionBegin:
             if (matchParentheticalAssertionBegin(currentTerm(), context))
                 MATCH_NEXT();
@@ -1284,6 +1351,14 @@ public:
                 if (backtrackParenthesesOnceEnd(currentTerm(), context))
                     MATCH_NEXT();
                 BACKTRACK();
+            case ByteTerm::TypeParenthesesSubpatternTerminalBegin:
+                if (backtrackParenthesesTerminalBegin(currentTerm(), context))
+                    MATCH_NEXT();
+                BACKTRACK();
+            case ByteTerm::TypeParenthesesSubpatternTerminalEnd:
+                if (backtrackParenthesesTerminalEnd(currentTerm(), context))
+                    MATCH_NEXT();
+                BACKTRACK();
             case ByteTerm::TypeParentheticalAssertionBegin:
                 if (backtrackParentheticalAssertionBegin(currentTerm(), context))
                     MATCH_NEXT();
@@ -1445,8 +1520,38 @@ public:
         m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].frameLocation = frameLocation;
     }
 
+    void atomParenthesesOnceBegin(unsigned subpatternId, bool capture, int inputPosition, unsigned frameLocation, unsigned alternativeFrameLocation)
+    {
+        int beginTerm = m_bodyDisjunction->terms.size();
+
+        m_bodyDisjunction->terms.append(ByteTerm(ByteTerm::TypeParenthesesSubpatternOnceBegin, subpatternId, capture, inputPosition));
+        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].frameLocation = frameLocation;
+        m_bodyDisjunction->terms.append(ByteTerm::AlternativeBegin());
+        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].frameLocation = alternativeFrameLocation;
+
+        m_parenthesesStack.append(ParenthesesStackEntry(beginTerm, m_currentAlternativeIndex));
+        m_currentAlternativeIndex = beginTerm + 1;
+    }
+
+    void atomParenthesesTerminalBegin(unsigned subpatternId, bool capture, int inputPosition, unsigned frameLocation, unsigned alternativeFrameLocation)
+    {
+        int beginTerm = m_bodyDisjunction->terms.size();
+
+        m_bodyDisjunction->terms.append(ByteTerm(ByteTerm::TypeParenthesesSubpatternTerminalBegin, subpatternId, capture, inputPosition));
+        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].frameLocation = frameLocation;
+        m_bodyDisjunction->terms.append(ByteTerm::AlternativeBegin());
+        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].frameLocation = alternativeFrameLocation;
+
+        m_parenthesesStack.append(ParenthesesStackEntry(beginTerm, m_currentAlternativeIndex));
+        m_currentAlternativeIndex = beginTerm + 1;
+    }
+
     void atomParenthesesSubpatternBegin(unsigned subpatternId, bool capture, int inputPosition, unsigned frameLocation, unsigned alternativeFrameLocation)
     {
+        // Errrk! - this is a little crazy, we initially generate as a TypeParenthesesSubpatternOnceBegin,
+        // then fix this up at the end! - simplifying this should make it much clearer.
+        // https://bugs.webkit.org/show_bug.cgi?id=50136
+
         int beginTerm = m_bodyDisjunction->terms.size();
 
         m_bodyDisjunction->terms.append(ByteTerm(ByteTerm::TypeParenthesesSubpatternOnceBegin, subpatternId, capture, inputPosition));
@@ -1471,6 +1576,28 @@ public:
         m_currentAlternativeIndex = beginTerm + 1;
     }
 
+    void atomParentheticalAssertionEnd(int inputPosition, unsigned frameLocation, unsigned quantityCount, QuantifierType quantityType)
+    {
+        unsigned beginTerm = popParenthesesStack();
+        closeAlternative(beginTerm + 1);
+        unsigned endTerm = m_bodyDisjunction->terms.size();
+
+        ASSERT(m_bodyDisjunction->terms[beginTerm].type == ByteTerm::TypeParentheticalAssertionBegin);
+
+        bool invertOrCapture = m_bodyDisjunction->terms[beginTerm].invertOrCapture;
+        unsigned subpatternId = m_bodyDisjunction->terms[beginTerm].atom.subpatternId;
+
+        m_bodyDisjunction->terms.append(ByteTerm(ByteTerm::TypeParentheticalAssertionEnd, subpatternId, invertOrCapture, inputPosition));
+        m_bodyDisjunction->terms[beginTerm].atom.parenthesesWidth = endTerm - beginTerm;
+        m_bodyDisjunction->terms[endTerm].atom.parenthesesWidth = endTerm - beginTerm;
+        m_bodyDisjunction->terms[endTerm].frameLocation = frameLocation;
+
+        m_bodyDisjunction->terms[beginTerm].atom.quantityCount = quantityCount;
+        m_bodyDisjunction->terms[beginTerm].atom.quantityType = quantityType;
+        m_bodyDisjunction->terms[endTerm].atom.quantityCount = quantityCount;
+        m_bodyDisjunction->terms[endTerm].atom.quantityType = quantityType;
+    }
+
     unsigned popParenthesesStack()
     {
         ASSERT(m_parenthesesStack.size());
@@ -1542,50 +1669,79 @@ public:
         m_bodyDisjunction->terms[endIndex].frameLocation = frameLocation;
     }
 
-    void atomParenthesesEnd(bool doInline, unsigned lastSubpatternId, int inputPosition, unsigned frameLocation, unsigned quantityCount, QuantifierType quantityType, unsigned callFrameSize = 0)
+    void atomParenthesesSubpatternEnd(unsigned lastSubpatternId, int inputPosition, unsigned frameLocation, unsigned quantityCount, QuantifierType quantityType, unsigned callFrameSize = 0)
+    {
+        unsigned beginTerm = popParenthesesStack();
+        closeAlternative(beginTerm + 1);
+        unsigned endTerm = m_bodyDisjunction->terms.size();
+
+        ASSERT(m_bodyDisjunction->terms[beginTerm].type == ByteTerm::TypeParenthesesSubpatternOnceBegin);
+
+        ByteTerm& parenthesesBegin = m_bodyDisjunction->terms[beginTerm];
+
+        bool invertOrCapture = parenthesesBegin.invertOrCapture;
+        unsigned subpatternId = parenthesesBegin.atom.subpatternId;
+
+        unsigned numSubpatterns = lastSubpatternId - subpatternId + 1;
+        ByteDisjunction* parenthesesDisjunction = new ByteDisjunction(numSubpatterns, callFrameSize);
+
+        parenthesesDisjunction->terms.append(ByteTerm::SubpatternBegin());
+        for (unsigned termInParentheses = beginTerm + 1; termInParentheses < endTerm; ++termInParentheses)
+            parenthesesDisjunction->terms.append(m_bodyDisjunction->terms[termInParentheses]);
+        parenthesesDisjunction->terms.append(ByteTerm::SubpatternEnd());
+
+        m_bodyDisjunction->terms.shrink(beginTerm);
+
+        m_allParenthesesInfo.append(parenthesesDisjunction);
+        m_bodyDisjunction->terms.append(ByteTerm(ByteTerm::TypeParenthesesSubpattern, subpatternId, parenthesesDisjunction, invertOrCapture, inputPosition));
+
+        m_bodyDisjunction->terms[beginTerm].atom.quantityCount = quantityCount;
+        m_bodyDisjunction->terms[beginTerm].atom.quantityType = quantityType;
+        m_bodyDisjunction->terms[beginTerm].frameLocation = frameLocation;
+    }
+
+    void atomParenthesesOnceEnd(int inputPosition, unsigned frameLocation, unsigned quantityCount, QuantifierType quantityType)
     {
         unsigned beginTerm = popParenthesesStack();
         closeAlternative(beginTerm + 1);
         unsigned endTerm = m_bodyDisjunction->terms.size();
 
-        bool isAssertion = m_bodyDisjunction->terms[beginTerm].type == ByteTerm::TypeParentheticalAssertionBegin;
+        ASSERT(m_bodyDisjunction->terms[beginTerm].type == ByteTerm::TypeParenthesesSubpatternOnceBegin);
+
         bool invertOrCapture = m_bodyDisjunction->terms[beginTerm].invertOrCapture;
         unsigned subpatternId = m_bodyDisjunction->terms[beginTerm].atom.subpatternId;
 
-        m_bodyDisjunction->terms.append(ByteTerm(isAssertion ? ByteTerm::TypeParentheticalAssertionEnd : ByteTerm::TypeParenthesesSubpatternOnceEnd, subpatternId, invertOrCapture, inputPosition));
+        m_bodyDisjunction->terms.append(ByteTerm(ByteTerm::TypeParenthesesSubpatternOnceEnd, subpatternId, invertOrCapture, inputPosition));
         m_bodyDisjunction->terms[beginTerm].atom.parenthesesWidth = endTerm - beginTerm;
         m_bodyDisjunction->terms[endTerm].atom.parenthesesWidth = endTerm - beginTerm;
         m_bodyDisjunction->terms[endTerm].frameLocation = frameLocation;
 
-        if (doInline) {
-            m_bodyDisjunction->terms[beginTerm].atom.quantityCount = quantityCount;
-            m_bodyDisjunction->terms[beginTerm].atom.quantityType = quantityType;
-            m_bodyDisjunction->terms[endTerm].atom.quantityCount = quantityCount;
-            m_bodyDisjunction->terms[endTerm].atom.quantityType = quantityType;
-        } else {
-            ByteTerm& parenthesesBegin = m_bodyDisjunction->terms[beginTerm];
-            ASSERT(parenthesesBegin.type == ByteTerm::TypeParenthesesSubpatternOnceBegin);
-
-            bool invertOrCapture = parenthesesBegin.invertOrCapture;
-            unsigned subpatternId = parenthesesBegin.atom.subpatternId;
+        m_bodyDisjunction->terms[beginTerm].atom.quantityCount = quantityCount;
+        m_bodyDisjunction->terms[beginTerm].atom.quantityType = quantityType;
+        m_bodyDisjunction->terms[endTerm].atom.quantityCount = quantityCount;
+        m_bodyDisjunction->terms[endTerm].atom.quantityType = quantityType;
+    }
 
-            unsigned numSubpatterns = lastSubpatternId - subpatternId + 1;
-            ByteDisjunction* parenthesesDisjunction = new ByteDisjunction(numSubpatterns, callFrameSize);
+    void atomParenthesesTerminalEnd(int inputPosition, unsigned frameLocation, unsigned quantityCount, QuantifierType quantityType)
+    {
+        unsigned beginTerm = popParenthesesStack();
+        closeAlternative(beginTerm + 1);
+        unsigned endTerm = m_bodyDisjunction->terms.size();
 
-            parenthesesDisjunction->terms.append(ByteTerm::SubpatternBegin());
-            for (unsigned termInParentheses = beginTerm + 1; termInParentheses < endTerm; ++termInParentheses)
-                parenthesesDisjunction->terms.append(m_bodyDisjunction->terms[termInParentheses]);
-            parenthesesDisjunction->terms.append(ByteTerm::SubpatternEnd());
+        ASSERT(m_bodyDisjunction->terms[beginTerm].type == ByteTerm::TypeParenthesesSubpatternTerminalBegin);
 
-            m_bodyDisjunction->terms.shrink(beginTerm);
+        bool invertOrCapture = m_bodyDisjunction->terms[beginTerm].invertOrCapture;
+        unsigned subpatternId = m_bodyDisjunction->terms[beginTerm].atom.subpatternId;
 
-            m_allParenthesesInfo.append(parenthesesDisjunction);
-            m_bodyDisjunction->terms.append(ByteTerm(ByteTerm::TypeParenthesesSubpattern, subpatternId, parenthesesDisjunction, invertOrCapture, inputPosition));
+        m_bodyDisjunction->terms.append(ByteTerm(ByteTerm::TypeParenthesesSubpatternTerminalEnd, subpatternId, invertOrCapture, inputPosition));
+        m_bodyDisjunction->terms[beginTerm].atom.parenthesesWidth = endTerm - beginTerm;
+        m_bodyDisjunction->terms[endTerm].atom.parenthesesWidth = endTerm - beginTerm;
+        m_bodyDisjunction->terms[endTerm].frameLocation = frameLocation;
 
-            m_bodyDisjunction->terms[beginTerm].atom.quantityCount = quantityCount;
-            m_bodyDisjunction->terms[beginTerm].atom.quantityType = quantityType;
-            m_bodyDisjunction->terms[beginTerm].frameLocation = frameLocation;
-        }
+        m_bodyDisjunction->terms[beginTerm].atom.quantityCount = quantityCount;
+        m_bodyDisjunction->terms[beginTerm].atom.quantityType = quantityType;
+        m_bodyDisjunction->terms[endTerm].atom.quantityCount = quantityCount;
+        m_bodyDisjunction->terms[endTerm].atom.quantityType = quantityType;
     }
 
     void regexBegin(unsigned numSubpatterns, unsigned callFrameSize, bool onceThrough)
@@ -1680,24 +1836,27 @@ public:
 
                 case PatternTerm::TypeParenthesesSubpattern: {
                     unsigned disjunctionAlreadyCheckedCount = 0;
-                    if ((term.quantityCount == 1) && !term.parentheses.isCopy) {
-                        if (term.quantityType == QuantifierFixedCount) {
+                    if (term.quantityCount == 1 && !term.parentheses.isCopy) {
+                        unsigned alternativeFrameLocation = term.frameLocation;
+                        // For QuantifierFixedCount we pre-check the minimum size; for greedy/non-greedy we reserve a slot in the frame.
+                        if (term.quantityType == QuantifierFixedCount)
                             disjunctionAlreadyCheckedCount = term.parentheses.disjunction->m_minimumSize;
-                            unsigned delegateEndInputOffset = term.inputPosition - currentCountAlreadyChecked;
-                            atomParenthesesSubpatternBegin(term.parentheses.subpatternId, term.invertOrCapture, delegateEndInputOffset - disjunctionAlreadyCheckedCount, term.frameLocation, term.frameLocation);
-                            emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, term.parentheses.disjunction->m_minimumSize);
-                            atomParenthesesEnd(true, term.parentheses.lastSubpatternId, delegateEndInputOffset, term.frameLocation, term.quantityCount, term.quantityType, term.parentheses.disjunction->m_callFrameSize);
-                        } else {
-                            unsigned delegateEndInputOffset = term.inputPosition - currentCountAlreadyChecked;
-                            atomParenthesesSubpatternBegin(term.parentheses.subpatternId, term.invertOrCapture, delegateEndInputOffset - disjunctionAlreadyCheckedCount, term.frameLocation, term.frameLocation + RegexStackSpaceForBackTrackInfoParenthesesOnce);
-                            emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, 0);
-                            atomParenthesesEnd(true, term.parentheses.lastSubpatternId, delegateEndInputOffset, term.frameLocation, term.quantityCount, term.quantityType, term.parentheses.disjunction->m_callFrameSize);
-                        }
+                        else
+                            alternativeFrameLocation += RegexStackSpaceForBackTrackInfoParenthesesOnce;
+                        unsigned delegateEndInputOffset = term.inputPosition - currentCountAlreadyChecked;
+                        atomParenthesesOnceBegin(term.parentheses.subpatternId, term.invertOrCapture, delegateEndInputOffset - disjunctionAlreadyCheckedCount, term.frameLocation, alternativeFrameLocation);
+                        emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, disjunctionAlreadyCheckedCount);
+                        atomParenthesesOnceEnd(delegateEndInputOffset, term.frameLocation, term.quantityCount, term.quantityType);
+                    } else if (term.parentheses.isTerminal) {
+                        unsigned delegateEndInputOffset = term.inputPosition - currentCountAlreadyChecked;
+                        atomParenthesesTerminalBegin(term.parentheses.subpatternId, term.invertOrCapture, delegateEndInputOffset - disjunctionAlreadyCheckedCount, term.frameLocation, term.frameLocation + RegexStackSpaceForBackTrackInfoParenthesesOnce);
+                        emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, disjunctionAlreadyCheckedCount);
+                        atomParenthesesTerminalEnd(delegateEndInputOffset, term.frameLocation, term.quantityCount, term.quantityType);
                     } else {
                         unsigned delegateEndInputOffset = term.inputPosition - currentCountAlreadyChecked;
                         atomParenthesesSubpatternBegin(term.parentheses.subpatternId, term.invertOrCapture, delegateEndInputOffset - disjunctionAlreadyCheckedCount, term.frameLocation, 0);
                         emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, 0);
-                        atomParenthesesEnd(false, term.parentheses.lastSubpatternId, delegateEndInputOffset, term.frameLocation, term.quantityCount, term.quantityType, term.parentheses.disjunction->m_callFrameSize);
+                        atomParenthesesSubpatternEnd(term.parentheses.lastSubpatternId, delegateEndInputOffset, term.frameLocation, term.quantityCount, term.quantityType, term.parentheses.disjunction->m_callFrameSize);
                     }
                     break;
                 }
@@ -1710,7 +1869,7 @@ public:
 
                     atomParentheticalAssertionBegin(term.parentheses.subpatternId, term.invertOrCapture, term.frameLocation, alternativeFrameLocation);
                     emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, positiveInputOffset, true);
-                    atomParenthesesEnd(true, term.parentheses.lastSubpatternId, 0, term.frameLocation, term.quantityCount, term.quantityType);
+                    atomParentheticalAssertionEnd(0, term.frameLocation, term.quantityCount, term.quantityType);
                     break;
                 }
                 }
index f761ccc..2e23472 100644 (file)
@@ -81,6 +81,8 @@ struct ByteTerm {
         TypeParenthesesSubpattern,
         TypeParenthesesSubpatternOnceBegin,
         TypeParenthesesSubpatternOnceEnd,
+        TypeParenthesesSubpatternTerminalBegin,
+        TypeParenthesesSubpatternTerminalEnd,
         TypeParentheticalAssertionBegin,
         TypeParentheticalAssertionEnd,
         TypeCheckInput,
index c2be056..acbd458 100644 (file)
@@ -910,12 +910,7 @@ class RegexGenerator : private MacroAssembler {
         PatternDisjunction* disjunction = term.parentheses.disjunction;
         ASSERT(term.quantityCount == 1);
 
-        if (term.parentheses.isCopy) {
-            m_shouldFallBack = true;
-            return;
-        }
-
-        unsigned preCheckedCount = ((term.quantityCount == 1) && (term.quantityType == QuantifierFixedCount)) ? disjunction->m_minimumSize : 0;
+        unsigned preCheckedCount = (term.quantityType == QuantifierFixedCount) ? disjunction->m_minimumSize : 0;
 
         unsigned parenthesesFrameLocation = term.frameLocation;
         unsigned alternativeFrameLocation = parenthesesFrameLocation;
@@ -934,12 +929,12 @@ class RegexGenerator : private MacroAssembler {
             Jump nonGreedySkipParentheses;
             Label nonGreedyTryParentheses;
             if (term.quantityType == QuantifierGreedy)
-                storeToFrame(Imm32(1), parenthesesFrameLocation);
+                storeToFrame(index, parenthesesFrameLocation);
             else if (term.quantityType == QuantifierNonGreedy) {
-                storeToFrame(Imm32(0), parenthesesFrameLocation);
+                storeToFrame(Imm32(-1), parenthesesFrameLocation);
                 nonGreedySkipParentheses = jump();
                 nonGreedyTryParentheses = label();
-                storeToFrame(Imm32(1), parenthesesFrameLocation);
+                storeToFrame(index, parenthesesFrameLocation);
             }
 
             // store the match start index
@@ -957,29 +952,21 @@ class RegexGenerator : private MacroAssembler {
             TermGenerationState parenthesesState(disjunction, state.checkedTotal);
             generateParenthesesDisjunction(state.term(), parenthesesState, alternativeFrameLocation);
 
-            // store the match end index
-            if (term.invertOrCapture) {
-                int inputOffset = state.inputOffset();
-                if (inputOffset) {
-                    move(index, indexTemporary);
-                    add32(Imm32(state.inputOffset()), indexTemporary);
-                    store32(indexTemporary, Address(output, ((term.parentheses.subpatternId << 1) + 1) * sizeof(int)));
-                } else
-                    store32(index, Address(output, ((term.parentheses.subpatternId << 1) + 1) * sizeof(int)));
-            }
-            Jump success = jump();
+            Jump success = (term.quantityType == QuantifierFixedCount) ?
+                jump() :
+                branch32(NotEqual, index, Address(stackPointerRegister, (parenthesesFrameLocation * sizeof(void*))));
 
             // A failure AFTER the parens jumps here
             Label backtrackFromAfterParens(this);
 
             if (term.quantityType == QuantifierGreedy) {
-                // If this is zero we have now tested with both with and without the parens.
+                // If this is -1 we have now tested with both with and without the parens.
                 loadFromFrame(parenthesesFrameLocation, indexTemporary);
-                state.jumpToBacktrack(branchTest32(Zero, indexTemporary), this);
+                state.jumpToBacktrack(branch32(Equal, indexTemporary, Imm32(-1)), this);
             } else if (term.quantityType == QuantifierNonGreedy) {
-                // If this is zero we have now tested with both with and without the parens.
+                // If this is -1 we have now tested without the parens, now test with.
                 loadFromFrame(parenthesesFrameLocation, indexTemporary);
-                branchTest32(Zero, indexTemporary).linkTo(nonGreedyTryParentheses, this);
+                branch32(Equal, indexTemporary, Imm32(-1)).linkTo(nonGreedyTryParentheses, this);
             }
 
             parenthesesState.plantJumpToBacktrackIfExists(this);
@@ -989,7 +976,7 @@ class RegexGenerator : private MacroAssembler {
                 store32(Imm32(-1), Address(output, (term.parentheses.subpatternId << 1) * sizeof(int)));
 
             if (term.quantityType == QuantifierGreedy)
-                storeToFrame(Imm32(0), parenthesesFrameLocation);
+                storeToFrame(Imm32(-1), parenthesesFrameLocation);
             else
                 state.jumpToBacktrack(jump(), this);
 
@@ -997,6 +984,17 @@ class RegexGenerator : private MacroAssembler {
             if (term.quantityType == QuantifierNonGreedy)
                 nonGreedySkipParentheses.link(this);
             success.link(this);
+
+            // store the match end index
+            if (term.invertOrCapture) {
+                int inputOffset = state.inputOffset();
+                if (inputOffset) {
+                    move(index, indexTemporary);
+                    add32(Imm32(state.inputOffset()), indexTemporary);
+                    store32(indexTemporary, Address(output, ((term.parentheses.subpatternId << 1) + 1) * sizeof(int)));
+                } else
+                    store32(index, Address(output, ((term.parentheses.subpatternId << 1) + 1) * sizeof(int)));
+            }
         }
     }
 
@@ -1007,25 +1005,6 @@ class RegexGenerator : private MacroAssembler {
         ASSERT(parenthesesTerm.type == PatternTerm::TypeParenthesesSubpattern);
         ASSERT(parenthesesTerm.quantityCount != 1); // Handled by generateParenthesesSingle.
 
-        // Capturing not yet implemented!
-        if (parenthesesTerm.invertOrCapture) {
-            m_shouldFallBack = true;
-            return;
-        }
-
-        // Quantification limit not yet implemented!
-        if (parenthesesTerm.quantityCount != 0xffffffff) {
-            m_shouldFallBack = true;
-            return;
-        }
-
-        // Need to reset nested subpatterns between iterations...
-        // for the minute this crude check rejects all patterns with any subpatterns!
-        if (m_pattern.m_numSubpatterns) {
-            m_shouldFallBack = true;
-            return;
-        }
-
         TermGenerationState parenthesesState(disjunction, state.checkedTotal);
 
         Label matchAgain(this);
@@ -1047,7 +1026,11 @@ class RegexGenerator : private MacroAssembler {
                 generateTerm(parenthesesState);
 
             // If we get here, we matched! If the index advanced then try to match more since limit isn't supported yet.
-            branch32(GreaterThan, index, Address(stackPointerRegister, (parenthesesTerm.frameLocation * sizeof(void*))), matchAgain);
+            branch32(NotEqual, index, Address(stackPointerRegister, (parenthesesTerm.frameLocation * sizeof(void*))), matchAgain);
+
+            // If we get here we matched, but we matched "" - cannot accept this alternative as is, so either backtrack,
+            // or fall through to try the next alternative if no backtrack is available.
+            parenthesesState.plantJumpToBacktrackIfExists(this);
 
             parenthesesState.linkAlternativeBacktracks(this);
             // We get here if the alternative fails to match - fall through to the next iteration, or out of the loop.
@@ -1180,17 +1163,12 @@ class RegexGenerator : private MacroAssembler {
             break;
 
         case PatternTerm::TypeParenthesesSubpattern:
-            if (term.quantityCount == 1) {
+            if (term.quantityCount == 1 && !term.parentheses.isCopy)
                 generateParenthesesSingle(state);
-                break;
-            } else if (state.isLastTerm() && state.isMainDisjunction()) { // Is this is the last term of the main disjunction?
-                // If this has a greedy quantifier, then it will never need to backtrack!
-                if (term.quantityType == QuantifierGreedy) {
-                    generateParenthesesGreedyNoBacktrack(state);
-                    break;
-                }
-            }
-            m_shouldFallBack = true;
+            else if (term.parentheses.isTerminal)
+                generateParenthesesGreedyNoBacktrack(state);
+            else
+                m_shouldFallBack = true;
             break;
 
         case PatternTerm::TypeParentheticalAssertion:
index 37b544f..c76c641 100644 (file)
@@ -38,6 +38,7 @@ namespace JSC { namespace Yarr {
 #define RegexStackSpaceForBackTrackInfoAlternative 1 // One per alternative.
 #define RegexStackSpaceForBackTrackInfoParentheticalAssertion 1
 #define RegexStackSpaceForBackTrackInfoParenthesesOnce 1 // Only for !fixed quantifiers.
+#define RegexStackSpaceForBackTrackInfoParenthesesTerminal 1
 #define RegexStackSpaceForBackTrackInfoParentheses 4
 
 struct PatternDisjunction;
@@ -112,6 +113,7 @@ struct PatternTerm {
             unsigned subpatternId;
             unsigned lastSubpatternId;
             bool isCopy;
+            bool isTerminal;
         } parentheses;
     };
     QuantifierType quantityType;
@@ -143,6 +145,7 @@ struct PatternTerm {
         parentheses.disjunction = disjunction;
         parentheses.subpatternId = subpatternId;
         parentheses.isCopy = false;
+        parentheses.isTerminal = false;
         quantityType = QuantifierFixedCount;
         quantityCount = 1;
     }
index b9c1e5e..a0c1b5a 100644 (file)
@@ -1,3 +1,19 @@
+2010-11-27  Gavin Barraclough  <barraclough@apple.com>
+
+        Reviewed by Sam Weinig.
+
+        Bug 48101 - Yarr gives different results for /(?:a*?){2,}/
+
+        Add layout tests for corner cases of repeat matches in regular expressions,
+        and of the examples documented in the ECMA-262 spec .
+
+        * fast/regex/ecma-regex-examples-expected.txt: Added.
+        * fast/regex/ecma-regex-examples.html: Added.
+        * fast/regex/repeat-match-waldemar-expected.txt: Added.
+        * fast/regex/repeat-match-waldemar.html: Added.
+        * fast/regex/script-tests/ecma-regex-examples.js: Added.
+        * fast/regex/script-tests/repeat-match-waldemar.js: Added.
+
 2010-11-26  Rob Buis  <rwlbuis@gmail.com>
 
         Reviewed by Simon Fraser.
diff --git a/LayoutTests/fast/regex/ecma-regex-examples-expected.txt b/LayoutTests/fast/regex/ecma-regex-examples-expected.txt
new file mode 100644 (file)
index 0000000..f207642
--- /dev/null
@@ -0,0 +1,21 @@
+This page tests the regex examples from the ECMA-262 specification.
+
+On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
+
+
+PASS regex01.exec("abc") is ["a"]
+PASS regex02.exec("abc") is ["abc", "a", "a", undefined, "bc", undefined, "bc"]
+PASS regex03.exec("abcdefghi") is ["abcde"]
+PASS regex04.exec("abcdefghi") is ["abc"]
+PASS regex05.exec("aabaac") is ["aaba", "ba"]
+PASS "aaaaaaaaaa,aaaaaaaaaaaaaaa".replace(regex06,"$1") is "aaaaa"
+PASS regex07.exec("zaacbbbcac") is ["zaacbbbcac", "z", "ac", "a", undefined, "c"]
+PASS regex08.exec("b") is ["", undefined]
+PASS regex09.exec("baaaac") is ["b", ""]
+PASS regex10.exec("baaabac") is ["", "aaa"]
+PASS regex11.exec("baaabac") is ["aba", "a"]
+PASS regex12.exec("baaabaac") is ["baaabaac", "ba", undefined, "abaac"]
+PASS successfullyParsed is true
+
+TEST COMPLETE
+
diff --git a/LayoutTests/fast/regex/ecma-regex-examples.html b/LayoutTests/fast/regex/ecma-regex-examples.html
new file mode 100644 (file)
index 0000000..ad581bd
--- /dev/null
@@ -0,0 +1,13 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+<head>
+<link rel="stylesheet" href="../js/resources/js-test-style.css">
+<script src="../js/resources/js-test-pre.js"></script>
+</head>
+<body>
+<p id="description"></p>
+<div id="console"></div>
+<script src="script-tests/ecma-regex-examples.js"></script>
+<script src="../js/resources/js-test-post.js"></script>
+</body>
+</html>
diff --git a/LayoutTests/fast/regex/repeat-match-waldemar-expected.txt b/LayoutTests/fast/regex/repeat-match-waldemar-expected.txt
new file mode 100644 (file)
index 0000000..ba806fc
--- /dev/null
@@ -0,0 +1,30 @@
+Some test cases identified by Waldemar Horwat in response to this bug: https://bugs.webkit.org/show_bug.cgi?id=48101
+
+On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
+
+
+PASS /(?:a*?){2,}/.exec("aa") is ["aa"]
+PASS /(?:a*?){2,}/.exec("a") is ["a"]
+PASS /(?:a*?){2,}/.exec("") is [""]
+PASS /(?:a*?)/.exec("aa") is [""]
+PASS /(?:a*?)/.exec("a") is [""]
+PASS /(?:a*?)/.exec("") is [""]
+PASS /(?:a*?)(?:a*?)(?:a*?)/.exec("aa") is [""]
+PASS /(?:a*?)(?:a*?)(?:a*?)/.exec("a") is [""]
+PASS /(?:a*?)(?:a*?)(?:a*?)/.exec("") is [""]
+PASS /(?:a*?){2}/.exec("aa") is [""]
+PASS /(?:a*?){2}/.exec("a") is [""]
+PASS /(?:a*?){2}/.exec("") is [""]
+PASS /(?:a*?){2,3}/.exec("aa") is ["a"]
+PASS /(?:a*?){2,3}/.exec("a") is ["a"]
+PASS /(?:a*?){2,3}/.exec("") is [""]
+PASS /(?:a*?)?/.exec("aa") is ["a"]
+PASS /(?:a*?)?/.exec("a") is ["a"]
+PASS /(?:a*?)?/.exec("") is [""]
+PASS /(?:a*?)*/.exec("aa") is ["aa"]
+PASS /(?:a*?)*/.exec("a") is ["a"]
+PASS /(?:a*?)*/.exec("") is [""]
+PASS successfullyParsed is true
+
+TEST COMPLETE
+
diff --git a/LayoutTests/fast/regex/repeat-match-waldemar.html b/LayoutTests/fast/regex/repeat-match-waldemar.html
new file mode 100644 (file)
index 0000000..ba47ab2
--- /dev/null
@@ -0,0 +1,13 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+<head>
+<link rel="stylesheet" href="../js/resources/js-test-style.css">
+<script src="../js/resources/js-test-pre.js"></script>
+</head>
+<body>
+<p id="description"></p>
+<div id="console"></div>
+<script src="script-tests/repeat-match-waldemar.js"></script>
+<script src="../js/resources/js-test-post.js"></script>
+</body>
+</html>
diff --git a/LayoutTests/fast/regex/script-tests/ecma-regex-examples.js b/LayoutTests/fast/regex/script-tests/ecma-regex-examples.js
new file mode 100755 (executable)
index 0000000..2c8e913
--- /dev/null
@@ -0,0 +1,41 @@
+description(
+"This page tests the regex examples from the ECMA-262 specification."
+);
+
+var regex01 = /a|ab/;
+shouldBe('regex01.exec("abc")', '["a"]');
+
+var regex02 = /((a)|(ab))((c)|(bc))/;
+shouldBe('regex02.exec("abc")', '["abc", "a", "a", undefined, "bc", undefined, "bc"]');
+
+var regex03 = /a[a-z]{2,4}/;
+shouldBe('regex03.exec("abcdefghi")', '["abcde"]');
+
+var regex04 = /a[a-z]{2,4}?/;
+shouldBe('regex04.exec("abcdefghi")', '["abc"]');
+
+var regex05 = /(aa|aabaac|ba|b|c)*/;
+shouldBe('regex05.exec("aabaac")', '["aaba", "ba"]');
+
+var regex06 = /^(a+)\1*,\1+$/;
+shouldBe('"aaaaaaaaaa,aaaaaaaaaaaaaaa".replace(regex06,"$1")', '"aaaaa"');
+
+var regex07 = /(z)((a+)?(b+)?(c))*/;
+shouldBe('regex07.exec("zaacbbbcac")', '["zaacbbbcac", "z", "ac", "a", undefined, "c"]');
+
+var regex08 = /(a*)*/;
+shouldBe('regex08.exec("b")', '["", undefined]');
+
+var regex09 = /(a*)b\1+/;
+shouldBe('regex09.exec("baaaac")', '["b", ""]');
+
+var regex10 = /(?=(a+))/;
+shouldBe('regex10.exec("baaabac")', '["", "aaa"]');
+
+var regex11 = /(?=(a+))a*b\1/;
+shouldBe('regex11.exec("baaabac")', '["aba", "a"]');
+
+var regex12 = /(.*?)a(?!(a+)b\2c)\2(.*)/;
+shouldBe('regex12.exec("baaabaac")', '["baaabaac", "ba", undefined, "abaac"]');
+
+var successfullyParsed = true;
diff --git a/LayoutTests/fast/regex/script-tests/repeat-match-waldemar.js b/LayoutTests/fast/regex/script-tests/repeat-match-waldemar.js
new file mode 100755 (executable)
index 0000000..b8b5467
--- /dev/null
@@ -0,0 +1,33 @@
+description(
+"Some test cases identified by Waldemar Horwat in response to this bug: https://bugs.webkit.org/show_bug.cgi?id=48101"
+);
+
+shouldBe('/(?:a*?){2,}/.exec("aa")', '["aa"]');
+shouldBe('/(?:a*?){2,}/.exec("a")', '["a"]');
+shouldBe('/(?:a*?){2,}/.exec("")', '[""]');
+
+shouldBe('/(?:a*?)/.exec("aa")', '[""]');
+shouldBe('/(?:a*?)/.exec("a")', '[""]');
+shouldBe('/(?:a*?)/.exec("")', '[""]');
+
+shouldBe('/(?:a*?)(?:a*?)(?:a*?)/.exec("aa")', '[""]');
+shouldBe('/(?:a*?)(?:a*?)(?:a*?)/.exec("a")', '[""]');
+shouldBe('/(?:a*?)(?:a*?)(?:a*?)/.exec("")', '[""]');
+
+shouldBe('/(?:a*?){2}/.exec("aa")', '[""]');
+shouldBe('/(?:a*?){2}/.exec("a")', '[""]');
+shouldBe('/(?:a*?){2}/.exec("")', '[""]');
+
+shouldBe('/(?:a*?){2,3}/.exec("aa")', '["a"]');
+shouldBe('/(?:a*?){2,3}/.exec("a")', '["a"]');
+shouldBe('/(?:a*?){2,3}/.exec("")', '[""]');
+
+shouldBe('/(?:a*?)?/.exec("aa")', '["a"]');
+shouldBe('/(?:a*?)?/.exec("a")', '["a"]');
+shouldBe('/(?:a*?)?/.exec("")', '[""]');
+
+shouldBe('/(?:a*?)*/.exec("aa")', '["aa"]');
+shouldBe('/(?:a*?)*/.exec("a")', '["a"]');
+shouldBe('/(?:a*?)*/.exec("")', '[""]');
+
+var successfullyParsed = true;