Yarr bytecode compilation failure should be gracefully handled
authorysuzuki@apple.com <ysuzuki@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 13 Jun 2019 18:47:22 +0000 (18:47 +0000)
committerysuzuki@apple.com <ysuzuki@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 13 Jun 2019 18:47:22 +0000 (18:47 +0000)
https://bugs.webkit.org/show_bug.cgi?id=198700

Reviewed by Michael Saboff.

JSTests:

* stress/regexp-bytecode-compilation-fail.js: Added.
(shouldThrow):

Source/JavaScriptCore:

Currently, we assume that Yarr bytecode compilation does not fail. But in fact it can fail.
We should gracefully handle this failure as a runtime error, as we did for parse errors in [1].
We also harden Yarr's consumed character calculation by using Checked.

[1]: https://bugs.webkit.org/show_bug.cgi?id=185755

* inspector/ContentSearchUtilities.cpp:
(Inspector::ContentSearchUtilities::findMagicComment):
* runtime/RegExp.cpp:
(JSC::RegExp::byteCodeCompileIfNecessary):
(JSC::RegExp::compile):
(JSC::RegExp::compileMatchOnly):
* runtime/RegExpInlines.h:
(JSC::RegExp::matchInline):
* yarr/YarrErrorCode.cpp:
(JSC::Yarr::errorMessage):
(JSC::Yarr::errorToThrow):
* yarr/YarrErrorCode.h:
* yarr/YarrInterpreter.cpp:
(JSC::Yarr::ByteCompiler::ByteCompiler):
(JSC::Yarr::ByteCompiler::compile):
(JSC::Yarr::ByteCompiler::atomCharacterClass):
(JSC::Yarr::ByteCompiler::atomBackReference):
(JSC::Yarr::ByteCompiler::atomParenthesesOnceBegin):
(JSC::Yarr::ByteCompiler::atomParenthesesTerminalBegin):
(JSC::Yarr::ByteCompiler::atomParenthesesSubpatternBegin):
(JSC::Yarr::ByteCompiler::atomParentheticalAssertionBegin):
(JSC::Yarr::ByteCompiler::popParenthesesStack):
(JSC::Yarr::ByteCompiler::closeAlternative):
(JSC::Yarr::ByteCompiler::closeBodyAlternative):
(JSC::Yarr::ByteCompiler::alternativeBodyDisjunction):
(JSC::Yarr::ByteCompiler::alternativeDisjunction):
(JSC::Yarr::ByteCompiler::emitDisjunction):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@246408 268f45cc-cd09-0410-ab3c-d52691b4dbfc

JSTests/ChangeLog
JSTests/stress/regexp-bytecode-compilation-fail.js [new file with mode: 0644]
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/inspector/ContentSearchUtilities.cpp
Source/JavaScriptCore/runtime/RegExp.cpp
Source/JavaScriptCore/runtime/RegExpInlines.h
Source/JavaScriptCore/yarr/RegularExpression.cpp
Source/JavaScriptCore/yarr/YarrInterpreter.cpp
Source/JavaScriptCore/yarr/YarrInterpreter.h

index 14b2539..c64efc7 100644 (file)
@@ -1,3 +1,13 @@
+2019-06-13  Yusuke Suzuki  <ysuzuki@apple.com>
+
+        Yarr bytecode compilation failure should be gracefully handled
+        https://bugs.webkit.org/show_bug.cgi?id=198700
+
+        Reviewed by Michael Saboff.
+
+        * stress/regexp-bytecode-compilation-fail.js: Added.
+        (shouldThrow):
+
 2019-06-12  Yusuke Suzuki  <ysuzuki@apple.com>
 
         [JSC] Polymorphic call stub's slow path should restore callee saves before performing tail call
diff --git a/JSTests/stress/regexp-bytecode-compilation-fail.js b/JSTests/stress/regexp-bytecode-compilation-fail.js
new file mode 100644 (file)
index 0000000..c517f25
--- /dev/null
@@ -0,0 +1,23 @@
+function shouldThrow(func, errorMessage) {
+    var errorThrown = false;
+    var error = null;
+    try {
+        func();
+    } catch (e) {
+        errorThrown = true;
+        error = e;
+    }
+    if (!errorThrown)
+        throw new Error('not thrown');
+    if (String(error) !== errorMessage)
+        throw new Error(`bad error: ${String(error)}`);
+}
+
+function v2() {
+    const v8 = RegExp("(1{2147480000,}2{3648,})|(ab)");
+    "(1{2147480000,}2{3648,})|(ab)";
+    const v4 = RegExp(v2);
+    const v5 = v4.test();
+    return v5;
+}
+shouldThrow(v2, `SyntaxError: Invalid regular expression: pattern exceeds string length limits`);
index cf1d33d..65ee267 100644 (file)
@@ -1,3 +1,44 @@
+2019-06-13  Yusuke Suzuki  <ysuzuki@apple.com>
+
+        Yarr bytecode compilation failure should be gracefully handled
+        https://bugs.webkit.org/show_bug.cgi?id=198700
+
+        Reviewed by Michael Saboff.
+
+        Currently, we assume that Yarr bytecode compilation does not fail. But in fact it can fail.
+        We should gracefully handle this failure as a runtime error, as we did for parse errors in [1].
+        We also harden Yarr's consumed character calculation by using Checked.
+
+        [1]: https://bugs.webkit.org/show_bug.cgi?id=185755
+
+        * inspector/ContentSearchUtilities.cpp:
+        (Inspector::ContentSearchUtilities::findMagicComment):
+        * runtime/RegExp.cpp:
+        (JSC::RegExp::byteCodeCompileIfNecessary):
+        (JSC::RegExp::compile):
+        (JSC::RegExp::compileMatchOnly):
+        * runtime/RegExpInlines.h:
+        (JSC::RegExp::matchInline):
+        * yarr/YarrErrorCode.cpp:
+        (JSC::Yarr::errorMessage):
+        (JSC::Yarr::errorToThrow):
+        * yarr/YarrErrorCode.h:
+        * yarr/YarrInterpreter.cpp:
+        (JSC::Yarr::ByteCompiler::ByteCompiler):
+        (JSC::Yarr::ByteCompiler::compile):
+        (JSC::Yarr::ByteCompiler::atomCharacterClass):
+        (JSC::Yarr::ByteCompiler::atomBackReference):
+        (JSC::Yarr::ByteCompiler::atomParenthesesOnceBegin):
+        (JSC::Yarr::ByteCompiler::atomParenthesesTerminalBegin):
+        (JSC::Yarr::ByteCompiler::atomParenthesesSubpatternBegin):
+        (JSC::Yarr::ByteCompiler::atomParentheticalAssertionBegin):
+        (JSC::Yarr::ByteCompiler::popParenthesesStack):
+        (JSC::Yarr::ByteCompiler::closeAlternative):
+        (JSC::Yarr::ByteCompiler::closeBodyAlternative):
+        (JSC::Yarr::ByteCompiler::alternativeBodyDisjunction):
+        (JSC::Yarr::ByteCompiler::alternativeDisjunction):
+        (JSC::Yarr::ByteCompiler::emitDisjunction):
+
 2019-06-12  Yusuke Suzuki  <ysuzuki@apple.com>
 
         [JSC] Polymorphic call stub's slow path should restore callee saves before performing tail call
index c8de078..f791406 100644 (file)
@@ -171,8 +171,9 @@ static String findMagicComment(const String& content, const String& patternStrin
     YarrPattern pattern(patternString, JSC::Yarr::Flags::Multiline, error);
     ASSERT(!hasError(error));
     BumpPointerAllocator regexAllocator;
-    auto bytecodePattern = byteCompile(pattern, &regexAllocator);
-    ASSERT(bytecodePattern);
+    JSC::Yarr::ErrorCode ignoredErrorCode = JSC::Yarr::ErrorCode::NoError;
+    auto bytecodePattern = byteCompile(pattern, &regexAllocator, ignoredErrorCode);
+    RELEASE_ASSERT(bytecodePattern);
 
     ASSERT(pattern.m_numSubpatterns == 1);
     std::array<unsigned, 4> matches;
index f313d4e..071b1f6 100644 (file)
@@ -217,9 +217,9 @@ RegExp* RegExp::create(VM& vm, const String& patternString, OptionSet<Yarr::Flag
 }
 
 
-static std::unique_ptr<Yarr::BytecodePattern> byteCodeCompilePattern(VM* vm, Yarr::YarrPattern& pattern)
+static std::unique_ptr<Yarr::BytecodePattern> byteCodeCompilePattern(VM* vm, Yarr::YarrPattern& pattern, Yarr::ErrorCode& errorCode)
 {
-    return Yarr::byteCompile(pattern, &vm->m_regExpAllocator, &vm->m_regExpAllocatorLock);
+    return Yarr::byteCompile(pattern, &vm->m_regExpAllocator, errorCode, &vm->m_regExpAllocatorLock);
 }
 
 void RegExp::byteCodeCompileIfNecessary(VM* vm)
@@ -237,7 +237,11 @@ void RegExp::byteCodeCompileIfNecessary(VM* vm)
     }
     ASSERT(m_numSubpatterns == pattern.m_numSubpatterns);
 
-    m_regExpBytecode = byteCodeCompilePattern(vm, pattern);
+    m_regExpBytecode = byteCodeCompilePattern(vm, pattern, m_constructionErrorCode);
+    if (!m_regExpBytecode) {
+        m_state = ParseError;
+        return;
+    }
 }
 
 void RegExp::compile(VM* vm, Yarr::YarrCharSize charSize)
@@ -278,7 +282,11 @@ void RegExp::compile(VM* vm, Yarr::YarrCharSize charSize)
         dataLog("Can't JIT this regular expression: \"", m_patternString, "\"\n");
 
     m_state = ByteCode;
-    m_regExpBytecode = byteCodeCompilePattern(vm, pattern);
+    m_regExpBytecode = byteCodeCompilePattern(vm, pattern, m_constructionErrorCode);
+    if (!m_regExpBytecode) {
+        m_state = ParseError;
+        return;
+    }
 }
 
 int RegExp::match(VM& vm, const String& s, unsigned startOffset, Vector<int>& ovector)
@@ -336,7 +344,11 @@ void RegExp::compileMatchOnly(VM* vm, Yarr::YarrCharSize charSize)
         dataLog("Can't JIT this regular expression: \"", m_patternString, "\"\n");
 
     m_state = ByteCode;
-    m_regExpBytecode = byteCodeCompilePattern(vm, pattern);
+    m_regExpBytecode = byteCodeCompilePattern(vm, pattern, m_constructionErrorCode);
+    if (!m_regExpBytecode) {
+        m_state = ParseError;
+        return;
+    }
 }
 
 MatchResult RegExp::match(VM& vm, const String& s, unsigned startOffset)
index 53a1438..6969b46 100644 (file)
@@ -139,14 +139,17 @@ ALWAYS_INLINE int RegExp::matchInline(VM& vm, const String& s, unsigned startOff
 
     compileIfNecessary(vm, s.is8Bit() ? Yarr::Char8 : Yarr::Char16);
 
-    if (m_state == ParseError) {
+    auto throwError = [&] {
         auto throwScope = DECLARE_THROW_SCOPE(vm);
         ExecState* exec = vm.topCallFrame;
         throwScope.throwException(exec, errorToThrow(exec));
         if (!hasHardError(m_constructionErrorCode))
             reset();
         return -1;
-    }
+    };
+
+    if (m_state == ParseError)
+        return throwError();
 
     int offsetVectorSize = (m_numSubpatterns + 1) * 2;
     ovector.resize(offsetVectorSize);
@@ -168,12 +171,16 @@ ALWAYS_INLINE int RegExp::matchInline(VM& vm, const String& s, unsigned startOff
         if (result == Yarr::JSRegExpJITCodeFailure) {
             // JIT'ed code couldn't handle expression, so punt back to the interpreter.
             byteCodeCompileIfNecessary(&vm);
+            if (m_state == ParseError)
+                return throwError();
             result = Yarr::interpret(m_regExpBytecode.get(), s, startOffset, reinterpret_cast<unsigned*>(offsetVector));
         }
 
 #if ENABLE(YARR_JIT_DEBUG)
         if (m_state == JITCode) {
             byteCodeCompileIfNecessary(&vm);
+            if (m_state == ParseError)
+                return throwError();
             matchCompareWithInterpreter(s, startOffset, offsetVector, result);
         }
 #endif
@@ -260,14 +267,17 @@ ALWAYS_INLINE MatchResult RegExp::matchInline(VM& vm, const String& s, unsigned
 
     compileIfNecessaryMatchOnly(vm, s.is8Bit() ? Yarr::Char8 : Yarr::Char16);
 
-    if (m_state == ParseError) {
+    auto throwError = [&] {
         auto throwScope = DECLARE_THROW_SCOPE(vm);
         ExecState* exec = vm.topCallFrame;
         throwScope.throwException(exec, errorToThrow(exec));
         if (!hasHardError(m_constructionErrorCode))
             reset();
         return MatchResult::failed();
-    }
+    };
+
+    if (m_state == ParseError)
+        return throwError();
 
 #if ENABLE(YARR_JIT)
     MatchResult result;
@@ -291,6 +301,8 @@ ALWAYS_INLINE MatchResult RegExp::matchInline(VM& vm, const String& s, unsigned
 
         // JIT'ed code couldn't handle expression, so punt back to the interpreter.
         byteCodeCompileIfNecessary(&vm);
+        if (m_state == ParseError)
+            return throwError();
     }
 #endif
 
index 4394ae5..ee4f58b 100644 (file)
@@ -70,7 +70,7 @@ private:
 
         m_numSubpatterns = pattern.m_numSubpatterns;
 
-        return JSC::Yarr::byteCompile(pattern, &m_regexAllocator);
+        return JSC::Yarr::byteCompile(pattern, &m_regexAllocator, m_constructionErrorCode);
     }
 
     JSC::Yarr::ErrorCode m_constructionErrorCode { Yarr::ErrorCode::NoError };
index 70f069a..7f033fe 100644 (file)
@@ -1690,13 +1690,15 @@ public:
     ByteCompiler(YarrPattern& pattern)
         : m_pattern(pattern)
     {
-        m_currentAlternativeIndex = 0;
     }
 
-    std::unique_ptr<BytecodePattern> compile(BumpPointerAllocator* allocator, ConcurrentJSLock* lock)
+    std::unique_ptr<BytecodePattern> compile(BumpPointerAllocator* allocator, ConcurrentJSLock* lock, ErrorCode& errorCode)
     {
         regexBegin(m_pattern.m_numSubpatterns, m_pattern.m_body->m_callFrameSize, m_pattern.m_body->m_alternatives[0]->onceThrough());
-        emitDisjunction(m_pattern.m_body);
+        if (auto error = emitDisjunction(m_pattern.m_body, 0, 0)) {
+            errorCode = error.value();
+            return nullptr;
+        }
         regexEnd();
 
 #ifndef NDEBUG
@@ -1751,9 +1753,9 @@ public:
     {
         m_bodyDisjunction->terms.append(ByteTerm(characterClass, invert, inputPosition));
 
-        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].atom.quantityMaxCount = quantityMaxCount.unsafeGet();
-        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].atom.quantityType = quantityType;
-        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].frameLocation = frameLocation;
+        m_bodyDisjunction->terms.last().atom.quantityMaxCount = quantityMaxCount.unsafeGet();
+        m_bodyDisjunction->terms.last().atom.quantityType = quantityType;
+        m_bodyDisjunction->terms.last().frameLocation = frameLocation;
     }
 
     void atomBackReference(unsigned subpatternId, unsigned inputPosition, unsigned frameLocation, Checked<unsigned> quantityMaxCount, QuantifierType quantityType)
@@ -1762,19 +1764,19 @@ public:
 
         m_bodyDisjunction->terms.append(ByteTerm::BackReference(subpatternId, inputPosition));
 
-        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].atom.quantityMaxCount = quantityMaxCount.unsafeGet();
-        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].atom.quantityType = quantityType;
-        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].frameLocation = frameLocation;
+        m_bodyDisjunction->terms.last().atom.quantityMaxCount = quantityMaxCount.unsafeGet();
+        m_bodyDisjunction->terms.last().atom.quantityType = quantityType;
+        m_bodyDisjunction->terms.last().frameLocation = frameLocation;
     }
 
     void atomParenthesesOnceBegin(unsigned subpatternId, bool capture, unsigned inputPosition, unsigned frameLocation, unsigned alternativeFrameLocation)
     {
-        int beginTerm = m_bodyDisjunction->terms.size();
+        unsigned beginTerm = m_bodyDisjunction->terms.size();
 
         m_bodyDisjunction->terms.append(ByteTerm(ByteTerm::TypeParenthesesSubpatternOnceBegin, subpatternId, capture, false, inputPosition));
-        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].frameLocation = frameLocation;
+        m_bodyDisjunction->terms.last().frameLocation = frameLocation;
         m_bodyDisjunction->terms.append(ByteTerm::AlternativeBegin());
-        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].frameLocation = alternativeFrameLocation;
+        m_bodyDisjunction->terms.last().frameLocation = alternativeFrameLocation;
 
         m_parenthesesStack.append(ParenthesesStackEntry(beginTerm, m_currentAlternativeIndex));
         m_currentAlternativeIndex = beginTerm + 1;
@@ -1782,12 +1784,12 @@ public:
 
     void atomParenthesesTerminalBegin(unsigned subpatternId, bool capture, unsigned inputPosition, unsigned frameLocation, unsigned alternativeFrameLocation)
     {
-        int beginTerm = m_bodyDisjunction->terms.size();
+        unsigned beginTerm = m_bodyDisjunction->terms.size();
 
         m_bodyDisjunction->terms.append(ByteTerm(ByteTerm::TypeParenthesesSubpatternTerminalBegin, subpatternId, capture, false, inputPosition));
-        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].frameLocation = frameLocation;
+        m_bodyDisjunction->terms.last().frameLocation = frameLocation;
         m_bodyDisjunction->terms.append(ByteTerm::AlternativeBegin());
-        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].frameLocation = alternativeFrameLocation;
+        m_bodyDisjunction->terms.last().frameLocation = alternativeFrameLocation;
 
         m_parenthesesStack.append(ParenthesesStackEntry(beginTerm, m_currentAlternativeIndex));
         m_currentAlternativeIndex = beginTerm + 1;
@@ -1799,12 +1801,12 @@ public:
         // then fix this up at the end! - simplifying this should make it much clearer.
         // https://bugs.webkit.org/show_bug.cgi?id=50136
 
-        int beginTerm = m_bodyDisjunction->terms.size();
+        unsigned beginTerm = m_bodyDisjunction->terms.size();
 
         m_bodyDisjunction->terms.append(ByteTerm(ByteTerm::TypeParenthesesSubpatternOnceBegin, subpatternId, capture, false, inputPosition));
-        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].frameLocation = frameLocation;
+        m_bodyDisjunction->terms.last().frameLocation = frameLocation;
         m_bodyDisjunction->terms.append(ByteTerm::AlternativeBegin());
-        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].frameLocation = alternativeFrameLocation;
+        m_bodyDisjunction->terms.last().frameLocation = alternativeFrameLocation;
 
         m_parenthesesStack.append(ParenthesesStackEntry(beginTerm, m_currentAlternativeIndex));
         m_currentAlternativeIndex = beginTerm + 1;
@@ -1812,12 +1814,12 @@ public:
 
     void atomParentheticalAssertionBegin(unsigned subpatternId, bool invert, unsigned frameLocation, unsigned alternativeFrameLocation)
     {
-        int beginTerm = m_bodyDisjunction->terms.size();
+        unsigned beginTerm = m_bodyDisjunction->terms.size();
 
         m_bodyDisjunction->terms.append(ByteTerm(ByteTerm::TypeParentheticalAssertionBegin, subpatternId, false, invert, 0));
-        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].frameLocation = frameLocation;
+        m_bodyDisjunction->terms.last().frameLocation = frameLocation;
         m_bodyDisjunction->terms.append(ByteTerm::AlternativeBegin());
-        m_bodyDisjunction->terms[m_bodyDisjunction->terms.size() - 1].frameLocation = alternativeFrameLocation;
+        m_bodyDisjunction->terms.last().frameLocation = alternativeFrameLocation;
 
         m_parenthesesStack.append(ParenthesesStackEntry(beginTerm, m_currentAlternativeIndex));
         m_currentAlternativeIndex = beginTerm + 1;
@@ -1853,10 +1855,9 @@ public:
     unsigned popParenthesesStack()
     {
         ASSERT(m_parenthesesStack.size());
-        int stackEnd = m_parenthesesStack.size() - 1;
-        unsigned beginTerm = m_parenthesesStack[stackEnd].beginTerm;
-        m_currentAlternativeIndex = m_parenthesesStack[stackEnd].savedAlternativeIndex;
-        m_parenthesesStack.shrink(stackEnd);
+        unsigned beginTerm = m_parenthesesStack.last().beginTerm;
+        m_currentAlternativeIndex = m_parenthesesStack.last().savedAlternativeIndex;
+        m_parenthesesStack.removeLast();
 
         ASSERT(beginTerm < m_bodyDisjunction->terms.size());
         ASSERT(m_currentAlternativeIndex < m_bodyDisjunction->terms.size());
@@ -1864,11 +1865,11 @@ public:
         return beginTerm;
     }
 
-    void closeAlternative(int beginTerm)
+    void closeAlternative(unsigned beginTerm)
     {
-        int origBeginTerm = beginTerm;
+        unsigned origBeginTerm = beginTerm;
         ASSERT(m_bodyDisjunction->terms[beginTerm].type == ByteTerm::TypeAlternativeBegin);
-        int endIndex = m_bodyDisjunction->terms.size();
+        unsigned endIndex = m_bodyDisjunction->terms.size();
 
         unsigned frameLocation = m_bodyDisjunction->terms[beginTerm].frameLocation;
 
@@ -1891,10 +1892,10 @@ public:
 
     void closeBodyAlternative()
     {
-        int beginTerm = 0;
-        int origBeginTerm = 0;
+        unsigned beginTerm = 0;
+        unsigned origBeginTerm = 0;
         ASSERT(m_bodyDisjunction->terms[beginTerm].type == ByteTerm::TypeBodyAlternativeBegin);
-        int endIndex = m_bodyDisjunction->terms.size();
+        unsigned endIndex = m_bodyDisjunction->terms.size();
 
         unsigned frameLocation = m_bodyDisjunction->terms[beginTerm].frameLocation;
 
@@ -2009,7 +2010,7 @@ public:
 
     void alternativeBodyDisjunction(bool onceThrough)
     {
-        int newAlternativeIndex = m_bodyDisjunction->terms.size();
+        unsigned newAlternativeIndex = m_bodyDisjunction->terms.size();
         m_bodyDisjunction->terms[m_currentAlternativeIndex].alternative.next = newAlternativeIndex - m_currentAlternativeIndex;
         m_bodyDisjunction->terms.append(ByteTerm::BodyAlternativeDisjunction(onceThrough));
 
@@ -2018,17 +2019,17 @@ public:
 
     void alternativeDisjunction()
     {
-        int newAlternativeIndex = m_bodyDisjunction->terms.size();
+        unsigned newAlternativeIndex = m_bodyDisjunction->terms.size();
         m_bodyDisjunction->terms[m_currentAlternativeIndex].alternative.next = newAlternativeIndex - m_currentAlternativeIndex;
         m_bodyDisjunction->terms.append(ByteTerm::AlternativeDisjunction());
 
         m_currentAlternativeIndex = newAlternativeIndex;
     }
 
-    void emitDisjunction(PatternDisjunction* disjunction, unsigned inputCountAlreadyChecked = 0, unsigned parenthesesInputCountAlreadyChecked = 0)
+    Optional<ErrorCode> WARN_UNUSED_RETURN emitDisjunction(PatternDisjunction* disjunction, Checked<unsigned, RecordOverflow> inputCountAlreadyChecked, unsigned parenthesesInputCountAlreadyChecked)
     {
         for (unsigned alt = 0; alt < disjunction->m_alternatives.size(); ++alt) {
-            unsigned currentCountAlreadyChecked = inputCountAlreadyChecked;
+            auto currentCountAlreadyChecked = inputCountAlreadyChecked;
 
             PatternAlternative* alternative = disjunction->m_alternatives[alt].get();
 
@@ -2046,33 +2047,35 @@ public:
             if (countToCheck) {
                 checkInput(countToCheck);
                 currentCountAlreadyChecked += countToCheck;
+                if (currentCountAlreadyChecked.hasOverflowed())
+                    return ErrorCode::OffsetTooLarge;
             }
 
             for (auto& term : alternative->m_terms) {
                 switch (term.type) {
                 case PatternTerm::TypeAssertionBOL:
-                    assertionBOL(currentCountAlreadyChecked - term.inputPosition);
+                    assertionBOL((currentCountAlreadyChecked - term.inputPosition).unsafeGet());
                     break;
 
                 case PatternTerm::TypeAssertionEOL:
-                    assertionEOL(currentCountAlreadyChecked - term.inputPosition);
+                    assertionEOL((currentCountAlreadyChecked - term.inputPosition).unsafeGet());
                     break;
 
                 case PatternTerm::TypeAssertionWordBoundary:
-                    assertionWordBoundary(term.invert(), currentCountAlreadyChecked - term.inputPosition);
+                    assertionWordBoundary(term.invert(), (currentCountAlreadyChecked - term.inputPosition).unsafeGet());
                     break;
 
                 case PatternTerm::TypePatternCharacter:
-                    atomPatternCharacter(term.patternCharacter, currentCountAlreadyChecked - term.inputPosition, term.frameLocation, term.quantityMaxCount, term.quantityType);
+                    atomPatternCharacter(term.patternCharacter, (currentCountAlreadyChecked - term.inputPosition).unsafeGet(), term.frameLocation, term.quantityMaxCount, term.quantityType);
                     break;
 
                 case PatternTerm::TypeCharacterClass:
-                    atomCharacterClass(term.characterClass, term.invert(), currentCountAlreadyChecked- term.inputPosition, term.frameLocation, term.quantityMaxCount, term.quantityType);
+                    atomCharacterClass(term.characterClass, term.invert(), (currentCountAlreadyChecked - term.inputPosition).unsafeGet(), term.frameLocation, term.quantityMaxCount, term.quantityType);
                     break;
 
                 case PatternTerm::TypeBackReference:
-                    atomBackReference(term.backReferenceSubpatternId, currentCountAlreadyChecked - term.inputPosition, term.frameLocation, term.quantityMaxCount, term.quantityType);
-                        break;
+                    atomBackReference(term.backReferenceSubpatternId, (currentCountAlreadyChecked - term.inputPosition).unsafeGet(), term.frameLocation, term.quantityMaxCount, term.quantityType);
+                    break;
 
                 case PatternTerm::TypeForwardReference:
                     break;
@@ -2086,22 +2089,22 @@ public:
                             disjunctionAlreadyCheckedCount = term.parentheses.disjunction->m_minimumSize;
                         else
                             alternativeFrameLocation += YarrStackSpaceForBackTrackInfoParenthesesOnce;
-                        ASSERT(currentCountAlreadyChecked >= term.inputPosition);
-                        unsigned delegateEndInputOffset = currentCountAlreadyChecked - term.inputPosition;
+                        unsigned delegateEndInputOffset = (currentCountAlreadyChecked - term.inputPosition).unsafeGet();
                         atomParenthesesOnceBegin(term.parentheses.subpatternId, term.capture(), disjunctionAlreadyCheckedCount + delegateEndInputOffset, term.frameLocation, alternativeFrameLocation);
-                        emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, disjunctionAlreadyCheckedCount);
+                        if (auto error = emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, disjunctionAlreadyCheckedCount))
+                            return error;
                         atomParenthesesOnceEnd(delegateEndInputOffset, term.frameLocation, term.quantityMinCount, term.quantityMaxCount, term.quantityType);
                     } else if (term.parentheses.isTerminal) {
-                        ASSERT(currentCountAlreadyChecked >= term.inputPosition);
-                        unsigned delegateEndInputOffset = currentCountAlreadyChecked - term.inputPosition;
+                        unsigned delegateEndInputOffset = (currentCountAlreadyChecked - term.inputPosition).unsafeGet();
                         atomParenthesesTerminalBegin(term.parentheses.subpatternId, term.capture(), disjunctionAlreadyCheckedCount + delegateEndInputOffset, term.frameLocation, term.frameLocation + YarrStackSpaceForBackTrackInfoParenthesesTerminal);
-                        emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, disjunctionAlreadyCheckedCount);
+                        if (auto error = emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, disjunctionAlreadyCheckedCount))
+                            return error;
                         atomParenthesesTerminalEnd(delegateEndInputOffset, term.frameLocation, term.quantityMinCount, term.quantityMaxCount, term.quantityType);
                     } else {
-                        ASSERT(currentCountAlreadyChecked >= term.inputPosition);
-                        unsigned delegateEndInputOffset = currentCountAlreadyChecked - term.inputPosition;
+                        unsigned delegateEndInputOffset = (currentCountAlreadyChecked - term.inputPosition).unsafeGet();
                         atomParenthesesSubpatternBegin(term.parentheses.subpatternId, term.capture(), disjunctionAlreadyCheckedCount + delegateEndInputOffset, term.frameLocation, 0);
-                        emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, 0);
+                        if (auto error = emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, 0))
+                            return error;
                         atomParenthesesSubpatternEnd(term.parentheses.lastSubpatternId, delegateEndInputOffset, term.frameLocation, term.quantityMinCount, term.quantityMaxCount, term.quantityType, term.parentheses.disjunction->m_callFrameSize);
                     }
                     break;
@@ -2109,22 +2112,25 @@ public:
 
                 case PatternTerm::TypeParentheticalAssertion: {
                     unsigned alternativeFrameLocation = term.frameLocation + YarrStackSpaceForBackTrackInfoParentheticalAssertion;
-
-                    ASSERT(currentCountAlreadyChecked >= term.inputPosition);
-                    unsigned positiveInputOffset = currentCountAlreadyChecked - term.inputPosition;
+                    unsigned positiveInputOffset = (currentCountAlreadyChecked - term.inputPosition).unsafeGet();
                     unsigned uncheckAmount = 0;
                     if (positiveInputOffset > term.parentheses.disjunction->m_minimumSize) {
                         uncheckAmount = positiveInputOffset - term.parentheses.disjunction->m_minimumSize;
                         uncheckInput(uncheckAmount);
                         currentCountAlreadyChecked -= uncheckAmount;
+                        if (currentCountAlreadyChecked.hasOverflowed())
+                            return ErrorCode::OffsetTooLarge;
                     }
 
                     atomParentheticalAssertionBegin(term.parentheses.subpatternId, term.invert(), term.frameLocation, alternativeFrameLocation);
-                    emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, positiveInputOffset - uncheckAmount);
+                    if (auto error = emitDisjunction(term.parentheses.disjunction, currentCountAlreadyChecked, positiveInputOffset - uncheckAmount))
+                        return error;
                     atomParentheticalAssertionEnd(0, term.frameLocation, term.quantityMaxCount, term.quantityType);
                     if (uncheckAmount) {
                         checkInput(uncheckAmount);
                         currentCountAlreadyChecked += uncheckAmount;
+                        if (currentCountAlreadyChecked.hasOverflowed())
+                            return ErrorCode::OffsetTooLarge;
                     }
                     break;
                 }
@@ -2135,6 +2141,7 @@ public:
                 }
             }
         }
+        return WTF::nullopt;
     }
 #ifndef NDEBUG
     void dumpDisjunction(ByteDisjunction* disjunction, unsigned nesting = 0)
@@ -2400,14 +2407,14 @@ public:
 private:
     YarrPattern& m_pattern;
     std::unique_ptr<ByteDisjunction> m_bodyDisjunction;
-    unsigned m_currentAlternativeIndex;
+    unsigned m_currentAlternativeIndex { 0 };
     Vector<ParenthesesStackEntry> m_parenthesesStack;
     Vector<std::unique_ptr<ByteDisjunction>> m_allParenthesesInfo;
 };
 
-std::unique_ptr<BytecodePattern> byteCompile(YarrPattern& pattern, BumpPointerAllocator* allocator, ConcurrentJSLock* lock)
+std::unique_ptr<BytecodePattern> byteCompile(YarrPattern& pattern, BumpPointerAllocator* allocator, ErrorCode& errorCode, ConcurrentJSLock* lock)
 {
-    return ByteCompiler(pattern).compile(allocator, lock);
+    return ByteCompiler(pattern).compile(allocator, lock, errorCode);
 }
 
 unsigned interpret(BytecodePattern* bytecode, const String& input, unsigned start, unsigned* output)
index 6ec290e..03b83bd 100644 (file)
@@ -26,6 +26,7 @@
 #pragma once
 
 #include "ConcurrentJSLock.h"
+#include "YarrErrorCode.h"
 #include "YarrFlags.h"
 #include "YarrPattern.h"
 
@@ -389,7 +390,7 @@ private:
     Vector<std::unique_ptr<CharacterClass>> m_userCharacterClasses;
 };
 
-JS_EXPORT_PRIVATE std::unique_ptr<BytecodePattern> byteCompile(YarrPattern&, BumpPointerAllocator*, ConcurrentJSLock* = nullptr);
+JS_EXPORT_PRIVATE std::unique_ptr<BytecodePattern> byteCompile(YarrPattern&, BumpPointerAllocator*, ErrorCode&, ConcurrentJSLock* = nullptr);
 JS_EXPORT_PRIVATE unsigned interpret(BytecodePattern*, const String& input, unsigned start, unsigned* output);
 unsigned interpret(BytecodePattern*, const LChar* input, unsigned length, unsigned start, unsigned* output);
 unsigned interpret(BytecodePattern*, const UChar* input, unsigned length, unsigned start, unsigned* output);