[ContentExtensions] Prepare for compiling stylesheets of selectors to be used on...
authorcommit-queue@webkit.org <commit-queue@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 19 Mar 2015 04:57:20 +0000 (04:57 +0000)
committercommit-queue@webkit.org <commit-queue@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 19 Mar 2015 04:57:20 +0000 (04:57 +0000)
https://bugs.webkit.org/show_bug.cgi?id=142799

Patch by Alex Christensen <achristensen@webkit.org> on 2015-03-18
Reviewed by Brady Eidson.

Source/WebCore:

* WebCore.xcodeproj/project.pbxproj:
Make private headers to use with API tests.
* contentextensions/CompiledContentExtension.cpp:
(WebCore::ContentExtensions::CompiledContentExtension::globalDisplayNoneSelectors):
* contentextensions/CompiledContentExtension.h:
Added method to get only the selectors from the root of the DFA, which apply to all URLs.
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
Added checking if the trigger will match everything.
These actions can be put directly on the root of the DFA without adding extra epsilon transitions to the NFA.
* contentextensions/DFA.h:
(WebCore::ContentExtensions::DFA::nodeAt):
* contentextensions/DFABytecodeInterpreter.cpp:
(WebCore::ContentExtensions::DFABytecodeInterpreter::actionsFromDFARoot):
(WebCore::ContentExtensions::DFABytecodeInterpreter::interpret):
* contentextensions/DFABytecodeInterpreter.h:
* contentextensions/NFA.h:
* contentextensions/URLFilterParser.cpp:
(WebCore::ContentExtensions::Term::quantifier):
Sink terms to a vector then add nodes to NFA when finalizing after checking for regexes that match everything.
(WebCore::ContentExtensions::GraphBuilder::GraphBuilder):
(WebCore::ContentExtensions::GraphBuilder::finalize):
(WebCore::ContentExtensions::GraphBuilder::parseStatus):
(WebCore::ContentExtensions::GraphBuilder::atomPatternCharacter):
(WebCore::ContentExtensions::GraphBuilder::atomBuiltInCharacterClass):
(WebCore::ContentExtensions::GraphBuilder::quantifyAtom):
(WebCore::ContentExtensions::GraphBuilder::atomBackReference):
(WebCore::ContentExtensions::GraphBuilder::assertionBOL):
(WebCore::ContentExtensions::GraphBuilder::assertionWordBoundary):
(WebCore::ContentExtensions::GraphBuilder::atomCharacterClassAtom):
(WebCore::ContentExtensions::GraphBuilder::atomCharacterClassRange):
(WebCore::ContentExtensions::GraphBuilder::atomCharacterClassBuiltIn):
(WebCore::ContentExtensions::GraphBuilder::atomParentheticalAssertionBegin):
(WebCore::ContentExtensions::GraphBuilder::disjunction):
(WebCore::ContentExtensions::GraphBuilder::hasError):
(WebCore::ContentExtensions::GraphBuilder::fail):
(WebCore::ContentExtensions::GraphBuilder::sinkFloatingTermIfNecessary):
(WebCore::ContentExtensions::URLFilterParser::addPattern):
(WebCore::ContentExtensions::URLFilterParser::statusString):
(WebCore::ContentExtensions::GraphBuilder::errorMessage): Deleted.
* contentextensions/URLFilterParser.h:
Use an enum instead of strings for the status to avoid checking strings when we have a regex that matches everything.

Tools:

* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
(TestWebKitAPI::testPattern):
(TestWebKitAPI::TEST_F):
Start testing regex failures.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@181726 268f45cc-cd09-0410-ab3c-d52691b4dbfc

13 files changed:
Source/WebCore/ChangeLog
Source/WebCore/WebCore.xcodeproj/project.pbxproj
Source/WebCore/contentextensions/CompiledContentExtension.cpp
Source/WebCore/contentextensions/CompiledContentExtension.h
Source/WebCore/contentextensions/ContentExtensionCompiler.cpp
Source/WebCore/contentextensions/DFA.h
Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp
Source/WebCore/contentextensions/DFABytecodeInterpreter.h
Source/WebCore/contentextensions/NFA.h
Source/WebCore/contentextensions/URLFilterParser.cpp
Source/WebCore/contentextensions/URLFilterParser.h
Tools/ChangeLog
Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp

index 55b7019..2c38807 100644 (file)
@@ -1,3 +1,53 @@
+2015-03-18  Alex Christensen  <achristensen@webkit.org>
+
+        [ContentExtensions] Prepare for compiling stylesheets of selectors to be used on every page.
+        https://bugs.webkit.org/show_bug.cgi?id=142799
+
+        Reviewed by Brady Eidson.
+
+        * WebCore.xcodeproj/project.pbxproj:
+        Make private headers to use with API tests.
+        * contentextensions/CompiledContentExtension.cpp:
+        (WebCore::ContentExtensions::CompiledContentExtension::globalDisplayNoneSelectors):
+        * contentextensions/CompiledContentExtension.h:
+        Added method to get only the selectors from the root of the DFA, which apply to all URLs.
+        * contentextensions/ContentExtensionCompiler.cpp:
+        (WebCore::ContentExtensions::compileRuleList):
+        Added checking if the trigger will match everything.
+        These actions can be put directly on the root of the DFA without adding extra epsilon transitions to the NFA.
+        * contentextensions/DFA.h:
+        (WebCore::ContentExtensions::DFA::nodeAt):
+        * contentextensions/DFABytecodeInterpreter.cpp:
+        (WebCore::ContentExtensions::DFABytecodeInterpreter::actionsFromDFARoot):
+        (WebCore::ContentExtensions::DFABytecodeInterpreter::interpret):
+        * contentextensions/DFABytecodeInterpreter.h:
+        * contentextensions/NFA.h:
+        * contentextensions/URLFilterParser.cpp:
+        (WebCore::ContentExtensions::Term::quantifier):
+        Sink terms to a vector then add nodes to NFA when finalizing after checking for regexes that match everything.
+        (WebCore::ContentExtensions::GraphBuilder::GraphBuilder):
+        (WebCore::ContentExtensions::GraphBuilder::finalize):
+        (WebCore::ContentExtensions::GraphBuilder::parseStatus):
+        (WebCore::ContentExtensions::GraphBuilder::atomPatternCharacter):
+        (WebCore::ContentExtensions::GraphBuilder::atomBuiltInCharacterClass):
+        (WebCore::ContentExtensions::GraphBuilder::quantifyAtom):
+        (WebCore::ContentExtensions::GraphBuilder::atomBackReference):
+        (WebCore::ContentExtensions::GraphBuilder::assertionBOL):
+        (WebCore::ContentExtensions::GraphBuilder::assertionWordBoundary):
+        (WebCore::ContentExtensions::GraphBuilder::atomCharacterClassAtom):
+        (WebCore::ContentExtensions::GraphBuilder::atomCharacterClassRange):
+        (WebCore::ContentExtensions::GraphBuilder::atomCharacterClassBuiltIn):
+        (WebCore::ContentExtensions::GraphBuilder::atomParentheticalAssertionBegin):
+        (WebCore::ContentExtensions::GraphBuilder::disjunction):
+        (WebCore::ContentExtensions::GraphBuilder::hasError):
+        (WebCore::ContentExtensions::GraphBuilder::fail):
+        (WebCore::ContentExtensions::GraphBuilder::sinkFloatingTermIfNecessary):
+        (WebCore::ContentExtensions::URLFilterParser::addPattern):
+        (WebCore::ContentExtensions::URLFilterParser::statusString):
+        (WebCore::ContentExtensions::GraphBuilder::errorMessage): Deleted.
+        * contentextensions/URLFilterParser.h:
+        Use an enum instead of strings for the status to avoid checking strings when we have a regex that matches everything.
+
 2015-03-18  Yusuke Suzuki  <utatane.tea@gmail.com>
 
         Fix build failure due to FALLTHROUGH in unreachable code
index 5ee9793..87acddb 100644 (file)
                24F54EAD101FE914000AE741 /* ApplicationCacheHost.h in Headers */ = {isa = PBXBuildFile; fileRef = 24F54EAB101FE914000AE741 /* ApplicationCacheHost.h */; settings = {ATTRIBUTES = (Private, ); }; };
                2542F4DA1166C25A00E89A86 /* UserGestureIndicator.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 2542F4D81166C25A00E89A86 /* UserGestureIndicator.cpp */; };
                2542F4DB1166C25A00E89A86 /* UserGestureIndicator.h in Headers */ = {isa = PBXBuildFile; fileRef = 2542F4D91166C25A00E89A86 /* UserGestureIndicator.h */; settings = {ATTRIBUTES = (Private, ); }; };
-               262391361A648CEE007251A3 /* ContentExtensionsDebugging.h in Headers */ = {isa = PBXBuildFile; fileRef = 262391351A648CEE007251A3 /* ContentExtensionsDebugging.h */; };
+               262391361A648CEE007251A3 /* ContentExtensionsDebugging.h in Headers */ = {isa = PBXBuildFile; fileRef = 262391351A648CEE007251A3 /* ContentExtensionsDebugging.h */; settings = {ATTRIBUTES = (Private, ); }; };
                26255F0018878DFF0006E1FD /* UserAgentIOS.mm in Sources */ = {isa = PBXBuildFile; fileRef = 26255EFF18878DFF0006E1FD /* UserAgentIOS.mm */; };
                26255F0318878E110006E1FD /* UserAgent.h in Headers */ = {isa = PBXBuildFile; fileRef = 26255F0118878E110006E1FD /* UserAgent.h */; settings = {ATTRIBUTES = (Private, ); }; };
                26255F0418878E110006E1FD /* UserAgentMac.mm in Sources */ = {isa = PBXBuildFile; fileRef = 26255F0218878E110006E1FD /* UserAgentMac.mm */; };
                267726001A5B3AD9003C24DD /* NFAToDFA.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 267725FA1A5B3AD9003C24DD /* NFAToDFA.cpp */; };
                267726011A5B3AD9003C24DD /* NFAToDFA.h in Headers */ = {isa = PBXBuildFile; fileRef = 267725FB1A5B3AD9003C24DD /* NFAToDFA.h */; };
                267726041A5DF6F2003C24DD /* URLFilterParser.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 267726021A5DF6F2003C24DD /* URLFilterParser.cpp */; };
-               267726051A5DF6F2003C24DD /* URLFilterParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 267726031A5DF6F2003C24DD /* URLFilterParser.h */; };
+               267726051A5DF6F2003C24DD /* URLFilterParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 267726031A5DF6F2003C24DD /* URLFilterParser.h */; settings = {ATTRIBUTES = (Private, ); }; };
                269239961505E1AA009E57FC /* JSIDBVersionChangeEvent.h in Headers */ = {isa = PBXBuildFile; fileRef = 269239921505E1AA009E57FC /* JSIDBVersionChangeEvent.h */; };
-               269397221A4A412F00E8349D /* NFANode.h in Headers */ = {isa = PBXBuildFile; fileRef = 269397201A4A412F00E8349D /* NFANode.h */; };
-               269397241A4A5B6400E8349D /* NFA.h in Headers */ = {isa = PBXBuildFile; fileRef = 269397231A4A5B6400E8349D /* NFA.h */; };
+               269397221A4A412F00E8349D /* NFANode.h in Headers */ = {isa = PBXBuildFile; fileRef = 269397201A4A412F00E8349D /* NFANode.h */; settings = {ATTRIBUTES = (Private, ); }; };
+               269397241A4A5B6400E8349D /* NFA.h in Headers */ = {isa = PBXBuildFile; fileRef = 269397231A4A5B6400E8349D /* NFA.h */; settings = {ATTRIBUTES = (Private, ); }; };
                269397261A4A5FBD00E8349D /* NFA.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 269397251A4A5FBD00E8349D /* NFA.cpp */; };
                26AA0F9E18D2A18B00419381 /* SelectorPseudoElementTypeMap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 26AA0F9D18D2A18B00419381 /* SelectorPseudoElementTypeMap.cpp */; };
                26B9998F1803AE7200D01121 /* RegisterAllocator.h in Headers */ = {isa = PBXBuildFile; fileRef = 26B9998E1803AE7200D01121 /* RegisterAllocator.h */; };
index 59aa4b9..be8a066 100644 (file)
@@ -25,6 +25,7 @@
 
 #include "config.h"
 #include "CompiledContentExtension.h"
+#include "DFABytecodeInterpreter.h"
 
 #if ENABLE(CONTENT_EXTENSIONS)
 
@@ -35,6 +36,24 @@ CompiledContentExtension::~CompiledContentExtension()
 {
 }
 
+Vector<String> CompiledContentExtension::globalDisplayNoneSelectors()
+{
+    DFABytecodeInterpreter interpreter(bytecode(), bytecodeLength());
+    DFABytecodeInterpreter::Actions actionLocations = interpreter.actionsFromDFARoot();
+    
+    Vector<Action> globalActions;
+    for (uint64_t actionLocation : actionLocations)
+        globalActions.append(Action::deserialize(actions(), actionsLength(), static_cast<unsigned>(actionLocation)));
+    
+    Vector<String> selectors;
+    for (Action& action : globalActions) {
+        if (action.cssSelector().length())
+            selectors.append(action.cssSelector());
+    }
+    
+    return selectors;
+}
+    
 } // namespace ContentExtensions
 } // namespace WebCore
 
index 9c09852..98c32d9 100644 (file)
@@ -44,6 +44,7 @@ public:
     virtual unsigned bytecodeLength() const = 0;
     virtual const SerializedActionByte* actions() const = 0;
     virtual unsigned actionsLength() const = 0;
+    Vector<String> globalDisplayNoneSelectors();
 };
 
 } // namespace ContentExtensions
index 510b364..6b5da27 100644 (file)
@@ -108,19 +108,29 @@ CompiledContentExtensionData compileRuleList(const String& ruleList)
 
     Vector<SerializedActionByte> actions;
     Vector<unsigned> actionLocations = serializeActions(parsedRuleList, actions);
+    Vector<uint64_t> universalActionLocations;
 
     NFA nfa;
     URLFilterParser urlFilterParser(nfa);
+    bool nonUniversalActionSeen = false;
     for (unsigned ruleIndex = 0; ruleIndex < parsedRuleList.size(); ++ruleIndex) {
         const ContentExtensionRule& contentExtensionRule = parsedRuleList[ruleIndex];
         const Trigger& trigger = contentExtensionRule.trigger();
         ASSERT(trigger.urlFilter.length());
 
         // High bits are used for flags. This should match how they are used in DFABytecodeCompiler::compileNode.
-        String error = urlFilterParser.addPattern(trigger.urlFilter, trigger.urlFilterIsCaseSensitive, (static_cast<uint64_t>(trigger.flags) << 32) | static_cast<uint64_t>(actionLocations[ruleIndex]));
-
-        if (!error.isNull()) {
-            dataLogF("Error while parsing %s: %s\n", trigger.urlFilter.utf8().data(), error.utf8().data());
+        uint64_t actionLocationAndFlags =(static_cast<uint64_t>(trigger.flags) << 32) | static_cast<uint64_t>(actionLocations[ruleIndex]);
+        URLFilterParser::ParseStatus status = urlFilterParser.addPattern(trigger.urlFilter, trigger.urlFilterIsCaseSensitive, actionLocationAndFlags);
+
+        if (status == URLFilterParser::MatchesEverything) {
+            if (nonUniversalActionSeen)
+                dataLogF("Trigger matching everything found not at beginning.  This may cause incorrect behavior with ignore-previous-rules");
+            universalActionLocations.append(actionLocationAndFlags);
+        } else
+            nonUniversalActionSeen = true;
+        
+        if (status != URLFilterParser::Ok && status != URLFilterParser::MatchesEverything) {
+            dataLogF("Error while parsing %s: %s\n", trigger.urlFilter.utf8().data(), URLFilterParser::statusString(status).utf8().data());
             continue;
         }
     }
@@ -138,7 +148,9 @@ CompiledContentExtensionData compileRuleList(const String& ruleList)
     double dfaBuildTimeStart = monotonicallyIncreasingTime();
 #endif
 
-    const DFA dfa = NFAToDFA::convert(nfa);
+    DFA dfa = NFAToDFA::convert(nfa);
+    for (uint64_t actionLocation : universalActionLocations)
+        dfa.nodeAt(dfa.root()).actions.append(actionLocation);
 
 #if CONTENT_EXTENSIONS_PERFORMANCE_REPORTING
     double dfaBuildTimeEnd = monotonicallyIncreasingTime();
index e8e2f0b..92a2708 100644 (file)
@@ -48,6 +48,7 @@ public:
     unsigned root() const { return m_root; }
     unsigned size() const { return m_nodes.size(); }
     const DFANode& nodeAt(unsigned i) const { return m_nodes[i]; }
+    DFANode& nodeAt(unsigned i) { return m_nodes[i]; }
 
 #if CONTENT_EXTENSIONS_STATE_MACHINE_DEBUGGING
     void debugPrintDot() const;
index b45c71f..a94e8ed 100644 (file)
@@ -40,7 +40,18 @@ static inline IntType getBits(const DFABytecode* bytecode, unsigned bytecodeLeng
     ASSERT_UNUSED(bytecodeLength, index + sizeof(IntType) <= bytecodeLength);
     return *reinterpret_cast<const IntType*>(&bytecode[index]);
 }
-
+    
+DFABytecodeInterpreter::Actions DFABytecodeInterpreter::actionsFromDFARoot()
+{
+    unsigned programCounter = 0;
+    DFABytecodeInterpreter::Actions globalActionLocations;
+    while (static_cast<DFABytecodeInstruction>(m_bytecode[programCounter]) == DFABytecodeInstruction::AppendAction) {
+        globalActionLocations.add(static_cast<uint64_t>(getBits<unsigned>(m_bytecode, m_bytecodeLength, programCounter + sizeof(DFABytecode))));
+        programCounter += instructionSizeWithArguments(DFABytecodeInstruction::AppendAction);
+    }
+    return globalActionLocations;
+}
+    
 DFABytecodeInterpreter::Actions DFABytecodeInterpreter::interpret(const CString& urlCString, uint16_t flags)
 {
     const char* url = urlCString.data();
@@ -51,6 +62,8 @@ DFABytecodeInterpreter::Actions DFABytecodeInterpreter::interpret(const CString&
     bool urlIndexIsAfterEndOfString = false;
     Actions actions;
     
+    // FIXME: Skip the actions from the root once they are used through actionsFromDFARoot. Change AppendAction to AppendActions to make this faster.
+    
     // This should always terminate if interpreting correctly compiled bytecode.
     while (true) {
         ASSERT(programCounter <= m_bytecodeLength);
index e8d247a..3a4d927 100644 (file)
@@ -48,6 +48,7 @@ public:
     typedef HashSet<uint64_t, DefaultHash<uint64_t>::Hash, WTF::UnsignedWithZeroKeyHashTraits<uint64_t>> Actions;
     
     Actions interpret(const CString&, uint16_t flags);
+    Actions actionsFromDFARoot();
 
 private:
     const DFABytecode* m_bytecode;
index 6d95c20..2d38225 100644 (file)
@@ -42,7 +42,7 @@ class NFAToDFA;
 // The nodes are accessed through an identifier.
 class NFA {
 public:
-    NFA();
+    WEBCORE_EXPORT NFA();
     unsigned root() const { return m_root; }
     unsigned createNode();
 
index cfdefb3..26fec83 100644 (file)
@@ -32,6 +32,7 @@
 #include <JavaScriptCore/YarrParser.h>
 #include <wtf/BitVector.h>
 #include <wtf/Deque.h>
+#include <wtf/text/CString.h>
 
 namespace WebCore {
 
@@ -174,6 +175,7 @@ public:
         ASSERT_WITH_MESSAGE(m_quantifier == AtomQuantifier::One, "Transition to quantified term should only happen once.");
         m_quantifier = quantifier;
     }
+    AtomQuantifier quantifier() const { return m_quantifier; }
 
     unsigned generateGraph(NFA& nfa, uint64_t patternId, unsigned start) const
     {
@@ -428,6 +430,7 @@ public:
         , m_subtreeStart(nfa.root())
         , m_subtreeEnd(nfa.root())
         , m_lastPrefixTreeEntry(&prefixTreeRoot)
+        , m_parseStatus(URLFilterParser::Ok)
     {
     }
 
@@ -438,20 +441,56 @@ public:
 
         sinkFloatingTermIfNecessary();
 
+        // Check to see if there are any terms without ? or *.
+        bool matchesEverything = true;
+        for (const auto& term : m_sunkTerms) {
+            if (term.quantifier() == AtomQuantifier::One || term.quantifier() == AtomQuantifier::OneOrMore) {
+                matchesEverything = false;
+                break;
+            }
+        }
+        if (matchesEverything)
+            fail(URLFilterParser::MatchesEverything);
+
+        for (const auto& term : m_sunkTerms) {
+            ASSERT(m_lastPrefixTreeEntry);
+            auto nextEntry = m_lastPrefixTreeEntry->nextPattern.find(term);
+            if (nextEntry != m_lastPrefixTreeEntry->nextPattern.end()) {
+                m_lastPrefixTreeEntry = nextEntry->value.get();
+                m_nfa.addRuleId(m_lastPrefixTreeEntry->nfaNode, m_patternId);
+            } else {
+                std::unique_ptr<PrefixTreeEntry> nextPrefixTreeEntry = std::make_unique<PrefixTreeEntry>();
+                
+                unsigned newEnd = term.generateGraph(m_nfa, m_patternId, m_lastPrefixTreeEntry->nfaNode);
+                nextPrefixTreeEntry->nfaNode = newEnd;
+                
+                auto addResult = m_lastPrefixTreeEntry->nextPattern.set(term, WTF::move(nextPrefixTreeEntry));
+                ASSERT(addResult.isNewEntry);
+                
+                if (!m_newPrefixSubtreeRoot) {
+                    m_newPrefixSubtreeRoot = m_lastPrefixTreeEntry;
+                    m_newPrefixStaringPoint = term;
+                }
+                
+                m_lastPrefixTreeEntry = addResult.iterator->value.get();
+            }
+            m_subtreeEnd = m_lastPrefixTreeEntry->nfaNode;
+        }
+        
         if (!m_openGroups.isEmpty()) {
-            fail(ASCIILiteral("The expression has unclosed groups."));
+            fail(URLFilterParser::UnclosedGroups);
             return;
         }
 
         if (m_subtreeStart != m_subtreeEnd)
             m_nfa.setFinal(m_subtreeEnd, m_patternId);
         else
-            fail(ASCIILiteral("The pattern cannot match anything."));
+            fail(URLFilterParser::CannotMatchAnything);
     }
 
-    const String& errorMessage() const
+    URLFilterParser::ParseStatus parseStatus() const
     {
-        return m_errorMessage;
+        return m_parseStatus;
     }
 
     void atomPatternCharacter(UChar character)
@@ -460,7 +499,7 @@ public:
             return;
 
         if (!isASCII(character)) {
-            fail(ASCIILiteral("Only ASCII characters are supported in pattern."));
+            fail(URLFilterParser::NonASCII);
             return;
         }
 
@@ -482,7 +521,7 @@ public:
         if (builtInCharacterClassID == JSC::Yarr::NewlineClassID && inverted)
             m_floatingTerm = Term(Term::UniversalTransition);
         else
-            fail(ASCIILiteral("Character class is not supported."));
+            fail(URLFilterParser::UnsupportedCharacterClass);
     }
 
     void quantifyAtom(unsigned minimum, unsigned maximum, bool)
@@ -491,7 +530,7 @@ public:
             return;
 
         if (!m_floatingTerm.isValid())
-            fail(ASCIILiteral("Quantifier without corresponding term to quantify."));
+            fail(URLFilterParser::MisplacedQuantifier);
 
         if (!minimum && maximum == 1)
             m_floatingTerm.quantify(AtomQuantifier::ZeroOrOne);
@@ -500,12 +539,12 @@ public:
         else if (minimum == 1 && maximum == JSC::Yarr::quantifyInfinite)
             m_floatingTerm.quantify(AtomQuantifier::OneOrMore);
         else
-            fail(ASCIILiteral("Arbitrary atom repetitions are not supported."));
+            fail(URLFilterParser::InvalidQuantifier);
     }
 
     void atomBackReference(unsigned)
     {
-        fail(ASCIILiteral("Patterns cannot contain backreferences."));
+        fail(URLFilterParser::BackReference);
     }
 
     void assertionBOL()
@@ -514,7 +553,7 @@ public:
             return;
 
         if (m_subtreeStart != m_subtreeEnd || m_floatingTerm.isValid() || !m_openGroups.isEmpty())
-            fail(ASCIILiteral("Start of line assertion can only appear as the first term in a filter."));
+            fail(URLFilterParser::MisplacedStartOfLine);
     }
 
     void assertionEOL()
@@ -530,7 +569,7 @@ public:
 
     void assertionWordBoundary(bool)
     {
-        fail(ASCIILiteral("Word boundaries assertions are not supported yet."));
+        fail(URLFilterParser::WordBoundary);
     }
 
     void atomCharacterClassBegin(bool inverted = false)
@@ -549,10 +588,7 @@ public:
         if (hasError())
             return;
 
-        if (!isASCII(character)) {
-            fail(ASCIILiteral("Non ASCII Character in a character set."));
-            return;
-        }
+        ASSERT(isASCII(character));
 
         m_floatingTerm.addCharacter(character, m_patternIsCaseSensitive);
     }
@@ -562,10 +598,10 @@ public:
         if (hasError())
             return;
 
-        if (!a || !b || !isASCII(a) || !isASCII(b)) {
-            fail(ASCIILiteral("Non ASCII Character in a character range of a character set."));
-            return;
-        }
+        ASSERT(a);
+        ASSERT(b);
+        ASSERT(isASCII(a));
+        ASSERT(isASCII(b));
 
         for (unsigned i = a; i <= b; ++i)
             m_floatingTerm.addCharacter(static_cast<UChar>(i), m_patternIsCaseSensitive);
@@ -578,7 +614,7 @@ public:
 
     void atomCharacterClassBuiltIn(JSC::Yarr::BuiltInCharacterClassID, bool)
     {
-        fail(ASCIILiteral("Builtins character class atoms are not supported yet."));
+        fail(URLFilterParser::AtomCharacter);
     }
 
     void atomParenthesesSubpatternBegin(bool = true)
@@ -593,7 +629,7 @@ public:
 
     void atomParentheticalAssertionBegin(bool = false)
     {
-        fail(ASCIILiteral("Groups are not supported yet."));
+        fail(URLFilterParser::Group);
     }
 
     void atomParenthesesEnd()
@@ -609,16 +645,16 @@ public:
 
     void disjunction()
     {
-        fail(ASCIILiteral("Disjunctions are not supported yet."));
+        fail(URLFilterParser::Disjunction);
     }
 
 private:
     bool hasError() const
     {
-        return !m_errorMessage.isNull();
+        return m_parseStatus != URLFilterParser::Ok;
     }
 
-    void fail(const String& errorMessage)
+    void fail(URLFilterParser::ParseStatus reason)
     {
         if (hasError())
             return;
@@ -626,7 +662,7 @@ private:
         if (m_newPrefixSubtreeRoot)
             m_newPrefixSubtreeRoot->nextPattern.remove(m_newPrefixStaringPoint);
 
-        m_errorMessage = errorMessage;
+        m_parseStatus = reason;
     }
 
     void sinkFloatingTermIfNecessary()
@@ -634,10 +670,8 @@ private:
         if (!m_floatingTerm.isValid())
             return;
 
-        ASSERT(m_lastPrefixTreeEntry);
-
         if (m_hasProcessedEndOfLineAssertion) {
-            fail(ASCIILiteral("The end of line assertion must be the last term in an expression."));
+            fail(URLFilterParser::MisplacedEndOfLine);
             m_floatingTerm = Term();
             return;
         }
@@ -651,30 +685,8 @@ private:
             return;
         }
 
-        auto nextEntry = m_lastPrefixTreeEntry->nextPattern.find(m_floatingTerm);
-        if (nextEntry != m_lastPrefixTreeEntry->nextPattern.end()) {
-            m_lastPrefixTreeEntry = nextEntry->value.get();
-            m_nfa.addRuleId(m_lastPrefixTreeEntry->nfaNode, m_patternId);
-        } else {
-            std::unique_ptr<PrefixTreeEntry> nextPrefixTreeEntry = std::make_unique<PrefixTreeEntry>();
-
-            unsigned newEnd = m_floatingTerm.generateGraph(m_nfa, m_patternId, m_lastPrefixTreeEntry->nfaNode);
-            nextPrefixTreeEntry->nfaNode = newEnd;
-
-            auto addResult = m_lastPrefixTreeEntry->nextPattern.set(m_floatingTerm, WTF::move(nextPrefixTreeEntry));
-            ASSERT(addResult.isNewEntry);
-
-            if (!m_newPrefixSubtreeRoot) {
-                m_newPrefixSubtreeRoot = m_lastPrefixTreeEntry;
-                m_newPrefixStaringPoint = m_floatingTerm;
-            }
-
-            m_lastPrefixTreeEntry = addResult.iterator->value.get();
-        }
-        m_subtreeEnd = m_lastPrefixTreeEntry->nfaNode;
-
+        m_sunkTerms.append(m_floatingTerm);
         m_floatingTerm = Term();
-        ASSERT(m_lastPrefixTreeEntry);
     }
 
     NFA& m_nfa;
@@ -686,13 +698,14 @@ private:
 
     PrefixTreeEntry* m_lastPrefixTreeEntry;
     Deque<Term> m_openGroups;
+    Vector<Term> m_sunkTerms;
     Term m_floatingTerm;
     bool m_hasProcessedEndOfLineAssertion { false };
 
     PrefixTreeEntry* m_newPrefixSubtreeRoot = nullptr;
     Term m_newPrefixStaringPoint;
 
-    String m_errorMessage;
+    URLFilterParser::ParseStatus m_parseStatus;
 };
 
 URLFilterParser::URLFilterParser(NFA& nfa)
@@ -706,33 +719,74 @@ URLFilterParser::~URLFilterParser()
 {
 }
 
-String URLFilterParser::addPattern(const String& pattern, bool patternIsCaseSensitive, uint64_t patternId)
+URLFilterParser::ParseStatus URLFilterParser::addPattern(const String& pattern, bool patternIsCaseSensitive, uint64_t patternId)
 {
     if (!pattern.containsOnlyASCII())
-        return ASCIILiteral("URLFilterParser only supports ASCII patterns.");
+        return NonASCII;
     ASSERT(!pattern.isEmpty());
 
     if (pattern.isEmpty())
-        return ASCIILiteral("Empty pattern.");
+        return EmptyPattern;
 
     unsigned oldSize = m_nfa.graphSize();
 
-    String error;
-
+    ParseStatus status = Ok;
     GraphBuilder graphBuilder(m_nfa, *m_prefixTreeRoot, patternIsCaseSensitive, patternId);
-    error = String(JSC::Yarr::parse(graphBuilder, pattern, 0));
+    String error = String(JSC::Yarr::parse(graphBuilder, pattern, 0));
     if (error.isNull())
         graphBuilder.finalize();
+    else
+        status = YarrError;
+    
+    if (status == Ok)
+        status = graphBuilder.parseStatus();
 
-    if (error.isNull())
-        error = graphBuilder.errorMessage();
-
-    if (!error.isNull())
+    if (status != Ok)
         m_nfa.restoreToGraphSize(oldSize);
 
-    return error;
+    return status;
 }
 
+String URLFilterParser::statusString(ParseStatus status)
+{
+    switch (status) {
+    case Ok:
+        return "Ok";
+    case MatchesEverything:
+        return "Matches everything.";
+    case UnclosedGroups:
+        return "The expression has unclosed groups.";
+    case CannotMatchAnything:
+        return "The pattern cannot match anything.";
+    case NonASCII:
+        return "Only ASCII characters are supported in pattern.";
+    case UnsupportedCharacterClass:
+        return "Character class is not supported.";
+    case MisplacedQuantifier:
+        return "Quantifier without corresponding term to quantify.";
+    case BackReference:
+        return "Patterns cannot contain backreferences.";
+    case MisplacedStartOfLine:
+        return "Start of line assertion can only appear as the first term in a filter.";
+    case WordBoundary:
+        return "Word boundaries assertions are not supported yet.";
+    case AtomCharacter:
+        return "Builtins character class atoms are not supported yet.";
+    case Group:
+        return "Groups are not supported yet.";
+    case Disjunction:
+        return "Disjunctions are not supported yet.";
+    case MisplacedEndOfLine:
+        return "The end of line assertion must be the last term in an expression.";
+    case EmptyPattern:
+        return "Empty pattern.";
+    case YarrError:
+        return "Internal error in YARR.";
+    case InvalidQuantifier:
+        return "Arbitrary atom repetitions are not supported.";
+    }
+}
+    
 } // namespace ContentExtensions
 } // namespace WebCore
 
index e7518ef..81501ea 100644 (file)
@@ -39,11 +39,31 @@ class NFA;
 
 struct PrefixTreeEntry;
 
-class URLFilterParser {
+class WEBCORE_EXPORT URLFilterParser {
 public:
+    enum ParseStatus {
+        Ok,
+        MatchesEverything,
+        UnclosedGroups,
+        CannotMatchAnything,
+        NonASCII,
+        UnsupportedCharacterClass,
+        MisplacedQuantifier,
+        BackReference,
+        MisplacedStartOfLine,
+        WordBoundary,
+        AtomCharacter,
+        Group,
+        Disjunction,
+        MisplacedEndOfLine,
+        EmptyPattern,
+        YarrError,
+        InvalidQuantifier,
+    };
+    static String statusString(ParseStatus);
     explicit URLFilterParser(NFA&);
     ~URLFilterParser();
-    String addPattern(const String& pattern, bool patternIsCaseSensitive, uint64_t patternId);
+    ParseStatus addPattern(const String& pattern, bool patternIsCaseSensitive, uint64_t patternId);
 
 private:
     NFA& m_nfa;
index 8842a5f..2fc9646 100644 (file)
@@ -1,3 +1,15 @@
+2015-03-18  Alex Christensen  <achristensen@webkit.org>
+
+        [ContentExtensions] Prepare for compiling stylesheets of selectors to be used on every page.
+        https://bugs.webkit.org/show_bug.cgi?id=142799
+
+        Reviewed by Brady Eidson.
+
+        * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
+        (TestWebKitAPI::testPattern):
+        (TestWebKitAPI::TEST_F):
+        Start testing regex failures.
+
 2015-03-18  Dhi Aurrahman  <diorahman@rockybars.com>
 
         Fix StringView typos after r181525 and r181558
index d5559ea..6c9e7fa 100644 (file)
 #include <JavaScriptCore/InitializeThreading.h>
 #include <WebCore/ContentExtensionCompiler.h>
 #include <WebCore/ContentExtensionsBackend.h>
+#include <WebCore/NFA.h>
 #include <WebCore/ResourceLoadInfo.h>
 #include <WebCore/URL.h>
+#include <WebCore/URLFilterParser.h>
 #include <wtf/MainThread.h>
 #include <wtf/RunLoop.h>
+#include <wtf/text/CString.h>
 
 namespace WebCore {
 namespace ContentExtensions {
@@ -335,4 +338,19 @@ TEST_F(ContentExtensionTest, ResourceType)
     testRequest(backend, mainDocumentRequest("http://block_only_images.org", ResourceType::Document), { });
 }
 
+static void testPatternStatus(const char* pattern, ContentExtensions::URLFilterParser::ParseStatus status)
+{
+    ContentExtensions::NFA nfa;
+    ContentExtensions::URLFilterParser parser(nfa);
+    EXPECT_EQ(status, parser.addPattern(ASCIILiteral(pattern), false, 0));
+}
+    
+TEST_F(ContentExtensionTest, ParsingFailures)
+{
+    testPatternStatus("a*b?.*.?[a-z]?[a-z]*", ContentExtensions::URLFilterParser::ParseStatus::MatchesEverything);
+    testPatternStatus("a*b?.*.?[a-z]?[a-z]+", ContentExtensions::URLFilterParser::ParseStatus::Ok);
+    testPatternStatus("a*b?.*.?[a-z]?[a-z]", ContentExtensions::URLFilterParser::ParseStatus::Ok);
+    // FIXME: Add regexes that cause each parse status.
+}
+
 } // namespace TestWebKitAPI