Handle the transition on any character as a separate type of transition
authorbenjamin@webkit.org <benjamin@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 21 Jan 2015 21:56:22 +0000 (21:56 +0000)
committerbenjamin@webkit.org <benjamin@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 21 Jan 2015 21:56:22 +0000 (21:56 +0000)
https://bugs.webkit.org/show_bug.cgi?id=140711

Reviewed by Andreas Kling.

Instead of considering the universal transition as 127 transitions, it is now
handled as a separate type of transition.

The goal is to reduce the number of exit edge to consider for each node. Instead
of having 127 for any partition containing one universal transition, we have
as few exit edges as necessary + one universal transition.

In the NFA, the universal transition is stored directly on NFANode in a new
HashSet (transitionsOnAnyCharacter).
The target nodes are made exclusive between "transitionsOnAnyCharacter" and "transitions"
by construction. That is not strictly needed but it simplify debugging at the moment.

When converting a NFA to a DFA, we first find all the node that transition on any character.
Then, when we iterate over "real" transition, we also augment that set with the set on
any character.

When creating the DFA node, we first consider each "real" transition, then we have a single
"fallback" transition for any character that has not been handled yet.

When matching, we first search for any real transition. If there is none but a fallback exists,
we take the fallback.

* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::nextState):
(WebCore::ContentExtensions::printTransitions):
(WebCore::ContentExtensions::DFA::debugPrintDot):
(WebCore::ContentExtensions::printTransition): Deleted.
* contentextensions/DFANode.h:
* contentextensions/NFA.cpp:
(WebCore::ContentExtensions::NFA::addTransition):
(WebCore::ContentExtensions::NFA::addTransitionsOnAnyCharacter):
(WebCore::ContentExtensions::printTransitions):
(WebCore::ContentExtensions::NFA::debugPrintDot):
(WebCore::ContentExtensions::printTransition): Deleted.
* contentextensions/NFA.h:
* contentextensions/NFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::populateTransitions):
(WebCore::ContentExtensions::getOrCreateDFANode):
(WebCore::ContentExtensions::NFAToDFA::convert):
* contentextensions/URLFilterParser.cpp:
(WebCore::ContentExtensions::GraphBuilder::generateTransition):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@178857 268f45cc-cd09-0410-ab3c-d52691b4dbfc

Source/WebCore/ChangeLog
Source/WebCore/contentextensions/DFA.cpp
Source/WebCore/contentextensions/DFANode.h
Source/WebCore/contentextensions/NFA.cpp
Source/WebCore/contentextensions/NFA.h
Source/WebCore/contentextensions/NFANode.h
Source/WebCore/contentextensions/NFAToDFA.cpp
Source/WebCore/contentextensions/URLFilterParser.cpp

index fbf373f..189f327 100644 (file)
@@ -1,3 +1,53 @@
+2015-01-21  Benjamin Poulain  <benjamin@webkit.org>
+
+        Handle the transition on any character as a separate type of transition
+        https://bugs.webkit.org/show_bug.cgi?id=140711
+
+        Reviewed by Andreas Kling.
+
+        Instead of considering the universal transition as 127 transitions, it is now
+        handled as a separate type of transition.
+
+        The goal is to reduce the number of exit edge to consider for each node. Instead
+        of having 127 for any partition containing one universal transition, we have
+        as few exit edges as necessary + one universal transition.
+
+        In the NFA, the universal transition is stored directly on NFANode in a new
+        HashSet (transitionsOnAnyCharacter).
+        The target nodes are made exclusive between "transitionsOnAnyCharacter" and "transitions"
+        by construction. That is not strictly needed but it simplify debugging at the moment.
+
+        When converting a NFA to a DFA, we first find all the node that transition on any character.
+        Then, when we iterate over "real" transition, we also augment that set with the set on
+        any character.
+
+        When creating the DFA node, we first consider each "real" transition, then we have a single
+        "fallback" transition for any character that has not been handled yet.
+
+        When matching, we first search for any real transition. If there is none but a fallback exists,
+        we take the fallback.
+
+        * contentextensions/DFA.cpp:
+        (WebCore::ContentExtensions::DFA::nextState):
+        (WebCore::ContentExtensions::printTransitions):
+        (WebCore::ContentExtensions::DFA::debugPrintDot):
+        (WebCore::ContentExtensions::printTransition): Deleted.
+        * contentextensions/DFANode.h:
+        * contentextensions/NFA.cpp:
+        (WebCore::ContentExtensions::NFA::addTransition):
+        (WebCore::ContentExtensions::NFA::addTransitionsOnAnyCharacter):
+        (WebCore::ContentExtensions::printTransitions):
+        (WebCore::ContentExtensions::NFA::debugPrintDot):
+        (WebCore::ContentExtensions::printTransition): Deleted.
+        * contentextensions/NFA.h:
+        * contentextensions/NFANode.h:
+        * contentextensions/NFAToDFA.cpp:
+        (WebCore::ContentExtensions::populateTransitions):
+        (WebCore::ContentExtensions::getOrCreateDFANode):
+        (WebCore::ContentExtensions::NFAToDFA::convert):
+        * contentextensions/URLFilterParser.cpp:
+        (WebCore::ContentExtensions::GraphBuilder::generateTransition):
+
 2015-01-20  Antti Koivisto  <antti@apple.com>
 
         REGRESSION(r178180): Membuster regressed ~4%
index bbd58b0..4f43e44 100644 (file)
@@ -65,12 +65,17 @@ unsigned DFA::nextState(unsigned currentState, char character, bool& ok) const
 
     const DFANode& node = m_nodes[currentState];
     auto nextNode = node.transitions.find(character);
-    if (nextNode == node.transitions.end()) {
-        ok = false;
-        return 0;
+    if (nextNode != node.transitions.end()) {
+        ok = true;
+        return nextNode->value;
     }
-    ok = true;
-    return nextNode->value;
+    if (node.hasFallbackTransition) {
+        ok = true;
+        return node.fallbackTransition;
+    }
+    ok = false;
+    return 0;
+
 }
 
 const Vector<uint64_t>& DFA::actions(unsigned currentState) const
@@ -95,9 +100,12 @@ static void printRange(bool firstRange, char rangeStart, char rangeEnd)
         dataLogF("\\\\%d-\\\\%d", rangeStart, rangeEnd);
 }
 
-static void printTransition(unsigned sourceNode, const HashMap<uint16_t, unsigned>& transitions)
+static void printTransitions(const Vector<DFANode>& graph, unsigned sourceNodeId)
 {
-    if (transitions.isEmpty())
+    const DFANode& sourceNode = graph[sourceNodeId];
+    const HashMap<uint16_t, unsigned>& transitions = sourceNode.transitions;
+
+    if (transitions.isEmpty() && !sourceNode.hasFallbackTransition)
         return;
 
     HashMap<unsigned, Vector<uint16_t>, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> transitionsPerTarget;
@@ -112,7 +120,7 @@ static void printTransition(unsigned sourceNode, const HashMap<uint16_t, unsigne
 
     // Then we go over each one an display the ranges one by one.
     for (const auto& transitionPerTarget : transitionsPerTarget) {
-        dataLogF("        %d -> %d [label=\"", sourceNode, transitionPerTarget.key);
+        dataLogF("        %d -> %d [label=\"", sourceNodeId, transitionPerTarget.key);
 
         Vector<uint16_t> incommingCharacters = transitionPerTarget.value;
         std::sort(incommingCharacters.begin(), incommingCharacters.end());
@@ -134,6 +142,9 @@ static void printTransition(unsigned sourceNode, const HashMap<uint16_t, unsigne
 
         dataLogF("\"];\n");
     }
+
+    if (sourceNode.hasFallbackTransition)
+        dataLogF("        %d -> %d [label=\"[fallback]\"];\n", sourceNodeId, sourceNode.fallbackTransition);
 }
 
 void DFA::debugPrintDot() const
@@ -174,7 +185,7 @@ void DFA::debugPrintDot() const
 
     dataLogF("    {\n");
     for (unsigned i = 0; i < m_nodes.size(); ++i)
-        printTransition(i, m_nodes[i].transitions);
+        printTransitions(m_nodes, i);
 
     dataLogF("    }\n");
     dataLogF("}\n");
index bce23d6..d8e6b51 100644 (file)
@@ -41,6 +41,8 @@ namespace ContentExtensions {
 class DFANode {
 public:
     HashMap<uint16_t, unsigned> transitions;
+    bool hasFallbackTransition = false;
+    unsigned fallbackTransition;
     Vector<uint64_t> actions;
 
 #if CONTENT_EXTENSIONS_STATE_MACHINE_DEBUGGING
index 1ab990a..53a10b6 100644 (file)
@@ -53,6 +53,10 @@ void NFA::addTransition(unsigned from, unsigned to, char character)
     ASSERT(to < m_nodes.size());
     ASSERT(character);
 
+    NFANode& fromNode = m_nodes[from];
+    if (fromNode.transitionsOnAnyCharacter.contains(to))
+        return;
+
     auto addResult = m_nodes[from].transitions.add(character, HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>>());
     addResult.iterator->value.add(to);
 }
@@ -66,6 +70,18 @@ void NFA::addEpsilonTransition(unsigned from, unsigned to)
     addResult.iterator->value.add(to);
 }
 
+void NFA::addTransitionsOnAnyCharacter(unsigned from, unsigned to)
+{
+    ASSERT(from < m_nodes.size());
+    ASSERT(to < m_nodes.size());
+
+    NFANode& fromNode = m_nodes[from];
+    fromNode.transitionsOnAnyCharacter.add(to);
+
+    for (auto transitionSlot : fromNode.transitions)
+        transitionSlot.value.remove(to);
+}
+
 void NFA::setFinal(unsigned node, uint64_t ruleId)
 {
     ASSERT(!m_nodes[node].finalRuleIds.contains(ruleId));
@@ -114,8 +130,11 @@ static void printRange(bool firstRange, uint16_t rangeStart, uint16_t rangeEnd,
     }
 }
 
-static void printTransition(unsigned sourceNode, const HashMap<uint16_t, HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>>>& transitions, uint16_t epsilonTransitionCharacter)
+static void printTransitions(const Vector<NFANode>& graph, unsigned sourceNode, uint16_t epsilonTransitionCharacter)
 {
+    const NFANode& node = graph[sourceNode];
+    const HashMap<uint16_t, HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>>>& transitions = node.transitions;
+
     HashMap<unsigned, HashSet<uint16_t>, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> transitionsPerTarget;
 
     for (const auto& transition : transitions) {
@@ -149,6 +168,9 @@ static void printTransition(unsigned sourceNode, const HashMap<uint16_t, HashSet
 
         dataLogF("\"];\n");
     }
+
+    for (unsigned targetOnAnyCharacter : node.transitionsOnAnyCharacter)
+        dataLogF("        %d -> %d [label=\"[any input]\"];\n", sourceNode, targetOnAnyCharacter);
 }
 
 void NFA::debugPrintDot() const
@@ -193,7 +215,7 @@ void NFA::debugPrintDot() const
 
     dataLogF("    {\n");
     for (unsigned i = 0; i < m_nodes.size(); ++i)
-        printTransition(i, m_nodes[i].transitions, epsilonTransitionCharacter);
+        printTransitions(m_nodes, i, epsilonTransitionCharacter);
     dataLogF("    }\n");
     dataLogF("}\n");
 }
index dda74f8..6d95c20 100644 (file)
@@ -48,6 +48,7 @@ public:
 
     void addTransition(unsigned from, unsigned to, char character);
     void addEpsilonTransition(unsigned from, unsigned to);
+    void addTransitionsOnAnyCharacter(unsigned from, unsigned to);
     void setFinal(unsigned node, uint64_t ruleId);
 
     unsigned graphSize() const;
index 90084bb..75a3549 100644 (file)
@@ -41,6 +41,7 @@ namespace ContentExtensions {
 class NFANode {
 public:
     HashMap<uint16_t, HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>>> transitions;
+    HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> transitionsOnAnyCharacter;
 
     Vector<uint64_t> finalRuleIds;
 #if CONTENT_EXTENSIONS_STATE_MACHINE_DEBUGGING
index e235151..5d8c50e 100644 (file)
@@ -291,9 +291,10 @@ private:
     NodeIdSet m_targets[128];
 };
 
-static inline void populateTransitions(SetTransitions& setTransitions, const UniqueNodeIdSetImpl& sourceNodeSet, const Vector<NFANode>& graph, const Vector<Vector<unsigned>>& nfaNodeclosures)
+static inline void populateTransitions(SetTransitions& setTransitions, NodeIdSet& setFallbackTransition, const UniqueNodeIdSetImpl& sourceNodeSet, const Vector<NFANode>& graph, const Vector<Vector<unsigned>>& nfaNodeclosures)
 {
     ASSERT(!graph.isEmpty());
+    ASSERT(setFallbackTransition.isEmpty());
 #if !ASSERT_DISABLED
     for (const NodeIdSet& set : setTransitions)
         ASSERT(set.isEmpty());
@@ -302,16 +303,36 @@ static inline void populateTransitions(SetTransitions& setTransitions, const Uni
     const unsigned* buffer = sourceNodeSet.buffer();
     for (unsigned i = 0; i < sourceNodeSet.m_size; ++i) {
         unsigned nodeId = buffer[i];
+        const NFANode& nfaSourceNode = graph[nodeId];
+        if (!nfaSourceNode.transitionsOnAnyCharacter.isEmpty())
+            setFallbackTransition.add(nfaSourceNode.transitionsOnAnyCharacter.begin(), nfaSourceNode.transitionsOnAnyCharacter.end());
+    }
+    for (unsigned targetNodeId : setFallbackTransition)
+        extendSetWithClosure(nfaNodeclosures, targetNodeId, setFallbackTransition);
+
+    for (unsigned i = 0; i < sourceNodeSet.m_size; ++i) {
+        unsigned nodeId = buffer[i];
         for (const auto& transitionSlot : graph[nodeId].transitions) {
             NodeIdSet& targetSet = setTransitions[transitionSlot.key];
             for (unsigned targetNodId : transitionSlot.value) {
                 targetSet.add(targetNodId);
                 extendSetWithClosure(nfaNodeclosures, targetNodId, targetSet);
             }
+            targetSet.add(setFallbackTransition.begin(), setFallbackTransition.end());
         }
     }
 }
 
+static ALWAYS_INLINE unsigned getOrCreateDFANode(const NodeIdSet& nfaNodeSet, const Vector<NFANode>& nfaGraph, Vector<DFANode>& dfaGraph, UniqueNodeIdSetTable& uniqueNodeIdSetTable, Vector<UniqueNodeIdSetImpl*>& unprocessedNodes)
+{
+    NodeIdSetToUniqueNodeIdSetSource nodeIdSetToUniqueNodeIdSetSource(dfaGraph, nfaGraph, nfaNodeSet);
+    auto uniqueNodeIdAddResult = uniqueNodeIdSetTable.add<NodeIdSetToUniqueNodeIdSetTranslator>(nodeIdSetToUniqueNodeIdSetSource);
+    if (uniqueNodeIdAddResult.isNewEntry)
+        unprocessedNodes.append(uniqueNodeIdAddResult.iterator->impl());
+
+    return uniqueNodeIdAddResult.iterator->impl()->m_dfaNodeId;
+}
+
 DFA NFAToDFA::convert(NFA& nfa)
 {
     Vector<NFANode>& nfaGraph = nfa.m_nodes;
@@ -337,7 +358,8 @@ DFA NFAToDFA::convert(NFA& nfa)
         UniqueNodeIdSetImpl* uniqueNodeIdSetImpl = unprocessedNodes.takeLast();
 
         unsigned dfaNodeId = uniqueNodeIdSetImpl->m_dfaNodeId;
-        populateTransitions(transitionsFromClosedSet, *uniqueNodeIdSetImpl, nfaGraph, nfaNodeClosures);
+        NodeIdSet setFallbackTransition;
+        populateTransitions(transitionsFromClosedSet, setFallbackTransition, *uniqueNodeIdSetImpl, nfaGraph, nfaNodeClosures);
 
         // FIXME: there should not be any transition on key 0.
         for (unsigned key = 0; key < transitionsFromClosedSet.size(); ++key) {
@@ -346,18 +368,19 @@ DFA NFAToDFA::convert(NFA& nfa)
             if (targetNodeSet.isEmpty())
                 continue;
 
-            NodeIdSetToUniqueNodeIdSetSource nodeIdSetToUniqueNodeIdSetSource(dfaGraph, nfaGraph, targetNodeSet);
-            auto uniqueNodeIdAddResult = uniqueNodeIdSetTable.add<NodeIdSetToUniqueNodeIdSetTranslator>(nodeIdSetToUniqueNodeIdSetSource);
-
-            unsigned targetNodeId = uniqueNodeIdAddResult.iterator->impl()->m_dfaNodeId;
+            unsigned targetNodeId = getOrCreateDFANode(targetNodeSet, nfaGraph, dfaGraph, uniqueNodeIdSetTable, unprocessedNodes);
             const auto addResult = dfaGraph[dfaNodeId].transitions.add(key, targetNodeId);
             ASSERT_UNUSED(addResult, addResult.isNewEntry);
 
-            if (uniqueNodeIdAddResult.isNewEntry)
-                unprocessedNodes.append(uniqueNodeIdAddResult.iterator->impl());
-
             targetNodeSet.clear();
         }
+        if (!setFallbackTransition.isEmpty()) {
+            unsigned targetNodeId = getOrCreateDFANode(setFallbackTransition, nfaGraph, dfaGraph, uniqueNodeIdSetTable, unprocessedNodes);
+            DFANode& dfaSourceNode = dfaGraph[dfaNodeId];
+            ASSERT(!dfaSourceNode.hasFallbackTransition);
+            dfaSourceNode.hasFallbackTransition = true;
+            dfaSourceNode.fallbackTransition = targetNodeId;
+        }
     } while (!unprocessedNodes.isEmpty());
 
     return DFA(WTF::move(dfaGraph), 0);
index eb2df91..8e9e682 100644 (file)
@@ -245,8 +245,7 @@ private:
     {
         if (trivialAtom & hasNonCharacterMask) {
             ASSERT(trivialAtom & newlineClassIDBuiltinMask);
-            for (unsigned i = 1; i < 128; ++i)
-                m_nfa.addTransition(source, target, i);
+            m_nfa.addTransitionsOnAnyCharacter(source, target);
         } else {
             if (trivialAtom & caseInsensitiveMask) {
                 char character = static_cast<char>(trivialAtom & characterMask);