[Content extensions] Combine suffixes when generating NFAs
https://bugs.webkit.org/show_bug.cgi?id=146961
Patch by Benjamin Poulain <bpoulain@apple.com> on 2015-07-16
Reviewed by Alex Christensen.
Source/WebCore:
In this patch, I add a mechanism very similar to the prefix tree
but for the suffix (called a reverse suffix tree here).
The idea is here is to reuse the existing NFA nodes when generating
a chain of suffix Term that were already generated previously.
When generating a disjunction ending with the same suffix, we now
have the same trailing NFA nodes for both sides of the disjunction.
Mixing the prefix and suffix generation can be tricky, we do not want
transitions from a pattern to creep into the suffix of an other.
To avoid any conflict, the rules here are very simple:
-Only use the reverse suffix tree for terms without actions
up to a leaf term with actions.
This rule ensure that no action will accidentally make its way
to an other rule by resuing a vertex of the reverse suffix tree.
-Only use the reverse suffix tree for chains of terms in which
each term only has zero or one following term.
With this condition, when taking any vertex of the reverse suffix
tree, there is only one edge that move out of that vertex when reading
from left to right.
For any vertex, there is only one possible string generated
left-to-right, a single suffix.
This is overly restrictive but it is fast, easier to verify, and it works
well in practice.
For all the more complicated cases, we can count on the Minimizer to
find a better solution.
With all the simple suffixes merged, our NFAs are smaller, which
let us combine more patterns.
The DFAs are also smaller and faster to produce since their size
is relative to the NFA sizes.
Overall, I get the following gains:
-Chris's test case:
compile time -40%.
bytecode size -14%.
-Armand's test case:
compile time -53%.
bytecode size -13%.
* WebCore.xcodeproj/project.pbxproj:
* contentextensions/CombinedURLFilters.cpp:
(WebCore::ContentExtensions::ActiveSubtree::ActiveSubtree):
(WebCore::ContentExtensions::generateInfixUnsuitableForReverseSuffixTree):
(WebCore::ContentExtensions::generateSuffixWithReverseSuffixTree):
(WebCore::ContentExtensions::clearReverseSuffixTree):
(WebCore::ContentExtensions::generateNFAForSubtree):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::debugPrintDot):
Forgot to close a tag, dot was not happy.
* contentextensions/HashableActionList.h: Added.
(WebCore::ContentExtensions::HashableActionList::HashableActionList):
(WebCore::ContentExtensions::HashableActionList::isEmptyValue):
(WebCore::ContentExtensions::HashableActionList::isDeletedValue):
(WebCore::ContentExtensions::HashableActionList::operator==):
(WebCore::ContentExtensions::HashableActionList::operator!=):
(WebCore::ContentExtensions::HashableActionListHash::hash):
(WebCore::ContentExtensions::HashableActionListHash::equal):
We need a way to group reverse suffix tree by their terminal actions.
This new hash structure lets us find unique vertex for a list of actions
in any order.
* contentextensions/ImmutableNFANodeBuilder.h:
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::isValid):
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::nodeId):
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::addTransition):
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::addEpsilonTransition):
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::ImmutableNFANodeBuilder): Deleted.
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::~ImmutableNFANodeBuilder): Deleted.
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::operator=): Deleted.
* contentextensions/Term.h:
(WebCore::ContentExtensions::Term::generateGraph):
(WebCore::ContentExtensions::Term::generateSubgraphForAtom):
Node building changes a bit.
Previously, it was assumed nodes are always built from left to right.
Getting the node on the right was done by providing the left node and the term
doing the transition.
Now we have both left to right and right to left generation.
The right-to-left has a specific property: no edge can be added after
it's initial term (rule 2 of our reverse suffix tree). This simplifies
things a bit since we can finalize all the nodes in the suffix tree.
All we need is to keep their ID to be able to link new nodes
to the reverse suffix tree.
Source/WTF:
* wtf/Vector.h:
(WTF::minCapacity>::Vector):
(WTF::=):
Copying a vector with a different inline capacity was broken due to
the addition of MinimumCapacity.
This feature was needed by this patch so I fixed WTF.
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
(TestWebKitAPI::compareContents):
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186910
268f45cc-cd09-0410-ab3c-
d52691b4dbfc