Make the NFA transitions range-based
authorbenjamin@webkit.org <benjamin@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 29 Jun 2015 19:40:12 +0000 (19:40 +0000)
committerbenjamin@webkit.org <benjamin@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 29 Jun 2015 19:40:12 +0000 (19:40 +0000)
commitd442d45ed6ac2c1395a7432c321e45ca204697bb
treed70ef24ec4f43450311dc234113e4a0bf2708ab1
parent9c3dcfa4d8fa7a814c5fbbcebea4128b1d6dbe86
Make the NFA transitions range-based
https://bugs.webkit.org/show_bug.cgi?id=146338

Reviewed by Alex Christensen.

Source/WebCore:

Change the NFA to use range based transition for any kind of transition.
The fallback transition is also absorbed as ranges.

Previously, the NFA would only have single transitions and a fallback
transition for all cases not covered by single transitions.

The problem with that design was that character ranges (e.g. [a-z]) were
extended as individual transitions. Something like [^a] would cover
most of the alphabet with transitions.
When converting the NFA to DFA, the time is proportional to the number of states
multiplied by the number of transitions. With many individual transitions,
the run time was an order of magnitude higher than what we want.

This patch changes the NFA to only handle ranges of characters. A single transition
becomes a range with the character as first and last character in the range
('a' becomes 'a' to 'a').
Ranges of characters are handled direclty (e.g. [a-z] becomes a single 'a' to 'z' transition).

In the context of the state machines, ranges have identifies (the target of the transitions).
When two ranges collide, they have to be split such that each part retain its target
except the intersection that gets the union of the targets.

Handling the union of ranges efficiently is critical because we have to do
it for every NFA node in any subset when building the DFA. The helper
class that does that is MutableRange.

The union of ranges is done efficiently because of preconditions on our list of ranges:
-The ranges must be sorted.
-No range in a list can intersect any other range in the same list.

To merge two ranges, we can go over them in order and split them part by part.
It is easy to find what goes where because they are both sorted and we can
compare the characters of each to know how to move forward.
The time to merge 2 range list is proportional to O(n+m) where 'n' and 'm' are
the number of ranges in each list.

Since taking the union of two lists usually create new ranges, we have to allocate
those somewhere efficiently. To do that, MutableRange support an inline capacity,
which is used for the NFAToDFA Convertion.

---

With ranges, the NFA-to-DFA conversion changes very little:
-Each NFA nodes contains a list of ranges and each range has a list of targets.
-The subset construction select any number of NFA nodes corresponding to
 a single deterministic state.
-For the subset, we can use MutableRange to merge the ranges of every
 NFA node in the set. The resulting list has rangeis with targets corresponding
 to the union of all the transitions.
-We go over all the ranges the same way we used to go over the transitions.
 Since the DFA does not support ranges, the ranges are spread as individual
 transitions + fallback transition.

---

With the efficient merging solved, we still need the actual NFA to use ranges
instead of individual transitions.

I could have used MutableRange for that but it is not the most compact
way to represent ranges that do not need merging.

Instead, the NFA uses a custom structure: ImmutableNFA. It is basically
the same thing, but in that one you cannot change a list of range
after creating it.

Consequently, the sorted ranges in ImmutableNFA are also subsequent
in memory, which is really nice to go over them efficiently
when merging ranges in the NFA-to-DFA conversion. :)

When building the NFA, we don't know all the transitions when creating
each node, BUT we know that we have very few node "unfinished" at any
time. Since we build by going depth-first in the prefix-tree, we only
have the max-depth of live nodes in the worst case.

To help building the NFA out of the prefix tree, we have
ImmutableNFANodeBuilder. It keeps all the informations about a NFA node,
but in a non-compact, mutable form. When a ImmutableNFANodeBuilder
is destroyed, it serialize the information into the immutable NFA.

* WebCore.xcodeproj/project.pbxproj:
* contentextensions/CombinedURLFilters.cpp:
(WebCore::ContentExtensions::generateNFAForSubtree):
(WebCore::ContentExtensions::CombinedURLFilters::processNFAs):
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::empty):
* contentextensions/DFA.h:
* contentextensions/ImmutableNFA.h: Added.
(WebCore::ContentExtensions::ImmutableNFA::ConstTargetIterator::operator*):
(WebCore::ContentExtensions::ImmutableNFA::ConstTargetIterator::operator->):
(WebCore::ContentExtensions::ImmutableNFA::ConstTargetIterator::operator==):
(WebCore::ContentExtensions::ImmutableNFA::ConstTargetIterator::operator!=):
(WebCore::ContentExtensions::ImmutableNFA::ConstTargetIterator::operator++):
(WebCore::ContentExtensions::ImmutableNFA::IterableConstTargets::begin):
(WebCore::ContentExtensions::ImmutableNFA::IterableConstTargets::end):
(WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator*):
(WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator->):
(WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator==):
(WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator!=):
(WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::operator++):
(WebCore::ContentExtensions::ImmutableNFA::ConstRangeIterator::data):
(WebCore::ContentExtensions::ImmutableNFA::IterableConstRange::begin):
(WebCore::ContentExtensions::ImmutableNFA::IterableConstRange::end):
(WebCore::ContentExtensions::ImmutableNFA::IterableConstRange::debugPrint):
(WebCore::ContentExtensions::ImmutableNFA::transitionsForNode):
(WebCore::ContentExtensions::ImmutableNFA::root):
(WebCore::ContentExtensions::ImmutableNFA::finalize):
(WebCore::ContentExtensions::ImmutableNFA::memoryUsed):
* contentextensions/ImmutableNFANodeBuilder.h: Added.
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::ImmutableNFANodeBuilder):
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::~ImmutableNFANodeBuilder):
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::addTransition):
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::addEpsilonTransition):
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::setActions):
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::operator=):
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::finalize):
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::sinkActions):
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::sinkTransitions):
(WebCore::ContentExtensions::ImmutableNFANodeBuilder::sinkEpsilonTransitions):
* contentextensions/MutableRange.h: Added.
(WebCore::ContentExtensions::MutableRange::MutableRange):
(WebCore::ContentExtensions::MutableRange::operator=):
(WebCore::ContentExtensions::MutableRange::size):
* contentextensions/MutableRangeList.h: Added.
(WebCore::ContentExtensions::MutableRangeList::ConstIterator::operator*):
(WebCore::ContentExtensions::MutableRangeList::ConstIterator::operator->):
(WebCore::ContentExtensions::MutableRangeList::ConstIterator::operator==):
(WebCore::ContentExtensions::MutableRangeList::ConstIterator::operator!=):
(WebCore::ContentExtensions::MutableRangeList::ConstIterator::operator++):
(WebCore::ContentExtensions::MutableRangeList::begin):
(WebCore::ContentExtensions::MutableRangeList::end):
(WebCore::ContentExtensions::MutableRangeList::appendRange):
(WebCore::ContentExtensions::MutableRangeList::extend):
(WebCore::ContentExtensions::MutableRangeList::isEmpty):
(WebCore::ContentExtensions::MutableRangeList::clear):
(WebCore::ContentExtensions::MutableRangeList::debugPrint):
(WebCore::ContentExtensions::MutableRangeList::insertBetween):
(WebCore::ContentExtensions::MutableRangeList::initializeFrom):
* contentextensions/NFA.cpp:
(WebCore::ContentExtensions::NFA::debugPrintDot):
(WebCore::ContentExtensions::NFA::NFA): Deleted.
(WebCore::ContentExtensions::NFA::createNode): Deleted.
(WebCore::ContentExtensions::NFA::memoryUsed): Deleted.
(WebCore::ContentExtensions::NFA::addTransition): Deleted.
(WebCore::ContentExtensions::NFA::addEpsilonTransition): Deleted.
(WebCore::ContentExtensions::NFA::addTransitionsOnAnyCharacter): Deleted.
(WebCore::ContentExtensions::NFA::setActions): Deleted.
(WebCore::ContentExtensions::NFA::graphSize): Deleted.
(WebCore::ContentExtensions::NFA::restoreToGraphSize): Deleted.
(WebCore::ContentExtensions::printRange): Deleted.
(WebCore::ContentExtensions::printTransitions): Deleted.
* contentextensions/NFA.h:
(WebCore::ContentExtensions::NFA::root): Deleted.
(WebCore::ContentExtensions::NFA::addRuleId): Deleted.
* contentextensions/NFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::epsilonClosureExcludingSelf):
(WebCore::ContentExtensions::resolveEpsilonClosures):
(WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetSource::NodeIdSetToUniqueNodeIdSetSource):
(WebCore::ContentExtensions::NodeIdSetToUniqueNodeIdSetTranslator::translate):
(WebCore::ContentExtensions::DataConverterWithEpsilonClosure::convert):
(WebCore::ContentExtensions::DataConverterWithEpsilonClosure::extend):
(WebCore::ContentExtensions::createCombinedTransition):
(WebCore::ContentExtensions::canUseFallbackTransition):
(WebCore::ContentExtensions::findBestFallbackTarget):
(WebCore::ContentExtensions::getOrCreateDFANode):
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::populateTransitions): Deleted.
* contentextensions/NFAToDFA.h:
* contentextensions/Term.h:
(WebCore::ContentExtensions::Term::Term):
(WebCore::ContentExtensions::Term::generateGraph):
(WebCore::ContentExtensions::Term::generateSubgraphForAtom):
(WebCore::ContentExtensions::Term::Term::generateGraph): Deleted.

Tools:

* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
Test some new interesting cases.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@186079 268f45cc-cd09-0410-ab3c-d52691b4dbfc
18 files changed:
Source/WebCore/ChangeLog
Source/WebCore/WebCore.xcodeproj/project.pbxproj
Source/WebCore/contentextensions/CombinedURLFilters.cpp
Source/WebCore/contentextensions/ContentExtensionCompiler.cpp
Source/WebCore/contentextensions/DFA.cpp
Source/WebCore/contentextensions/DFA.h
Source/WebCore/contentextensions/ImmutableNFA.h [new file with mode: 0644]
Source/WebCore/contentextensions/ImmutableNFANodeBuilder.h [new file with mode: 0644]
Source/WebCore/contentextensions/MutableRange.h [new file with mode: 0644]
Source/WebCore/contentextensions/MutableRangeList.h [new file with mode: 0644]
Source/WebCore/contentextensions/NFA.cpp
Source/WebCore/contentextensions/NFA.h
Source/WebCore/contentextensions/NFANode.h
Source/WebCore/contentextensions/NFAToDFA.cpp
Source/WebCore/contentextensions/NFAToDFA.h
Source/WebCore/contentextensions/Term.h
Tools/ChangeLog
Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp