Add basic support for character sets to the URL Filter parser
authorbenjamin@webkit.org <benjamin@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 5 Mar 2015 23:47:21 +0000 (23:47 +0000)
committerbenjamin@webkit.org <benjamin@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 5 Mar 2015 23:47:21 +0000 (23:47 +0000)
https://bugs.webkit.org/show_bug.cgi?id=142257

Patch by Benjamin Poulain <bpoulain@apple.com> on 2015-03-05
Reviewed by Alex Christensen.

Source/WebCore:

This patch is a first step toward making the URL filter parser a bit
more powerful: it adds support for character set atom.

I did not attempt to integrate that into the prefix tree in this patch,
instead, the GraphBuilder gets a two modes of generating the NFA:
PrefixTree and DirectGeneration.

As long as we only see trivial atoms, we use the PrefixTree generation
to minimize the number of nodes we need. As soon as we start a character
class, we switch to DirectGeneration and we generate the NFA from the input
without merging with previously seen patterns.

To differentiate between Trivial atoms and CharacterSet, we also gain
an AtomType state.

The character set themself are very simple: each character is represented by
a bit in a 16bytes bit vector.

* contentextensions/URLFilterParser.cpp:
(WebCore::ContentExtensions::quantifyTrivialAtom):
(WebCore::ContentExtensions::GraphBuilder::atomPatternCharacter):
(WebCore::ContentExtensions::GraphBuilder::atomBuiltInCharacterClass):
(WebCore::ContentExtensions::GraphBuilder::quantifyAtom):
(WebCore::ContentExtensions::GraphBuilder::atomBackReference):
(WebCore::ContentExtensions::GraphBuilder::atomCharacterClassBegin):
(WebCore::ContentExtensions::GraphBuilder::atomCharacterClassAtom):
(WebCore::ContentExtensions::GraphBuilder::atomCharacterClassRange):
(WebCore::ContentExtensions::GraphBuilder::atomCharacterClassEnd):
(WebCore::ContentExtensions::GraphBuilder::atomCharacterClassBuiltIn):
(WebCore::ContentExtensions::GraphBuilder::sinkAtom):
(WebCore::ContentExtensions::GraphBuilder::generateTransition):
(WebCore::ContentExtensions::GraphBuilder::sinkTrivialAtom):
(WebCore::ContentExtensions::GraphBuilder::sinkCharacterSet):
(WebCore::ContentExtensions::GraphBuilder::sinkPendingAtomIfNecessary):

LayoutTests:

* http/tests/usercontentfilter/character-set-basic-support-expected.txt: Added.
* http/tests/usercontentfilter/character-set-basic-support.html: Added.
* http/tests/usercontentfilter/character-set-basic-support.html.json: Added.
* http/tests/usercontentfilter/resources/url-blocking-test.js: Added.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@181107 268f45cc-cd09-0410-ab3c-d52691b4dbfc

LayoutTests/ChangeLog
LayoutTests/http/tests/usercontentfilter/character-set-basic-support-expected.txt [new file with mode: 0644]
LayoutTests/http/tests/usercontentfilter/character-set-basic-support.html [new file with mode: 0644]
LayoutTests/http/tests/usercontentfilter/character-set-basic-support.html.json [new file with mode: 0644]
LayoutTests/http/tests/usercontentfilter/resources/url-blocking-test.js [new file with mode: 0644]
Source/WebCore/ChangeLog
Source/WebCore/contentextensions/URLFilterParser.cpp

index f2bc3e7..8042932 100644 (file)
@@ -1,3 +1,15 @@
+2015-03-05  Benjamin Poulain  <bpoulain@apple.com>
+
+        Add basic support for character sets to the URL Filter parser
+        https://bugs.webkit.org/show_bug.cgi?id=142257
+
+        Reviewed by Alex Christensen.
+
+        * http/tests/usercontentfilter/character-set-basic-support-expected.txt: Added.
+        * http/tests/usercontentfilter/character-set-basic-support.html: Added.
+        * http/tests/usercontentfilter/character-set-basic-support.html.json: Added.
+        * http/tests/usercontentfilter/resources/url-blocking-test.js: Added.
+
 2015-03-05  Chris Dumez  <cdumez@apple.com>
 
         Regression(r173761): ASSERTION FAILED: !is8Bit() in StringImpl::characters16()
diff --git a/LayoutTests/http/tests/usercontentfilter/character-set-basic-support-expected.txt b/LayoutTests/http/tests/usercontentfilter/character-set-basic-support-expected.txt
new file mode 100644 (file)
index 0000000..3c7fa93
--- /dev/null
@@ -0,0 +1,23 @@
+Test basic cases of filter using a character set (e.g. [a-z]).
+
+On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
+
+
+URL: http://127.0.0.1:8000/resources/square100.png is not blocked.
+PASS isBlocked is false
+URL: http://127.0.0.1:8000/resources/square100.png?casesEnsitive is not blocked.
+PASS isBlocked is false
+URL: http://127.0.0.1:8000/resources/square100.png?casesenSitive is not blocked.
+PASS isBlocked is false
+URL: http://127.0.0.1:8000/resources/square100.png?BaSiC-TeSt-cAsE is blocked.
+PASS isBlocked is true
+URL: http://127.0.0.1:8000/resources/square100.png?any-scheme-matcher is blocked.
+PASS isBlocked is true
+URL: http://127.0.0.1:8000/resources/square100.png?casesensitive is blocked.
+PASS isBlocked is true
+URL: http://127.0.0.1:8000/resources/square100.png?caseSeNsitive is blocked.
+PASS isBlocked is true
+PASS successfullyParsed is true
+
+TEST COMPLETE
+
diff --git a/LayoutTests/http/tests/usercontentfilter/character-set-basic-support.html b/LayoutTests/http/tests/usercontentfilter/character-set-basic-support.html
new file mode 100644 (file)
index 0000000..fe96552
--- /dev/null
@@ -0,0 +1,23 @@
+<script src="/resources/js-test-pre.js"></script>
+<script src="resources/url-blocking-test.js"></script>
+<script>
+window.jsTestIsAsync = true;
+
+description("Test basic cases of filter using a character set (e.g. [a-z]).");
+
+validTestSet = [
+    "http://127.0.0.1:8000/resources/square100.png",
+    "http://127.0.0.1:8000/resources/square100.png?casesEnsitive",
+    "http://127.0.0.1:8000/resources/square100.png?casesenSitive",
+];
+
+blockedTestSet = [
+    "http://127.0.0.1:8000/resources/square100.png?BaSiC-TeSt-cAsE",
+    "http://127.0.0.1:8000/resources/square100.png?any-scheme-matcher",
+    "http://127.0.0.1:8000/resources/square100.png?casesensitive",
+    "http://127.0.0.1:8000/resources/square100.png?caseSeNsitive",
+];
+
+runBlockingTest(validTestSet, blockedTestSet);
+</script>
+<script src="/resources/js-test-post.js"></script>
\ No newline at end of file
diff --git a/LayoutTests/http/tests/usercontentfilter/character-set-basic-support.html.json b/LayoutTests/http/tests/usercontentfilter/character-set-basic-support.html.json
new file mode 100644 (file)
index 0000000..ecf8b3f
--- /dev/null
@@ -0,0 +1,27 @@
+[
+    {
+        "action": {
+            "type": "block"
+        },
+        "trigger": {
+            "url-filter": ".*\\?[b][a][s][i][c]-test-case"
+        }
+    },
+    {
+        "action": {
+            "type": "block"
+        },
+        "trigger": {
+            "url-filter": "[a-zA-Z][a-zA-Z0-9+-.]*://127\\.0\\.0\\.1:8000/resources/square100\\.png\\?any-scheme-matcher"
+        }
+    },
+    {
+        "action": {
+            "type": "block"
+        },
+        "trigger": {
+            "url-filter": ".*case[Ss][e][nN][s]itive",
+            "url-filter-is-case-sensitive": true
+        }
+    }
+]
diff --git a/LayoutTests/http/tests/usercontentfilter/resources/url-blocking-test.js b/LayoutTests/http/tests/usercontentfilter/resources/url-blocking-test.js
new file mode 100644 (file)
index 0000000..36e1483
--- /dev/null
@@ -0,0 +1,62 @@
+function runBlockingTest(validTestSet, blockedTestSet) {
+    function getUrlIterator(validTestSet, blockedTestSet) {
+        var validTestSetIndex = 0;
+        var blockedTestSetIndex = 0;
+        return function() {
+            if (validTestSetIndex < validTestSet.length)
+                return { url:validTestSet[validTestSetIndex++], expectBlock:false};
+            if (blockedTestSetIndex < blockedTestSet.length)
+                return { url:blockedTestSet[blockedTestSetIndex++], expectBlock:true};
+            return;
+        }
+    }
+
+    function tryLoadingURL(testCase, testFunction) {
+        var request = new XMLHttpRequest;
+        request.open("GET", testCase.url, true);
+        request.testCase = testCase;
+        request.timeout = 50;
+
+        var timeoutId = setTimeout( function() { testFunction("timeout", request); }, 50);
+
+        request.addEventListener("readystatechange", function() { testFunction("readystatechange", request, timeoutId); });
+        request.addEventListener("error", function() { testFunction("error", request, timeoutId); });
+        request.addEventListener("timeout", function() { testFunction("timeout", request, timeoutId); });
+        request.send();
+    }
+
+    function testFunction(eventType, request, timeoutId)
+    {
+        isBlocked = true;
+        if (eventType === "error" || eventType === "timeout")
+            debug("URL: " + request.testCase.url + " is blocked.");
+        else if (eventType == "readystatechange") {
+            if (request.readyState == XMLHttpRequest.HEADERS_RECEIVED) {
+                isBlocked = false;
+                debug("URL: " + request.testCase.url + " is not blocked.");
+            } else
+                return;
+        }
+
+        if (request.testCase.expectBlock)
+            shouldBeTrue("isBlocked");
+        else
+            shouldBeFalse("isBlocked");
+
+        if (timeoutId !== undefined)
+            clearTimeout(timeoutId);
+
+        runNextTest();
+    }
+
+    var urlIterator = getUrlIterator(validTestSet, blockedTestSet);
+    function runNextTest() {
+        nextCase = urlIterator();
+        if (nextCase)
+            tryLoadingURL(nextCase, testFunction);
+        else
+            finishJSTest();
+    }
+
+    runNextTest();
+}
\ No newline at end of file
index 79a13aa..4ee154f 100644 (file)
@@ -1,3 +1,45 @@
+2015-03-05  Benjamin Poulain  <bpoulain@apple.com>
+
+        Add basic support for character sets to the URL Filter parser
+        https://bugs.webkit.org/show_bug.cgi?id=142257
+
+        Reviewed by Alex Christensen.
+
+        This patch is a first step toward making the URL filter parser a bit
+        more powerful: it adds support for character set atom.
+
+        I did not attempt to integrate that into the prefix tree in this patch,
+        instead, the GraphBuilder gets a two modes of generating the NFA:
+        PrefixTree and DirectGeneration.
+
+        As long as we only see trivial atoms, we use the PrefixTree generation
+        to minimize the number of nodes we need. As soon as we start a character
+        class, we switch to DirectGeneration and we generate the NFA from the input
+        without merging with previously seen patterns.
+
+        To differentiate between Trivial atoms and CharacterSet, we also gain
+        an AtomType state.
+
+        The character set themself are very simple: each character is represented by
+        a bit in a 16bytes bit vector.
+
+        * contentextensions/URLFilterParser.cpp:
+        (WebCore::ContentExtensions::quantifyTrivialAtom):
+        (WebCore::ContentExtensions::GraphBuilder::atomPatternCharacter):
+        (WebCore::ContentExtensions::GraphBuilder::atomBuiltInCharacterClass):
+        (WebCore::ContentExtensions::GraphBuilder::quantifyAtom):
+        (WebCore::ContentExtensions::GraphBuilder::atomBackReference):
+        (WebCore::ContentExtensions::GraphBuilder::atomCharacterClassBegin):
+        (WebCore::ContentExtensions::GraphBuilder::atomCharacterClassAtom):
+        (WebCore::ContentExtensions::GraphBuilder::atomCharacterClassRange):
+        (WebCore::ContentExtensions::GraphBuilder::atomCharacterClassEnd):
+        (WebCore::ContentExtensions::GraphBuilder::atomCharacterClassBuiltIn):
+        (WebCore::ContentExtensions::GraphBuilder::sinkAtom):
+        (WebCore::ContentExtensions::GraphBuilder::generateTransition):
+        (WebCore::ContentExtensions::GraphBuilder::sinkTrivialAtom):
+        (WebCore::ContentExtensions::GraphBuilder::sinkCharacterSet):
+        (WebCore::ContentExtensions::GraphBuilder::sinkPendingAtomIfNecessary):
+
 2015-03-05  Roger Fong  <roger_fong@apple.com>
 
         Minor touchups to inline media control icons.
index 8e9e682..ec967f5 100644 (file)
@@ -30,6 +30,7 @@
 
 #include "NFA.h"
 #include <JavaScriptCore/YarrParser.h>
+#include <wtf/BitVector.h>
 
 namespace WebCore {
 
@@ -50,30 +51,47 @@ static TrivialAtom trivialAtomFromASCIICharacter(char character, bool caseSensit
     return static_cast<uint16_t>(toASCIILower(character)) | caseInsensitiveMask;
 }
 
-enum class TrivialAtomQuantifier : uint16_t {
+enum class AtomQuantifier : uint16_t {
+    One = 0,
     ZeroOrOne = 0x1000,
-    ZeroToMany = 0x2000,
-    OneToMany = 0x4000
+    ZeroOrMore = 0x2000,
+    OneOrMore = 0x4000
 };
 
-static void quantifyTrivialAtom(TrivialAtom& trivialAtom, TrivialAtomQuantifier quantifier)
+static void quantifyTrivialAtom(TrivialAtom& trivialAtom, AtomQuantifier quantifier)
 {
     ASSERT(trivialAtom & (hasNonCharacterMask | characterMask));
     ASSERT(!(trivialAtom & 0xf000));
     trivialAtom |= static_cast<uint16_t>(quantifier);
 }
 
+static AtomQuantifier trivialAtomQuantifier(TrivialAtom trivialAtom)
+{
+    switch (trivialAtom & 0xf000) {
+    case static_cast<unsigned>(AtomQuantifier::One):
+        return AtomQuantifier::One;
+    case static_cast<unsigned>(AtomQuantifier::ZeroOrOne):
+        return AtomQuantifier::ZeroOrOne;
+    case static_cast<unsigned>(AtomQuantifier::ZeroOrMore):
+        return AtomQuantifier::ZeroOrMore;
+    case static_cast<unsigned>(AtomQuantifier::OneOrMore):
+        return AtomQuantifier::OneOrMore;
+    }
+    ASSERT_NOT_REACHED();
+    return AtomQuantifier::One;
+}
+
 static TrivialAtom trivialAtomForNewlineClassIDBuiltin()
 {
     return hasNonCharacterMask | newlineClassIDBuiltinMask;
 }
 
 class GraphBuilder {
-private:
     struct BoundedSubGraph {
         unsigned start;
         unsigned end;
     };
+
 public:
     GraphBuilder(NFA& nfa, PrefixTreeEntry& prefixTreeRoot, bool patternIsCaseSensitive, uint64_t patternId)
         : m_nfa(nfa)
@@ -104,21 +122,21 @@ public:
 
     void atomPatternCharacter(UChar character)
     {
+        if (hasError())
+            return;
+
         if (!isASCII(character)) {
             fail(ASCIILiteral("Only ASCII characters are supported in pattern."));
             return;
         }
 
-        if (hasError())
-            return;
-
         sinkPendingAtomIfNecessary();
+        ASSERT(m_floatingAtomType == FloatingAtomType::Invalid);
+        ASSERT(!m_pendingTrivialAtom);
 
         char asciiChararacter = static_cast<char>(character);
-        m_hasValidAtom = true;
-
-        ASSERT(m_lastPrefixTreeEntry);
         m_pendingTrivialAtom = trivialAtomFromASCIICharacter(asciiChararacter, m_patternIsCaseSensitive);
+        m_floatingAtomType = FloatingAtomType::Trivial;
     }
 
     void atomBuiltInCharacterClass(JSC::Yarr::BuiltInCharacterClassID builtInCharacterClassID, bool inverted)
@@ -127,11 +145,12 @@ public:
             return;
 
         sinkPendingAtomIfNecessary();
+        ASSERT(m_floatingAtomType == FloatingAtomType::Invalid);
+        ASSERT(!m_pendingTrivialAtom);
 
         if (builtInCharacterClassID == JSC::Yarr::NewlineClassID && inverted) {
-            m_hasValidAtom = true;
-            ASSERT(m_lastPrefixTreeEntry);
             m_pendingTrivialAtom = trivialAtomForNewlineClassIDBuiltin();
+            m_floatingAtomType = FloatingAtomType::Trivial;
         } else
             fail(ASCIILiteral("Character class is not supported."));
     }
@@ -141,32 +160,39 @@ public:
         if (hasError())
             return;
 
-        ASSERT(m_hasValidAtom);
-        if (!m_hasValidAtom) {
+        switch (m_floatingAtomType) {
+        case FloatingAtomType::Invalid:
             fail(ASCIILiteral("Quantifier without corresponding atom to quantify."));
-            return;
+            break;
+
+        case FloatingAtomType::Trivial:
+            if (!minimum && maximum == 1)
+                quantifyTrivialAtom(m_pendingTrivialAtom, AtomQuantifier::ZeroOrOne);
+            else if (!minimum && maximum == JSC::Yarr::quantifyInfinite)
+                quantifyTrivialAtom(m_pendingTrivialAtom, AtomQuantifier::ZeroOrMore);
+            else if (minimum == 1 && maximum == JSC::Yarr::quantifyInfinite)
+                quantifyTrivialAtom(m_pendingTrivialAtom, AtomQuantifier::OneOrMore);
+            else
+                fail(ASCIILiteral("Arbitrary atom repetitions are not supported."));
+            break;
+        case FloatingAtomType::CharacterSet: {
+            ASSERT(m_characterSetQuantifier == AtomQuantifier::One);
+            if (!minimum && maximum == 1)
+                m_characterSetQuantifier = AtomQuantifier::ZeroOrOne;
+            else if (!minimum && maximum == JSC::Yarr::quantifyInfinite)
+                m_characterSetQuantifier = AtomQuantifier::ZeroOrMore;
+            else if (minimum == 1 && maximum == JSC::Yarr::quantifyInfinite)
+                m_characterSetQuantifier = AtomQuantifier::OneOrMore;
+            else
+                fail(ASCIILiteral("Arbitrary character set repetitions are not supported."));
+            break;
+        }
         }
-
-        ASSERT(m_lastPrefixTreeEntry);
-        if (!minimum && maximum == 1)
-            quantifyTrivialAtom(m_pendingTrivialAtom, TrivialAtomQuantifier::ZeroOrOne);
-        else if (!minimum && maximum == JSC::Yarr::quantifyInfinite)
-            quantifyTrivialAtom(m_pendingTrivialAtom, TrivialAtomQuantifier::ZeroToMany);
-        else if (minimum == 1 && maximum == JSC::Yarr::quantifyInfinite)
-            quantifyTrivialAtom(m_pendingTrivialAtom, TrivialAtomQuantifier::OneToMany);
-        else
-            fail(ASCIILiteral("Arbitrary atom repetitions are not supported."));
     }
 
-    NO_RETURN_DUE_TO_ASSERT void atomBackReference(unsigned)
+    void atomBackReference(unsigned)
     {
         fail(ASCIILiteral("Patterns cannot contain backreferences."));
-        ASSERT_NOT_REACHED();
-    }
-
-    void atomCharacterClassAtom(UChar)
-    {
-        fail(ASCIILiteral("Character class atoms are not supported yet."));
     }
 
     void assertionBOL()
@@ -184,24 +210,60 @@ public:
         fail(ASCIILiteral("Word boundaries assertions are not supported yet."));
     }
 
-    void atomCharacterClassBegin(bool = false)
+    void atomCharacterClassBegin(bool inverted = false)
     {
-        fail(ASCIILiteral("Character class atoms are not supported yet."));
+        if (hasError())
+            return;
+
+        sinkPendingAtomIfNecessary();
+
+        ASSERT_WITH_MESSAGE(!m_pendingCharacterSet.bitCount(), "We should not have nested character classes.");
+        ASSERT(m_floatingAtomType == FloatingAtomType::Invalid);
+
+        m_buildMode = BuildMode::DirectGeneration;
+        m_lastPrefixTreeEntry = nullptr;
+
+        m_isInvertedCharacterSet = inverted;
+        m_floatingAtomType = FloatingAtomType::CharacterSet;
     }
 
-    void atomCharacterClassRange(UChar, UChar)
+    void atomCharacterClassAtom(UChar character)
     {
-        fail(ASCIILiteral("Character class ranges are not supported yet."));
+        if (hasError())
+            return;
+
+        ASSERT(m_floatingAtomType == FloatingAtomType::CharacterSet);
+
+        if (!isASCII(character)) {
+            fail(ASCIILiteral("Non ASCII Character in a character set."));
+            return;
+        }
+        m_pendingCharacterSet.set(character);
     }
 
-    void atomCharacterClassBuiltIn(JSC::Yarr::BuiltInCharacterClassID, bool)
+    void atomCharacterClassRange(UChar a, UChar b)
     {
-        fail(ASCIILiteral("Buildins character class atoms are not supported yet."));
+        if (hasError())
+            return;
+
+        ASSERT(m_floatingAtomType == FloatingAtomType::CharacterSet);
+
+        if (!a || !b || !isASCII(a) || !isASCII(b)) {
+            fail(ASCIILiteral("Non ASCII Character in a character range of a character set."));
+            return;
+        }
+        for (unsigned i = a; i <= b; ++i)
+            m_pendingCharacterSet.set(i);
     }
 
     void atomCharacterClassEnd()
     {
-        fail(ASCIILiteral("Character class are not supported yet."));
+        // Nothing to do here. The character set atom may have a quantifier, we sink the atom lazily.
+    }
+
+    void atomCharacterClassBuiltIn(JSC::Yarr::BuiltInCharacterClassID, bool)
+    {
+        fail(ASCIILiteral("Builtins character class atoms are not supported yet."));
     }
 
     void atomParenthesesSubpatternBegin(bool = true)
@@ -241,38 +303,28 @@ private:
         m_errorMessage = errorMessage;
     }
 
-    void generateTransition(TrivialAtom trivialAtom, unsigned source, unsigned target)
+    BoundedSubGraph sinkAtom(std::function<void(unsigned, unsigned)> transitionFunction, AtomQuantifier quantifier, unsigned start)
     {
-        if (trivialAtom & hasNonCharacterMask) {
-            ASSERT(trivialAtom & newlineClassIDBuiltinMask);
-            m_nfa.addTransitionsOnAnyCharacter(source, target);
-        } else {
-            if (trivialAtom & caseInsensitiveMask) {
-                char character = static_cast<char>(trivialAtom & characterMask);
-                m_nfa.addTransition(source, target, character);
-                m_nfa.addTransition(source, target, toASCIIUpper(character));
-            } else
-                m_nfa.addTransition(source, target, static_cast<char>(trivialAtom & characterMask));
+        switch (quantifier) {
+        case AtomQuantifier::One: {
+            unsigned newEnd = m_nfa.createNode();
+            m_nfa.addRuleId(newEnd, m_patternId);
+            transitionFunction(start, newEnd);
+            return { start, newEnd };
         }
-    }
-
-    BoundedSubGraph sinkTrivialAtom(TrivialAtom trivialAtom, unsigned start)
-    {
-        if (trivialAtom & static_cast<uint16_t>(TrivialAtomQuantifier::ZeroOrOne)) {
+        case AtomQuantifier::ZeroOrOne: {
             unsigned newEnd = m_nfa.createNode();
             m_nfa.addRuleId(newEnd, m_patternId);
-            generateTransition(trivialAtom, start, newEnd);
-            m_nfa.addEpsilonTransition(start, newEnd);
+            transitionFunction(start, newEnd);
             return { start, newEnd };
         }
-
-        if (trivialAtom & static_cast<uint16_t>(TrivialAtomQuantifier::ZeroToMany)) {
+        case AtomQuantifier::ZeroOrMore: {
             unsigned repeatStart = m_nfa.createNode();
             m_nfa.addRuleId(repeatStart, m_patternId);
             unsigned repeatEnd = m_nfa.createNode();
             m_nfa.addRuleId(repeatEnd, m_patternId);
 
-            generateTransition(trivialAtom, repeatStart, repeatEnd);
+            transitionFunction(repeatStart, repeatEnd);
             m_nfa.addEpsilonTransition(repeatEnd, repeatStart);
 
             m_nfa.addEpsilonTransition(start, repeatStart);
@@ -283,14 +335,13 @@ private:
             m_nfa.addEpsilonTransition(start, kleenEnd);
             return { start, kleenEnd };
         }
-
-        if (trivialAtom & static_cast<uint16_t>(TrivialAtomQuantifier::OneToMany)) {
+        case AtomQuantifier::OneOrMore: {
             unsigned repeatStart = m_nfa.createNode();
             m_nfa.addRuleId(repeatStart, m_patternId);
             unsigned repeatEnd = m_nfa.createNode();
             m_nfa.addRuleId(repeatEnd, m_patternId);
 
-            generateTransition(trivialAtom, repeatStart, repeatEnd);
+            transitionFunction(repeatStart, repeatEnd);
             m_nfa.addEpsilonTransition(repeatEnd, repeatStart);
 
             m_nfa.addEpsilonTransition(start, repeatStart);
@@ -300,45 +351,133 @@ private:
             m_nfa.addEpsilonTransition(repeatEnd, afterRepeat);
             return { start, afterRepeat };
         }
+        }
+    }
+
+    void generateTransition(TrivialAtom trivialAtom, unsigned source, unsigned target)
+    {
+        if (trivialAtom & hasNonCharacterMask) {
+            ASSERT(trivialAtom & newlineClassIDBuiltinMask);
+            m_nfa.addTransitionsOnAnyCharacter(source, target);
+        } else {
+            if (trivialAtom & caseInsensitiveMask) {
+                char character = static_cast<char>(trivialAtom & characterMask);
+                m_nfa.addTransition(source, target, character);
+                m_nfa.addTransition(source, target, toASCIIUpper(character));
+            } else
+                m_nfa.addTransition(source, target, static_cast<char>(trivialAtom & characterMask));
+        }
+    }
 
-        unsigned newEnd = m_nfa.createNode();
-        m_nfa.addRuleId(newEnd, m_patternId);
-        generateTransition(trivialAtom, start, newEnd);
-        return { start, newEnd };
+    BoundedSubGraph sinkTrivialAtom(TrivialAtom trivialAtom, unsigned start)
+    {
+        auto transitionFunction = [this, trivialAtom](unsigned source, unsigned target)
+        {
+            generateTransition(trivialAtom, source, target);
+        };
+        return sinkAtom(transitionFunction, trivialAtomQuantifier(trivialAtom), start);
     }
 
-    void sinkPendingAtomIfNecessary()
+    void generateTransition(const BitVector& characterSet, bool isInverted, unsigned source, unsigned target)
+    {
+        ASSERT(characterSet.bitCount());
+        if (!isInverted) {
+            for (const auto& characterIterator : characterSet.setBits()) {
+                char character = static_cast<char>(characterIterator);
+                if (!m_patternIsCaseSensitive && isASCIIAlpha(character)) {
+                    m_nfa.addTransition(source, target, toASCIIUpper(character));
+                    m_nfa.addTransition(source, target, toASCIILower(character));
+                } else
+                    m_nfa.addTransition(source, target, character);
+            }
+        } else {
+            for (unsigned i = 1; i < characterSet.size(); ++i) {
+                if (characterSet.get(i))
+                    continue;
+                char character = static_cast<char>(i);
+
+                if (!m_patternIsCaseSensitive && (characterSet.get(toASCIIUpper(character)) || characterSet.get(toASCIILower(character))))
+                    continue;
+
+                m_nfa.addTransition(source, target, character);
+            }
+        }
+    }
+
+    BoundedSubGraph sinkCharacterSet(const BitVector& characterSet, bool isInverted, unsigned start)
     {
-        ASSERT(m_lastPrefixTreeEntry);
+        auto transitionFunction = [this, &characterSet, isInverted](unsigned source, unsigned target)
+        {
+            generateTransition(characterSet, isInverted, source, target);
+        };
+        return sinkAtom(transitionFunction, m_characterSetQuantifier, start);
+    }
 
-        if (!m_hasValidAtom)
+    void sinkPendingAtomIfNecessary()
+    {
+        if (m_floatingAtomType == FloatingAtomType::Invalid)
             return;
 
-        ASSERT(m_pendingTrivialAtom);
+        switch (m_buildMode) {
+        case BuildMode::PrefixTree: {
+            ASSERT(m_lastPrefixTreeEntry);
+            ASSERT_WITH_MESSAGE(m_floatingAtomType == FloatingAtomType::Trivial, "Only trivial atoms are handled with a prefix tree.");
 
-        auto nextEntry = m_lastPrefixTreeEntry->nextPattern.find(m_pendingTrivialAtom);
-        if (nextEntry != m_lastPrefixTreeEntry->nextPattern.end()) {
-            m_lastPrefixTreeEntry = nextEntry->value.get();
-            m_nfa.addRuleId(m_lastPrefixTreeEntry->nfaNode, m_patternId);
-        } else {
-            std::unique_ptr<PrefixTreeEntry> nextPrefixTreeEntry = std::make_unique<PrefixTreeEntry>();
+            auto nextEntry = m_lastPrefixTreeEntry->nextPattern.find(m_pendingTrivialAtom);
+            if (nextEntry != m_lastPrefixTreeEntry->nextPattern.end()) {
+                m_lastPrefixTreeEntry = nextEntry->value.get();
+                m_nfa.addRuleId(m_lastPrefixTreeEntry->nfaNode, m_patternId);
+            } else {
+                std::unique_ptr<PrefixTreeEntry> nextPrefixTreeEntry = std::make_unique<PrefixTreeEntry>();
 
-            BoundedSubGraph newSubGraph = sinkTrivialAtom(m_pendingTrivialAtom, m_lastPrefixTreeEntry->nfaNode);
-            nextPrefixTreeEntry->nfaNode = newSubGraph.end;
+                BoundedSubGraph newSubGraph = sinkTrivialAtom(m_pendingTrivialAtom, m_lastPrefixTreeEntry->nfaNode);
+                nextPrefixTreeEntry->nfaNode = newSubGraph.end;
 
-            auto addResult = m_lastPrefixTreeEntry->nextPattern.set(m_pendingTrivialAtom, WTF::move(nextPrefixTreeEntry));
-            ASSERT(addResult.isNewEntry);
+                auto addResult = m_lastPrefixTreeEntry->nextPattern.set(m_pendingTrivialAtom, WTF::move(nextPrefixTreeEntry));
+                ASSERT(addResult.isNewEntry);
 
-            m_newPrefixSubtreeRoot = m_lastPrefixTreeEntry;
-            m_newPrefixStaringPoint = m_pendingTrivialAtom;
+                if (!m_newPrefixSubtreeRoot) {
+                    m_newPrefixSubtreeRoot = m_lastPrefixTreeEntry;
+                    m_newPrefixStaringPoint = m_pendingTrivialAtom;
+                }
 
-            m_lastPrefixTreeEntry = addResult.iterator->value.get();
+                m_lastPrefixTreeEntry = addResult.iterator->value.get();
+            }
+            m_activeGroup.end = m_lastPrefixTreeEntry->nfaNode;
+            ASSERT(m_lastPrefixTreeEntry);
+            break;
+        }
+        case BuildMode::DirectGeneration: {
+            ASSERT(!m_lastPrefixTreeEntry);
+
+            switch (m_floatingAtomType) {
+            case FloatingAtomType::Invalid:
+                ASSERT_NOT_REACHED();
+                break;
+            case FloatingAtomType::Trivial: {
+                BoundedSubGraph newSubGraph = sinkTrivialAtom(m_pendingTrivialAtom, m_activeGroup.end);
+                m_activeGroup.end = newSubGraph.end;
+                break;
+            }
+            case FloatingAtomType::CharacterSet:
+                if (!m_pendingCharacterSet.bitCount()) {
+                    fail(ASCIILiteral("Empty character set."));
+                    return;
+                }
+                BoundedSubGraph newSubGraph = sinkCharacterSet(m_pendingCharacterSet, m_isInvertedCharacterSet, m_activeGroup.end);
+                m_activeGroup.end = newSubGraph.end;
+
+                m_isInvertedCharacterSet = false;
+                m_characterSetQuantifier = AtomQuantifier::One;
+                m_pendingCharacterSet.clearAll();
+                break;
+            }
+            break;
+            }
         }
-        ASSERT(m_lastPrefixTreeEntry);
 
-        m_activeGroup.end = m_lastPrefixTreeEntry->nfaNode;
         m_pendingTrivialAtom = 0;
-        m_hasValidAtom = false;
+        m_floatingAtomType = FloatingAtomType::Invalid;
     }
 
     NFA& m_nfa;
@@ -348,9 +487,24 @@ private:
     BoundedSubGraph m_activeGroup;
 
     PrefixTreeEntry* m_lastPrefixTreeEntry;
-    bool m_hasValidAtom = false;
+    enum class FloatingAtomType {
+        Invalid,
+        Trivial,
+        CharacterSet
+    };
+    FloatingAtomType m_floatingAtomType { FloatingAtomType::Invalid };
     TrivialAtom m_pendingTrivialAtom = 0;
 
+    bool m_isInvertedCharacterSet { false };
+    BitVector m_pendingCharacterSet { 128 };
+    AtomQuantifier m_characterSetQuantifier { AtomQuantifier::One };
+
+    enum class BuildMode {
+        PrefixTree,
+        DirectGeneration
+    };
+    BuildMode m_buildMode { BuildMode::PrefixTree };
+
     PrefixTreeEntry* m_newPrefixSubtreeRoot = nullptr;
     TrivialAtom m_newPrefixStaringPoint = 0;