Modernize and streamline HTMLTokenizer
authordarin@apple.com <darin@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 12 Jan 2015 16:22:50 +0000 (16:22 +0000)
committerdarin@apple.com <darin@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 12 Jan 2015 16:22:50 +0000 (16:22 +0000)
https://bugs.webkit.org/show_bug.cgi?id=140166

Reviewed by Sam Weinig.

Source/WebCore:

* html/parser/AtomicHTMLToken.h:
(WebCore::AtomicHTMLToken::initializeAttributes): Removed unneeded assertions
based on fields I removed.

* html/parser/HTMLDocumentParser.cpp:
(WebCore::HTMLDocumentParser::HTMLDocumentParser): Change to use updateStateFor
to set the initial state when parsing a fragment, since it implements the same
rule taht the tokenizerStateForContextElement function did.
(WebCore::HTMLDocumentParser::pumpTokenizer): Updated to use the revised
interfaces for HTMLSourceTracker and HTMLTokenizer.
(WebCore::HTMLDocumentParser::constructTreeFromHTMLToken): Changed to take a
TokenPtr instead of an HTMLToken, so we can clear out the TokenPtr earlier
for non-character tokens, and let them get cleared later for character tokens.
(WebCore::HTMLDocumentParser::insert): Pass references.
(WebCore::HTMLDocumentParser::append): Ditto.
(WebCore::HTMLDocumentParser::appendCurrentInputStreamToPreloadScannerAndScan): Ditto.

* html/parser/HTMLDocumentParser.h: Updated argument type for constructTreeFromHTMLToken
and removed now-unneeded m_token data members.

* html/parser/HTMLEntityParser.cpp: Removed unneeded uses of the inline keyword.
(WebCore::HTMLEntityParser::consumeNamedEntity): Replaced two uses of
advanceAndASSERT with just plain advance; there's really no need to assert the
character is the one we just got out of the string.

* html/parser/HTMLInputStream.h: Moved the include of TextPosition.h here from
its old location since this class has two data members that are OrdinalNumber.

* html/parser/HTMLMetaCharsetParser.cpp:
(WebCore::HTMLMetaCharsetParser::HTMLMetaCharsetParser): Removed most of the
initialization, since it's now done by defaults.
(WebCore::extractCharset): Rewrote this to be a non-member function, and to
use a for loop, and to handle quote marks in a simpler way. Also changed it
to return a StringView so we don't have to allocate a new string.
(WebCore::HTMLMetaCharsetParser::processMeta): Use a modern for loop, and
also take a token argument since it's no longer a data member.
(WebCore::HTMLMetaCharsetParser::encodingFromMetaAttributes): Use a modern for
loop, StringView instead of string, and don't bother naming the local enum.
(WebCore::HTMLMetaCharsetParser::checkForMetaCharset): Updated for the new
way of getting tokens from the tokenizer.

* html/parser/HTMLMetaCharsetParser.h: Got rid of some data members and
tightened up the formatting a little. Don't bother allocating the tokenizer
on the heap.

* html/parser/HTMLPreloadScanner.cpp:
(WebCore::TokenPreloadScanner::TokenPreloadScanner): Removed unneeded
initialization.
(WebCore::HTMLPreloadScanner::HTMLPreloadScanner): Ditto.
(WebCore::HTMLPreloadScanner::scan): Changed to take a reference.

* html/parser/HTMLPreloadScanner.h: Removed unneeded includes, typedefs,
and forward declarations. Removed explicit declaration of the destructor,
since the default one works. Removed unused createCheckpoint and rewindTo
functions. Gave initial values for various data members. Marked the device
scale factor const beacuse it's set in the constructor and never changed.
Also removed the unneeded isSafeToSendToAnotherThread.

* html/parser/HTMLResourcePreloader.cpp:
(WebCore::PreloadRequest::isSafeToSendToAnotherThread): Deleted.

* html/parser/HTMLResourcePreloader.h:
(WebCore::PreloadRequest::PreloadRequest): Removed unneeded calls to
isolatedCopy. Also removed isSafeToSendToAnotherThread.

* html/parser/HTMLSourceTracker.cpp:
(WebCore::HTMLSourceTracker::startToken): Renamed. Changed to keep state
 in the source tracker itself, not the token.
(WebCore::HTMLSourceTracker::endToken): Ditto.
(WebCore::HTMLSourceTracker::source): Renamed. Changed to use the state
from the source tracker.

* html/parser/HTMLSourceTracker.h: Removed unneeded include of HTMLToken.h.
Renamed functions, removed now-unneeded comment.

* html/parser/HTMLToken.h: Cut down on the fields used by the source tracker.
It only needs to know the start and end of each attribute, not each part of
each attribute. Removed setBaseOffset, setEndOffset, length, addNewAttribute,
beginAttributeName, endAttributeName, beginAttributeValue, endAttributeValue,
m_baseOffset and m_length. Added beginAttribute and endAttribute.
(WebCore::HTMLToken::clear): No need to zero m_length or m_baseOffset any more.
(WebCore::HTMLToken::length): Deleted.
(WebCore::HTMLToken::setBaseOffset): Deleted.
(WebCore::HTMLToken::setEndOffset): Deleted.
(WebCore::HTMLToken::beginStartTag): Only null out m_currentAttribute if we
are compiling in assertions.
(WebCore::HTMLToken::beginEndTag): Ditto.
(WebCore::HTMLToken::addNewAttribute): Deleted.
(WebCore::HTMLToken::beginAttribute): Moved the code from addNewAttribute in
here and set the start offset.
(WebCore::HTMLToken::beginAttributeName): Deleted.
(WebCore::HTMLToken::endAttributeName): Deleted.
(WebCore::HTMLToken::beginAttributeValue): Deleted.
(WebCore::HTMLToken::endAttributeValue): Deleted.

* html/parser/HTMLTokenizer.cpp:
(WebCore::HTMLToken::endAttribute): Added. Sets the end offset.
(WebCore::HTMLToken::appendToAttributeName): Updated assertion.
(WebCore::HTMLToken::appendToAttributeValue): Ditto.
(WebCore::convertASCIIAlphaToLower): Renamed from toLowerCase and changed
so it's legal to call on lower case letters too.
(WebCore::vectorEqualsString): Changed to take a string literal rather than
a WTF::String.
(WebCore::HTMLTokenizer::inEndTagBufferingState): Made this a member function.
(WebCore::HTMLTokenizer::HTMLTokenizer): Updated for data member changes.
(WebCore::HTMLTokenizer::bufferASCIICharacter): Added. Optimized version of
bufferCharacter for the common case where we know the character is ASCII.
(WebCore::HTMLTokenizer::bufferCharacter): Moved this function here from the
header since it's only used inside the class.
(WebCore::HTMLTokenizer::emitAndResumeInDataState): Moved this here, renamed
it and removed the state argument.
(WebCore::HTMLTokenizer::emitAndReconsumeInDataState): Ditto.
(WebCore::HTMLTokenizer::emitEndOfFile): More of the same.
(WebCore::HTMLTokenizer::saveEndTagNameIfNeeded): Ditto.
(WebCore::HTMLTokenizer::haveBufferedCharacterToken): Ditto.
(WebCore::HTMLTokenizer::flushBufferedEndTag): Updated since m_token is now
the actual token, not just a pointer.
(WebCore::HTMLTokenizer::flushEmitAndResumeInDataState): Renamed this and
removed the state argument.
(WebCore::HTMLTokenizer::processToken): This function, formerly nextToken,
is now the internal function used by nextToken. Updated its contents to use
simpler macros, changed code to set m_state when returning, rather than
constantly setting it when cycling through states, switched style to use
early return/goto rather than lots of else statements, took out unneeded
braces now that BEGIN/END_STATE handles the braces, collapsed upper and
lower case letter handling in many states, changed lookAhead call sites to
use the new advancePast function instead.
(WebCore::HTMLTokenizer::updateStateFor): Set m_state directly instead of
calling a setstate function.
(WebCore::HTMLTokenizer::appendToTemporaryBuffer): Moved here from header.
(WebCore::HTMLTokenizer::temporaryBufferIs): Changed argument type to
a literal instead of a WTF::String.
(WebCore::HTMLTokenizer::appendToPossibleEndTag): Renamed and changed type
to be a UChar instead of LChar, although all characters will be ASCII.
(WebCore::HTMLTokenizer::isAppropriateEndTag): Marked const, and changed
type from size_t to unsigned.

* html/parser/HTMLTokenizer.h: Changed interface of nextToken so it returns
a TokenPtr so code doesn't have to understand special rules about when to
work with an HTMLToken and when to clear it. Made most functions private,
and made the State enum private as well. Replaced the state and setState
functions with more specific functions for the few states we need to deal
with outside the class. Moved function bodies outside the class definition
so it's easier to read the class definition.

* html/parser/HTMLTreeBuilder.cpp:
(WebCore::HTMLTreeBuilder::processStartTagForInBody): Updated to use the
new set state functions instead of setState.
(WebCore::HTMLTreeBuilder::processEndTag): Ditto.
(WebCore::HTMLTreeBuilder::processGenericRCDATAStartTag): Ditto.
(WebCore::HTMLTreeBuilder::processGenericRawTextStartTag): Ditto.
(WebCore::HTMLTreeBuilder::processScriptStartTag): Ditto.

* html/parser/InputStreamPreprocessor.h: Marked the constructor explicit,
and mde it take a reference rather than a pointer.

* html/parser/TextDocumentParser.cpp:
(WebCore::TextDocumentParser::insertFakePreElement): Updated to use the
new set state functions instead of setState.

* html/parser/XSSAuditor.cpp:
(WebCore::XSSAuditor::decodedSnippetForName): Updated for name change.
(WebCore::XSSAuditor::decodedSnippetForAttribute): Updated for changes to
attribute range tracking.
(WebCore::XSSAuditor::decodedSnippetForJavaScript): Updated for name change.
(WebCore::XSSAuditor::isSafeToSendToAnotherThread): Deleted.

* html/parser/XSSAuditor.h: Deleted isSafeToSendToAnotherThread.

* html/track/WebVTTTokenizer.cpp: Removed the local state variable from
WEBVTT_ADVANCE_TO; there is no need for it.
(WebCore::WebVTTTokenizer::WebVTTTokenizer): Use a reference instead of a
pointer for the preprocessor.
(WebCore::WebVTTTokenizer::nextToken): Ditto. Also removed the state local
variable and the switch statement, replacing with labels instead since we
go between states with goto.

* platform/text/SegmentedString.cpp:
(WebCore::SegmentedString::operator=): Changed the return type to be non-const
to match normal C++ design rules.
(WebCore::SegmentedString::pushBack): Renamed from prepend since this is not a
general purpose prepend function. Also fixed assertions to not use the strangely
named "escaped" function, since we are deleting it.
(WebCore::SegmentedString::append): Ditto.
(WebCore::SegmentedString::advancePastNonNewlines): Renamed from advance, since
the function only works for non-newlines.
(WebCore::SegmentedString::currentColumn): Got rid of unneeded local variable.
(WebCore::SegmentedString::advancePastSlowCase): Moved here from header and
renamed. This function now consumes the characters if they match.

* platform/text/SegmentedString.h: Made the changes mentioned above.
(WebCore::SegmentedString::excludeLineNumbers): Deleted.
(WebCore::SegmentedString::advancePast): Renamed from lookAhead. Also changed
behavior so the characters are consumed.
(WebCore::SegmentedString::advancePastIgnoringCase): Ditto.
(WebCore::SegmentedString::advanceAndASSERT): Deleted.
(WebCore::SegmentedString::advanceAndASSERTIgnoringCase): Deleted.
(WebCore::SegmentedString::escaped): Deleted.

* xml/parser/CharacterReferenceParserInlines.h:
(WebCore::isHexDigit): Deleted.
(WebCore::unconsumeCharacters): Updated for name change.
(WebCore::consumeCharacterReference): Removed unneeded name for local enum,
renamed local variable "cc" to character. Changed code to use helpers like
isASCIIAlpha and toASCIIHexValue. Removed unneeded use of advanceAndASSERT,
since we don't really need to assert the character we just extracted.

* xml/parser/MarkupTokenizerInlines.h:
(WebCore::isTokenizerWhitespace): Renamed argument to character.
(WebCore::advanceStringAndASSERTIgnoringCase): Deleted.
(WebCore::advanceStringAndASSERT): Deleted.
Changed all the macro implementations so they set m_state only when
returning from the function and just use goto inside the state machine.

Source/WTF:

* wtf/Forward.h: Removed PassRef, added OrdinalNumber and TextPosition.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@178265 268f45cc-cd09-0410-ab3c-d52691b4dbfc

30 files changed:
Source/WTF/ChangeLog
Source/WTF/wtf/Forward.h
Source/WebCore/ChangeLog
Source/WebCore/html/parser/AtomicHTMLToken.h
Source/WebCore/html/parser/HTMLDocumentParser.cpp
Source/WebCore/html/parser/HTMLDocumentParser.h
Source/WebCore/html/parser/HTMLEntityParser.cpp
Source/WebCore/html/parser/HTMLInputStream.h
Source/WebCore/html/parser/HTMLMetaCharsetParser.cpp
Source/WebCore/html/parser/HTMLMetaCharsetParser.h
Source/WebCore/html/parser/HTMLPreloadScanner.cpp
Source/WebCore/html/parser/HTMLPreloadScanner.h
Source/WebCore/html/parser/HTMLResourcePreloader.cpp
Source/WebCore/html/parser/HTMLResourcePreloader.h
Source/WebCore/html/parser/HTMLSourceTracker.cpp
Source/WebCore/html/parser/HTMLSourceTracker.h
Source/WebCore/html/parser/HTMLToken.h
Source/WebCore/html/parser/HTMLTokenizer.cpp
Source/WebCore/html/parser/HTMLTokenizer.h
Source/WebCore/html/parser/HTMLTreeBuilder.cpp
Source/WebCore/html/parser/InputStreamPreprocessor.h
Source/WebCore/html/parser/TextDocumentParser.cpp
Source/WebCore/html/parser/XSSAuditor.cpp
Source/WebCore/html/parser/XSSAuditor.h
Source/WebCore/html/track/WebVTTTokenizer.cpp
Source/WebCore/html/track/WebVTTTokenizer.h
Source/WebCore/platform/text/SegmentedString.cpp
Source/WebCore/platform/text/SegmentedString.h
Source/WebCore/xml/parser/CharacterReferenceParserInlines.h
Source/WebCore/xml/parser/MarkupTokenizerInlines.h

index 989712b..b152e85 100644 (file)
@@ -1,3 +1,12 @@
+2015-01-12  Darin Adler  <darin@apple.com>
+
+        Modernize and streamline HTMLTokenizer
+        https://bugs.webkit.org/show_bug.cgi?id=140166
+
+        Reviewed by Sam Weinig.
+
+        * wtf/Forward.h: Removed PassRef, added OrdinalNumber and TextPosition.
+
 2015-01-09  Commit Queue  <commit-queue@webkit.org>
 
         Unreviewed, rolling out r178154, r178163, and r178164.
index 49a92c5..6aefa1d 100644 (file)
@@ -30,7 +30,6 @@ template<typename T> class LazyNeverDestroyed;
 template<typename T> class NeverDestroyed;
 template<typename T> class OwnPtr;
 template<typename T> class PassOwnPtr;
-template<typename T> class PassRef;
 template<typename T> class PassRefPtr;
 template<typename T> class RefPtr;
 template<typename T> class Ref;
@@ -45,11 +44,13 @@ class CString;
 class Decoder;
 class Encoder;
 class FunctionDispatcher;
+class OrdinalNumber;
 class PrintStream;
 class String;
 class StringBuilder;
 class StringImpl;
 class StringView;
+class TextPosition;
 
 }
 
@@ -63,9 +64,9 @@ using WTF::Function;
 using WTF::FunctionDispatcher;
 using WTF::LazyNeverDestroyed;
 using WTF::NeverDestroyed;
+using WTF::OrdinalNumber;
 using WTF::OwnPtr;
 using WTF::PassOwnPtr;
-using WTF::PassRef;
 using WTF::PassRefPtr;
 using WTF::PrintStream;
 using WTF::Ref;
@@ -75,6 +76,7 @@ using WTF::StringBuffer;
 using WTF::StringBuilder;
 using WTF::StringImpl;
 using WTF::StringView;
+using WTF::TextPosition;
 using WTF::Vector;
 
 #endif // WTF_Forward_h
index 18cae71..a16d20d 100644 (file)
@@ -1,3 +1,224 @@
+2015-01-12  Darin Adler  <darin@apple.com>
+
+        Modernize and streamline HTMLTokenizer
+        https://bugs.webkit.org/show_bug.cgi?id=140166
+
+        Reviewed by Sam Weinig.
+
+        * html/parser/AtomicHTMLToken.h:
+        (WebCore::AtomicHTMLToken::initializeAttributes): Removed unneeded assertions
+        based on fields I removed.
+
+        * html/parser/HTMLDocumentParser.cpp:
+        (WebCore::HTMLDocumentParser::HTMLDocumentParser): Change to use updateStateFor
+        to set the initial state when parsing a fragment, since it implements the same
+        rule taht the tokenizerStateForContextElement function did.
+        (WebCore::HTMLDocumentParser::pumpTokenizer): Updated to use the revised
+        interfaces for HTMLSourceTracker and HTMLTokenizer.
+        (WebCore::HTMLDocumentParser::constructTreeFromHTMLToken): Changed to take a
+        TokenPtr instead of an HTMLToken, so we can clear out the TokenPtr earlier
+        for non-character tokens, and let them get cleared later for character tokens.
+        (WebCore::HTMLDocumentParser::insert): Pass references.
+        (WebCore::HTMLDocumentParser::append): Ditto.
+        (WebCore::HTMLDocumentParser::appendCurrentInputStreamToPreloadScannerAndScan): Ditto.
+
+        * html/parser/HTMLDocumentParser.h: Updated argument type for constructTreeFromHTMLToken
+        and removed now-unneeded m_token data members.
+
+        * html/parser/HTMLEntityParser.cpp: Removed unneeded uses of the inline keyword.
+        (WebCore::HTMLEntityParser::consumeNamedEntity): Replaced two uses of
+        advanceAndASSERT with just plain advance; there's really no need to assert the
+        character is the one we just got out of the string.
+
+        * html/parser/HTMLInputStream.h: Moved the include of TextPosition.h here from
+        its old location since this class has two data members that are OrdinalNumber.
+
+        * html/parser/HTMLMetaCharsetParser.cpp:
+        (WebCore::HTMLMetaCharsetParser::HTMLMetaCharsetParser): Removed most of the
+        initialization, since it's now done by defaults.
+        (WebCore::extractCharset): Rewrote this to be a non-member function, and to
+        use a for loop, and to handle quote marks in a simpler way. Also changed it
+        to return a StringView so we don't have to allocate a new string.
+        (WebCore::HTMLMetaCharsetParser::processMeta): Use a modern for loop, and
+        also take a token argument since it's no longer a data member.
+        (WebCore::HTMLMetaCharsetParser::encodingFromMetaAttributes): Use a modern for
+        loop, StringView instead of string, and don't bother naming the local enum.
+        (WebCore::HTMLMetaCharsetParser::checkForMetaCharset): Updated for the new
+        way of getting tokens from the tokenizer.
+
+        * html/parser/HTMLMetaCharsetParser.h: Got rid of some data members and
+        tightened up the formatting a little. Don't bother allocating the tokenizer
+        on the heap.
+
+        * html/parser/HTMLPreloadScanner.cpp:
+        (WebCore::TokenPreloadScanner::TokenPreloadScanner): Removed unneeded
+        initialization.
+        (WebCore::HTMLPreloadScanner::HTMLPreloadScanner): Ditto.
+        (WebCore::HTMLPreloadScanner::scan): Changed to take a reference.
+
+        * html/parser/HTMLPreloadScanner.h: Removed unneeded includes, typedefs,
+        and forward declarations. Removed explicit declaration of the destructor,
+        since the default one works. Removed unused createCheckpoint and rewindTo
+        functions. Gave initial values for various data members. Marked the device
+        scale factor const beacuse it's set in the constructor and never changed.
+        Also removed the unneeded isSafeToSendToAnotherThread.
+
+        * html/parser/HTMLResourcePreloader.cpp:
+        (WebCore::PreloadRequest::isSafeToSendToAnotherThread): Deleted.
+
+        * html/parser/HTMLResourcePreloader.h:
+        (WebCore::PreloadRequest::PreloadRequest): Removed unneeded calls to
+        isolatedCopy. Also removed isSafeToSendToAnotherThread.
+
+        * html/parser/HTMLSourceTracker.cpp:
+        (WebCore::HTMLSourceTracker::startToken): Renamed. Changed to keep state
+         in the source tracker itself, not the token.
+        (WebCore::HTMLSourceTracker::endToken): Ditto.
+        (WebCore::HTMLSourceTracker::source): Renamed. Changed to use the state
+        from the source tracker.
+
+        * html/parser/HTMLSourceTracker.h: Removed unneeded include of HTMLToken.h.
+        Renamed functions, removed now-unneeded comment.
+
+        * html/parser/HTMLToken.h: Cut down on the fields used by the source tracker.
+        It only needs to know the start and end of each attribute, not each part of
+        each attribute. Removed setBaseOffset, setEndOffset, length, addNewAttribute,
+        beginAttributeName, endAttributeName, beginAttributeValue, endAttributeValue,
+        m_baseOffset and m_length. Added beginAttribute and endAttribute.
+        (WebCore::HTMLToken::clear): No need to zero m_length or m_baseOffset any more.
+        (WebCore::HTMLToken::length): Deleted.
+        (WebCore::HTMLToken::setBaseOffset): Deleted.
+        (WebCore::HTMLToken::setEndOffset): Deleted.
+        (WebCore::HTMLToken::beginStartTag): Only null out m_currentAttribute if we
+        are compiling in assertions.
+        (WebCore::HTMLToken::beginEndTag): Ditto.
+        (WebCore::HTMLToken::addNewAttribute): Deleted.
+        (WebCore::HTMLToken::beginAttribute): Moved the code from addNewAttribute in
+        here and set the start offset.
+        (WebCore::HTMLToken::beginAttributeName): Deleted.
+        (WebCore::HTMLToken::endAttributeName): Deleted.
+        (WebCore::HTMLToken::beginAttributeValue): Deleted.
+        (WebCore::HTMLToken::endAttributeValue): Deleted.
+
+        * html/parser/HTMLTokenizer.cpp:
+        (WebCore::HTMLToken::endAttribute): Added. Sets the end offset.
+        (WebCore::HTMLToken::appendToAttributeName): Updated assertion.
+        (WebCore::HTMLToken::appendToAttributeValue): Ditto.
+        (WebCore::convertASCIIAlphaToLower): Renamed from toLowerCase and changed
+        so it's legal to call on lower case letters too.
+        (WebCore::vectorEqualsString): Changed to take a string literal rather than
+        a WTF::String.
+        (WebCore::HTMLTokenizer::inEndTagBufferingState): Made this a member function.
+        (WebCore::HTMLTokenizer::HTMLTokenizer): Updated for data member changes.
+        (WebCore::HTMLTokenizer::bufferASCIICharacter): Added. Optimized version of
+        bufferCharacter for the common case where we know the character is ASCII.
+        (WebCore::HTMLTokenizer::bufferCharacter): Moved this function here from the
+        header since it's only used inside the class.
+        (WebCore::HTMLTokenizer::emitAndResumeInDataState): Moved this here, renamed
+        it and removed the state argument.
+        (WebCore::HTMLTokenizer::emitAndReconsumeInDataState): Ditto.
+        (WebCore::HTMLTokenizer::emitEndOfFile): More of the same.
+        (WebCore::HTMLTokenizer::saveEndTagNameIfNeeded): Ditto.
+        (WebCore::HTMLTokenizer::haveBufferedCharacterToken): Ditto.
+        (WebCore::HTMLTokenizer::flushBufferedEndTag): Updated since m_token is now
+        the actual token, not just a pointer.
+        (WebCore::HTMLTokenizer::flushEmitAndResumeInDataState): Renamed this and
+        removed the state argument.
+        (WebCore::HTMLTokenizer::processToken): This function, formerly nextToken,
+        is now the internal function used by nextToken. Updated its contents to use
+        simpler macros, changed code to set m_state when returning, rather than
+        constantly setting it when cycling through states, switched style to use
+        early return/goto rather than lots of else statements, took out unneeded
+        braces now that BEGIN/END_STATE handles the braces, collapsed upper and
+        lower case letter handling in many states, changed lookAhead call sites to
+        use the new advancePast function instead.
+        (WebCore::HTMLTokenizer::updateStateFor): Set m_state directly instead of
+        calling a setstate function.
+        (WebCore::HTMLTokenizer::appendToTemporaryBuffer): Moved here from header.
+        (WebCore::HTMLTokenizer::temporaryBufferIs): Changed argument type to
+        a literal instead of a WTF::String.
+        (WebCore::HTMLTokenizer::appendToPossibleEndTag): Renamed and changed type
+        to be a UChar instead of LChar, although all characters will be ASCII.
+        (WebCore::HTMLTokenizer::isAppropriateEndTag): Marked const, and changed
+        type from size_t to unsigned.
+
+        * html/parser/HTMLTokenizer.h: Changed interface of nextToken so it returns
+        a TokenPtr so code doesn't have to understand special rules about when to
+        work with an HTMLToken and when to clear it. Made most functions private,
+        and made the State enum private as well. Replaced the state and setState
+        functions with more specific functions for the few states we need to deal
+        with outside the class. Moved function bodies outside the class definition
+        so it's easier to read the class definition.
+
+        * html/parser/HTMLTreeBuilder.cpp:
+        (WebCore::HTMLTreeBuilder::processStartTagForInBody): Updated to use the
+        new set state functions instead of setState.
+        (WebCore::HTMLTreeBuilder::processEndTag): Ditto.
+        (WebCore::HTMLTreeBuilder::processGenericRCDATAStartTag): Ditto.
+        (WebCore::HTMLTreeBuilder::processGenericRawTextStartTag): Ditto.
+        (WebCore::HTMLTreeBuilder::processScriptStartTag): Ditto.
+
+        * html/parser/InputStreamPreprocessor.h: Marked the constructor explicit,
+        and mde it take a reference rather than a pointer.
+
+        * html/parser/TextDocumentParser.cpp:
+        (WebCore::TextDocumentParser::insertFakePreElement): Updated to use the
+        new set state functions instead of setState.
+
+        * html/parser/XSSAuditor.cpp:
+        (WebCore::XSSAuditor::decodedSnippetForName): Updated for name change.
+        (WebCore::XSSAuditor::decodedSnippetForAttribute): Updated for changes to
+        attribute range tracking.
+        (WebCore::XSSAuditor::decodedSnippetForJavaScript): Updated for name change.
+        (WebCore::XSSAuditor::isSafeToSendToAnotherThread): Deleted.
+
+        * html/parser/XSSAuditor.h: Deleted isSafeToSendToAnotherThread.
+
+        * html/track/WebVTTTokenizer.cpp: Removed the local state variable from
+        WEBVTT_ADVANCE_TO; there is no need for it.
+        (WebCore::WebVTTTokenizer::WebVTTTokenizer): Use a reference instead of a
+        pointer for the preprocessor.
+        (WebCore::WebVTTTokenizer::nextToken): Ditto. Also removed the state local
+        variable and the switch statement, replacing with labels instead since we
+        go between states with goto.
+
+        * platform/text/SegmentedString.cpp:
+        (WebCore::SegmentedString::operator=): Changed the return type to be non-const
+        to match normal C++ design rules.
+        (WebCore::SegmentedString::pushBack): Renamed from prepend since this is not a
+        general purpose prepend function. Also fixed assertions to not use the strangely
+        named "escaped" function, since we are deleting it.
+        (WebCore::SegmentedString::append): Ditto.
+        (WebCore::SegmentedString::advancePastNonNewlines): Renamed from advance, since
+        the function only works for non-newlines.
+        (WebCore::SegmentedString::currentColumn): Got rid of unneeded local variable.
+        (WebCore::SegmentedString::advancePastSlowCase): Moved here from header and
+        renamed. This function now consumes the characters if they match.
+
+        * platform/text/SegmentedString.h: Made the changes mentioned above.
+        (WebCore::SegmentedString::excludeLineNumbers): Deleted.
+        (WebCore::SegmentedString::advancePast): Renamed from lookAhead. Also changed
+        behavior so the characters are consumed.
+        (WebCore::SegmentedString::advancePastIgnoringCase): Ditto.
+        (WebCore::SegmentedString::advanceAndASSERT): Deleted.
+        (WebCore::SegmentedString::advanceAndASSERTIgnoringCase): Deleted.
+        (WebCore::SegmentedString::escaped): Deleted.
+
+        * xml/parser/CharacterReferenceParserInlines.h:
+        (WebCore::isHexDigit): Deleted.
+        (WebCore::unconsumeCharacters): Updated for name change.
+        (WebCore::consumeCharacterReference): Removed unneeded name for local enum,
+        renamed local variable "cc" to character. Changed code to use helpers like
+        isASCIIAlpha and toASCIIHexValue. Removed unneeded use of advanceAndASSERT,
+        since we don't really need to assert the character we just extracted.
+
+        * xml/parser/MarkupTokenizerInlines.h:
+        (WebCore::isTokenizerWhitespace): Renamed argument to character.
+        (WebCore::advanceStringAndASSERTIgnoringCase): Deleted.
+        (WebCore::advanceStringAndASSERT): Deleted.
+        Changed all the macro implementations so they set m_state only when
+        returning from the function and just use goto inside the state machine.
+
 2015-01-11  Andreas Kling  <akling@apple.com>
 
         Enable Vector bounds checking for ElementDescendantIterator.
index 5e61fba..cc5a215 100644 (file)
@@ -191,11 +191,6 @@ inline void AtomicHTMLToken::initializeAttributes(const HTMLToken::AttributeList
         if (attribute.name.isEmpty())
             continue;
 
-        ASSERT(attribute.nameRange.start);
-        ASSERT(attribute.nameRange.end);
-        ASSERT(attribute.valueRange.start);
-        ASSERT(attribute.valueRange.end);
-
         QualifiedName name(nullAtom, AtomicString(attribute.name), nullAtom);
 
         // FIXME: This is N^2 for the number of attributes.
index d9643cd..812c9c5 100644 (file)
@@ -39,28 +39,6 @@ namespace WebCore {
 
 using namespace HTMLNames;
 
-// This is a direct transcription of step 4 from:
-// https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments
-static HTMLTokenizer::State tokenizerStateForContextElement(Element& contextElement, bool reportErrors, const HTMLParserOptions& options)
-{
-    const QualifiedName& contextTag = contextElement.tagQName();
-
-    if (contextTag.matches(titleTag) || contextTag.matches(textareaTag))
-        return HTMLTokenizer::RCDATAState;
-    if (contextTag.matches(styleTag)
-        || contextTag.matches(xmpTag)
-        || contextTag.matches(iframeTag)
-        || (contextTag.matches(noembedTag) && options.pluginsEnabled)
-        || (contextTag.matches(noscriptTag) && options.scriptEnabled)
-        || contextTag.matches(noframesTag))
-        return reportErrors ? HTMLTokenizer::RAWTEXTState : HTMLTokenizer::PLAINTEXTState;
-    if (contextTag.matches(scriptTag))
-        return reportErrors ? HTMLTokenizer::ScriptDataState : HTMLTokenizer::PLAINTEXTState;
-    if (contextTag.matches(plaintextTag))
-        return HTMLTokenizer::PLAINTEXTState;
-    return HTMLTokenizer::DataState;
-}
-
 HTMLDocumentParser::HTMLDocumentParser(HTMLDocument& document)
     : ScriptableDocumentParser(document)
     , m_options(document)
@@ -85,8 +63,9 @@ inline HTMLDocumentParser::HTMLDocumentParser(DocumentFragment& fragment, Elemen
     , m_treeBuilder(std::make_unique<HTMLTreeBuilder>(*this, fragment, contextElement, parserContentPolicy(), m_options))
     , m_xssAuditorDelegate(fragment.document())
 {
-    bool reportErrors = false; // For now document fragment parsing never reports errors.
-    m_tokenizer.setState(tokenizerStateForContextElement(contextElement, reportErrors, m_options));
+    // https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments
+    if (contextElement.isHTMLElement())
+        m_tokenizer.updateStateFor(contextElement.tagQName().localName());
     m_xssAuditor.initForFragment();
 }
 
@@ -279,22 +258,22 @@ void HTMLDocumentParser::pumpTokenizer(SynchronousMode mode)
 
     while (canTakeNextToken(mode, session) && !session.needsYield) {
         if (!isParsingFragment())
-            m_sourceTracker.start(m_input.current(), &m_tokenizer, m_token);
+            m_sourceTracker.startToken(m_input.current(), m_tokenizer);
 
-        if (!m_tokenizer.nextToken(m_input.current(), m_token))
+        auto token = m_tokenizer.nextToken(m_input.current());
+        if (!token)
             break;
 
         if (!isParsingFragment()) {
-            m_sourceTracker.end(m_input.current(), &m_tokenizer, m_token);
+            m_sourceTracker.endToken(m_input.current(), m_tokenizer);
 
             // We do not XSS filter innerHTML, which means we (intentionally) fail
             // http/tests/security/xssAuditor/dom-write-innerHTML.html
-            if (auto xssInfo = m_xssAuditor.filterToken(FilterTokenRequest(m_token, m_sourceTracker, m_tokenizer.shouldAllowCDATA())))
+            if (auto xssInfo = m_xssAuditor.filterToken(FilterTokenRequest(*token, m_sourceTracker, m_tokenizer.shouldAllowCDATA())))
                 m_xssAuditorDelegate.didBlockScript(*xssInfo);
         }
 
-        constructTreeFromHTMLToken(m_token);
-        ASSERT(m_token.type() == HTMLToken::Uninitialized);
+        constructTreeFromHTMLToken(token);
     }
 
     // Ensure we haven't been totally deref'ed after pumping. Any caller of this
@@ -308,20 +287,20 @@ void HTMLDocumentParser::pumpTokenizer(SynchronousMode mode)
         m_parserScheduler->scheduleForResume();
 
     if (isWaitingForScripts()) {
-        ASSERT(m_tokenizer.state() == HTMLTokenizer::DataState);
+        ASSERT(m_tokenizer.isInDataState());
         if (!m_preloadScanner) {
             m_preloadScanner = std::make_unique<HTMLPreloadScanner>(m_options, document()->url(), document()->deviceScaleFactor());
             m_preloadScanner->appendToEnd(m_input.current());
         }
-        m_preloadScanner->scan(m_preloader.get(), *document());
+        m_preloadScanner->scan(*m_preloader, *document());
     }
 
     InspectorInstrumentation::didWriteHTML(cookie, m_input.current().currentLine().zeroBasedInt());
 }
 
-void HTMLDocumentParser::constructTreeFromHTMLToken(HTMLToken& rawToken)
+void HTMLDocumentParser::constructTreeFromHTMLToken(HTMLTokenizer::TokenPtr& rawToken)
 {
-    AtomicHTMLToken token(rawToken);
+    AtomicHTMLToken token(*rawToken);
 
     // We clear the rawToken in case constructTreeFromAtomicToken
     // synchronously re-enters the parser. We don't clear the token immedately
@@ -333,15 +312,13 @@ void HTMLDocumentParser::constructTreeFromHTMLToken(HTMLToken& rawToken)
     // FIXME: Stop clearing the rawToken once we start running the parser off
     // the main thread or once we stop allowing synchronous JavaScript
     // execution from parseAttribute.
-    if (rawToken.type() != HTMLToken::Character)
+    if (rawToken->type() != HTMLToken::Character) {
+        // Clearing the TokenPtr makes sure we don't clear the HTMLToken a second time
+        // later when the TokenPtr is destroyed.
         rawToken.clear();
+    }
 
     m_treeBuilder->constructTree(token);
-
-    if (rawToken.type() != HTMLToken::Uninitialized) {
-        ASSERT(rawToken.type() == HTMLToken::Character);
-        rawToken.clear();
-    }
 }
 
 bool HTMLDocumentParser::hasInsertionPoint()
@@ -373,7 +350,7 @@ void HTMLDocumentParser::insert(const SegmentedString& source)
         if (!m_insertionPreloadScanner)
             m_insertionPreloadScanner = std::make_unique<HTMLPreloadScanner>(m_options, document()->url(), document()->deviceScaleFactor());
         m_insertionPreloadScanner->appendToEnd(source);
-        m_insertionPreloadScanner->scan(m_preloader.get(), *document());
+        m_insertionPreloadScanner->scan(*m_preloader, *document());
     }
 
     endIfDelayed();
@@ -398,7 +375,7 @@ void HTMLDocumentParser::append(PassRefPtr<StringImpl> inputSource)
         } else {
             m_preloadScanner->appendToEnd(source);
             if (isWaitingForScripts())
-                m_preloadScanner->scan(m_preloader.get(), *document());
+                m_preloadScanner->scan(*m_preloader, *document());
         }
     }
 
@@ -533,7 +510,7 @@ void HTMLDocumentParser::appendCurrentInputStreamToPreloadScannerAndScan()
 {
     ASSERT(m_preloadScanner);
     m_preloadScanner->appendToEnd(m_input.current());
-    m_preloadScanner->scan(m_preloader.get(), *document());
+    m_preloadScanner->scan(*m_preloader, *document());
 }
 
 void HTMLDocumentParser::notifyFinished(CachedResource* cachedResource)
index 44631fd..fe0435a 100644 (file)
@@ -103,7 +103,7 @@ private:
     bool canTakeNextToken(SynchronousMode, PumpSession&);
     void pumpTokenizer(SynchronousMode);
     void pumpTokenizerIfPossible(SynchronousMode);
-    void constructTreeFromHTMLToken(HTMLToken&);
+    void constructTreeFromHTMLToken(HTMLTokenizer::TokenPtr&);
 
     void runScriptsForPausedTreeBuilder();
     void resumeParsingAfterScriptExecution();
@@ -121,7 +121,6 @@ private:
     HTMLParserOptions m_options;
     HTMLInputStream m_input;
 
-    HTMLToken m_token;
     HTMLTokenizer m_tokenizer;
     std::unique_ptr<HTMLScriptRunner> m_scriptRunner;
     std::unique_ptr<HTMLTreeBuilder> m_treeBuilder;
index a016050..dfdfd6c 100644 (file)
@@ -60,9 +60,9 @@ public:
         return windowsLatin1ExtensionArray[value - 0x80];
     }
 
-    inline static bool acceptMalformed() { return true; }
+    static bool acceptMalformed() { return true; }
 
-    inline static bool consumeNamedEntity(SegmentedString& source, StringBuilder& decodedEntity, bool& notEnoughCharacters, UChar additionalAllowedCharacter, UChar& cc)
+    static bool consumeNamedEntity(SegmentedString& source, StringBuilder& decodedEntity, bool& notEnoughCharacters, UChar additionalAllowedCharacter, UChar& cc)
     {
         StringBuilder consumedCharacters;
         HTMLEntitySearch entitySearch;
@@ -72,7 +72,7 @@ public:
             if (!entitySearch.isEntityPrefix())
                 break;
             consumedCharacters.append(cc);
-            source.advanceAndASSERT(cc);
+            source.advance();
         }
         notEnoughCharacters = source.isEmpty();
         if (notEnoughCharacters) {
@@ -97,7 +97,7 @@ public:
                 cc = source.currentChar();
                 ASSERT_UNUSED(reference, cc == *reference++);
                 consumedCharacters.append(cc);
-                source.advanceAndASSERT(cc);
+                source.advance();
                 ASSERT(!source.isEmpty());
             }
             cc = source.currentChar();
index a7b86b3..e738f5f 100644 (file)
@@ -28,6 +28,7 @@
 
 #include "InputStreamPreprocessor.h"
 #include "SegmentedString.h"
+#include <wtf/text/TextPosition.h>
 
 namespace WebCore {
 
index 11e14a4..752dbee 100644 (file)
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2010 Google Inc. All Rights Reserved.
+ * Copyright (C) 2015 Apple Inc. All Rights Reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
 
 #include "HTMLNames.h"
 #include "HTMLParserIdioms.h"
-#include "HTMLTokenizer.h"
-#include "TextCodec.h"
 #include "TextEncodingRegistry.h"
 
-using namespace WTF;
-
 namespace WebCore {
 
 using namespace HTMLNames;
 
 HTMLMetaCharsetParser::HTMLMetaCharsetParser()
-    : m_tokenizer(std::make_unique<HTMLTokenizer>(HTMLParserOptions()))
-    , m_assumedCodec(newTextCodec(Latin1Encoding()))
-    , m_inHeadSection(true)
-    , m_doneChecking(false)
-{
-}
-
-HTMLMetaCharsetParser::~HTMLMetaCharsetParser()
+    : m_codec(newTextCodec(Latin1Encoding()))
 {
 }
 
-static const char charsetString[] = "charset";
-static const size_t charsetLength = sizeof("charset") - 1;
-
-String HTMLMetaCharsetParser::extractCharset(const String& value)
+static StringView extractCharset(const String& value)
 {
-    size_t pos = 0;
     unsigned length = value.length();
-
-    while (pos < length) {
-        pos = value.find(charsetString, pos, false);
+    for (size_t pos = 0; pos < length; ) {
+        pos = value.find("charset", pos, false);
         if (pos == notFound)
             break;
 
+        static const size_t charsetLength = sizeof("charset") - 1;
         pos += charsetLength;
 
         // Skip whitespace.
@@ -77,12 +63,10 @@ String HTMLMetaCharsetParser::extractCharset(const String& value)
         while (pos < length && value[pos] <= ' ')
             ++pos;
 
-        char quoteMark = 0;
-        if (pos < length && (value[pos] == '"' || value[pos] == '\'')) {
-            quoteMark = static_cast<char>(value[pos++]);
-            ASSERT(!(quoteMark & 0x80));
-        }
-            
+        UChar quoteMark = 0;
+        if (pos < length && (value[pos] == '"' || value[pos] == '\''))
+            quoteMark = value[pos++];
+
         if (pos == length)
             break;
 
@@ -93,19 +77,17 @@ String HTMLMetaCharsetParser::extractCharset(const String& value)
         if (quoteMark && (end == length))
             break; // Close quote not found.
 
-        return value.substring(pos, end - pos);
+        return StringView(value).substring(pos, end - pos);
     }
-
-    return "";
+    return StringView();
 }
 
-bool HTMLMetaCharsetParser::processMeta()
+bool HTMLMetaCharsetParser::processMeta(HTMLToken& token)
 {
-    const HTMLToken::AttributeList& tokenAttributes = m_token.attributes();
     AttributeList attributes;
-    for (HTMLToken::AttributeList::const_iterator iter = tokenAttributes.begin(); iter != tokenAttributes.end(); ++iter) {
-        String attributeName = StringImpl::create8BitIfPossible(iter->name);
-        String attributeValue = StringImpl::create8BitIfPossible(iter->value);
+    for (auto& attribute : token.attributes()) {
+        String attributeName = StringImpl::create8BitIfPossible(attribute.name);
+        String attributeValue = StringImpl::create8BitIfPossible(attribute.value);
         attributes.append(std::make_pair(attributeName, attributeValue));
     }
 
@@ -116,12 +98,12 @@ bool HTMLMetaCharsetParser::processMeta()
 TextEncoding HTMLMetaCharsetParser::encodingFromMetaAttributes(const AttributeList& attributes)
 {
     bool gotPragma = false;
-    Mode mode = None;
-    String charset;
+    enum { None, Charset, Pragma } mode = None;
+    StringView charset;
 
-    for (AttributeList::const_iterator iter = attributes.begin(); iter != attributes.end(); ++iter) {
-        const AtomicString& attributeName = iter->first;
-        const String& attributeValue = iter->second;
+    for (auto& attribute : attributes) {
+        const String& attributeName = attribute.first;
+        const String& attributeValue = attribute.second;
 
         if (attributeName == http_equivAttr) {
             if (equalIgnoringCase(attributeValue, "content-type"))
@@ -139,13 +121,11 @@ TextEncoding HTMLMetaCharsetParser::encodingFromMetaAttributes(const AttributeLi
     }
 
     if (mode == Charset || (mode == Pragma && gotPragma))
-        return TextEncoding(stripLeadingAndTrailingHTMLSpaces(charset));
+        return TextEncoding(stripLeadingAndTrailingHTMLSpaces(charset.toStringWithoutCopying()));
 
     return TextEncoding();
 }
 
-static const int bytesToCheckUnconditionally = 1024; // That many input bytes will be checked for meta charset even if <head> section is over.
-
 bool HTMLMetaCharsetParser::checkForMetaCharset(const char* data, size_t length)
 {
     if (m_doneChecking)
@@ -156,30 +136,32 @@ bool HTMLMetaCharsetParser::checkForMetaCharset(const char* data, size_t length)
     // We still don't have an encoding, and are in the head.
     // The following tags are allowed in <head>:
     // SCRIPT|STYLE|META|LINK|OBJECT|TITLE|BASE
-
+    //
     // We stop scanning when a tag that is not permitted in <head>
     // is seen, rather when </head> is seen, because that more closely
     // matches behavior in other browsers; more details in
     // <http://bugs.webkit.org/show_bug.cgi?id=3590>.
-
+    //
     // Additionally, we ignore things that looks like tags in <title>, <script>
     // and <noscript>; see <http://bugs.webkit.org/show_bug.cgi?id=4560>,
     // <http://bugs.webkit.org/show_bug.cgi?id=12165> and
     // <http://bugs.webkit.org/show_bug.cgi?id=12389>.
-
+    //
     // Since many sites have charset declarations after <body> or other tags
     // that are disallowed in <head>, we don't bail out until we've checked at
     // least bytesToCheckUnconditionally bytes of input.
 
-    m_input.append(SegmentedString(m_assumedCodec->decode(data, length)));
+    static const int bytesToCheckUnconditionally = 1024;
+
+    m_input.append(SegmentedString(m_codec->decode(data, length)));
 
-    while (m_tokenizer->nextToken(m_input, m_token)) {
-        bool end = m_token.type() == HTMLToken::EndTag;
-        if (end || m_token.type() == HTMLToken::StartTag) {
-            AtomicString tagName(m_token.name());
-            if (!end) {
-                m_tokenizer->updateStateFor(tagName);
-                if (tagName == metaTag && processMeta()) {
+    while (auto token = m_tokenizer.nextToken(m_input)) {
+        bool isEnd = token->type() == HTMLToken::EndTag;
+        if (isEnd || token->type() == HTMLToken::StartTag) {
+            AtomicString tagName(token->name());
+            if (!isEnd) {
+                m_tokenizer.updateStateFor(tagName);
+                if (tagName == metaTag && processMeta(*token)) {
                     m_doneChecking = true;
                     return true;
                 }
@@ -189,7 +171,8 @@ bool HTMLMetaCharsetParser::checkForMetaCharset(const char* data, size_t length)
                 && tagName != styleTag && tagName != linkTag
                 && tagName != metaTag && tagName != objectTag
                 && tagName != titleTag && tagName != baseTag
-                && (end || tagName != htmlTag) && (end || tagName != headTag)) {
+                && (isEnd || tagName != htmlTag)
+                && (isEnd || tagName != headTag)) {
                 m_inHeadSection = false;
             }
         }
@@ -198,8 +181,6 @@ bool HTMLMetaCharsetParser::checkForMetaCharset(const char* data, size_t length)
             m_doneChecking = true;
             return true;
         }
-
-        m_token.clear();
     }
 
     return false;
index de028bb..2c6d7c5 100644 (file)
 #ifndef HTMLMetaCharsetParser_h
 #define HTMLMetaCharsetParser_h
 
-#include "HTMLToken.h"
+#include "HTMLTokenizer.h"
 #include "SegmentedString.h"
 #include "TextEncoding.h"
-#include <wtf/Noncopyable.h>
 
 namespace WebCore {
 
-class HTMLTokenizer;
 class TextCodec;
 
 class HTMLMetaCharsetParser {
     WTF_MAKE_NONCOPYABLE(HTMLMetaCharsetParser); WTF_MAKE_FAST_ALLOCATED;
 public:
     HTMLMetaCharsetParser();
-    ~HTMLMetaCharsetParser();
 
     // Returns true if done checking, regardless whether an encoding is found.
     bool checkForMetaCharset(const char*, size_t);
 
     const TextEncoding& encoding() { return m_encoding; }
 
-    typedef Vector<std::pair<String, String>> AttributeList;
     // The returned encoding might not be valid.
-    static TextEncoding encodingFromMetaAttributes(const AttributeList&
-);
+    typedef Vector<std::pair<String, String>> AttributeList;
+    static TextEncoding encodingFromMetaAttributes(const AttributeList&);
 
 private:
-    bool processMeta();
-    static String extractCharset(const String&);
+    bool processMeta(HTMLToken&);
 
-    enum Mode {
-        None,
-        Charset,
-        Pragma,
-    };
-
-    std::unique_ptr<HTMLTokenizer> m_tokenizer;
-    std::unique_ptr<TextCodec> m_assumedCodec;
+    HTMLTokenizer m_tokenizer;
+    const std::unique_ptr<TextCodec> m_codec;
     SegmentedString m_input;
-    HTMLToken m_token;
-    bool m_inHeadSection;
-
-    bool m_doneChecking;
+    bool m_inHeadSection { true };
+    bool m_doneChecking { false };
     TextEncoding m_encoding;
 };
 
index 149e40c..47031b1 100644 (file)
@@ -242,40 +242,8 @@ private:
 
 TokenPreloadScanner::TokenPreloadScanner(const URL& documentURL, float deviceScaleFactor)
     : m_documentURL(documentURL)
-    , m_inStyle(false)
     , m_deviceScaleFactor(deviceScaleFactor)
-#if ENABLE(TEMPLATE_ELEMENT)
-    , m_templateCount(0)
-#endif
-{
-}
-
-TokenPreloadScanner::~TokenPreloadScanner()
-{
-}
-
-TokenPreloadScannerCheckpoint TokenPreloadScanner::createCheckpoint()
-{
-    TokenPreloadScannerCheckpoint checkpoint = m_checkpoints.size();
-    m_checkpoints.append(Checkpoint(m_predictedBaseElementURL, m_inStyle
-#if ENABLE(TEMPLATE_ELEMENT)
-                                    , m_templateCount
-#endif
-                                    ));
-    return checkpoint;
-}
-
-void TokenPreloadScanner::rewindTo(TokenPreloadScannerCheckpoint checkpointIndex)
 {
-    ASSERT(checkpointIndex < m_checkpoints.size()); // If this ASSERT fires, checkpointIndex is invalid.
-    const Checkpoint& checkpoint = m_checkpoints[checkpointIndex];
-    m_predictedBaseElementURL = checkpoint.predictedBaseElementURL;
-    m_inStyle = checkpoint.inStyle;
-#if ENABLE(TEMPLATE_ELEMENT)
-    m_templateCount = checkpoint.templateCount;
-#endif
-    m_cssScanner.reset();
-    m_checkpoints.clear();
 }
 
 void TokenPreloadScanner::scan(const HTMLToken& token, Vector<std::unique_ptr<PreloadRequest>>& requests, Document& document)
@@ -349,11 +317,7 @@ void TokenPreloadScanner::updatePredictedBaseURL(const HTMLToken& token)
 
 HTMLPreloadScanner::HTMLPreloadScanner(const HTMLParserOptions& options, const URL& documentURL, float deviceScaleFactor)
     : m_scanner(documentURL, deviceScaleFactor)
-    , m_tokenizer(std::make_unique<HTMLTokenizer>(options))
-{
-}
-
-HTMLPreloadScanner::~HTMLPreloadScanner()
+    , m_tokenizer(options)
 {
 }
 
@@ -362,7 +326,7 @@ void HTMLPreloadScanner::appendToEnd(const SegmentedString& source)
     m_source.append(source);
 }
 
-void HTMLPreloadScanner::scan(HTMLResourcePreloader* preloader, Document& document)
+void HTMLPreloadScanner::scan(HTMLResourcePreloader& preloader, Document& document)
 {
     ASSERT(isMainThread()); // HTMLTokenizer::updateStateFor only works on the main thread.
 
@@ -374,14 +338,13 @@ void HTMLPreloadScanner::scan(HTMLResourcePreloader* preloader, Document& docume
 
     PreloadRequestStream requests;
 
-    while (m_tokenizer->nextToken(m_source, m_token)) {
-        if (m_token.type() == HTMLToken::StartTag)
-            m_tokenizer->updateStateFor(AtomicString(m_token.name()));
-        m_scanner.scan(m_token, requests, document);
-        m_token.clear();
+    while (auto token = m_tokenizer.nextToken(m_source)) {
+        if (token->type() == HTMLToken::StartTag)
+            m_tokenizer.updateStateFor(AtomicString(token->name()));
+        m_scanner.scan(*token, requests, document);
     }
 
-    preloader->preload(WTF::move(requests));
+    preloader.preload(WTF::move(requests));
 }
 
 }
index 9dd6cfc..1a9c283 100644 (file)
 #define HTMLPreloadScanner_h
 
 #include "CSSPreloadScanner.h"
-#include "HTMLToken.h"
+#include "HTMLTokenizer.h"
 #include "SegmentedString.h"
-#include <wtf/Vector.h>
 
 namespace WebCore {
 
-typedef size_t TokenPreloadScannerCheckpoint;
-
-class HTMLParserOptions;
-class HTMLTokenizer;
-class SegmentedString;
-class Frame;
-
 class TokenPreloadScanner {
-    WTF_MAKE_NONCOPYABLE(TokenPreloadScanner); WTF_MAKE_FAST_ALLOCATED;
+    WTF_MAKE_NONCOPYABLE(TokenPreloadScanner);
 public:
     explicit TokenPreloadScanner(const URL& documentURL, float deviceScaleFactor = 1.0);
-    ~TokenPreloadScanner();
 
-    void scan(const HTMLToken&, PreloadRequestStream& requests, Document&);
+    void scan(const HTMLToken&, PreloadRequestStream&, Document&);
 
     void setPredictedBaseElementURL(const URL& url) { m_predictedBaseElementURL = url; }
 
-    // A TokenPreloadScannerCheckpoint is valid until the next call to rewindTo,
-    // at which point all outstanding checkpoints are invalidated.
-    TokenPreloadScannerCheckpoint createCheckpoint();
-    void rewindTo(TokenPreloadScannerCheckpoint);
-
-    bool isSafeToSendToAnotherThread()
-    {
-        return m_documentURL.isSafeToSendToAnotherThread()
-            && m_predictedBaseElementURL.isSafeToSendToAnotherThread();
-    }
-
 private:
     enum class TagId {
         // These tags are scanned by the StartTagScanner.
@@ -85,54 +65,29 @@ private:
 
     void updatePredictedBaseURL(const HTMLToken&);
 
-    struct Checkpoint {
-        Checkpoint(const URL& predictedBaseElementURL, bool inStyle
-#if ENABLE(TEMPLATE_ELEMENT)
-            , size_t templateCount
-#endif
-            )
-            : predictedBaseElementURL(predictedBaseElementURL)
-            , inStyle(inStyle)
-#if ENABLE(TEMPLATE_ELEMENT)
-            , templateCount(templateCount)
-#endif
-        {
-        }
-
-        URL predictedBaseElementURL;
-        bool inStyle;
-#if ENABLE(TEMPLATE_ELEMENT)
-        size_t templateCount;
-#endif
-    };
-
     CSSPreloadScanner m_cssScanner;
     const URL m_documentURL;
-    URL m_predictedBaseElementURL;
-    bool m_inStyle;
-    float m_deviceScaleFactor;
+    const float m_deviceScaleFactor { 1 };
 
+    URL m_predictedBaseElementURL;
+    bool m_inStyle { false };
 #if ENABLE(TEMPLATE_ELEMENT)
-    size_t m_templateCount;
+    unsigned m_templateCount { 0 };
 #endif
-
-    Vector<Checkpoint> m_checkpoints;
 };
 
 class HTMLPreloadScanner {
-    WTF_MAKE_NONCOPYABLE(HTMLPreloadScanner); WTF_MAKE_FAST_ALLOCATED;
+    WTF_MAKE_FAST_ALLOCATED;
 public:
     HTMLPreloadScanner(const HTMLParserOptions&, const URL& documentURL, float deviceScaleFactor = 1.0);
-    ~HTMLPreloadScanner();
 
     void appendToEnd(const SegmentedString&);
-    void scan(HTMLResourcePreloader*, Document&);
+    void scan(HTMLResourcePreloader&, Document&);
 
 private:
     TokenPreloadScanner m_scanner;
     SegmentedString m_source;
-    HTMLToken m_token;
-    std::unique_ptr<HTMLTokenizer> m_tokenizer;
+    HTMLTokenizer m_tokenizer;
 };
 
 }
index 16ea761..3c7b7ad 100644 (file)
 
 namespace WebCore {
 
-bool PreloadRequest::isSafeToSendToAnotherThread() const
-{
-    return m_initiator.isSafeToSendToAnotherThread()
-        && m_charset.isSafeToSendToAnotherThread()
-        && m_resourceURL.isSafeToSendToAnotherThread()
-        && m_mediaAttribute.isSafeToSendToAnotherThread()
-        && m_baseURL.isSafeToSendToAnotherThread();
-}
-
 URL PreloadRequest::completeURL(Document& document)
 {
     return document.completeURL(m_resourceURL, m_baseURL.isEmpty() ? document.url() : m_baseURL);
index f93a093..2a8b6c8 100644 (file)
@@ -35,16 +35,14 @@ class PreloadRequest {
 public:
     PreloadRequest(const String& initiator, const String& resourceURL, const URL& baseURL, CachedResource::Type resourceType, const String& mediaAttribute)
         : m_initiator(initiator)
-        , m_resourceURL(resourceURL.isolatedCopy())
+        , m_resourceURL(resourceURL)
         , m_baseURL(baseURL.copy())
         , m_resourceType(resourceType)
-        , m_mediaAttribute(mediaAttribute.isolatedCopy())
+        , m_mediaAttribute(mediaAttribute)
         , m_crossOriginModeAllowsCookies(false)
     {
     }
 
-    bool isSafeToSendToAnotherThread() const;
-
     CachedResourceRequest resourceRequest(Document&);
 
     const String& charset() const { return m_charset; }
index f48d87c..0c9a046 100644 (file)
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2010 Adam Barth. All Rights Reserved.
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -25,6 +26,7 @@
 
 #include "config.h"
 #include "HTMLSourceTracker.h"
+
 #include "HTMLTokenizer.h"
 #include <wtf/text/StringBuilder.h>
 
@@ -34,36 +36,41 @@ HTMLSourceTracker::HTMLSourceTracker()
 {
 }
 
-void HTMLSourceTracker::start(SegmentedString& currentInput, HTMLTokenizer* tokenizer, HTMLToken& token)
+void HTMLSourceTracker::startToken(SegmentedString& currentInput, HTMLTokenizer& tokenizer)
 {
-    if (token.type() == HTMLToken::Uninitialized) {
-        m_previousSource.clear();
-        if (tokenizer->numberOfBufferedCharacters())
-            m_previousSource = tokenizer->bufferedCharacters();
+    if (!m_started) {
+        if (tokenizer.numberOfBufferedCharacters())
+            m_previousSource = tokenizer.bufferedCharacters();
+        else
+            m_previousSource.clear();
+        m_started = true;
     } else
         m_previousSource.append(m_currentSource);
 
     m_currentSource = currentInput;
-    token.setBaseOffset(m_currentSource.numberOfCharactersConsumed() - m_previousSource.length());
+    m_tokenStart = m_currentSource.numberOfCharactersConsumed() - m_previousSource.length();
 }
 
-void HTMLSourceTracker::end(SegmentedString& currentInput, HTMLTokenizer* tokenizer, HTMLToken& token)
+void HTMLSourceTracker::endToken(SegmentedString& currentInput, HTMLTokenizer& tokenizer)
 {
-    m_cachedSourceForToken = String();
+    ASSERT(m_started);
+    m_started = false;
 
-    // FIXME: This work should really be done by the HTMLTokenizer.
-    token.setEndOffset(currentInput.numberOfCharactersConsumed() - tokenizer->numberOfBufferedCharacters());
+    m_tokenEnd = currentInput.numberOfCharactersConsumed() - tokenizer.numberOfBufferedCharacters();
+    m_cachedSourceForToken = String();
 }
 
-String HTMLSourceTracker::sourceForToken(const HTMLToken& token)
+String HTMLSourceTracker::source(const HTMLToken& token)
 {
+    ASSERT(!m_started);
+
     if (token.type() == HTMLToken::EndOfFile)
         return String(); // Hides the null character we use to mark the end of file.
 
     if (!m_cachedSourceForToken.isEmpty())
         return m_cachedSourceForToken;
 
-    unsigned length = token.length();
+    unsigned length = m_tokenEnd - m_tokenStart;
 
     StringBuilder source;
     source.reserveCapacity(length);
@@ -83,4 +90,9 @@ String HTMLSourceTracker::sourceForToken(const HTMLToken& token)
     return m_cachedSourceForToken;
 }
 
+String HTMLSourceTracker::source(const HTMLToken& token, unsigned attributeStart, unsigned attributeEnd)
+{
+    return source(token).substring(attributeStart - m_tokenStart, attributeEnd - attributeStart);
+}
+
 }
index 7f0378b..3601e25 100644 (file)
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2010 Adam Barth. All Rights Reserved.
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
 #ifndef HTMLSourceTracker_h
 #define HTMLSourceTracker_h
 
-#include "HTMLToken.h"
 #include "SegmentedString.h"
 
 namespace WebCore {
 
+class HTMLToken;
 class HTMLTokenizer;
 
 class HTMLSourceTracker {
@@ -38,15 +39,18 @@ class HTMLSourceTracker {
 public:
     HTMLSourceTracker();
 
-    // FIXME: Once we move "end" into HTMLTokenizer, rename "start" to
-    // something that makes it obvious that this method can be called multiple
-    // times.
-    void start(SegmentedString&, HTMLTokenizer*, HTMLToken&);
-    void end(SegmentedString&, HTMLTokenizer*, HTMLToken&);
+    void startToken(SegmentedString&, HTMLTokenizer&);
+    void endToken(SegmentedString&, HTMLTokenizer&);
 
-    String sourceForToken(const HTMLToken&);
+    String source(const HTMLToken&);
+    String source(const HTMLToken&, unsigned attributeStart, unsigned attributeEnd);
 
 private:
+    bool m_started { false };
+
+    unsigned m_tokenStart;
+    unsigned m_tokenEnd;
+
     SegmentedString m_previousSource;
     SegmentedString m_currentSource;
 
index 8a0349c..6171721 100644 (file)
@@ -53,15 +53,12 @@ public:
     };
 
     struct Attribute {
-        struct Range {
-            unsigned start;
-            unsigned end;
-        };
-
-        Range nameRange;
-        Range valueRange;
         Vector<UChar, 32> name;
         Vector<UChar, 32> value;
+
+        // Used by HTMLSourceTracker.
+        unsigned startOffset;
+        unsigned endOffset;
     };
 
     typedef Vector<Attribute, 10> AttributeList;
@@ -73,11 +70,6 @@ public:
 
     Type type() const;
 
-    // Used by HTMLSourceTracker.
-    void setBaseOffset(unsigned); // Base for attribute offsets, and the end of token offset.
-    void setEndOffset(unsigned);
-    unsigned length() const;
-
     // EndOfFile
 
     void makeEndOfFile();
@@ -113,15 +105,10 @@ public:
     void beginEndTag(LChar);
     void beginEndTag(const Vector<LChar, 32>&);
 
-    void addNewAttribute();
-
-    void beginAttributeName(unsigned offset);
+    void beginAttribute(unsigned offset);
     void appendToAttributeName(UChar);
-    void endAttributeName(unsigned offset);
-
-    void beginAttributeValue(unsigned offset);
     void appendToAttributeValue(UChar);
-    void endAttributeValue(unsigned offset);
+    void endAttribute(unsigned offset);
 
     void setSelfClosing();
 
@@ -154,9 +141,6 @@ public:
 private:
     Type m_type;
 
-    unsigned m_baseOffset;
-    unsigned m_length;
-
     DataVector m_data;
     UChar m_data8BitCheck;
 
@@ -172,8 +156,9 @@ private:
 const HTMLToken::Attribute* findAttribute(const Vector<HTMLToken::Attribute>&, StringView name);
 
 inline HTMLToken::HTMLToken()
+    : m_type(Uninitialized)
+    , m_data8BitCheck(0)
 {
-    clear();
 }
 
 inline void HTMLToken::clear()
@@ -181,9 +166,6 @@ inline void HTMLToken::clear()
     m_type = Uninitialized;
     m_data.clear();
     m_data8BitCheck = 0;
-
-    m_length = 0;
-    m_baseOffset = 0;
 }
 
 inline HTMLToken::Type HTMLToken::type() const
@@ -197,21 +179,6 @@ inline void HTMLToken::makeEndOfFile()
     m_type = EndOfFile;
 }
 
-inline unsigned HTMLToken::length() const
-{
-    return m_length;
-}
-
-inline void HTMLToken::setBaseOffset(unsigned offset)
-{
-    m_baseOffset = offset;
-}
-
-inline void HTMLToken::setEndOffset(unsigned endOffset)
-{
-    m_length = endOffset - m_baseOffset;
-}
-
 inline const HTMLToken::DataVector& HTMLToken::name() const
 {
     ASSERT(m_type == StartTag || m_type == EndTag || m_type == DOCTYPE);
@@ -300,9 +267,12 @@ inline void HTMLToken::beginStartTag(UChar character)
     ASSERT(m_type == Uninitialized);
     m_type = StartTag;
     m_selfClosing = false;
-    m_currentAttribute = nullptr;
     m_attributes.clear();
 
+#if !ASSERT_DISABLED
+    m_currentAttribute = nullptr;
+#endif
+
     m_data.append(character);
     m_data8BitCheck = character;
 }
@@ -312,9 +282,12 @@ inline void HTMLToken::beginEndTag(LChar character)
     ASSERT(m_type == Uninitialized);
     m_type = EndTag;
     m_selfClosing = false;
-    m_currentAttribute = nullptr;
     m_attributes.clear();
 
+#if !ASSERT_DISABLED
+    m_currentAttribute = nullptr;
+#endif
+
     m_data.append(character);
 }
 
@@ -323,64 +296,41 @@ inline void HTMLToken::beginEndTag(const Vector<LChar, 32>& characters)
     ASSERT(m_type == Uninitialized);
     m_type = EndTag;
     m_selfClosing = false;
-    m_currentAttribute = nullptr;
     m_attributes.clear();
 
-    m_data.appendVector(characters);
-}
-
-inline void HTMLToken::addNewAttribute()
-{
-    ASSERT(m_type == StartTag || m_type == EndTag);
-    m_attributes.grow(m_attributes.size() + 1);
-    m_currentAttribute = &m_attributes.last();
-
 #if !ASSERT_DISABLED
-    m_currentAttribute->nameRange.start = 0;
-    m_currentAttribute->nameRange.end = 0;
-    m_currentAttribute->valueRange.start = 0;
-    m_currentAttribute->valueRange.end = 0;
+    m_currentAttribute = nullptr;
 #endif
-}
 
-inline void HTMLToken::beginAttributeName(unsigned offset)
-{
-    ASSERT(offset);
-    ASSERT(!m_currentAttribute->nameRange.start);
-    m_currentAttribute->nameRange.start = offset - m_baseOffset;
+    m_data.appendVector(characters);
 }
 
-inline void HTMLToken::endAttributeName(unsigned offset)
+inline void HTMLToken::beginAttribute(unsigned offset)
 {
+    ASSERT(m_type == StartTag || m_type == EndTag);
     ASSERT(offset);
-    ASSERT(m_currentAttribute->nameRange.start);
-    ASSERT(!m_currentAttribute->nameRange.end);
 
-    unsigned adjustedOffset = offset - m_baseOffset;
-    m_currentAttribute->nameRange.end = adjustedOffset;
-
-    // FIXME: Is this intentional? Why point the value at the end of the name?
-    m_currentAttribute->valueRange.start = adjustedOffset;
-    m_currentAttribute->valueRange.end = adjustedOffset;
-}
+    m_attributes.grow(m_attributes.size() + 1);
+    m_currentAttribute = &m_attributes.last();
 
-inline void HTMLToken::beginAttributeValue(unsigned offset)
-{
-    ASSERT(offset);
-    m_currentAttribute->valueRange.start = offset - m_baseOffset;
+    m_currentAttribute->startOffset = offset;
 }
 
-inline void HTMLToken::endAttributeValue(unsigned offset)
+inline void HTMLToken::endAttribute(unsigned offset)
 {
     ASSERT(offset);
-    m_currentAttribute->valueRange.end = offset - m_baseOffset;
+    ASSERT(m_currentAttribute);
+    m_currentAttribute->endOffset = offset;
+#if !ASSERT_DISABLED
+    m_currentAttribute = nullptr;
+#endif
 }
 
 inline void HTMLToken::appendToAttributeName(UChar character)
 {
     ASSERT(character);
     ASSERT(m_type == StartTag || m_type == EndTag);
-    ASSERT(m_currentAttribute->nameRange.start);
+    ASSERT(m_currentAttribute);
     m_currentAttribute->name.append(character);
 }
 
@@ -388,7 +338,7 @@ inline void HTMLToken::appendToAttributeValue(UChar character)
 {
     ASSERT(character);
     ASSERT(m_type == StartTag || m_type == EndTag);
-    ASSERT(m_currentAttribute->valueRange.start);
+    ASSERT(m_currentAttribute);
     m_currentAttribute->value.append(character);
 }
 
index 063bab8..489e6c5 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2008 Apple Inc. All Rights Reserved.
+ * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved.
  * Copyright (C) 2009 Torch Mobile, Inc. http://www.torchmobile.com/
  * Copyright (C) 2010 Google, Inc. All Rights Reserved.
  *
 #include "HTMLTokenizer.h"
 
 #include "HTMLEntityParser.h"
-#include "HTMLTreeBuilder.h"
+#include "HTMLNames.h"
 #include "MarkupTokenizerInlines.h"
-#include "NotImplemented.h"
 #include <wtf/ASCIICType.h>
-#include <wtf/CurrentTime.h>
-#include <wtf/text/CString.h>
 
 using namespace WTF;
 
@@ -42,64 +39,95 @@ namespace WebCore {
 
 using namespace HTMLNames;
 
-static inline UChar toLowerCase(UChar cc)
+static inline LChar convertASCIIAlphaToLower(UChar character)
 {
-    ASSERT(isASCIIUpper(cc));
-    const int lowerCaseOffset = 0x20;
-    return cc + lowerCaseOffset;
+    ASSERT(isASCIIAlpha(character));
+    return toASCIILowerUnchecked(character);
 }
 
-static inline bool vectorEqualsString(const Vector<LChar, 32>& vector, const String& string)
+static inline bool vectorEqualsString(const Vector<LChar, 32>& vector, const char* string)
 {
-    if (vector.size() != string.length())
-        return false;
-
-    if (!string.length())
-        return true;
-
-    return equal(string.impl(), vector.data(), vector.size());
+    unsigned size = vector.size();
+    for (unsigned i = 0; i < size; ++i) {
+        if (!string[i] || vector[i] != string[i])
+            return false;
+    }
+    return !string[size];
 }
 
-static inline bool isEndTagBufferingState(HTMLTokenizer::State state)
+inline bool HTMLTokenizer::inEndTagBufferingState() const
 {
-    switch (state) {
-    case HTMLTokenizer::RCDATAEndTagOpenState:
-    case HTMLTokenizer::RCDATAEndTagNameState:
-    case HTMLTokenizer::RAWTEXTEndTagOpenState:
-    case HTMLTokenizer::RAWTEXTEndTagNameState:
-    case HTMLTokenizer::ScriptDataEndTagOpenState:
-    case HTMLTokenizer::ScriptDataEndTagNameState:
-    case HTMLTokenizer::ScriptDataEscapedEndTagOpenState:
-    case HTMLTokenizer::ScriptDataEscapedEndTagNameState:
+    switch (m_state) {
+    case RCDATAEndTagOpenState:
+    case RCDATAEndTagNameState:
+    case RAWTEXTEndTagOpenState:
+    case RAWTEXTEndTagNameState:
+    case ScriptDataEndTagOpenState:
+    case ScriptDataEndTagNameState:
+    case ScriptDataEscapedEndTagOpenState:
+    case ScriptDataEscapedEndTagNameState:
         return true;
     default:
         return false;
     }
 }
 
-#define HTML_BEGIN_STATE(stateName) BEGIN_STATE(HTMLTokenizer, stateName)
-#define HTML_RECONSUME_IN(stateName) RECONSUME_IN(HTMLTokenizer, stateName)
-#define HTML_ADVANCE_TO(stateName) ADVANCE_TO(HTMLTokenizer, stateName)
-#define HTML_SWITCH_TO(stateName) SWITCH_TO(HTMLTokenizer, stateName)
-
 HTMLTokenizer::HTMLTokenizer(const HTMLParserOptions& options)
-    : m_inputStreamPreprocessor(this)
+    : m_preprocessor(*this)
     , m_options(options)
 {
-    reset();
 }
 
-HTMLTokenizer::~HTMLTokenizer()
+inline void HTMLTokenizer::bufferASCIICharacter(UChar character)
+{
+    ASSERT(character != kEndOfFileMarker);
+    ASSERT(isASCII(character));
+    LChar narrowedCharacter = character;
+    m_token.appendToCharacter(narrowedCharacter);
+}
+
+inline void HTMLTokenizer::bufferCharacter(UChar character)
+{
+    ASSERT(character != kEndOfFileMarker);
+    m_token.appendToCharacter(character);
+}
+
+inline bool HTMLTokenizer::emitAndResumeInDataState(SegmentedString& source)
+{
+    saveEndTagNameIfNeeded();
+    m_state = DataState;
+    source.advanceAndUpdateLineNumber();
+    return true;
+}
+
+inline bool HTMLTokenizer::emitAndReconsumeInDataState()
+{
+    saveEndTagNameIfNeeded();
+    m_state = DataState;
+    return true;
+}
+
+inline bool HTMLTokenizer::emitEndOfFile(SegmentedString& source)
+{
+    m_state = DataState;
+    if (haveBufferedCharacterToken())
+        return true;
+    source.advance();
+    m_token.clear();
+    m_token.makeEndOfFile();
+    return true;
+}
+
+inline void HTMLTokenizer::saveEndTagNameIfNeeded()
 {
+    ASSERT(m_token.type() != HTMLToken::Uninitialized);
+    if (m_token.type() == HTMLToken::StartTag)
+        m_appropriateEndTagName = m_token.name();
 }
 
-void HTMLTokenizer::reset()
+inline bool HTMLTokenizer::haveBufferedCharacterToken() const
 {
-    m_state = HTMLTokenizer::DataState;
-    m_token = 0;
-    m_forceNullCharacterReplacement = false;
-    m_shouldAllowCDATA = false;
-    m_additionalAllowedCharacter = '\0';
+    return m_token.type() == HTMLToken::Character;
 }
 
 inline bool HTMLTokenizer::processEntity(SegmentedString& source)
@@ -119,1426 +147,1246 @@ inline bool HTMLTokenizer::processEntity(SegmentedString& source)
     return true;
 }
 
-bool HTMLTokenizer::flushBufferedEndTag(SegmentedString& source)
+void HTMLTokenizer::flushBufferedEndTag()
 {
-    ASSERT(m_token->type() == HTMLToken::Character || m_token->type() == HTMLToken::Uninitialized);
-    source.advanceAndUpdateLineNumber();
-    if (m_token->type() == HTMLToken::Character)
-        return true;
-    m_token->beginEndTag(m_bufferedEndTagName);
+    m_token.beginEndTag(m_bufferedEndTagName);
     m_bufferedEndTagName.clear();
     m_appropriateEndTagName.clear();
     m_temporaryBuffer.clear();
+}
+
+bool HTMLTokenizer::commitToPartialEndTag(SegmentedString& source, UChar character, State state)
+{
+    ASSERT(source.currentChar() == character);
+    appendToTemporaryBuffer(character);
+    source.advanceAndUpdateLineNumber();
+
+    if (haveBufferedCharacterToken()) {
+        // Emit the buffered character token.
+        // The next call to processToken will flush the buffered end tag and continue parsing it.
+        m_state = state;
+        return true;
+    }
+
+    flushBufferedEndTag();
     return false;
 }
 
-#define FLUSH_AND_ADVANCE_TO(stateName)                                    \
-    do {                                                                   \
-        m_state = HTMLTokenizer::stateName;                           \
-        if (flushBufferedEndTag(source))                                   \
-            return true;                                                   \
-        if (source.isEmpty()                                               \
-            || !m_inputStreamPreprocessor.peek(source))                    \
-            return haveBufferedCharacterToken();                           \
-        cc = m_inputStreamPreprocessor.nextInputCharacter();               \
-        goto stateName;                                                    \
-    } while (false)
-
-bool HTMLTokenizer::flushEmitAndResumeIn(SegmentedString& source, HTMLTokenizer::State state)
+bool HTMLTokenizer::commitToCompleteEndTag(SegmentedString& source)
 {
-    m_state = state;
-    flushBufferedEndTag(source);
+    ASSERT(source.currentChar() == '>');
+    appendToTemporaryBuffer('>');
+    source.advance();
+
+    m_state = DataState;
+
+    if (haveBufferedCharacterToken()) {
+        // Emit the character token we already have.
+        // The next call to processToken will flush the buffered end tag and emit it.
+        return true;
+    }
+
+    flushBufferedEndTag();
     return true;
 }
 
-bool HTMLTokenizer::nextToken(SegmentedString& source, HTMLToken& token)
+bool HTMLTokenizer::processToken(SegmentedString& source)
 {
-    // If we have a token in progress, then we're supposed to be called back
-    // with the same token so we can finish it.
-    ASSERT(!m_token || m_token == &token || token.type() == HTMLToken::Uninitialized);
-    m_token = &token;
-
-    if (!m_bufferedEndTagName.isEmpty() && !isEndTagBufferingState(m_state)) {
-        // FIXME: This should call flushBufferedEndTag().
-        // We started an end tag during our last iteration.
-        m_token->beginEndTag(m_bufferedEndTagName);
-        m_bufferedEndTagName.clear();
-        m_appropriateEndTagName.clear();
-        m_temporaryBuffer.clear();
-        if (m_state == HTMLTokenizer::DataState) {
-            // We're back in the data state, so we must be done with the tag.
+    if (!m_bufferedEndTagName.isEmpty() && !inEndTagBufferingState()) {
+        // We are back here after emitting a character token that came just before an end tag.
+        // To continue parsing the end tag we need to move the buffered tag name into the token.
+        flushBufferedEndTag();
+
+        // If we are in the data state, the end tag is already complete and we should emit it
+        // now, otherwise, we want to resume parsing the partial end tag.
+        if (m_state == DataState)
             return true;
-        }
     }
 
-    if (source.isEmpty() || !m_inputStreamPreprocessor.peek(source))
+    if (!m_preprocessor.peek(source, isNullCharacterSkippingState(m_state)))
         return haveBufferedCharacterToken();
-    UChar cc = m_inputStreamPreprocessor.nextInputCharacter();
+    UChar character = m_preprocessor.nextInputCharacter();
 
-    // Source: http://www.whatwg.org/specs/web-apps/current-work/#tokenisation0
+    // https://html.spec.whatwg.org/#tokenization
     switch (m_state) {
-    HTML_BEGIN_STATE(DataState) {
-        if (cc == '&')
-            HTML_ADVANCE_TO(CharacterReferenceInDataState);
-        else if (cc == '<') {
-            if (m_token->type() == HTMLToken::Character) {
-                // We have a bunch of character tokens queued up that we
-                // are emitting lazily here.
-                return true;
-            }
-            HTML_ADVANCE_TO(TagOpenState);
-        } else if (cc == kEndOfFileMarker)
-            return emitEndOfFile(source);
-        else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(DataState);
-        }
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(CharacterReferenceInDataState) {
-        if (!processEntity(source))
-            return haveBufferedCharacterToken();
-        HTML_SWITCH_TO(DataState);
-    }
-    END_STATE()
 
-    HTML_BEGIN_STATE(RCDATAState) {
-        if (cc == '&')
-            HTML_ADVANCE_TO(CharacterReferenceInRCDATAState);
-        else if (cc == '<')
-            HTML_ADVANCE_TO(RCDATALessThanSignState);
-        else if (cc == kEndOfFileMarker)
-            return emitEndOfFile(source);
-        else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(RCDATAState);
+    BEGIN_STATE(DataState)
+        if (character == '&')
+            ADVANCE_TO(CharacterReferenceInDataState);
+        if (character == '<') {
+            if (haveBufferedCharacterToken())
+                RETURN_IN_CURRENT_STATE(true);
+            ADVANCE_TO(TagOpenState);
         }
-    }
+        if (character == kEndOfFileMarker)
+            return emitEndOfFile(source);
+        bufferCharacter(character);
+        ADVANCE_TO(DataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CharacterReferenceInRCDATAState) {
+    BEGIN_STATE(CharacterReferenceInDataState)
         if (!processEntity(source))
-            return haveBufferedCharacterToken();
-        HTML_SWITCH_TO(RCDATAState);
-    }
+            RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
+        SWITCH_TO(DataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(RAWTEXTState) {
-        if (cc == '<')
-            HTML_ADVANCE_TO(RAWTEXTLessThanSignState);
-        else if (cc == kEndOfFileMarker)
-            return emitEndOfFile(source);
-        else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(RAWTEXTState);
-        }
-    }
+    BEGIN_STATE(RCDATAState)
+        if (character == '&')
+            ADVANCE_TO(CharacterReferenceInRCDATAState);
+        if (character == '<')
+            ADVANCE_TO(RCDATALessThanSignState);
+        if (character == kEndOfFileMarker)
+            RECONSUME_IN(DataState);
+        bufferCharacter(character);
+        ADVANCE_TO(RCDATAState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataState) {
-        if (cc == '<')
-            HTML_ADVANCE_TO(ScriptDataLessThanSignState);
-        else if (cc == kEndOfFileMarker)
-            return emitEndOfFile(source);
-        else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(ScriptDataState);
+    BEGIN_STATE(CharacterReferenceInRCDATAState)
+        if (!processEntity(source))
+            RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
+        SWITCH_TO(RCDATAState);
+    END_STATE()
+
+    BEGIN_STATE(RAWTEXTState)
+        if (character == '<')
+            ADVANCE_TO(RAWTEXTLessThanSignState);
+        if (character == kEndOfFileMarker)
+            RECONSUME_IN(DataState);
+        bufferCharacter(character);
+        ADVANCE_TO(RAWTEXTState);
+    END_STATE()
+
+    BEGIN_STATE(ScriptDataState)
+        if (character == '<')
+            ADVANCE_TO(ScriptDataLessThanSignState);
+        if (character == kEndOfFileMarker)
+            RECONSUME_IN(DataState);
+        bufferCharacter(character);
+        ADVANCE_TO(ScriptDataState);
+    END_STATE()
+
+    BEGIN_STATE(PLAINTEXTState)
+        if (character == kEndOfFileMarker)
+            RECONSUME_IN(DataState);
+        bufferCharacter(character);
+        ADVANCE_TO(PLAINTEXTState);
+    END_STATE()
+
+    BEGIN_STATE(TagOpenState)
+        if (character == '!')
+            ADVANCE_TO(MarkupDeclarationOpenState);
+        if (character == '/')
+            ADVANCE_TO(EndTagOpenState);
+        if (isASCIIAlpha(character)) {
+            m_token.beginStartTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(TagNameState);
         }
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(PLAINTEXTState) {
-        if (cc == kEndOfFileMarker)
-            return emitEndOfFile(source);
-        bufferCharacter(cc);
-        HTML_ADVANCE_TO(PLAINTEXTState);
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(TagOpenState) {
-        if (cc == '!')
-            HTML_ADVANCE_TO(MarkupDeclarationOpenState);
-        else if (cc == '/')
-            HTML_ADVANCE_TO(EndTagOpenState);
-        else if (isASCIIUpper(cc)) {
-            m_token->beginStartTag(toLowerCase(cc));
-            HTML_ADVANCE_TO(TagNameState);
-        } else if (isASCIILower(cc)) {
-            m_token->beginStartTag(cc);
-            HTML_ADVANCE_TO(TagNameState);
-        } else if (cc == '?') {
+        if (character == '?') {
             parseError();
             // The spec consumes the current character before switching
             // to the bogus comment state, but it's easier to implement
             // if we reconsume the current character.
-            HTML_RECONSUME_IN(BogusCommentState);
-        } else {
-            parseError();
-            bufferASCIICharacter('<');
-            HTML_RECONSUME_IN(DataState);
+            RECONSUME_IN(BogusCommentState);
         }
-    }
+        parseError();
+        bufferASCIICharacter('<');
+        RECONSUME_IN(DataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(EndTagOpenState) {
-        if (isASCIIUpper(cc)) {
-            m_token->beginEndTag(static_cast<LChar>(toLowerCase(cc)));
-            m_appropriateEndTagName.clear();
-            HTML_ADVANCE_TO(TagNameState);
-        } else if (isASCIILower(cc)) {
-            m_token->beginEndTag(static_cast<LChar>(cc));
+    BEGIN_STATE(EndTagOpenState)
+        if (isASCIIAlpha(character)) {
+            m_token.beginEndTag(convertASCIIAlphaToLower(character));
             m_appropriateEndTagName.clear();
-            HTML_ADVANCE_TO(TagNameState);
-        } else if (cc == '>') {
+            ADVANCE_TO(TagNameState);
+        }
+        if (character == '>') {
             parseError();
-            HTML_ADVANCE_TO(DataState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(DataState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
             bufferASCIICharacter('<');
             bufferASCIICharacter('/');
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            parseError();
-            HTML_RECONSUME_IN(BogusCommentState);
+            RECONSUME_IN(DataState);
         }
-    }
+        parseError();
+        RECONSUME_IN(BogusCommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(TagNameState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeAttributeNameState);
-        else if (cc == '/')
-            HTML_ADVANCE_TO(SelfClosingStartTagState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (m_options.usePreHTML5ParserQuirks && cc == '<')
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        else if (isASCIIUpper(cc)) {
-            m_token->appendToName(toLowerCase(cc));
-            HTML_ADVANCE_TO(TagNameState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            m_token->appendToName(cc);
-            HTML_ADVANCE_TO(TagNameState);
+    BEGIN_STATE(TagNameState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeAttributeNameState);
+        if (character == '/')
+            ADVANCE_TO(SelfClosingStartTagState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (m_options.usePreHTML5ParserQuirks && character == '<')
+            return emitAndReconsumeInDataState();
+        if (character == kEndOfFileMarker) {
+            parseError();
+            RECONSUME_IN(DataState);
         }
-    }
+        m_token.appendToName(toASCIILower(character));
+        ADVANCE_TO(TagNameState);
     END_STATE()
 
-    HTML_BEGIN_STATE(RCDATALessThanSignState) {
-        if (cc == '/') {
+    BEGIN_STATE(RCDATALessThanSignState)
+        if (character == '/') {
             m_temporaryBuffer.clear();
             ASSERT(m_bufferedEndTagName.isEmpty());
-            HTML_ADVANCE_TO(RCDATAEndTagOpenState);
-        } else {
-            bufferASCIICharacter('<');
-            HTML_RECONSUME_IN(RCDATAState);
+            ADVANCE_TO(RCDATAEndTagOpenState);
         }
-    }
+        bufferASCIICharacter('<');
+        RECONSUME_IN(RCDATAState);
     END_STATE()
 
-    HTML_BEGIN_STATE(RCDATAEndTagOpenState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(RCDATAEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(RCDATAEndTagNameState);
-        } else {
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            HTML_RECONSUME_IN(RCDATAState);
+    BEGIN_STATE(RCDATAEndTagOpenState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(RCDATAEndTagNameState);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        RECONSUME_IN(RCDATAState);
     END_STATE()
 
-    HTML_BEGIN_STATE(RCDATAEndTagNameState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(RCDATAEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(RCDATAEndTagNameState);
-        } else {
-            if (isTokenizerWhitespace(cc)) {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
-                }
-            } else if (cc == '/') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
-                }
-            } else if (cc == '>') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
-                }
+    BEGIN_STATE(RCDATAEndTagNameState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(RCDATAEndTagNameState);
+        }
+        if (isTokenizerWhitespace(character)) {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
+                    return true;
+                SWITCH_TO(BeforeAttributeNameState);
             }
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            m_token->appendToCharacter(m_temporaryBuffer);
-            m_bufferedEndTagName.clear();
-            m_temporaryBuffer.clear();
-            HTML_RECONSUME_IN(RCDATAState);
+        } else if (character == '/') {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
+                    return true;
+                SWITCH_TO(SelfClosingStartTagState);
+            }
+        } else if (character == '>') {
+            if (isAppropriateEndTag())
+                return commitToCompleteEndTag(source);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        m_token.appendToCharacter(m_temporaryBuffer);
+        m_bufferedEndTagName.clear();
+        m_temporaryBuffer.clear();
+        RECONSUME_IN(RCDATAState);
     END_STATE()
 
-    HTML_BEGIN_STATE(RAWTEXTLessThanSignState) {
-        if (cc == '/') {
+    BEGIN_STATE(RAWTEXTLessThanSignState)
+        if (character == '/') {
             m_temporaryBuffer.clear();
             ASSERT(m_bufferedEndTagName.isEmpty());
-            HTML_ADVANCE_TO(RAWTEXTEndTagOpenState);
-        } else {
-            bufferASCIICharacter('<');
-            HTML_RECONSUME_IN(RAWTEXTState);
+            ADVANCE_TO(RAWTEXTEndTagOpenState);
         }
-    }
+        bufferASCIICharacter('<');
+        RECONSUME_IN(RAWTEXTState);
     END_STATE()
 
-    HTML_BEGIN_STATE(RAWTEXTEndTagOpenState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
-        } else {
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            HTML_RECONSUME_IN(RAWTEXTState);
+    BEGIN_STATE(RAWTEXTEndTagOpenState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(RAWTEXTEndTagNameState);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        RECONSUME_IN(RAWTEXTState);
     END_STATE()
 
-    HTML_BEGIN_STATE(RAWTEXTEndTagNameState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
-        } else {
-            if (isTokenizerWhitespace(cc)) {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
-                }
-            } else if (cc == '/') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
-                }
-            } else if (cc == '>') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
-                }
+    BEGIN_STATE(RAWTEXTEndTagNameState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(RAWTEXTEndTagNameState);
+        }
+        if (isTokenizerWhitespace(character)) {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
+                    return true;
+                SWITCH_TO(BeforeAttributeNameState);
             }
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            m_token->appendToCharacter(m_temporaryBuffer);
-            m_bufferedEndTagName.clear();
-            m_temporaryBuffer.clear();
-            HTML_RECONSUME_IN(RAWTEXTState);
+        } else if (character == '/') {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
+                    return true;
+                SWITCH_TO(SelfClosingStartTagState);
+            }
+        } else if (character == '>') {
+            if (isAppropriateEndTag())
+                return commitToCompleteEndTag(source);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        m_token.appendToCharacter(m_temporaryBuffer);
+        m_bufferedEndTagName.clear();
+        m_temporaryBuffer.clear();
+        RECONSUME_IN(RAWTEXTState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataLessThanSignState) {
-        if (cc == '/') {
+    BEGIN_STATE(ScriptDataLessThanSignState)
+        if (character == '/') {
             m_temporaryBuffer.clear();
             ASSERT(m_bufferedEndTagName.isEmpty());
-            HTML_ADVANCE_TO(ScriptDataEndTagOpenState);
-        } else if (cc == '!') {
+            ADVANCE_TO(ScriptDataEndTagOpenState);
+        }
+        if (character == '!') {
             bufferASCIICharacter('<');
             bufferASCIICharacter('!');
-            HTML_ADVANCE_TO(ScriptDataEscapeStartState);
-        } else {
-            bufferASCIICharacter('<');
-            HTML_RECONSUME_IN(ScriptDataState);
+            ADVANCE_TO(ScriptDataEscapeStartState);
         }
-    }
+        bufferASCIICharacter('<');
+        RECONSUME_IN(ScriptDataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEndTagOpenState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(ScriptDataEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(ScriptDataEndTagNameState);
-        } else {
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            HTML_RECONSUME_IN(ScriptDataState);
+    BEGIN_STATE(ScriptDataEndTagOpenState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(ScriptDataEndTagNameState);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        RECONSUME_IN(ScriptDataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEndTagNameState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(ScriptDataEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(ScriptDataEndTagNameState);
-        } else {
-            if (isTokenizerWhitespace(cc)) {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
-                }
-            } else if (cc == '/') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
-                }
-            } else if (cc == '>') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
-                }
+    BEGIN_STATE(ScriptDataEndTagNameState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(ScriptDataEndTagNameState);
+        }
+        if (isTokenizerWhitespace(character)) {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
+                    return true;
+                SWITCH_TO(BeforeAttributeNameState);
             }
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            m_token->appendToCharacter(m_temporaryBuffer);
-            m_bufferedEndTagName.clear();
-            m_temporaryBuffer.clear();
-            HTML_RECONSUME_IN(ScriptDataState);
+        } else if (character == '/') {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
+                    return true;
+                SWITCH_TO(SelfClosingStartTagState);
+            }
+        } else if (character == '>') {
+            if (isAppropriateEndTag())
+                return commitToCompleteEndTag(source);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        m_token.appendToCharacter(m_temporaryBuffer);
+        m_bufferedEndTagName.clear();
+        m_temporaryBuffer.clear();
+        RECONSUME_IN(ScriptDataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapeStartState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataEscapeStartState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataEscapeStartDashState);
+            ADVANCE_TO(ScriptDataEscapeStartDashState);
         } else
-            HTML_RECONSUME_IN(ScriptDataState);
-    }
+            RECONSUME_IN(ScriptDataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapeStartDashState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataEscapeStartDashState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataEscapedDashDashState);
+            ADVANCE_TO(ScriptDataEscapedDashDashState);
         } else
-            HTML_RECONSUME_IN(ScriptDataState);
-    }
+            RECONSUME_IN(ScriptDataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapedState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataEscapedState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataEscapedDashState);
-        } else if (cc == '<')
-            HTML_ADVANCE_TO(ScriptDataEscapedLessThanSignState);
-        else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(ScriptDataEscapedDashState);
+        }
+        if (character == '<')
+            ADVANCE_TO(ScriptDataEscapedLessThanSignState);
+        if (character == kEndOfFileMarker) {
             parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(ScriptDataEscapedState);
+            RECONSUME_IN(DataState);
         }
-    }
+        bufferCharacter(character);
+        ADVANCE_TO(ScriptDataEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapedDashState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataEscapedDashState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataEscapedDashDashState);
-        } else if (cc == '<')
-            HTML_ADVANCE_TO(ScriptDataEscapedLessThanSignState);
-        else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(ScriptDataEscapedDashDashState);
+        }
+        if (character == '<')
+            ADVANCE_TO(ScriptDataEscapedLessThanSignState);
+        if (character == kEndOfFileMarker) {
             parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(ScriptDataEscapedState);
+            RECONSUME_IN(DataState);
         }
-    }
+        bufferCharacter(character);
+        ADVANCE_TO(ScriptDataEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapedDashDashState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataEscapedDashDashState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataEscapedDashDashState);
-        } else if (cc == '<')
-            HTML_ADVANCE_TO(ScriptDataEscapedLessThanSignState);
-        else if (cc == '>') {
+            ADVANCE_TO(ScriptDataEscapedDashDashState);
+        }
+        if (character == '<')
+            ADVANCE_TO(ScriptDataEscapedLessThanSignState);
+        if (character == '>') {
             bufferASCIICharacter('>');
-            HTML_ADVANCE_TO(ScriptDataState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(ScriptDataState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(ScriptDataEscapedState);
+            RECONSUME_IN(DataState);
         }
-    }
+        bufferCharacter(character);
+        ADVANCE_TO(ScriptDataEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapedLessThanSignState) {
-        if (cc == '/') {
+    BEGIN_STATE(ScriptDataEscapedLessThanSignState)
+        if (character == '/') {
             m_temporaryBuffer.clear();
             ASSERT(m_bufferedEndTagName.isEmpty());
-            HTML_ADVANCE_TO(ScriptDataEscapedEndTagOpenState);
-        } else if (isASCIIUpper(cc)) {
-            bufferASCIICharacter('<');
-            bufferASCIICharacter(cc);
-            m_temporaryBuffer.clear();
-            m_temporaryBuffer.append(toLowerCase(cc));
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
-        } else if (isASCIILower(cc)) {
+            ADVANCE_TO(ScriptDataEscapedEndTagOpenState);
+        }
+        if (isASCIIAlpha(character)) {
             bufferASCIICharacter('<');
-            bufferASCIICharacter(cc);
+            bufferASCIICharacter(character);
             m_temporaryBuffer.clear();
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
-        } else {
-            bufferASCIICharacter('<');
-            HTML_RECONSUME_IN(ScriptDataEscapedState);
+            appendToTemporaryBuffer(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(ScriptDataDoubleEscapeStartState);
         }
-    }
+        bufferASCIICharacter('<');
+        RECONSUME_IN(ScriptDataEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapedEndTagOpenState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
-        } else {
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            HTML_RECONSUME_IN(ScriptDataEscapedState);
+    BEGIN_STATE(ScriptDataEscapedEndTagOpenState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(ScriptDataEscapedEndTagNameState);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        RECONSUME_IN(ScriptDataEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapedEndTagNameState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
-        } else {
-            if (isTokenizerWhitespace(cc)) {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
-                }
-            } else if (cc == '/') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
-                }
-            } else if (cc == '>') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
-                }
+    BEGIN_STATE(ScriptDataEscapedEndTagNameState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(ScriptDataEscapedEndTagNameState);
+        }
+        if (isTokenizerWhitespace(character)) {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
+                    return true;
+                SWITCH_TO(BeforeAttributeNameState);
             }
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            m_token->appendToCharacter(m_temporaryBuffer);
-            m_bufferedEndTagName.clear();
-            m_temporaryBuffer.clear();
-            HTML_RECONSUME_IN(ScriptDataEscapedState);
+        } else if (character == '/') {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
+                    return true;
+                SWITCH_TO(SelfClosingStartTagState);
+            }
+        } else if (character == '>') {
+            if (isAppropriateEndTag())
+                return commitToCompleteEndTag(source);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        m_token.appendToCharacter(m_temporaryBuffer);
+        m_bufferedEndTagName.clear();
+        m_temporaryBuffer.clear();
+        RECONSUME_IN(ScriptDataEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataDoubleEscapeStartState) {
-        if (isTokenizerWhitespace(cc) || cc == '/' || cc == '>') {
-            bufferASCIICharacter(cc);
-            if (temporaryBufferIs(scriptTag.localName()))
-                HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
+    BEGIN_STATE(ScriptDataDoubleEscapeStartState)
+        if (isTokenizerWhitespace(character) || character == '/' || character == '>') {
+            bufferASCIICharacter(character);
+            if (temporaryBufferIs("script"))
+                ADVANCE_TO(ScriptDataDoubleEscapedState);
             else
-                HTML_ADVANCE_TO(ScriptDataEscapedState);
-        } else if (isASCIIUpper(cc)) {
-            bufferASCIICharacter(cc);
-            m_temporaryBuffer.append(toLowerCase(cc));
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
-        } else if (isASCIILower(cc)) {
-            bufferASCIICharacter(cc);
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
-        } else
-            HTML_RECONSUME_IN(ScriptDataEscapedState);
-    }
+                ADVANCE_TO(ScriptDataEscapedState);
+        }
+        if (isASCIIAlpha(character)) {
+            bufferASCIICharacter(character);
+            appendToTemporaryBuffer(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(ScriptDataDoubleEscapeStartState);
+        }
+        RECONSUME_IN(ScriptDataEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataDoubleEscapedState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataDoubleEscapedState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedDashState);
-        } else if (cc == '<') {
+            ADVANCE_TO(ScriptDataDoubleEscapedDashState);
+        }
+        if (character == '<') {
             bufferASCIICharacter('<');
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
+            RECONSUME_IN(DataState);
         }
-    }
+        bufferCharacter(character);
+        ADVANCE_TO(ScriptDataDoubleEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataDoubleEscapedDashState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataDoubleEscapedDashState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
-        } else if (cc == '<') {
+            ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
+        }
+        if (character == '<') {
             bufferASCIICharacter('<');
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
+            RECONSUME_IN(DataState);
         }
-    }
+        bufferCharacter(character);
+        ADVANCE_TO(ScriptDataDoubleEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataDoubleEscapedDashDashState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataDoubleEscapedDashDashState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
-        } else if (cc == '<') {
+            ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
+        }
+        if (character == '<') {
             bufferASCIICharacter('<');
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
-        } else if (cc == '>') {
+            ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
+        }
+        if (character == '>') {
             bufferASCIICharacter('>');
-            HTML_ADVANCE_TO(ScriptDataState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(ScriptDataState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
+            RECONSUME_IN(DataState);
         }
-    }
+        bufferCharacter(character);
+        ADVANCE_TO(ScriptDataDoubleEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataDoubleEscapedLessThanSignState) {
-        if (cc == '/') {
+    BEGIN_STATE(ScriptDataDoubleEscapedLessThanSignState)
+        if (character == '/') {
             bufferASCIICharacter('/');
             m_temporaryBuffer.clear();
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapeEndState);
-        } else
-            HTML_RECONSUME_IN(ScriptDataDoubleEscapedState);
-    }
+            ADVANCE_TO(ScriptDataDoubleEscapeEndState);
+        }
+        RECONSUME_IN(ScriptDataDoubleEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataDoubleEscapeEndState) {
-        if (isTokenizerWhitespace(cc) || cc == '/' || cc == '>') {
-            bufferASCIICharacter(cc);
-            if (temporaryBufferIs(scriptTag.localName()))
-                HTML_ADVANCE_TO(ScriptDataEscapedState);
+    BEGIN_STATE(ScriptDataDoubleEscapeEndState)
+        if (isTokenizerWhitespace(character) || character == '/' || character == '>') {
+            bufferASCIICharacter(character);
+            if (temporaryBufferIs("script"))
+                ADVANCE_TO(ScriptDataEscapedState);
             else
-                HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
-        } else if (isASCIIUpper(cc)) {
-            bufferASCIICharacter(cc);
-            m_temporaryBuffer.append(toLowerCase(cc));
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapeEndState);
-        } else if (isASCIILower(cc)) {
-            bufferASCIICharacter(cc);
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapeEndState);
-        } else
-            HTML_RECONSUME_IN(ScriptDataDoubleEscapedState);
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(BeforeAttributeNameState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeAttributeNameState);
-        else if (cc == '/')
-            HTML_ADVANCE_TO(SelfClosingStartTagState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (m_options.usePreHTML5ParserQuirks && cc == '<')
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        else if (isASCIIUpper(cc)) {
-            m_token->addNewAttribute();
-            m_token->beginAttributeName(source.numberOfCharactersConsumed());
-            m_token->appendToAttributeName(toLowerCase(cc));
-            HTML_ADVANCE_TO(AttributeNameState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            if (cc == '"' || cc == '\'' || cc == '<' || cc == '=')
-                parseError();
-            m_token->addNewAttribute();
-            m_token->beginAttributeName(source.numberOfCharactersConsumed());
-            m_token->appendToAttributeName(cc);
-            HTML_ADVANCE_TO(AttributeNameState);
+                ADVANCE_TO(ScriptDataDoubleEscapedState);
         }
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(AttributeNameState) {
-        if (isTokenizerWhitespace(cc)) {
-            m_token->endAttributeName(source.numberOfCharactersConsumed());
-            HTML_ADVANCE_TO(AfterAttributeNameState);
-        } else if (cc == '/') {
-            m_token->endAttributeName(source.numberOfCharactersConsumed());
-            HTML_ADVANCE_TO(SelfClosingStartTagState);
-        } else if (cc == '=') {
-            m_token->endAttributeName(source.numberOfCharactersConsumed());
-            HTML_ADVANCE_TO(BeforeAttributeValueState);
-        } else if (cc == '>') {
-            m_token->endAttributeName(source.numberOfCharactersConsumed());
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (m_options.usePreHTML5ParserQuirks && cc == '<') {
-            m_token->endAttributeName(source.numberOfCharactersConsumed());
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else if (isASCIIUpper(cc)) {
-            m_token->appendToAttributeName(toLowerCase(cc));
-            HTML_ADVANCE_TO(AttributeNameState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            m_token->endAttributeName(source.numberOfCharactersConsumed());
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            if (cc == '"' || cc == '\'' || cc == '<' || cc == '=')
-                parseError();
-            m_token->appendToAttributeName(cc);
-            HTML_ADVANCE_TO(AttributeNameState);
+        if (isASCIIAlpha(character)) {
+            bufferASCIICharacter(character);
+            appendToTemporaryBuffer(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(ScriptDataDoubleEscapeEndState);
         }
-    }
+        RECONSUME_IN(ScriptDataDoubleEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AfterAttributeNameState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(AfterAttributeNameState);
-        else if (cc == '/')
-            HTML_ADVANCE_TO(SelfClosingStartTagState);
-        else if (cc == '=')
-            HTML_ADVANCE_TO(BeforeAttributeValueState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (m_options.usePreHTML5ParserQuirks && cc == '<')
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        else if (isASCIIUpper(cc)) {
-            m_token->addNewAttribute();
-            m_token->beginAttributeName(source.numberOfCharactersConsumed());
-            m_token->appendToAttributeName(toLowerCase(cc));
-            HTML_ADVANCE_TO(AttributeNameState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            if (cc == '"' || cc == '\'' || cc == '<')
-                parseError();
-            m_token->addNewAttribute();
-            m_token->beginAttributeName(source.numberOfCharactersConsumed());
-            m_token->appendToAttributeName(cc);
-            HTML_ADVANCE_TO(AttributeNameState);
+    BEGIN_STATE(BeforeAttributeNameState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeAttributeNameState);
+        if (character == '/')
+            ADVANCE_TO(SelfClosingStartTagState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (m_options.usePreHTML5ParserQuirks && character == '<')
+            return emitAndReconsumeInDataState();
+        if (character == kEndOfFileMarker) {
+            parseError();
+            RECONSUME_IN(DataState);
         }
-    }
+        if (character == '"' || character == '\'' || character == '<' || character == '=')
+            parseError();
+        m_token.beginAttribute(source.numberOfCharactersConsumed());
+        m_token.appendToAttributeName(toASCIILower(character));
+        ADVANCE_TO(AttributeNameState);
+    END_STATE()
+
+    BEGIN_STATE(AttributeNameState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(AfterAttributeNameState);
+        if (character == '/')
+            ADVANCE_TO(SelfClosingStartTagState);
+        if (character == '=')
+            ADVANCE_TO(BeforeAttributeValueState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (m_options.usePreHTML5ParserQuirks && character == '<')
+            return emitAndReconsumeInDataState();
+        if (character == kEndOfFileMarker) {
+            parseError();
+            RECONSUME_IN(DataState);
+        }
+        if (character == '"' || character == '\'' || character == '<' || character == '=')
+            parseError();
+        m_token.appendToAttributeName(toASCIILower(character));
+        ADVANCE_TO(AttributeNameState);
+    END_STATE()
+
+    BEGIN_STATE(AfterAttributeNameState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(AfterAttributeNameState);
+        if (character == '/')
+            ADVANCE_TO(SelfClosingStartTagState);
+        if (character == '=')
+            ADVANCE_TO(BeforeAttributeValueState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (m_options.usePreHTML5ParserQuirks && character == '<')
+            return emitAndReconsumeInDataState();
+        if (character == kEndOfFileMarker) {
+            parseError();
+            RECONSUME_IN(DataState);
+        }
+        if (character == '"' || character == '\'' || character == '<')
+            parseError();
+        m_token.beginAttribute(source.numberOfCharactersConsumed());
+        m_token.appendToAttributeName(toASCIILower(character));
+        ADVANCE_TO(AttributeNameState);
     END_STATE()
 
-    HTML_BEGIN_STATE(BeforeAttributeValueState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeAttributeValueState);
-        else if (cc == '"') {
-            m_token->beginAttributeValue(source.numberOfCharactersConsumed() + 1);
-            HTML_ADVANCE_TO(AttributeValueDoubleQuotedState);
-        } else if (cc == '&') {
-            m_token->beginAttributeValue(source.numberOfCharactersConsumed());
-            HTML_RECONSUME_IN(AttributeValueUnquotedState);
-        } else if (cc == '\'') {
-            m_token->beginAttributeValue(source.numberOfCharactersConsumed() + 1);
-            HTML_ADVANCE_TO(AttributeValueSingleQuotedState);
-        } else if (cc == '>') {
-            parseError();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            if (cc == '<' || cc == '=' || cc == '`')
-                parseError();
-            m_token->beginAttributeValue(source.numberOfCharactersConsumed());
-            m_token->appendToAttributeValue(cc);
-            HTML_ADVANCE_TO(AttributeValueUnquotedState);
+    BEGIN_STATE(BeforeAttributeValueState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeAttributeValueState);
+        if (character == '"')
+            ADVANCE_TO(AttributeValueDoubleQuotedState);
+        if (character == '&')
+            RECONSUME_IN(AttributeValueUnquotedState);
+        if (character == '\'')
+            ADVANCE_TO(AttributeValueSingleQuotedState);
+        if (character == '>') {
+            parseError();
+            return emitAndResumeInDataState(source);
         }
-    }
+        if (character == kEndOfFileMarker) {
+            parseError();
+            RECONSUME_IN(DataState);
+        }
+        if (character == '<' || character == '=' || character == '`')
+            parseError();
+        m_token.appendToAttributeValue(character);
+        ADVANCE_TO(AttributeValueUnquotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AttributeValueDoubleQuotedState) {
-        if (cc == '"') {
-            m_token->endAttributeValue(source.numberOfCharactersConsumed());
-            HTML_ADVANCE_TO(AfterAttributeValueQuotedState);
-        } else if (cc == '&') {
+    BEGIN_STATE(AttributeValueDoubleQuotedState)
+        if (character == '"') {
+            m_token.endAttribute(source.numberOfCharactersConsumed());
+            ADVANCE_TO(AfterAttributeValueQuotedState);
+        }
+        if (character == '&') {
             m_additionalAllowedCharacter = '"';
-            HTML_ADVANCE_TO(CharacterReferenceInAttributeValueState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(CharacterReferenceInAttributeValueState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->endAttributeValue(source.numberOfCharactersConsumed());
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            m_token->appendToAttributeValue(cc);
-            HTML_ADVANCE_TO(AttributeValueDoubleQuotedState);
+            m_token.endAttribute(source.numberOfCharactersConsumed());
+            RECONSUME_IN(DataState);
         }
-    }
+        m_token.appendToAttributeValue(character);
+        ADVANCE_TO(AttributeValueDoubleQuotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AttributeValueSingleQuotedState) {
-        if (cc == '\'') {
-            m_token->endAttributeValue(source.numberOfCharactersConsumed());
-            HTML_ADVANCE_TO(AfterAttributeValueQuotedState);
-        } else if (cc == '&') {
+    BEGIN_STATE(AttributeValueSingleQuotedState)
+        if (character == '\'') {
+            m_token.endAttribute(source.numberOfCharactersConsumed());
+            ADVANCE_TO(AfterAttributeValueQuotedState);
+        }
+        if (character == '&') {
             m_additionalAllowedCharacter = '\'';
-            HTML_ADVANCE_TO(CharacterReferenceInAttributeValueState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(CharacterReferenceInAttributeValueState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->endAttributeValue(source.numberOfCharactersConsumed());
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            m_token->appendToAttributeValue(cc);
-            HTML_ADVANCE_TO(AttributeValueSingleQuotedState);
+            m_token.endAttribute(source.numberOfCharactersConsumed());
+            RECONSUME_IN(DataState);
         }
-    }
+        m_token.appendToAttributeValue(character);
+        ADVANCE_TO(AttributeValueSingleQuotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AttributeValueUnquotedState) {
-        if (isTokenizerWhitespace(cc)) {
-            m_token->endAttributeValue(source.numberOfCharactersConsumed());
-            HTML_ADVANCE_TO(BeforeAttributeNameState);
-        } else if (cc == '&') {
+    BEGIN_STATE(AttributeValueUnquotedState)
+        if (isTokenizerWhitespace(character)) {
+            m_token.endAttribute(source.numberOfCharactersConsumed());
+            ADVANCE_TO(BeforeAttributeNameState);
+        }
+        if (character == '&') {
             m_additionalAllowedCharacter = '>';
-            HTML_ADVANCE_TO(CharacterReferenceInAttributeValueState);
-        } else if (cc == '>') {
-            m_token->endAttributeValue(source.numberOfCharactersConsumed());
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            m_token->endAttributeValue(source.numberOfCharactersConsumed());
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            if (cc == '"' || cc == '\'' || cc == '<' || cc == '=' || cc == '`')
-                parseError();
-            m_token->appendToAttributeValue(cc);
-            HTML_ADVANCE_TO(AttributeValueUnquotedState);
+            ADVANCE_TO(CharacterReferenceInAttributeValueState);
         }
-    }
+        if (character == '>') {
+            m_token.endAttribute(source.numberOfCharactersConsumed());
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
+            parseError();
+            m_token.endAttribute(source.numberOfCharactersConsumed());
+            RECONSUME_IN(DataState);
+        }
+        if (character == '"' || character == '\'' || character == '<' || character == '=' || character == '`')
+            parseError();
+        m_token.appendToAttributeValue(character);
+        ADVANCE_TO(AttributeValueUnquotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CharacterReferenceInAttributeValueState) {
+    BEGIN_STATE(CharacterReferenceInAttributeValueState)
         bool notEnoughCharacters = false;
         StringBuilder decodedEntity;
         bool success = consumeHTMLEntity(source, decodedEntity, notEnoughCharacters, m_additionalAllowedCharacter);
         if (notEnoughCharacters)
-            return haveBufferedCharacterToken();
+            RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
         if (!success) {
             ASSERT(decodedEntity.isEmpty());
-            m_token->appendToAttributeValue('&');
+            m_token.appendToAttributeValue('&');
         } else {
             for (unsigned i = 0; i < decodedEntity.length(); ++i)
-                m_token->appendToAttributeValue(decodedEntity[i]);
+                m_token.appendToAttributeValue(decodedEntity[i]);
         }
         // We're supposed to switch back to the attribute value state that
         // we were in when we were switched into this state. Rather than
         // keeping track of this explictly, we observe that the previous
         // state can be determined by m_additionalAllowedCharacter.
         if (m_additionalAllowedCharacter == '"')
-            HTML_SWITCH_TO(AttributeValueDoubleQuotedState);
-        else if (m_additionalAllowedCharacter == '\'')
-            HTML_SWITCH_TO(AttributeValueSingleQuotedState);
-        else if (m_additionalAllowedCharacter == '>')
-            HTML_SWITCH_TO(AttributeValueUnquotedState);
-        else
-            ASSERT_NOT_REACHED();
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(AfterAttributeValueQuotedState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeAttributeNameState);
-        else if (cc == '/')
-            HTML_ADVANCE_TO(SelfClosingStartTagState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (m_options.usePreHTML5ParserQuirks && cc == '<')
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == kEndOfFileMarker) {
-            parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            parseError();
-            HTML_RECONSUME_IN(BeforeAttributeNameState);
+            SWITCH_TO(AttributeValueDoubleQuotedState);
+        if (m_additionalAllowedCharacter == '\'')
+            SWITCH_TO(AttributeValueSingleQuotedState);
+        ASSERT(m_additionalAllowedCharacter == '>');
+        SWITCH_TO(AttributeValueUnquotedState);
+    END_STATE()
+
+    BEGIN_STATE(AfterAttributeValueQuotedState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeAttributeNameState);
+        if (character == '/')
+            ADVANCE_TO(SelfClosingStartTagState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (m_options.usePreHTML5ParserQuirks && character == '<')
+            return emitAndReconsumeInDataState();
+        if (character == kEndOfFileMarker) {
+            parseError();
+            RECONSUME_IN(DataState);
         }
-    }
+        parseError();
+        RECONSUME_IN(BeforeAttributeNameState);
     END_STATE()
 
-    HTML_BEGIN_STATE(SelfClosingStartTagState) {
-        if (cc == '>') {
-            m_token->setSelfClosing();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
+    BEGIN_STATE(SelfClosingStartTagState)
+        if (character == '>') {
+            m_token.setSelfClosing();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            HTML_RECONSUME_IN(BeforeAttributeNameState);
+            RECONSUME_IN(DataState);
         }
-    }
+        parseError();
+        RECONSUME_IN(BeforeAttributeNameState);
     END_STATE()
 
-    HTML_BEGIN_STATE(BogusCommentState) {
-        m_token->beginComment();
-        HTML_RECONSUME_IN(ContinueBogusCommentState);
-    }
+    BEGIN_STATE(BogusCommentState)
+        m_token.beginComment();
+        RECONSUME_IN(ContinueBogusCommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ContinueBogusCommentState) {
-        if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == kEndOfFileMarker)
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        else {
-            m_token->appendToComment(cc);
-            HTML_ADVANCE_TO(ContinueBogusCommentState);
-        }
-    }
+    BEGIN_STATE(ContinueBogusCommentState)
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == kEndOfFileMarker)
+            return emitAndReconsumeInDataState();
+        m_token.appendToComment(character);
+        ADVANCE_TO(ContinueBogusCommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(MarkupDeclarationOpenState) {
-        DEPRECATED_DEFINE_STATIC_LOCAL(String, dashDashString, (ASCIILiteral("--")));
-        DEPRECATED_DEFINE_STATIC_LOCAL(String, doctypeString, (ASCIILiteral("doctype")));
-        DEPRECATED_DEFINE_STATIC_LOCAL(String, cdataString, (ASCIILiteral("[CDATA[")));
-        if (cc == '-') {
-            SegmentedString::LookAheadResult result = source.lookAhead(dashDashString);
-            if (result == SegmentedString::DidMatch) {
-                source.advanceAndASSERT('-');
-                source.advanceAndASSERT('-');
-                m_token->beginComment();
-                HTML_SWITCH_TO(CommentStartState);
-            } else if (result == SegmentedString::NotEnoughCharacters)
-                return haveBufferedCharacterToken();
-        } else if (cc == 'D' || cc == 'd') {
-            SegmentedString::LookAheadResult result = source.lookAheadIgnoringCase(doctypeString);
+    BEGIN_STATE(MarkupDeclarationOpenState)
+        if (character == '-') {
+            auto result = source.advancePast("--");
             if (result == SegmentedString::DidMatch) {
-                advanceStringAndASSERTIgnoringCase(source, "doctype");
-                HTML_SWITCH_TO(DOCTYPEState);
-            } else if (result == SegmentedString::NotEnoughCharacters)
-                return haveBufferedCharacterToken();
-        } else if (cc == '[' && shouldAllowCDATA()) {
-            SegmentedString::LookAheadResult result = source.lookAhead(cdataString);
-            if (result == SegmentedString::DidMatch) {
-                advanceStringAndASSERT(source, "[CDATA[");
-                HTML_SWITCH_TO(CDATASectionState);
-            } else if (result == SegmentedString::NotEnoughCharacters)
-                return haveBufferedCharacterToken();
+                m_token.beginComment();
+                SWITCH_TO(CommentStartState);
+            }
+            if (result == SegmentedString::NotEnoughCharacters)
+                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
+        } else if (isASCIIAlphaCaselessEqual(character, 'd')) {
+            auto result = source.advancePastIgnoringCase("doctype");
+            if (result == SegmentedString::DidMatch)
+                SWITCH_TO(DOCTYPEState);
+            if (result == SegmentedString::NotEnoughCharacters)
+                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
+        } else if (character == '[' && shouldAllowCDATA()) {
+            auto result = source.advancePast("[CDATA[");
+            if (result == SegmentedString::DidMatch)
+                SWITCH_TO(CDATASectionState);
+            if (result == SegmentedString::NotEnoughCharacters)
+                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
         }
         parseError();
-        HTML_RECONSUME_IN(BogusCommentState);
-    }
+        RECONSUME_IN(BogusCommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CommentStartState) {
-        if (cc == '-')
-            HTML_ADVANCE_TO(CommentStartDashState);
-        else if (cc == '>') {
+    BEGIN_STATE(CommentStartState)
+        if (character == '-')
+            ADVANCE_TO(CommentStartDashState);
+        if (character == '>') {
             parseError();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToComment(cc);
-            HTML_ADVANCE_TO(CommentState);
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToComment(character);
+        ADVANCE_TO(CommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CommentStartDashState) {
-        if (cc == '-')
-            HTML_ADVANCE_TO(CommentEndState);
-        else if (cc == '>') {
+    BEGIN_STATE(CommentStartDashState)
+        if (character == '-')
+            ADVANCE_TO(CommentEndState);
+        if (character == '>') {
             parseError();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToComment('-');
-            m_token->appendToComment(cc);
-            HTML_ADVANCE_TO(CommentState);
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToComment('-');
+        m_token.appendToComment(character);
+        ADVANCE_TO(CommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CommentState) {
-        if (cc == '-')
-            HTML_ADVANCE_TO(CommentEndDashState);
-        else if (cc == kEndOfFileMarker) {
+    BEGIN_STATE(CommentState)
+        if (character == '-')
+            ADVANCE_TO(CommentEndDashState);
+        if (character == kEndOfFileMarker) {
             parseError();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToComment(cc);
-            HTML_ADVANCE_TO(CommentState);
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToComment(character);
+        ADVANCE_TO(CommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CommentEndDashState) {
-        if (cc == '-')
-            HTML_ADVANCE_TO(CommentEndState);
-        else if (cc == kEndOfFileMarker) {
+    BEGIN_STATE(CommentEndDashState)
+        if (character == '-')
+            ADVANCE_TO(CommentEndState);
+        if (character == kEndOfFileMarker) {
             parseError();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToComment('-');
-            m_token->appendToComment(cc);
-            HTML_ADVANCE_TO(CommentState);
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToComment('-');
+        m_token.appendToComment(character);
+        ADVANCE_TO(CommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CommentEndState) {
-        if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == '!') {
-            parseError();
-            HTML_ADVANCE_TO(CommentEndBangState);
-        } else if (cc == '-') {
+    BEGIN_STATE(CommentEndState)
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == '!') {
             parseError();
-            m_token->appendToComment('-');
-            HTML_ADVANCE_TO(CommentEndState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(CommentEndBangState);
+        }
+        if (character == '-') {
             parseError();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
+            m_token.appendToComment('-');
+            ADVANCE_TO(CommentEndState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->appendToComment('-');
-            m_token->appendToComment('-');
-            m_token->appendToComment(cc);
-            HTML_ADVANCE_TO(CommentState);
+            return emitAndReconsumeInDataState();
         }
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(CommentEndBangState) {
-        if (cc == '-') {
-            m_token->appendToComment('-');
-            m_token->appendToComment('-');
-            m_token->appendToComment('!');
-            HTML_ADVANCE_TO(CommentEndDashState);
-        } else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == kEndOfFileMarker) {
+        parseError();
+        m_token.appendToComment('-');
+        m_token.appendToComment('-');
+        m_token.appendToComment(character);
+        ADVANCE_TO(CommentState);
+    END_STATE()
+
+    BEGIN_STATE(CommentEndBangState)
+        if (character == '-') {
+            m_token.appendToComment('-');
+            m_token.appendToComment('-');
+            m_token.appendToComment('!');
+            ADVANCE_TO(CommentEndDashState);
+        }
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == kEndOfFileMarker) {
             parseError();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToComment('-');
-            m_token->appendToComment('-');
-            m_token->appendToComment('!');
-            m_token->appendToComment(cc);
-            HTML_ADVANCE_TO(CommentState);
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToComment('-');
+        m_token.appendToComment('-');
+        m_token.appendToComment('!');
+        m_token.appendToComment(character);
+        ADVANCE_TO(CommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(DOCTYPEState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeDOCTYPENameState);
-        else if (cc == kEndOfFileMarker) {
+    BEGIN_STATE(DOCTYPEState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeDOCTYPENameState);
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->beginDOCTYPE();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            parseError();
-            HTML_RECONSUME_IN(BeforeDOCTYPENameState);
+            m_token.beginDOCTYPE();
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        parseError();
+        RECONSUME_IN(BeforeDOCTYPENameState);
     END_STATE()
 
-    HTML_BEGIN_STATE(BeforeDOCTYPENameState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeDOCTYPENameState);
-        else if (isASCIIUpper(cc)) {
-            m_token->beginDOCTYPE(toLowerCase(cc));
-            HTML_ADVANCE_TO(DOCTYPENameState);
-        } else if (cc == '>') {
+    BEGIN_STATE(BeforeDOCTYPENameState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeDOCTYPENameState);
+        if (character == '>') {
             parseError();
-            m_token->beginDOCTYPE();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.beginDOCTYPE();
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->beginDOCTYPE();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->beginDOCTYPE(cc);
-            HTML_ADVANCE_TO(DOCTYPENameState);
+            m_token.beginDOCTYPE();
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.beginDOCTYPE(toASCIILower(character));
+        ADVANCE_TO(DOCTYPENameState);
     END_STATE()
 
-    HTML_BEGIN_STATE(DOCTYPENameState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(AfterDOCTYPENameState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (isASCIIUpper(cc)) {
-            m_token->appendToName(toLowerCase(cc));
-            HTML_ADVANCE_TO(DOCTYPENameState);
-        } else if (cc == kEndOfFileMarker) {
+    BEGIN_STATE(DOCTYPENameState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(AfterDOCTYPENameState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToName(cc);
-            HTML_ADVANCE_TO(DOCTYPENameState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToName(toASCIILower(character));
+        ADVANCE_TO(DOCTYPENameState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AfterDOCTYPENameState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(AfterDOCTYPENameState);
-        if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == kEndOfFileMarker) {
-            parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            DEPRECATED_DEFINE_STATIC_LOCAL(String, publicString, (ASCIILiteral("public")));
-            DEPRECATED_DEFINE_STATIC_LOCAL(String, systemString, (ASCIILiteral("system")));
-            if (cc == 'P' || cc == 'p') {
-                SegmentedString::LookAheadResult result = source.lookAheadIgnoringCase(publicString);
-                if (result == SegmentedString::DidMatch) {
-                    advanceStringAndASSERTIgnoringCase(source, "public");
-                    HTML_SWITCH_TO(AfterDOCTYPEPublicKeywordState);
-                } else if (result == SegmentedString::NotEnoughCharacters)
-                    return haveBufferedCharacterToken();
-            } else if (cc == 'S' || cc == 's') {
-                SegmentedString::LookAheadResult result = source.lookAheadIgnoringCase(systemString);
-                if (result == SegmentedString::DidMatch) {
-                    advanceStringAndASSERTIgnoringCase(source, "system");
-                    HTML_SWITCH_TO(AfterDOCTYPESystemKeywordState);
-                } else if (result == SegmentedString::NotEnoughCharacters)
-                    return haveBufferedCharacterToken();
-            }
+    BEGIN_STATE(AfterDOCTYPENameState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(AfterDOCTYPENameState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        if (isASCIIAlphaCaselessEqual(character, 'p')) {
+            auto result = source.advancePastIgnoringCase("public");
+            if (result == SegmentedString::DidMatch)
+                SWITCH_TO(AfterDOCTYPEPublicKeywordState);
+            if (result == SegmentedString::NotEnoughCharacters)
+                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
+        } else if (isASCIIAlphaCaselessEqual(character, 's')) {
+            auto result = source.advancePastIgnoringCase("system");
+            if (result == SegmentedString::DidMatch)
+                SWITCH_TO(AfterDOCTYPESystemKeywordState);
+            if (result == SegmentedString::NotEnoughCharacters)
+                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
+        }
+        parseError();
+        m_token.setForceQuirks();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AfterDOCTYPEPublicKeywordState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
-        else if (cc == '"') {
-            parseError();
-            m_token->setPublicIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
-        } else if (cc == '\'') {
+    BEGIN_STATE(AfterDOCTYPEPublicKeywordState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
+        if (character == '"') {
             parseError();
-            m_token->setPublicIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
-        } else if (cc == '>') {
+            m_token.setPublicIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
+        }
+        if (character == '\'') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.setPublicIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
+        }
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        parseError();
+        m_token.setForceQuirks();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(BeforeDOCTYPEPublicIdentifierState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
-        else if (cc == '"') {
-            m_token->setPublicIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
-        } else if (cc == '\'') {
-            m_token->setPublicIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
-        } else if (cc == '>') {
-            parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+    BEGIN_STATE(BeforeDOCTYPEPublicIdentifierState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
+        if (character == '"') {
+            m_token.setPublicIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
+        }
+        if (character == '\'') {
+            m_token.setPublicIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
+        }
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        parseError();
+        m_token.setForceQuirks();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(DOCTYPEPublicIdentifierDoubleQuotedState) {
-        if (cc == '"')
-            HTML_ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
-        else if (cc == '>') {
+    BEGIN_STATE(DOCTYPEPublicIdentifierDoubleQuotedState)
+        if (character == '"')
+            ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToPublicIdentifier(cc);
-            HTML_ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToPublicIdentifier(character);
+        ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(DOCTYPEPublicIdentifierSingleQuotedState) {
-        if (cc == '\'')
-            HTML_ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
-        else if (cc == '>') {
+    BEGIN_STATE(DOCTYPEPublicIdentifierSingleQuotedState)
+        if (character == '\'')
+            ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToPublicIdentifier(cc);
-            HTML_ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToPublicIdentifier(character);
+        ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AfterDOCTYPEPublicIdentifierState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == '"') {
+    BEGIN_STATE(AfterDOCTYPEPublicIdentifierState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == '"') {
             parseError();
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
-        } else if (cc == '\'') {
-            parseError();
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+        }
+        if (character == '\'') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(BetweenDOCTYPEPublicAndSystemIdentifiersState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == '"') {
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
-        } else if (cc == '\'') {
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
+        parseError();
+        m_token.setForceQuirks();
+        ADVANCE_TO(BogusDOCTYPEState);
+    END_STATE()
+
+    BEGIN_STATE(BetweenDOCTYPEPublicAndSystemIdentifiersState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == '"') {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+        }
+        if (character == '\'') {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        parseError();
+        m_token.setForceQuirks();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AfterDOCTYPESystemKeywordState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
-        else if (cc == '"') {
-            parseError();
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
-        } else if (cc == '\'') {
+    BEGIN_STATE(AfterDOCTYPESystemKeywordState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
+        if (character == '"') {
             parseError();
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
-        } else if (cc == '>') {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+        }
+        if (character == '\'') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+        }
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        parseError();
+        m_token.setForceQuirks();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(BeforeDOCTYPESystemIdentifierState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
-        if (cc == '"') {
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
-        } else if (cc == '\'') {
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
-        } else if (cc == '>') {
-            parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+    BEGIN_STATE(BeforeDOCTYPESystemIdentifierState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
+        if (character == '"') {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+        }
+        if (character == '\'') {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+        }
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        parseError();
+        m_token.setForceQuirks();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(DOCTYPESystemIdentifierDoubleQuotedState) {
-        if (cc == '"')
-            HTML_ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
-        else if (cc == '>') {
+    BEGIN_STATE(DOCTYPESystemIdentifierDoubleQuotedState)
+        if (character == '"')
+            ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToSystemIdentifier(cc);
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToSystemIdentifier(character);
+        ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(DOCTYPESystemIdentifierSingleQuotedState) {
-        if (cc == '\'')
-            HTML_ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
-        else if (cc == '>') {
+    BEGIN_STATE(DOCTYPESystemIdentifierSingleQuotedState)
+        if (character == '\'')
+            ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToSystemIdentifier(cc);
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToSystemIdentifier(character);
+        ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AfterDOCTYPESystemIdentifierState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == kEndOfFileMarker) {
+    BEGIN_STATE(AfterDOCTYPESystemIdentifierState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            parseError();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        parseError();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(BogusDOCTYPEState) {
-        if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == kEndOfFileMarker)
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        HTML_ADVANCE_TO(BogusDOCTYPEState);
-    }
+    BEGIN_STATE(BogusDOCTYPEState)
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == kEndOfFileMarker)
+            return emitAndReconsumeInDataState();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CDATASectionState) {
-        if (cc == ']')
-            HTML_ADVANCE_TO(CDATASectionRightSquareBracketState);
-        else if (cc == kEndOfFileMarker)
-            HTML_RECONSUME_IN(DataState);
-        else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(CDATASectionState);
-        }
-    }
+    BEGIN_STATE(CDATASectionState)
+        if (character == ']')
+            ADVANCE_TO(CDATASectionRightSquareBracketState);
+        if (character == kEndOfFileMarker)
+            RECONSUME_IN(DataState);
+        bufferCharacter(character);
+        ADVANCE_TO(CDATASectionState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CDATASectionRightSquareBracketState) {
-        if (cc == ']')
-            HTML_ADVANCE_TO(CDATASectionDoubleRightSquareBracketState);
-        else {
-            bufferASCIICharacter(']');
-            HTML_RECONSUME_IN(CDATASectionState);
-        }
-    }
+    BEGIN_STATE(CDATASectionRightSquareBracketState)
+        if (character == ']')
+            ADVANCE_TO(CDATASectionDoubleRightSquareBracketState);
+        bufferASCIICharacter(']');
+        RECONSUME_IN(CDATASectionState);
+    END_STATE()
 
-    HTML_BEGIN_STATE(CDATASectionDoubleRightSquareBracketState) {
-        if (cc == '>')
-            HTML_ADVANCE_TO(DataState);
-        else {
-            bufferASCIICharacter(']');
-            bufferASCIICharacter(']');
-            HTML_RECONSUME_IN(CDATASectionState);
-        }
-    }
+    BEGIN_STATE(CDATASectionDoubleRightSquareBracketState)
+        if (character == '>')
+            ADVANCE_TO(DataState);
+        bufferASCIICharacter(']');
+        bufferASCIICharacter(']');
+        RECONSUME_IN(CDATASectionState);
     END_STATE()
 
     }
@@ -1561,39 +1409,45 @@ String HTMLTokenizer::bufferedCharacters() const
 void HTMLTokenizer::updateStateFor(const AtomicString& tagName)
 {
     if (tagName == textareaTag || tagName == titleTag)
-        setState(HTMLTokenizer::RCDATAState);
+        m_state = RCDATAState;
     else if (tagName == plaintextTag)
-        setState(HTMLTokenizer::PLAINTEXTState);
+        m_state = PLAINTEXTState;
     else if (tagName == scriptTag)
-        setState(HTMLTokenizer::ScriptDataState);
+        m_state = ScriptDataState;
     else if (tagName == styleTag
         || tagName == iframeTag
         || tagName == xmpTag
         || (tagName == noembedTag && m_options.pluginsEnabled)
         || tagName == noframesTag
         || (tagName == noscriptTag && m_options.scriptEnabled))
-        setState(HTMLTokenizer::RAWTEXTState);
+        m_state = RAWTEXTState;
+}
+
+inline void HTMLTokenizer::appendToTemporaryBuffer(UChar character)
+{
+    ASSERT(isASCII(character));
+    m_temporaryBuffer.append(character);
 }
 
-inline bool HTMLTokenizer::temporaryBufferIs(const String& expectedString)
+inline bool HTMLTokenizer::temporaryBufferIs(const char* expectedString)
 {
     return vectorEqualsString(m_temporaryBuffer, expectedString);
 }
 
-inline void HTMLTokenizer::addToPossibleEndTag(LChar cc)
+inline void HTMLTokenizer::appendToPossibleEndTag(UChar character)
 {
-    ASSERT(isEndTagBufferingState(m_state));
-    m_bufferedEndTagName.append(cc);
+    ASSERT(isASCII(character));
+    m_bufferedEndTagName.append(character);
 }
 
-inline bool HTMLTokenizer::isAppropriateEndTag()
+inline bool HTMLTokenizer::isAppropriateEndTag() const
 {
     if (m_bufferedEndTagName.size() != m_appropriateEndTagName.size())
         return false;
 
-    size_t numCharacters = m_bufferedEndTagName.size();
+    unsigned size = m_bufferedEndTagName.size();
 
-    for (size_t i = 0; i < numCharacters; i++) {
+    for (unsigned i = 0; i < size; i++) {
         if (m_bufferedEndTagName[i] != m_appropriateEndTagName[i])
             return false;
     }
@@ -1603,7 +1457,6 @@ inline bool HTMLTokenizer::isAppropriateEndTag()
 
 inline void HTMLTokenizer::parseError()
 {
-    notImplemented();
 }
 
 }
index 3d43568..fed2118 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2008 Apple Inc. All Rights Reserved.
+ * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved.
  * Copyright (C) 2010 Google, Inc. All Rights Reserved.
  *
  * Redistribution and use in source and binary forms, with or without
 #include "HTMLParserOptions.h"
 #include "HTMLToken.h"
 #include "InputStreamPreprocessor.h"
-#include "SegmentedString.h"
 
 namespace WebCore {
 
+class SegmentedString;
+
 class HTMLTokenizer {
-    WTF_MAKE_NONCOPYABLE(HTMLTokenizer);
-    WTF_MAKE_FAST_ALLOCATED;
 public:
-    explicit HTMLTokenizer(const HTMLParserOptions&);
-    ~HTMLTokenizer();
+    explicit HTMLTokenizer(const HTMLParserOptions& = HTMLParserOptions());
+
+    // If we can't parse a whole token, this returns null.
+    class TokenPtr;
+    TokenPtr nextToken(SegmentedString&);
+
+    // Returns a copy of any characters buffered internally by the tokenizer.
+    // The tokenizer buffers characters when searching for the </script> token that terminates a script element.
+    String bufferedCharacters() const;
+    size_t numberOfBufferedCharacters() const;
+
+    // Updates the tokenizer's state according to the given tag name. This is an approximation of how the tree
+    // builder would update the tokenizer's state. This method is useful for approximating HTML tokenization.
+    // To get exactly the correct tokenization, you need the real tree builder.
+    //
+    // The main failures in the approximation are as follows:
+    //
+    //  * The first set of character tokens emitted for a <pre> element might contain an extra leading newline.
+    //  * The replacement of U+0000 with U+FFFD will not be sensitive to the tree builder's insertion mode.
+    //  * CDATA sections in foreign content will be tokenized as bogus comments instead of as character tokens.
+    //
+    // This approximation is also the algorithm called for when parsing an HTML fragment.
+    // https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments
+    void updateStateFor(const AtomicString& tagName);
 
-    void reset();
+    void setForceNullCharacterReplacement(bool);
 
+    bool shouldAllowCDATA() const;
+    void setShouldAllowCDATA(bool);
+
+    bool isInDataState() const;
+
+    void setDataState();
+    void setPLAINTEXTState();
+    void setRAWTEXTState();
+    void setRCDATAState();
+    void setScriptDataState();
+
+    bool neverSkipNullCharacters() const;
+
+private:
     enum State {
         DataState,
         CharacterReferenceInDataState,
@@ -88,10 +123,7 @@ public:
         AfterAttributeValueQuotedState,
         SelfClosingStartTagState,
         BogusCommentState,
-        // The ContinueBogusCommentState is not in the HTML5 spec, but we use
-        // it internally to keep track of whether we've started the bogus
-        // comment token yet.
-        ContinueBogusCommentState,
+        ContinueBogusCommentState, // Not in the HTML spec, used internally to track whether we started the bogus comment token.
         MarkupDeclarationOpenState,
         CommentStartState,
         CommentStartDashState,
@@ -121,155 +153,197 @@ public:
         CDATASectionDoubleRightSquareBracketState,
     };
 
-    // This function returns true if it emits a token. Otherwise, callers
-    // must provide the same (in progress) token on the next call (unless
-    // they call reset() first).
-    bool nextToken(SegmentedString&, HTMLToken&);
+    bool processToken(SegmentedString&);
+    bool processEntity(SegmentedString&);
 
-    // Returns a copy of any characters buffered internally by the tokenizer.
-    // The tokenizer buffers characters when searching for the </script> token
-    // that terminates a script element.
-    String bufferedCharacters() const;
+    void parseError();
 
-    size_t numberOfBufferedCharacters() const
-    {
-        // Notice that we add 2 to the length of the m_temporaryBuffer to
-        // account for the "</" characters, which are effecitvely buffered in
-        // the tokenizer's state machine.
-        return m_temporaryBuffer.size() ? m_temporaryBuffer.size() + 2 : 0;
-    }
+    void bufferASCIICharacter(UChar);
+    void bufferCharacter(UChar);
 
-    // Updates the tokenizer's state according to the given tag name. This is
-    // an approximation of how the tree builder would update the tokenizer's
-    // state. This method is useful for approximating HTML tokenization. To
-    // get exactly the correct tokenization, you need the real tree builder.
-    //
-    // The main failures in the approximation are as follows:
-    //
-    //  * The first set of character tokens emitted for a <pre> element might
-    //    contain an extra leading newline.
-    //  * The replacement of U+0000 with U+FFFD will not be sensitive to the
-    //    tree builder's insertion mode.
-    //  * CDATA sections in foreign content will be tokenized as bogus comments
-    //    instead of as character tokens.
-    //
-    void updateStateFor(const AtomicString& tagName);
+    bool emitAndResumeInDataState(SegmentedString&);
+    bool emitAndReconsumeInDataState();
+    bool emitEndOfFile(SegmentedString&);
 
-    bool forceNullCharacterReplacement() const { return m_forceNullCharacterReplacement; }
-    void setForceNullCharacterReplacement(bool value) { m_forceNullCharacterReplacement = value; }
+    // Return true if we wil emit a character token before dealing with the buffered end tag.
+    void flushBufferedEndTag();
+    bool commitToPartialEndTag(SegmentedString&, UChar, State);
+    bool commitToCompleteEndTag(SegmentedString&);
 
-    bool shouldAllowCDATA() const { return m_shouldAllowCDATA; }
-    void setShouldAllowCDATA(bool value) { m_shouldAllowCDATA = value; }
+    void appendToTemporaryBuffer(UChar);
+    bool temporaryBufferIs(const char*);
 
-    State state() const { return m_state; }
-    void setState(State state) { m_state = state; }
+    // Sometimes we speculatively consume input characters and we don't know whether they represent
+    // end tags or RCDATA, etc. These functions help manage these state.
+    bool inEndTagBufferingState() const;
+    void appendToPossibleEndTag(UChar);
+    void saveEndTagNameIfNeeded();
+    bool isAppropriateEndTag() const;
 
-    inline bool shouldSkipNullCharacters() const
-    {
-        return !m_forceNullCharacterReplacement
-            && (m_state == HTMLTokenizer::DataState
-                || m_state == HTMLTokenizer::RCDATAState
-                || m_state == HTMLTokenizer::RAWTEXTState);
-    }
+    bool haveBufferedCharacterToken() const;
+
+    static bool isNullCharacterSkippingState(State);
+
+    State m_state { DataState };
+    bool m_forceNullCharacterReplacement { false };
+    bool m_shouldAllowCDATA { false };
+
+    mutable HTMLToken m_token;
+
+    // https://html.spec.whatwg.org/#additional-allowed-character
+    UChar m_additionalAllowedCharacter { 0 };
+
+    // https://html.spec.whatwg.org/#preprocessing-the-input-stream
+    InputStreamPreprocessor<HTMLTokenizer> m_preprocessor;
+
+    Vector<UChar, 32> m_appropriateEndTagName;
+
+    // https://html.spec.whatwg.org/#temporary-buffer
+    Vector<LChar, 32> m_temporaryBuffer;
+
+    // We occasionally want to emit both a character token and an end tag
+    // token (e.g., when lexing script). We buffer the name of the end tag
+    // token here so we remember it next time we re-enter the tokenizer.
+    Vector<LChar, 32> m_bufferedEndTagName;
+
+    const HTMLParserOptions m_options;
+};
+
+class HTMLTokenizer::TokenPtr {
+public:
+    TokenPtr();
+    ~TokenPtr();
+
+    TokenPtr(TokenPtr&&);
+    TokenPtr& operator=(TokenPtr&&) = delete;
+
+    void clear();
+
+    operator bool() const;
+
+    HTMLToken& operator*() const;
+    HTMLToken* operator->() const;
 
 private:
-    inline bool processEntity(SegmentedString&);
+    friend class HTMLTokenizer;
+    explicit TokenPtr(HTMLToken*);
 
-    inline void parseError();
+    HTMLToken* m_token { nullptr };
+};
 
-    void bufferASCIICharacter(UChar character)
-    {
-        ASSERT(character != kEndOfFileMarker);
-        ASSERT(isASCII(character));
-        m_token->appendToCharacter(static_cast<LChar>(character));
-    }
+inline HTMLTokenizer::TokenPtr::TokenPtr()
+{
+}
 
-    void bufferCharacter(UChar character)
-    {
-        ASSERT(character != kEndOfFileMarker);
-        m_token->appendToCharacter(character);
-    }
-    void bufferCharacter(char) = delete;
-    void bufferCharacter(LChar) = delete;
-
-    inline bool emitAndResumeIn(SegmentedString& source, State state)
-    {
-        saveEndTagNameIfNeeded();
-        m_state = state;
-        source.advanceAndUpdateLineNumber();
-        return true;
-    }
-    
-    inline bool emitAndReconsumeIn(SegmentedString&, State state)
-    {
-        saveEndTagNameIfNeeded();
-        m_state = state;
-        return true;
-    }
+inline HTMLTokenizer::TokenPtr::TokenPtr(HTMLToken* token)
+    : m_token(token)
+{
+}
 
-    inline bool emitEndOfFile(SegmentedString& source)
-    {
-        if (haveBufferedCharacterToken())
-            return true;
-        m_state = HTMLTokenizer::DataState;
-        source.advanceAndUpdateLineNumber();
+inline HTMLTokenizer::TokenPtr::~TokenPtr()
+{
+    if (m_token)
         m_token->clear();
-        m_token->makeEndOfFile();
-        return true;
+}
+
+inline HTMLTokenizer::TokenPtr::TokenPtr(TokenPtr&& other)
+    : m_token(other.m_token)
+{
+    other.m_token = nullptr;
+}
+
+inline void HTMLTokenizer::TokenPtr::clear()
+{
+    if (m_token) {
+        m_token->clear();
+        m_token = nullptr;
     }
+}
 
-    inline bool flushEmitAndResumeIn(SegmentedString&, State);
+inline HTMLTokenizer::TokenPtr::operator bool() const
+{
+    return m_token;
+}
 
-    // Return whether we need to emit a character token before dealing with
-    // the buffered end tag.
-    inline bool flushBufferedEndTag(SegmentedString&);
-    inline bool temporaryBufferIs(const String&);
+inline HTMLToken& HTMLTokenizer::TokenPtr::operator*() const
+{
+    ASSERT(m_token);
+    return *m_token;
+}
 
-    // Sometimes we speculatively consume input characters and we don't
-    // know whether they represent end tags or RCDATA, etc. These
-    // functions help manage these state.
-    inline void addToPossibleEndTag(LChar cc);
+inline HTMLToken* HTMLTokenizer::TokenPtr::operator->() const
+{
+    ASSERT(m_token);
+    return m_token;
+}
 
-    inline void saveEndTagNameIfNeeded()
-    {
-        ASSERT(m_token->type() != HTMLToken::Uninitialized);
-        if (m_token->type() == HTMLToken::StartTag)
-            m_appropriateEndTagName = m_token->name();
-    }
-    inline bool isAppropriateEndTag();
+inline HTMLTokenizer::TokenPtr HTMLTokenizer::nextToken(SegmentedString& source)
+{
+    return TokenPtr(processToken(source) ? &m_token : nullptr);
+}
 
+inline size_t HTMLTokenizer::numberOfBufferedCharacters() const
+{
+    // Notice that we add 2 to the length of the m_temporaryBuffer to
+    // account for the "</" characters, which are effecitvely buffered in
+    // the tokenizer's state machine.
+    return m_temporaryBuffer.size() ? m_temporaryBuffer.size() + 2 : 0;
+}
 
-    inline bool haveBufferedCharacterToken()
-    {
-        return m_token->type() == HTMLToken::Character;
-    }
+inline void HTMLTokenizer::setForceNullCharacterReplacement(bool value)
+{
+    m_forceNullCharacterReplacement = value;
+}
 
-    State m_state;
-    bool m_forceNullCharacterReplacement;
-    bool m_shouldAllowCDATA;
+inline bool HTMLTokenizer::shouldAllowCDATA() const
+{
+    return m_shouldAllowCDATA;
+}
 
-    // m_token is owned by the caller. If nextToken is not on the stack,
-    // this member might be pointing to unallocated memory.
-    HTMLToken* m_token;
+inline void HTMLTokenizer::setShouldAllowCDATA(bool value)
+{
+    m_shouldAllowCDATA = value;
+}
 
-    // http://www.whatwg.org/specs/web-apps/current-work/#additional-allowed-character
-    UChar m_additionalAllowedCharacter;
+inline bool HTMLTokenizer::isInDataState() const
+{
+    return m_state == DataState;
+}
 
-    // http://www.whatwg.org/specs/web-apps/current-work/#preprocessing-the-input-stream
-    InputStreamPreprocessor<HTMLTokenizer> m_inputStreamPreprocessor;
+inline void HTMLTokenizer::setDataState()
+{
+    m_state = DataState;
+}
 
-    Vector<UChar, 32> m_appropriateEndTagName;
+inline void HTMLTokenizer::setPLAINTEXTState()
+{
+    m_state = PLAINTEXTState;
+}
 
-    // http://www.whatwg.org/specs/web-apps/current-work/#temporary-buffer
-    Vector<LChar, 32> m_temporaryBuffer;
+inline void HTMLTokenizer::setRAWTEXTState()
+{
+    m_state = RAWTEXTState;
+}
 
-    // We occationally want to emit both a character token and an end tag
-    // token (e.g., when lexing script). We buffer the name of the end tag
-    // token here so we remember it next time we re-enter the tokenizer.
-    Vector<LChar, 32> m_bufferedEndTagName;
+inline void HTMLTokenizer::setRCDATAState()
+{
+    m_state = RCDATAState;
+}
 
-    HTMLParserOptions m_options;
-};
+inline void HTMLTokenizer::setScriptDataState()
+{
+    m_state = ScriptDataState;
+}
+
+inline bool HTMLTokenizer::isNullCharacterSkippingState(State state)
+{
+    return state == DataState || state == RCDATAState || state == RAWTEXTState;
+}
+
+inline bool HTMLTokenizer::neverSkipNullCharacters() const
+{
+    return m_forceNullCharacterReplacement;
+}
 
 }
 
index eaca0eb..042ca0e 100644 (file)
@@ -695,7 +695,7 @@ void HTMLTreeBuilder::processStartTagForInBody(AtomicHTMLToken& token)
     if (token.name() == plaintextTag) {
         processFakePEndTagIfPInButtonScope();
         m_tree.insertHTMLElement(&token);
-        m_parser.tokenizer().setState(HTMLTokenizer::PLAINTEXTState);
+        m_parser.tokenizer().setPLAINTEXTState();
         return;
     }
     if (token.name() == buttonTag) {
@@ -799,7 +799,7 @@ void HTMLTreeBuilder::processStartTagForInBody(AtomicHTMLToken& token)
     if (token.name() == textareaTag) {
         m_tree.insertHTMLElement(&token);
         m_shouldSkipLeadingNewline = true;
-        m_parser.tokenizer().setState(HTMLTokenizer::RCDATAState);
+        m_parser.tokenizer().setRCDATAState();
         m_originalInsertionMode = m_insertionMode;
         m_framesetOk = false;
         m_insertionMode = InsertionMode::Text;
@@ -2137,8 +2137,8 @@ void HTMLTreeBuilder::processEndTag(AtomicHTMLToken& token)
             // self-closing script tag was encountered and pre-HTML5 parser
             // quirks are enabled. We must set the tokenizer's state to
             // DataState explicitly if the tokenizer didn't have a chance to.
-            ASSERT(m_parser.tokenizer().state() == HTMLTokenizer::DataState || m_options.usePreHTML5ParserQuirks);
-            m_parser.tokenizer().setState(HTMLTokenizer::DataState);
+            ASSERT(m_parser.tokenizer().isInDataState() || m_options.usePreHTML5ParserQuirks);
+            m_parser.tokenizer().setDataState();
             return;
         }
         m_tree.openElements().pop();
@@ -2739,7 +2739,7 @@ void HTMLTreeBuilder::processGenericRCDATAStartTag(AtomicHTMLToken& token)
 {
     ASSERT(token.type() == HTMLToken::StartTag);
     m_tree.insertHTMLElement(&token);
-    m_parser.tokenizer().setState(HTMLTokenizer::RCDATAState);
+    m_parser.tokenizer().setRCDATAState();
     m_originalInsertionMode = m_insertionMode;
     m_insertionMode = InsertionMode::Text;
 }
@@ -2748,7 +2748,7 @@ void HTMLTreeBuilder::processGenericRawTextStartTag(AtomicHTMLToken& token)
 {
     ASSERT(token.type() == HTMLToken::StartTag);
     m_tree.insertHTMLElement(&token);
-    m_parser.tokenizer().setState(HTMLTokenizer::RAWTEXTState);
+    m_parser.tokenizer().setRAWTEXTState();
     m_originalInsertionMode = m_insertionMode;
     m_insertionMode = InsertionMode::Text;
 }
@@ -2757,7 +2757,7 @@ void HTMLTreeBuilder::processScriptStartTag(AtomicHTMLToken& token)
 {
     ASSERT(token.type() == HTMLToken::StartTag);
     m_tree.insertScriptElement(&token);
-    m_parser.tokenizer().setState(HTMLTokenizer::ScriptDataState);
+    m_parser.tokenizer().setScriptDataState();
     m_originalInsertionMode = m_insertionMode;
 
     TextPosition position = m_parser.textPosition();
index ffd639a..6290c15 100644 (file)
@@ -40,7 +40,7 @@ template <typename Tokenizer>
 class InputStreamPreprocessor {
     WTF_MAKE_NONCOPYABLE(InputStreamPreprocessor);
 public:
-    InputStreamPreprocessor(Tokenizer* tokenizer)
+    explicit InputStreamPreprocessor(Tokenizer& tokenizer)
         : m_tokenizer(tokenizer)
     {
         reset();
@@ -51,8 +51,11 @@ public:
     // Returns whether we succeeded in peeking at the next character.
     // The only way we can fail to peek is if there are no more
     // characters in |source| (after collapsing \r\n, etc).
-    ALWAYS_INLINE bool peek(SegmentedString& source)
+    ALWAYS_INLINE bool peek(SegmentedString& source, bool skipNullCharacters = false)
     {
+        if (source.isEmpty())
+            return false;
+
         m_nextInputCharacter = source.currentChar();
 
         // Every branch in this function is expensive, so we have a
@@ -64,16 +67,14 @@ public:
             m_skipNextNewLine = false;
             return true;
         }
-        return processNextInputCharacter(source);
+        return processNextInputCharacter(source, skipNullCharacters);
     }
 
     // Returns whether there are more characters in |source| after advancing.
-    ALWAYS_INLINE bool advance(SegmentedString& source)
+    ALWAYS_INLINE bool advance(SegmentedString& source, bool skipNullCharacters = false)
     {
         source.advanceAndUpdateLineNumber();
-        if (source.isEmpty())
-            return false;
-        return peek(source);
+        return peek(source, skipNullCharacters);
     }
 
     bool skipNextNewLine() const { return m_skipNextNewLine; }
@@ -85,7 +86,7 @@ public:
     }
 
 private:
-    bool processNextInputCharacter(SegmentedString& source)
+    bool processNextInputCharacter(SegmentedString& source, bool skipNullCharacters)
     {
     ProcessAgain:
         ASSERT(m_nextInputCharacter == source.currentChar());
@@ -107,7 +108,7 @@ private:
             // by the replacement character. We suspect this is a problem with the spec as doing
             // that filtering breaks surrogate pair handling and causes us not to match Minefield.
             if (m_nextInputCharacter == '\0' && !shouldTreatNullAsEndOfFileMarker(source)) {
-                if (m_tokenizer->shouldSkipNullCharacters()) {
+                if (skipNullCharacters && !m_tokenizer.neverSkipNullCharacters()) {
                     source.advancePastNonNewline();
                     if (source.isEmpty())
                         return false;
@@ -125,7 +126,7 @@ private:
         return source.isClosed() && source.length() == 1;
     }
 
-    Tokenizer* m_tokenizer;
+    Tokenizer& m_tokenizer;
 
     // http://www.whatwg.org/specs/web-apps/current-work/#next-input-character
     UChar m_nextInputCharacter;
index 9df6996..5fa62a3 100644 (file)
@@ -61,7 +61,7 @@ void TextDocumentParser::insertFakePreElement()
 
     // Although Text Documents expose a "pre" element in their DOM, they
     // act like a <plaintext> tag, so we have to force plaintext mode.
-    tokenizer().setState(HTMLTokenizer::PLAINTEXTState);
+    tokenizer().setPLAINTEXTState();
 
     m_haveInsertedFakePreElement = true;
 }
index 298c8c6..722021b 100644 (file)
@@ -566,7 +566,7 @@ bool XSSAuditor::eraseAttributeIfInjected(const FilterTokenRequest& request, con
 String XSSAuditor::decodedSnippetForName(const FilterTokenRequest& request)
 {
     // Grab a fixed number of characters equal to the length of the token's name plus one (to account for the "<").
-    return fullyDecodeString(request.sourceTracker.sourceForToken(request.token), m_encoding).substring(0, request.token.name().size() + 1);
+    return fullyDecodeString(request.sourceTracker.source(request.token), m_encoding).substring(0, request.token.name().size() + 1);
 }
 
 String XSSAuditor::decodedSnippetForAttribute(const FilterTokenRequest& request, const HTMLToken::Attribute& attribute, AttributeKind treatment)
@@ -575,9 +575,9 @@ String XSSAuditor::decodedSnippetForAttribute(const FilterTokenRequest& request,
     // for an input of |name="value"|, the snippet is |name="value|. For an
     // unquoted input of |name=value |, the snippet is |name=value|.
     // FIXME: We should grab one character before the name also.
-    unsigned start = attribute.nameRange.start;
-    unsigned end = attribute.valueRange.end;
-    String decodedSnippet = fullyDecodeString(request.sourceTracker.sourceForToken(request.token).substring(start, end - start), m_encoding);
+    unsigned start = attribute.startOffset;
+    unsigned end = attribute.endOffset;
+    String decodedSnippet = fullyDecodeString(request.sourceTracker.source(request.token, start, end), m_encoding);
     decodedSnippet.truncate(kMaximumFragmentLengthTarget);
     if (treatment == SrcLikeAttribute) {
         int slashCount = 0;
@@ -630,7 +630,7 @@ String XSSAuditor::decodedSnippetForAttribute(const FilterTokenRequest& request,
 
 String XSSAuditor::decodedSnippetForJavaScript(const FilterTokenRequest& request)
 {
-    String string = request.sourceTracker.sourceForToken(request.token);
+    String string = request.sourceTracker.source(request.token);
     size_t startPosition = 0;
     size_t endPosition = string.length();
     size_t foundPosition = notFound;
@@ -737,12 +737,4 @@ bool XSSAuditor::isLikelySafeResource(const String& url)
     return (m_documentURL.host() == resourceURL.host() && resourceURL.query().isEmpty());
 }
 
-bool XSSAuditor::isSafeToSendToAnotherThread() const
-{
-    return m_documentURL.isSafeToSendToAnotherThread()
-        && m_decodedURL.isSafeToSendToAnotherThread()
-        && m_decodedHTTPBody.isSafeToSendToAnotherThread()
-        && m_cachedDecodedSnippet.isSafeToSendToAnotherThread();
-}
-
 } // namespace WebCore
index 2c541a2..34b6a8f 100644 (file)
@@ -61,7 +61,6 @@ public:
     void initForFragment();
 
     std::unique_ptr<XSSInfo> filterToken(const FilterTokenRequest&);
-    bool isSafeToSendToAnotherThread() const;
 
 private:
     static const size_t kMaximumFragmentLengthTarget = 100;
index 56ce08a..0cb19e8 100644 (file)
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) 2011, 2013 Google Inc.  All rights reserved.
- * Copyright (C) 2014 Apple Inc.  All rights reserved.
+ * Copyright (C) 2014-2015 Apple Inc.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
 
 namespace WebCore {
 
-#define WEBVTT_BEGIN_STATE(stateName) case stateName: stateName:
-#define WEBVTT_ADVANCE_TO(stateName)                               \
-    do {                                                           \
-        state = stateName;                                         \
-        ASSERT(!m_input.isEmpty());                                \
-        m_inputStreamPreprocessor.advance(m_input);                \
-        cc = m_inputStreamPreprocessor.nextInputCharacter();       \
-        goto stateName;                                            \
+#define WEBVTT_ADVANCE_TO(stateName)                        \
+    do {                                                    \
+        ASSERT(!m_input.isEmpty());                         \
+        m_preprocessor.advance(m_input);                    \
+        character = m_preprocessor.nextInputCharacter();    \
+        goto stateName;                                     \
     } while (false)
-
     
-template<unsigned charactersCount>
-ALWAYS_INLINE bool equalLiteral(const StringBuilder& s, const char (&characters)[charactersCount])
+template<unsigned charactersCount> ALWAYS_INLINE bool equalLiteral(const StringBuilder& s, const char (&characters)[charactersCount])
 {
     return WTF::equal(s, reinterpret_cast<const LChar*>(characters), charactersCount - 1);
 }
@@ -79,7 +75,7 @@ inline bool advanceAndEmitToken(SegmentedString& source, WebVTTToken& resultToke
 
 WebVTTTokenizer::WebVTTTokenizer(const String& input)
     : m_input(input)
-    , m_inputStreamPreprocessor(this)
+    , m_preprocessor(*this)
 {
     // Append an EOF marker and close the input "stream".
     ASSERT(!m_input.isClosed());
@@ -89,12 +85,12 @@ WebVTTTokenizer::WebVTTTokenizer(const String& input)
 
 bool WebVTTTokenizer::nextToken(WebVTTToken& token)
 {
-    if (m_input.isEmpty() || !m_inputStreamPreprocessor.peek(m_input))
+    if (m_input.isEmpty() || !m_preprocessor.peek(m_input))
         return false;
 
-    UChar cc = m_inputStreamPreprocessor.nextInputCharacter();
-    if (cc == kEndOfFileMarker) {
-        m_inputStreamPreprocessor.advance(m_input);
+    UChar character = m_preprocessor.nextInputCharacter();
+    if (character == kEndOfFileMarker) {
+        m_preprocessor.advance(m_input);
         return false;
     }
 
@@ -102,169 +98,134 @@ bool WebVTTTokenizer::nextToken(WebVTTToken& token)
     StringBuilder result;
     StringBuilder classes;
 
-    enum {
-        DataState,
-        EscapeState,
-        TagState,
-        StartTagState,
-        StartTagClassState,
-        StartTagAnnotationState,
-        EndTagState,
-        TimestampTagState,
-    } state = DataState;
-
-    // 4.8.10.13.4 WebVTT cue text tokenizer
-    switch (state) {
-    WEBVTT_BEGIN_STATE(DataState) {
-        if (cc == '&') {
-            buffer.append(static_cast<LChar>(cc));
-            WEBVTT_ADVANCE_TO(EscapeState);
-        } else if (cc == '<') {
-            if (result.isEmpty())
-                WEBVTT_ADVANCE_TO(TagState);
-            else {
-                // We don't want to advance input or perform a state transition - just return a (new) token.
-                // (On the next call to nextToken we will see '<' again, but take the other branch in this if instead.)
-                return emitToken(token, WebVTTToken::StringToken(result.toString()));
-            }
-        } else if (cc == kEndOfFileMarker)
-            return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
+// 4.8.10.13.4 WebVTT cue text tokenizer
+DataState:
+    if (character == '&') {
+        buffer.append('&');
+        WEBVTT_ADVANCE_TO(EscapeState);
+    } else if (character == '<') {
+        if (result.isEmpty())
+            WEBVTT_ADVANCE_TO(TagState);
         else {
-            result.append(cc);
-            WEBVTT_ADVANCE_TO(DataState);
-        }
-    }
-    END_STATE()
-
-    WEBVTT_BEGIN_STATE(EscapeState) {
-        if (cc == ';') {
-            if (equalLiteral(buffer, "&amp"))
-                result.append('&');
-            else if (equalLiteral(buffer, "&lt"))
-                result.append('<');
-            else if (equalLiteral(buffer, "&gt"))
-                result.append('>');
-            else if (equalLiteral(buffer, "&lrm"))
-                result.append(leftToRightMark);
-            else if (equalLiteral(buffer, "&rlm"))
-                result.append(rightToLeftMark);
-            else if (equalLiteral(buffer, "&nbsp"))
-                result.append(noBreakSpace);
-            else {
-                buffer.append(static_cast<LChar>(cc));
-                result.append(buffer);
-            }
-            buffer.clear();
-            WEBVTT_ADVANCE_TO(DataState);
-        } else if (isASCIIAlphanumeric(cc)) {
-            buffer.append(static_cast<LChar>(cc));
-            WEBVTT_ADVANCE_TO(EscapeState);
-        } else if (cc == '<') {
-            result.append(buffer);
+            // We don't want to advance input or perform a state transition - just return a (new) token.
+            // (On the next call to nextToken we will see '<' again, but take the other branch in this if instead.)
             return emitToken(token, WebVTTToken::StringToken(result.toString()));
-        } else if (cc == kEndOfFileMarker) {
-            result.append(buffer);
-            return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
-        } else {
-            result.append(buffer);
-            buffer.clear();
-
-            if (cc == '&') {
-                buffer.append(static_cast<LChar>(cc));
-                WEBVTT_ADVANCE_TO(EscapeState);
-            }
-            result.append(cc);
-            WEBVTT_ADVANCE_TO(DataState);
-        }
-    }
-    END_STATE()
-
-    WEBVTT_BEGIN_STATE(TagState) {
-        if (isTokenizerWhitespace(cc)) {
-            ASSERT(result.isEmpty());
-            WEBVTT_ADVANCE_TO(StartTagAnnotationState);
-        } else if (cc == '.') {
-            ASSERT(result.isEmpty());
-            WEBVTT_ADVANCE_TO(StartTagClassState);
-        } else if (cc == '/') {
-            WEBVTT_ADVANCE_TO(EndTagState);
-        } else if (WTF::isASCIIDigit(cc)) {
-            result.append(cc);
-            WEBVTT_ADVANCE_TO(TimestampTagState);
-        } else if (cc == '>' || cc == kEndOfFileMarker) {
-            ASSERT(result.isEmpty());
-            return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
-        } else {
-            result.append(cc);
-            WEBVTT_ADVANCE_TO(StartTagState);
         }
+    } else if (character == kEndOfFileMarker)
+        return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
+    else {
+        result.append(character);
+        WEBVTT_ADVANCE_TO(DataState);
     }
-    END_STATE()
 
-    WEBVTT_BEGIN_STATE(StartTagState) {
-        if (isTokenizerWhitespace(cc))
-            WEBVTT_ADVANCE_TO(StartTagAnnotationState);
-        else if (cc == '.')
-            WEBVTT_ADVANCE_TO(StartTagClassState);
-        else if (cc == '>' || cc == kEndOfFileMarker)
-            return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
+EscapeState:
+    if (character == ';') {
+        if (equalLiteral(buffer, "&amp"))
+            result.append('&');
+        else if (equalLiteral(buffer, "&lt"))
+            result.append('<');
+        else if (equalLiteral(buffer, "&gt"))
+            result.append('>');
+        else if (equalLiteral(buffer, "&lrm"))
+            result.append(leftToRightMark);
+        else if (equalLiteral(buffer, "&rlm"))
+            result.append(rightToLeftMark);
+        else if (equalLiteral(buffer, "&nbsp"))
+            result.append(noBreakSpace);
         else {
-            result.append(cc);
-            WEBVTT_ADVANCE_TO(StartTagState);
+            buffer.append(character);
+            result.append(buffer);
         }
-    }
-    END_STATE()
-
-    WEBVTT_BEGIN_STATE(StartTagClassState) {
-        if (isTokenizerWhitespace(cc)) {
-            addNewClass(classes, buffer);
-            buffer.clear();
-            WEBVTT_ADVANCE_TO(StartTagAnnotationState);
-        } else if (cc == '.') {
-            addNewClass(classes, buffer);
-            buffer.clear();
-            WEBVTT_ADVANCE_TO(StartTagClassState);
-        } else if (cc == '>' || cc == kEndOfFileMarker) {
-            addNewClass(classes, buffer);
-            buffer.clear();
-            return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString()));
-        } else {
-            buffer.append(cc);
-            WEBVTT_ADVANCE_TO(StartTagClassState);
+        buffer.clear();
+        WEBVTT_ADVANCE_TO(DataState);
+    } else if (isASCIIAlphanumeric(character)) {
+        buffer.append(character);
+        WEBVTT_ADVANCE_TO(EscapeState);
+    } else if (character == '<') {
+        result.append(buffer);
+        return emitToken(token, WebVTTToken::StringToken(result.toString()));
+    } else if (character == kEndOfFileMarker) {
+        result.append(buffer);
+        return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
+    } else {
+        result.append(buffer);
+        buffer.clear();
+
+        if (character == '&') {
+            buffer.append('&');
+            WEBVTT_ADVANCE_TO(EscapeState);
         }
-
+        result.append(character);
+        WEBVTT_ADVANCE_TO(DataState);
     }
-    END_STATE()
 
-    WEBVTT_BEGIN_STATE(StartTagAnnotationState) {
-        if (cc == '>' || cc == kEndOfFileMarker) {
-            return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString(), buffer.toAtomicString()));
-        }
-        buffer.append(cc);
+TagState:
+    if (isTokenizerWhitespace(character)) {
+        ASSERT(result.isEmpty());
         WEBVTT_ADVANCE_TO(StartTagAnnotationState);
-    }
-    END_STATE()
-    
-    WEBVTT_BEGIN_STATE(EndTagState) {
-        if (cc == '>' || cc == kEndOfFileMarker)
-            return advanceAndEmitToken(m_input, token, WebVTTToken::EndTag(result.toString()));
-        result.append(cc);
+    } else if (character == '.') {
+        ASSERT(result.isEmpty());
+        WEBVTT_ADVANCE_TO(StartTagClassState);
+    } else if (character == '/') {
         WEBVTT_ADVANCE_TO(EndTagState);
+    } else if (WTF::isASCIIDigit(character)) {
+        result.append(character);
+        WEBVTT_ADVANCE_TO(TimestampTagState);
+    } else if (character == '>' || character == kEndOfFileMarker) {
+        ASSERT(result.isEmpty());
+        return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
+    } else {
+        result.append(character);
+        WEBVTT_ADVANCE_TO(StartTagState);
     }
-    END_STATE()
 
-    WEBVTT_BEGIN_STATE(TimestampTagState) {
-        if (cc == '>' || cc == kEndOfFileMarker)
-            return advanceAndEmitToken(m_input, token, WebVTTToken::TimestampTag(result.toString()));
-        result.append(cc);
-        WEBVTT_ADVANCE_TO(TimestampTagState);
+StartTagState:
+    if (isTokenizerWhitespace(character))
+        WEBVTT_ADVANCE_TO(StartTagAnnotationState);
+    else if (character == '.')
+        WEBVTT_ADVANCE_TO(StartTagClassState);
+    else if (character == '>' || character == kEndOfFileMarker)
+        return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
+    else {
+        result.append(character);
+        WEBVTT_ADVANCE_TO(StartTagState);
     }
-    END_STATE()
 
+StartTagClassState:
+    if (isTokenizerWhitespace(character)) {
+        addNewClass(classes, buffer);
+        buffer.clear();
+        WEBVTT_ADVANCE_TO(StartTagAnnotationState);
+    } else if (character == '.') {
+        addNewClass(classes, buffer);
+        buffer.clear();
+        WEBVTT_ADVANCE_TO(StartTagClassState);
+    } else if (character == '>' || character == kEndOfFileMarker) {
+        addNewClass(classes, buffer);
+        buffer.clear();
+        return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString()));
+    } else {
+        buffer.append(character);
+        WEBVTT_ADVANCE_TO(StartTagClassState);
     }
 
-    ASSERT_NOT_REACHED();
-    return false;
+StartTagAnnotationState:
+    if (character == '>' || character == kEndOfFileMarker)
+        return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString(), buffer.toAtomicString()));
+    buffer.append(character);
+    WEBVTT_ADVANCE_TO(StartTagAnnotationState);
+
+EndTagState:
+    if (character == '>' || character == kEndOfFileMarker)
+        return advanceAndEmitToken(m_input, token, WebVTTToken::EndTag(result.toString()));
+    result.append(character);
+    WEBVTT_ADVANCE_TO(EndTagState);
+
+TimestampTagState:
+    if (character == '>' || character == kEndOfFileMarker)
+        return advanceAndEmitToken(m_input, token, WebVTTToken::TimestampTag(result.toString()));
+    result.append(character);
+    WEBVTT_ADVANCE_TO(TimestampTagState);
 }
 
 }
index a97797e..6c1dda3 100644 (file)
 namespace WebCore {
 
 class WebVTTTokenizer {
-    WTF_MAKE_NONCOPYABLE(WebVTTTokenizer);
 public:
     explicit WebVTTTokenizer(const String&);
-
     bool nextToken(WebVTTToken&);
 
-    inline bool shouldSkipNullCharacters() const { return true; }
+    static bool neverSkipNullCharacters() { return false; }
 
 private:
     SegmentedString m_input;
-
-    // ://www.whatwg.org/specs/web-apps/current-work/#preprocessing-the-input-stream
-    InputStreamPreprocessor<WebVTTTokenizer> m_inputStreamPreprocessor;
+    InputStreamPreprocessor<WebVTTTokenizer> m_preprocessor;
 };
 
 }
index b0dc3d3..491d16c 100644 (file)
@@ -20,6 +20,8 @@
 #include "config.h"
 #include "SegmentedString.h"
 
+#include <wtf/text/TextPosition.h>
+
 namespace WebCore {
 
 SegmentedString::SegmentedString(const SegmentedString& other)
@@ -44,7 +46,7 @@ SegmentedString::SegmentedString(const SegmentedString& other)
         m_currentChar = m_currentString.m_length ? m_currentString.getCurrentChar() : 0;
 }
 
-const SegmentedString& SegmentedString::operator=(const SegmentedString& other)
+SegmentedString& SegmentedString::operator=(const SegmentedString& other)
 {
     m_pushedChar1 = other.m_pushedChar1;
     m_pushedChar2 = other.m_pushedChar2;
@@ -130,14 +132,14 @@ void SegmentedString::append(const SegmentedSubstring& s)
     m_empty = false;
 }
 
-void SegmentedString::prepend(const SegmentedSubstring& s)
+void SegmentedString::pushBack(const SegmentedSubstring& s)
 {
-    ASSERT(!escaped());
+    ASSERT(!m_pushedChar1);
     ASSERT(!s.numberOfCharactersConsumed());
     if (!s.m_length)
         return;
 
-    // FIXME: We're assuming that the prepend were originally consumed by
+    // FIXME: We're assuming that the characters were originally consumed by
     //        this SegmentedString.  We're also ASSERTing that s is a fresh
     //        SegmentedSubstring.  These assumptions are sufficient for our
     //        current use, but we might need to handle the more elaborate
@@ -166,7 +168,7 @@ void SegmentedString::close()
 void SegmentedString::append(const SegmentedString& s)
 {
     ASSERT(!m_closed);
-    ASSERT(!s.escaped());
+    ASSERT(!s.m_pushedChar1);
     append(s.m_currentString);
     if (s.isComposite()) {
         Deque<SegmentedSubstring>::const_iterator it = s.m_substrings.begin();
@@ -177,17 +179,17 @@ void SegmentedString::append(const SegmentedString& s)
     m_currentChar = m_pushedChar1 ? m_pushedChar1 : (m_currentString.m_length ? m_currentString.getCurrentChar() : 0);
 }
 
-void SegmentedString::prepend(const SegmentedString& s)
+void SegmentedString::pushBack(const SegmentedString& s)
 {
-    ASSERT(!escaped());
-    ASSERT(!s.escaped());
+    ASSERT(!m_pushedChar1);
+    ASSERT(!s.m_pushedChar1);
     if (s.isComposite()) {
         Deque<SegmentedSubstring>::const_reverse_iterator it = s.m_substrings.rbegin();
         Deque<SegmentedSubstring>::const_reverse_iterator e = s.m_substrings.rend();
         for (; it != e; ++it)
-            prepend(*it);
+            pushBack(*it);
     }
-    prepend(s.m_currentString);
+    pushBack(s.m_currentString);
     m_currentChar = m_pushedChar1 ? m_pushedChar1 : (m_currentString.m_length ? m_currentString.getCurrentChar() : 0);
 }
 
@@ -228,12 +230,12 @@ String SegmentedString::toString() const
     return result.toString();
 }
 
-void SegmentedString::advance(unsigned count, UChar* consumedCharacters)
+void SegmentedString::advancePastNonNewlines(unsigned count, UChar* consumedCharacters)
 {
     ASSERT_WITH_SECURITY_IMPLICATION(count <= length());
     for (unsigned i = 0; i < count; ++i) {
         consumedCharacters[i] = currentChar();
-        advance();
+        advancePastNonNewline();
     }
 }
 
@@ -353,8 +355,7 @@ OrdinalNumber SegmentedString::currentLine() const
 
 OrdinalNumber SegmentedString::currentColumn() const
 {
-    int zeroBasedColumn = numberOfCharactersConsumed() - m_numberOfCharactersConsumedPriorToCurrentLine;
-    return OrdinalNumber::fromZeroBasedInt(zeroBasedColumn);
+    return OrdinalNumber::fromZeroBasedInt(numberOfCharactersConsumed() - m_numberOfCharactersConsumedPriorToCurrentLine);
 }
 
 void SegmentedString::setCurrentPosition(OrdinalNumber line, OrdinalNumber columnAftreProlog, int prologLength)
@@ -363,4 +364,18 @@ void SegmentedString::setCurrentPosition(OrdinalNumber line, OrdinalNumber colum
     m_numberOfCharactersConsumedPriorToCurrentLine = numberOfCharactersConsumed() + prologLength - columnAftreProlog.zeroBasedInt();
 }
 
+SegmentedString::AdvancePastResult SegmentedString::advancePastSlowCase(const char* literal, bool caseSensitive)
+{
+    unsigned length = strlen(literal);
+    if (length > this->length())
+        return NotEnoughCharacters;
+    UChar* consumedCharacters;
+    String consumedString = String::createUninitialized(length, consumedCharacters);
+    advancePastNonNewlines(length, consumedCharacters);
+    if (consumedString.startsWith(literal, caseSensitive))
+        return DidMatch;
+    pushBack(SegmentedString(consumedString));
+    return DidNotMatch;
+}
+
 }
index d5fe367..0813d60 100644 (file)
@@ -1,5 +1,5 @@
 /*
-    Copyright (C) 2004, 2005, 2006, 2007, 2008 Apple Inc. All rights reserved.
+    Copyright (C) 2004-2008, 2015 Apple Inc. All rights reserved.
 
     This library is free software; you can redistribute it and/or
     modify it under the terms of the GNU Library General Public
@@ -22,8 +22,6 @@
 
 #include <wtf/Deque.h>
 #include <wtf/text/StringBuilder.h>
-#include <wtf/text/TextPosition.h>
-#include <wtf/text/WTFString.h>
 
 namespace WebCore {
 
@@ -170,16 +168,14 @@ public:
     }
 
     SegmentedString(const SegmentedString&);
-
-    const SegmentedString& operator=(const SegmentedString&);
+    SegmentedString& operator=(const SegmentedString&);
 
     void clear();
     void close();
 
     void append(const SegmentedString&);
-    void prepend(const SegmentedString&);
+    void pushBack(const SegmentedString&);
 
-    bool excludeLineNumbers() const { return m_currentString.excludeLineNumbers(); }
     void setExcludeLineNumbers();
 
     void push(UChar c)
@@ -199,14 +195,9 @@ public:
 
     bool isClosed() const { return m_closed; }
 
-    enum LookAheadResult {
-        DidNotMatch,
-        DidMatch,
-        NotEnoughCharacters,
-    };
-
-    LookAheadResult lookAhead(const String& string) { return lookAheadInline(string, true); }
-    LookAheadResult lookAheadIgnoringCase(const String& string) { return lookAheadInline(string, false); }
+    enum AdvancePastResult { DidNotMatch, DidMatch, NotEnoughCharacters };
+    template<unsigned length> AdvancePastResult advancePast(const char (&literal)[length]) { return advancePast(literal, length - 1, true); }
+    template<unsigned length> AdvancePastResult advancePastIgnoringCase(const char (&literal)[length]) { return advancePast(literal, length - 1, false); }
 
     void advance()
     {
@@ -226,7 +217,7 @@ public:
         (this->*m_advanceFunc)();
     }
 
-    inline void advanceAndUpdateLineNumber()
+    void advanceAndUpdateLineNumber()
     {
         if (m_fastPathFlags & Use8BitAdvance) {
             ASSERT(!m_pushedChar1);
@@ -253,18 +244,6 @@ public:
         (this->*m_advanceAndUpdateLineNumberFunc)();
     }
 
-    void advanceAndASSERT(UChar expectedCharacter)
-    {
-        ASSERT_UNUSED(expectedCharacter, currentChar() == expectedCharacter);
-        advance();
-    }
-
-    void advanceAndASSERTIgnoringCase(UChar expectedCharacter)
-    {
-        ASSERT_UNUSED(expectedCharacter, u_foldCase(currentChar(), U_FOLD_CASE_DEFAULT) == u_foldCase(expectedCharacter, U_FOLD_CASE_DEFAULT));
-        advance();
-    }
-
     void advancePastNonNewline()
     {
         ASSERT(currentChar() != '\n');
@@ -286,12 +265,6 @@ public:
         advanceAndUpdateLineNumberSlowCase();
     }
 
-    // Writes the consumed characters into consumedCharacters, which must
-    // have space for at least |count| characters.
-    void advance(unsigned count, UChar* consumedCharacters);
-
-    bool escaped() const { return m_pushedChar1; }
-
     int numberOfCharactersConsumed() const
     {
         int numberOfPushedCharacters = 0;
@@ -307,12 +280,12 @@ public:
 
     UChar currentChar() const { return m_currentChar; }    
 
-    // The method is moderately slow, comparing to currentLine method.
     OrdinalNumber currentColumn() const;
     OrdinalNumber currentLine() const;
-    // Sets value of line/column variables. Column is specified indirectly by a parameter columnAftreProlog
+
+    // Sets value of line/column variables. Column is specified indirectly by a parameter columnAfterProlog
     // which is a value of column that we should get after a prolog (first prologLength characters) has been consumed.
-    void setCurrentPosition(OrdinalNumber line, OrdinalNumber columnAftreProlog, int prologLength);
+    void setCurrentPosition(OrdinalNumber line, OrdinalNumber columnAfterProlog, int prologLength);
 
 private:
     enum FastPathFlags {
@@ -322,7 +295,7 @@ private:
     };
 
     void append(const SegmentedSubstring&);
-    void prepend(const SegmentedSubstring&);
+    void pushBack(const SegmentedSubstring&);
 
     void advance8();
     void advance16();
@@ -374,31 +347,12 @@ private:
         updateSlowCaseFunctionPointers();
     }
 
-    inline LookAheadResult lookAheadInline(const String& string, bool caseSensitive)
-    {
-        if (!m_pushedChar1 && string.length() <= static_cast<unsigned>(m_currentString.m_length)) {
-            String currentSubstring = m_currentString.currentSubString(string.length());
-            if (currentSubstring.startsWith(string, caseSensitive))
-                return DidMatch;
-            return DidNotMatch;
-        }
-        return lookAheadSlowCase(string, caseSensitive);
-    }
-    
-    LookAheadResult lookAheadSlowCase(const String& string, bool caseSensitive)
-    {
-        unsigned count = string.length();
-        if (count > length())
-            return NotEnoughCharacters;
-        UChar* consumedCharacters;
-        String consumedString = String::createUninitialized(count, consumedCharacters);
-        advance(count, consumedCharacters);
-        LookAheadResult result = DidNotMatch;
-        if (consumedString.startsWith(string, caseSensitive))
-            result = DidMatch;
-        prepend(SegmentedString(consumedString));
-        return result;
-    }
+    // Writes consumed characters into consumedCharacters, which must have space for at least |count| characters.
+    void advancePastNonNewlines(unsigned count);
+    void advancePastNonNewlines(unsigned count, UChar* consumedCharacters);
+
+    AdvancePastResult advancePast(const char* literal, unsigned length, bool caseSensitive);
+    AdvancePastResult advancePastSlowCase(const char* literal, bool caseSensitive);
 
     bool isComposite() const { return !m_substrings.isEmpty(); }
 
@@ -417,6 +371,27 @@ private:
     void (SegmentedString::*m_advanceAndUpdateLineNumberFunc)();
 };
 
+inline void SegmentedString::advancePastNonNewlines(unsigned count)
+{
+    for (unsigned i = 0; i < count; ++i)
+        advancePastNonNewline();
+}
+
+inline SegmentedString::AdvancePastResult SegmentedString::advancePast(const char* literal, unsigned length, bool caseSensitive)
+{
+    ASSERT(strlen(literal) == length);
+    ASSERT(!strchr(literal, '\n'));
+    if (!m_pushedChar1) {
+        if (length <= static_cast<unsigned>(m_currentString.m_length)) {
+            if (!m_currentString.currentSubString(length).startsWith(literal, caseSensitive))
+                return DidNotMatch;
+            advancePastNonNewlines(length);
+            return DidMatch;
+        }
+    }
+    return advancePastSlowCase(literal, caseSensitive);
+}
+
 }
 
 #endif
index 681eb33..23e0c02 100644 (file)
 
 namespace WebCore {
 
-inline bool isHexDigit(UChar cc)
-{
-    return (cc >= '0' && cc <= '9') || (cc >= 'a' && cc <= 'f') || (cc >= 'A' && cc <= 'F');
-}
-
 inline void unconsumeCharacters(SegmentedString& source, const StringBuilder& consumedCharacters)
 {
-    if (consumedCharacters.length() == 1)
-        source.push(consumedCharacters[0]);
-    else if (consumedCharacters.length() == 2) {
-        source.push(consumedCharacters[0]);
-        source.push(consumedCharacters[1]);
-    } else
-        source.prepend(SegmentedString(consumedCharacters.toStringPreserveCapacity()));
+    source.pushBack(SegmentedString(consumedCharacters.toStringPreserveCapacity()));
 }
 
 template <typename ParserFunctions>
@@ -54,7 +43,7 @@ bool consumeCharacterReference(SegmentedString& source, StringBuilder& decodedCh
     ASSERT(!notEnoughCharacters);
     ASSERT(decodedCharacter.isEmpty());
     
-    enum EntityState {
+    enum {
         Initial,
         Number,
         MaybeHexLowerCaseX,
@@ -62,111 +51,102 @@ bool consumeCharacterReference(SegmentedString& source, StringBuilder& decodedCh
         Hex,
         Decimal,
         Named
-    };
-    EntityState entityState = Initial;
+    } state = Initial;
     UChar32 result = 0;
     bool overflow = false;
     const UChar32 highestValidCharacter = 0x10FFFF;
     StringBuilder consumedCharacters;
     
     while (!source.isEmpty()) {
-        UChar cc = source.currentChar();
-        switch (entityState) {
-        case Initial: {
-            if (cc == '\x09' || cc == '\x0A' || cc == '\x0C' || cc == ' ' || cc == '<' || cc == '&')
+        UChar character = source.currentChar();
+        switch (state) {
+        case Initial:
+            if (character == '\x09' || character == '\x0A' || character == '\x0C' || character == ' ' || character == '<' || character == '&')
                 return false;
-            if (additionalAllowedCharacter && cc == additionalAllowedCharacter)
+            if (additionalAllowedCharacter && character == additionalAllowedCharacter)
                 return false;
-            if (cc == '#') {
-                entityState = Number;
+            if (character == '#') {
+                state = Number;
                 break;
             }
-            if ((cc >= 'a' && cc <= 'z') || (cc >= 'A' && cc <= 'Z')) {
-                entityState = Named;
-                continue;
+            if (isASCIIAlpha(character)) {
+                state = Named;
+                goto Named;
             }
             return false;
-        }
-        case Number: {
-            if (cc == 'x') {
-                entityState = MaybeHexLowerCaseX;
+        case Number:
+            if (character == 'x') {
+                state = MaybeHexLowerCaseX;
                 break;
             }
-            if (cc == 'X') {
-                entityState = MaybeHexUpperCaseX;
+            if (character == 'X') {
+                state = MaybeHexUpperCaseX;
                 break;
             }
-            if (cc >= '0' && cc <= '9') {
-                entityState = Decimal;
-                continue;
+            if (isASCIIDigit(character)) {
+                state = Decimal;
+                goto Decimal;
             }
-            source.push('#');
+            source.pushBack(SegmentedString(ASCIILiteral("#")));
             return false;
-        }
-        case MaybeHexLowerCaseX: {
-            if (isHexDigit(cc)) {
-                entityState = Hex;
-                continue;
+        case MaybeHexLowerCaseX:
+            if (isASCIIHexDigit(character)) {
+                state = Hex;
+                goto Hex;
             }
-            source.push('#');
-            source.push('x');
+            source.pushBack(SegmentedString(ASCIILiteral("#x")));
             return false;
-        }
-        case MaybeHexUpperCaseX: {
-            if (isHexDigit(cc)) {
-                entityState = Hex;
-                continue;
+        case MaybeHexUpperCaseX:
+            if (isASCIIHexDigit(character)) {
+                state = Hex;
+                goto Hex;
             }
-            source.push('#');
-            source.push('X');
+            source.pushBack(SegmentedString(ASCIILiteral("#X")));
             return false;
-        }
-        case Hex: {
-            if (cc >= '0' && cc <= '9')
-                result = result * 16 + cc - '0';
-            else if (cc >= 'a' && cc <= 'f')
-                result = result * 16 + 10 + cc - 'a';
-            else if (cc >= 'A' && cc <= 'F')
-                result = result * 16 + 10 + cc - 'A';
-            else if (cc == ';') {
-                source.advanceAndASSERT(cc);
+        case Hex:
+        Hex:
+            if (isASCIIHexDigit(character)) {
+                result = result * 16 + toASCIIHexValue(character);
+                if (result > highestValidCharacter)
+                    overflow = true;
+                break;
+            }
+            if (character == ';') {
+                source.advance();
                 decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
                 return true;
-            } else if (ParserFunctions::acceptMalformed()) {
+            }
+            if (ParserFunctions::acceptMalformed()) {
                 decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
                 return true;
-            } else {
-                unconsumeCharacters(source, consumedCharacters);
-                return false;
             }
-            if (result > highestValidCharacter)
-                overflow = true;
-            break;
-        }
-        case Decimal: {
-            if (cc >= '0' && cc <= '9')
-                result = result * 10 + cc - '0';
-            else if (cc == ';') {
-                source.advanceAndASSERT(cc);
+            unconsumeCharacters(source, consumedCharacters);
+            return false;
+        case Decimal:
+        Decimal:
+            if (isASCIIDigit(character)) {
+                result = result * 10 + character - '0';
+                if (result > highestValidCharacter)
+                    overflow = true;
+                break;
+            }
+            if (character == ';') {
+                source.advance();
                 decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
                 return true;
-            } else if (ParserFunctions::acceptMalformed()) {
+            }
+            if (ParserFunctions::acceptMalformed()) {
                 decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
                 return true;
-            } else {
-                unconsumeCharacters(source, consumedCharacters);
-                return false;
             }
-            if (result > highestValidCharacter)
-                overflow = true;
-            break;
-        }
-        case Named: {
-            return ParserFunctions::consumeNamedEntity(source, decodedCharacter, notEnoughCharacters, additionalAllowedCharacter, cc);
-        }
+            unconsumeCharacters(source, consumedCharacters);
+            return false;
+        case Named:
+        Named:
+            return ParserFunctions::consumeNamedEntity(source, decodedCharacter, notEnoughCharacters, additionalAllowedCharacter, character);
         }
-        consumedCharacters.append(cc);
-        source.advanceAndASSERT(cc);
+        consumedCharacters.append(character);
+        source.advance();
     }
     ASSERT(source.isEmpty());
     notEnoughCharacters = true;
index e0b3156..987510e 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2008 Apple Inc. All Rights Reserved.
+ * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved.
  * Copyright (C) 2009 Torch Mobile, Inc. http://www.torchmobile.com/
  * Copyright (C) 2010 Google, Inc. All Rights Reserved.
  *
 
 #include "SegmentedString.h"
 
+#if COMPILER(MSVC)
+// Disable the "unreachable code" warning so we can compile the ASSERT_NOT_REACHED in the END_STATE macro.
+#pragma warning(disable: 4702)
+#endif
+
 namespace WebCore {
 
-inline bool isTokenizerWhitespace(UChar cc)
+inline bool isTokenizerWhitespace(UChar character)
 {
-    return cc == ' ' || cc == '\x0A' || cc == '\x09' || cc == '\x0C';
+    return character == ' ' || character == '\x0A' || character == '\x09' || character == '\x0C';
 }
 
-inline void advanceStringAndASSERTIgnoringCase(SegmentedString& source, const char* expectedCharacters)
-{
-    while (*expectedCharacters)
-        source.advanceAndASSERTIgnoringCase(*expectedCharacters++);
-}
+#define BEGIN_STATE(stateName)                                  \
+    case stateName:                                             \
+    stateName: {                                                \
+        const auto currentState = stateName;                    \
+        UNUSED_PARAM(currentState);
 
-inline void advanceStringAndASSERT(SegmentedString& source, const char* expectedCharacters)
-{
-    while (*expectedCharacters)
-        source.advanceAndASSERT(*expectedCharacters++);
-}
-
-#if COMPILER(MSVC)
-// We need to disable the "unreachable code" warning because we want to assert
-// that some code points aren't reached in the state machine.
-#pragma warning(disable: 4702)
-#endif
+#define END_STATE()                                             \
+        ASSERT_NOT_REACHED();                                   \
+        break;                                                  \
+    }
 
-#define BEGIN_STATE(prefix, stateName) case prefix::stateName: stateName:
-#define END_STATE() ASSERT_NOT_REACHED(); break;
+#define RETURN_IN_CURRENT_STATE(expression)                     \
+    do {                                                        \
+        m_state = currentState;                                 \
+        return expression;                                      \
+    } while (false)
 
-// We use this macro when the HTML5 spec says "reconsume the current input
-// character in the <mumble> state."
-#define RECONSUME_IN(prefix, stateName)                                    \
-    do {                                                                   \
-        m_state = prefix::stateName;                                       \
-        goto stateName;                                                    \
+// We use this macro when the HTML spec says "reconsume the current input character in the <mumble> state."
+#define RECONSUME_IN(newState)                                  \
+    do {                                                        \
+        goto newState;                                          \
     } while (false)
 
-// We use this macro when the HTML5 spec says "consume the next input
-// character ... and switch to the <mumble> state."
-#define ADVANCE_TO(prefix, stateName)                                      \
-    do {                                                                   \
-        m_state = prefix::stateName;                                       \
-        if (!m_inputStreamPreprocessor.advance(source))                    \
-            return haveBufferedCharacterToken();                           \
-        cc = m_inputStreamPreprocessor.nextInputCharacter();               \
-        goto stateName;                                                    \
+// We use this macro when the HTML spec says "consume the next input character ... and switch to the <mumble> state."
+#define ADVANCE_TO(newState)                                    \
+    do {                                                        \
+        if (!m_preprocessor.advance(source, isNullCharacterSkippingState(newState))) { \
+            m_state = newState;                                 \
+            return haveBufferedCharacterToken();                \
+        }                                                       \
+        character = m_preprocessor.nextInputCharacter();        \
+        goto newState;                                          \
     } while (false)
 
-// Sometimes there's more complicated logic in the spec that separates when
-// we consume the next input character and when we switch to a particular
-// state. We handle those cases by advancing the source directly and using
-// this macro to switch to the indicated state.
-#define SWITCH_TO(prefix, stateName)                                       \
-    do {                                                                   \
-        m_state = prefix::stateName;                                       \
-        if (source.isEmpty() || !m_inputStreamPreprocessor.peek(source))   \
-            return haveBufferedCharacterToken();                           \
-        cc = m_inputStreamPreprocessor.nextInputCharacter();               \
-        goto stateName;                                                    \
+// For more complex cases, caller consumes the characters first and then uses this macro.
+#define SWITCH_TO(newState)                                     \
+    do {                                                        \
+        if (!m_preprocessor.peek(source, isNullCharacterSkippingState(newState))) { \
+            m_state = newState;                                 \
+            return haveBufferedCharacterToken();                \
+        }                                                       \
+        character = m_preprocessor.nextInputCharacter();        \
+        goto newState;                                          \
     } while (false)
 
 }