Modernize and streamline HTMLTokenizer
authordarin@apple.com <darin@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 12 Jan 2015 16:22:50 +0000 (16:22 +0000)
committerdarin@apple.com <darin@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 12 Jan 2015 16:22:50 +0000 (16:22 +0000)
https://bugs.webkit.org/show_bug.cgi?id=140166

Reviewed by Sam Weinig.

Source/WebCore:

* html/parser/AtomicHTMLToken.h:
(WebCore::AtomicHTMLToken::initializeAttributes): Removed unneeded assertions
based on fields I removed.

* html/parser/HTMLDocumentParser.cpp:
(WebCore::HTMLDocumentParser::HTMLDocumentParser): Change to use updateStateFor
to set the initial state when parsing a fragment, since it implements the same
rule taht the tokenizerStateForContextElement function did.
(WebCore::HTMLDocumentParser::pumpTokenizer): Updated to use the revised
interfaces for HTMLSourceTracker and HTMLTokenizer.
(WebCore::HTMLDocumentParser::constructTreeFromHTMLToken): Changed to take a
TokenPtr instead of an HTMLToken, so we can clear out the TokenPtr earlier
for non-character tokens, and let them get cleared later for character tokens.
(WebCore::HTMLDocumentParser::insert): Pass references.
(WebCore::HTMLDocumentParser::append): Ditto.
(WebCore::HTMLDocumentParser::appendCurrentInputStreamToPreloadScannerAndScan): Ditto.

* html/parser/HTMLDocumentParser.h: Updated argument type for constructTreeFromHTMLToken
and removed now-unneeded m_token data members.

* html/parser/HTMLEntityParser.cpp: Removed unneeded uses of the inline keyword.
(WebCore::HTMLEntityParser::consumeNamedEntity): Replaced two uses of
advanceAndASSERT with just plain advance; there's really no need to assert the
character is the one we just got out of the string.

* html/parser/HTMLInputStream.h: Moved the include of TextPosition.h here from
its old location since this class has two data members that are OrdinalNumber.

* html/parser/HTMLMetaCharsetParser.cpp:
(WebCore::HTMLMetaCharsetParser::HTMLMetaCharsetParser): Removed most of the
initialization, since it's now done by defaults.
(WebCore::extractCharset): Rewrote this to be a non-member function, and to
use a for loop, and to handle quote marks in a simpler way. Also changed it
to return a StringView so we don't have to allocate a new string.
(WebCore::HTMLMetaCharsetParser::processMeta): Use a modern for loop, and
also take a token argument since it's no longer a data member.
(WebCore::HTMLMetaCharsetParser::encodingFromMetaAttributes): Use a modern for
loop, StringView instead of string, and don't bother naming the local enum.
(WebCore::HTMLMetaCharsetParser::checkForMetaCharset): Updated for the new
way of getting tokens from the tokenizer.

* html/parser/HTMLMetaCharsetParser.h: Got rid of some data members and
tightened up the formatting a little. Don't bother allocating the tokenizer
on the heap.

* html/parser/HTMLPreloadScanner.cpp:
(WebCore::TokenPreloadScanner::TokenPreloadScanner): Removed unneeded
initialization.
(WebCore::HTMLPreloadScanner::HTMLPreloadScanner): Ditto.
(WebCore::HTMLPreloadScanner::scan): Changed to take a reference.

* html/parser/HTMLPreloadScanner.h: Removed unneeded includes, typedefs,
and forward declarations. Removed explicit declaration of the destructor,
since the default one works. Removed unused createCheckpoint and rewindTo
functions. Gave initial values for various data members. Marked the device
scale factor const beacuse it's set in the constructor and never changed.
Also removed the unneeded isSafeToSendToAnotherThread.

* html/parser/HTMLResourcePreloader.cpp:
(WebCore::PreloadRequest::isSafeToSendToAnotherThread): Deleted.

* html/parser/HTMLResourcePreloader.h:
(WebCore::PreloadRequest::PreloadRequest): Removed unneeded calls to
isolatedCopy. Also removed isSafeToSendToAnotherThread.

* html/parser/HTMLSourceTracker.cpp:
(WebCore::HTMLSourceTracker::startToken): Renamed. Changed to keep state
 in the source tracker itself, not the token.
(WebCore::HTMLSourceTracker::endToken): Ditto.
(WebCore::HTMLSourceTracker::source): Renamed. Changed to use the state
from the source tracker.

* html/parser/HTMLSourceTracker.h: Removed unneeded include of HTMLToken.h.
Renamed functions, removed now-unneeded comment.

* html/parser/HTMLToken.h: Cut down on the fields used by the source tracker.
It only needs to know the start and end of each attribute, not each part of
each attribute. Removed setBaseOffset, setEndOffset, length, addNewAttribute,
beginAttributeName, endAttributeName, beginAttributeValue, endAttributeValue,
m_baseOffset and m_length. Added beginAttribute and endAttribute.
(WebCore::HTMLToken::clear): No need to zero m_length or m_baseOffset any more.
(WebCore::HTMLToken::length): Deleted.
(WebCore::HTMLToken::setBaseOffset): Deleted.
(WebCore::HTMLToken::setEndOffset): Deleted.
(WebCore::HTMLToken::beginStartTag): Only null out m_currentAttribute if we
are compiling in assertions.
(WebCore::HTMLToken::beginEndTag): Ditto.
(WebCore::HTMLToken::addNewAttribute): Deleted.
(WebCore::HTMLToken::beginAttribute): Moved the code from addNewAttribute in
here and set the start offset.
(WebCore::HTMLToken::beginAttributeName): Deleted.
(WebCore::HTMLToken::endAttributeName): Deleted.
(WebCore::HTMLToken::beginAttributeValue): Deleted.
(WebCore::HTMLToken::endAttributeValue): Deleted.

* html/parser/HTMLTokenizer.cpp:
(WebCore::HTMLToken::endAttribute): Added. Sets the end offset.
(WebCore::HTMLToken::appendToAttributeName): Updated assertion.
(WebCore::HTMLToken::appendToAttributeValue): Ditto.
(WebCore::convertASCIIAlphaToLower): Renamed from toLowerCase and changed
so it's legal to call on lower case letters too.
(WebCore::vectorEqualsString): Changed to take a string literal rather than
a WTF::String.
(WebCore::HTMLTokenizer::inEndTagBufferingState): Made this a member function.
(WebCore::HTMLTokenizer::HTMLTokenizer): Updated for data member changes.
(WebCore::HTMLTokenizer::bufferASCIICharacter): Added. Optimized version of
bufferCharacter for the common case where we know the character is ASCII.
(WebCore::HTMLTokenizer::bufferCharacter): Moved this function here from the
header since it's only used inside the class.
(WebCore::HTMLTokenizer::emitAndResumeInDataState): Moved this here, renamed
it and removed the state argument.
(WebCore::HTMLTokenizer::emitAndReconsumeInDataState): Ditto.
(WebCore::HTMLTokenizer::emitEndOfFile): More of the same.
(WebCore::HTMLTokenizer::saveEndTagNameIfNeeded): Ditto.
(WebCore::HTMLTokenizer::haveBufferedCharacterToken): Ditto.
(WebCore::HTMLTokenizer::flushBufferedEndTag): Updated since m_token is now
the actual token, not just a pointer.
(WebCore::HTMLTokenizer::flushEmitAndResumeInDataState): Renamed this and
removed the state argument.
(WebCore::HTMLTokenizer::processToken): This function, formerly nextToken,
is now the internal function used by nextToken. Updated its contents to use
simpler macros, changed code to set m_state when returning, rather than
constantly setting it when cycling through states, switched style to use
early return/goto rather than lots of else statements, took out unneeded
braces now that BEGIN/END_STATE handles the braces, collapsed upper and
lower case letter handling in many states, changed lookAhead call sites to
use the new advancePast function instead.
(WebCore::HTMLTokenizer::updateStateFor): Set m_state directly instead of
calling a setstate function.
(WebCore::HTMLTokenizer::appendToTemporaryBuffer): Moved here from header.
(WebCore::HTMLTokenizer::temporaryBufferIs): Changed argument type to
a literal instead of a WTF::String.
(WebCore::HTMLTokenizer::appendToPossibleEndTag): Renamed and changed type
to be a UChar instead of LChar, although all characters will be ASCII.
(WebCore::HTMLTokenizer::isAppropriateEndTag): Marked const, and changed
type from size_t to unsigned.

* html/parser/HTMLTokenizer.h: Changed interface of nextToken so it returns
a TokenPtr so code doesn't have to understand special rules about when to
work with an HTMLToken and when to clear it. Made most functions private,
and made the State enum private as well. Replaced the state and setState
functions with more specific functions for the few states we need to deal
with outside the class. Moved function bodies outside the class definition
so it's easier to read the class definition.

* html/parser/HTMLTreeBuilder.cpp:
(WebCore::HTMLTreeBuilder::processStartTagForInBody): Updated to use the
new set state functions instead of setState.
(WebCore::HTMLTreeBuilder::processEndTag): Ditto.
(WebCore::HTMLTreeBuilder::processGenericRCDATAStartTag): Ditto.
(WebCore::HTMLTreeBuilder::processGenericRawTextStartTag): Ditto.
(WebCore::HTMLTreeBuilder::processScriptStartTag): Ditto.

* html/parser/InputStreamPreprocessor.h: Marked the constructor explicit,
and mde it take a reference rather than a pointer.

* html/parser/TextDocumentParser.cpp:
(WebCore::TextDocumentParser::insertFakePreElement): Updated to use the
new set state functions instead of setState.

* html/parser/XSSAuditor.cpp:
(WebCore::XSSAuditor::decodedSnippetForName): Updated for name change.
(WebCore::XSSAuditor::decodedSnippetForAttribute): Updated for changes to
attribute range tracking.
(WebCore::XSSAuditor::decodedSnippetForJavaScript): Updated for name change.
(WebCore::XSSAuditor::isSafeToSendToAnotherThread): Deleted.

* html/parser/XSSAuditor.h: Deleted isSafeToSendToAnotherThread.

* html/track/WebVTTTokenizer.cpp: Removed the local state variable from
WEBVTT_ADVANCE_TO; there is no need for it.
(WebCore::WebVTTTokenizer::WebVTTTokenizer): Use a reference instead of a
pointer for the preprocessor.
(WebCore::WebVTTTokenizer::nextToken): Ditto. Also removed the state local
variable and the switch statement, replacing with labels instead since we
go between states with goto.

* platform/text/SegmentedString.cpp:
(WebCore::SegmentedString::operator=): Changed the return type to be non-const
to match normal C++ design rules.
(WebCore::SegmentedString::pushBack): Renamed from prepend since this is not a
general purpose prepend function. Also fixed assertions to not use the strangely
named "escaped" function, since we are deleting it.
(WebCore::SegmentedString::append): Ditto.
(WebCore::SegmentedString::advancePastNonNewlines): Renamed from advance, since
the function only works for non-newlines.
(WebCore::SegmentedString::currentColumn): Got rid of unneeded local variable.
(WebCore::SegmentedString::advancePastSlowCase): Moved here from header and
renamed. This function now consumes the characters if they match.

* platform/text/SegmentedString.h: Made the changes mentioned above.
(WebCore::SegmentedString::excludeLineNumbers): Deleted.
(WebCore::SegmentedString::advancePast): Renamed from lookAhead. Also changed
behavior so the characters are consumed.
(WebCore::SegmentedString::advancePastIgnoringCase): Ditto.
(WebCore::SegmentedString::advanceAndASSERT): Deleted.
(WebCore::SegmentedString::advanceAndASSERTIgnoringCase): Deleted.
(WebCore::SegmentedString::escaped): Deleted.

* xml/parser/CharacterReferenceParserInlines.h:
(WebCore::isHexDigit): Deleted.
(WebCore::unconsumeCharacters): Updated for name change.
(WebCore::consumeCharacterReference): Removed unneeded name for local enum,
renamed local variable "cc" to character. Changed code to use helpers like
isASCIIAlpha and toASCIIHexValue. Removed unneeded use of advanceAndASSERT,
since we don't really need to assert the character we just extracted.

* xml/parser/MarkupTokenizerInlines.h:
(WebCore::isTokenizerWhitespace): Renamed argument to character.
(WebCore::advanceStringAndASSERTIgnoringCase): Deleted.
(WebCore::advanceStringAndASSERT): Deleted.
Changed all the macro implementations so they set m_state only when
returning from the function and just use goto inside the state machine.

Source/WTF:

* wtf/Forward.h: Removed PassRef, added OrdinalNumber and TextPosition.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@178265 268f45cc-cd09-0410-ab3c-d52691b4dbfc

30 files changed:
Source/WTF/ChangeLog
Source/WTF/wtf/Forward.h
Source/WebCore/ChangeLog
Source/WebCore/html/parser/AtomicHTMLToken.h
Source/WebCore/html/parser/HTMLDocumentParser.cpp
Source/WebCore/html/parser/HTMLDocumentParser.h
Source/WebCore/html/parser/HTMLEntityParser.cpp
Source/WebCore/html/parser/HTMLInputStream.h
Source/WebCore/html/parser/HTMLMetaCharsetParser.cpp
Source/WebCore/html/parser/HTMLMetaCharsetParser.h
Source/WebCore/html/parser/HTMLPreloadScanner.cpp
Source/WebCore/html/parser/HTMLPreloadScanner.h
Source/WebCore/html/parser/HTMLResourcePreloader.cpp
Source/WebCore/html/parser/HTMLResourcePreloader.h
Source/WebCore/html/parser/HTMLSourceTracker.cpp
Source/WebCore/html/parser/HTMLSourceTracker.h
Source/WebCore/html/parser/HTMLToken.h
Source/WebCore/html/parser/HTMLTokenizer.cpp
Source/WebCore/html/parser/HTMLTokenizer.h
Source/WebCore/html/parser/HTMLTreeBuilder.cpp
Source/WebCore/html/parser/InputStreamPreprocessor.h
Source/WebCore/html/parser/TextDocumentParser.cpp
Source/WebCore/html/parser/XSSAuditor.cpp
Source/WebCore/html/parser/XSSAuditor.h
Source/WebCore/html/track/WebVTTTokenizer.cpp
Source/WebCore/html/track/WebVTTTokenizer.h
Source/WebCore/platform/text/SegmentedString.cpp
Source/WebCore/platform/text/SegmentedString.h
Source/WebCore/xml/parser/CharacterReferenceParserInlines.h
Source/WebCore/xml/parser/MarkupTokenizerInlines.h

index 989712b6b7534688da232bef4033593dcc405adb..b152e854d25dd6c3b96b177572ffa04dbdc68b42 100644 (file)
@@ -1,3 +1,12 @@
+2015-01-12  Darin Adler  <darin@apple.com>
+
+        Modernize and streamline HTMLTokenizer
+        https://bugs.webkit.org/show_bug.cgi?id=140166
+
+        Reviewed by Sam Weinig.
+
+        * wtf/Forward.h: Removed PassRef, added OrdinalNumber and TextPosition.
+
 2015-01-09  Commit Queue  <commit-queue@webkit.org>
 
         Unreviewed, rolling out r178154, r178163, and r178164.
index 49a92c528e15ff997ea333f66e7aabab2c0cf6d4..6aefa1def2cdf089423ce0d12fcf2dd556879e18 100644 (file)
@@ -30,7 +30,6 @@ template<typename T> class LazyNeverDestroyed;
 template<typename T> class NeverDestroyed;
 template<typename T> class OwnPtr;
 template<typename T> class PassOwnPtr;
-template<typename T> class PassRef;
 template<typename T> class PassRefPtr;
 template<typename T> class RefPtr;
 template<typename T> class Ref;
@@ -45,11 +44,13 @@ class CString;
 class Decoder;
 class Encoder;
 class FunctionDispatcher;
+class OrdinalNumber;
 class PrintStream;
 class String;
 class StringBuilder;
 class StringImpl;
 class StringView;
+class TextPosition;
 
 }
 
@@ -63,9 +64,9 @@ using WTF::Function;
 using WTF::FunctionDispatcher;
 using WTF::LazyNeverDestroyed;
 using WTF::NeverDestroyed;
+using WTF::OrdinalNumber;
 using WTF::OwnPtr;
 using WTF::PassOwnPtr;
-using WTF::PassRef;
 using WTF::PassRefPtr;
 using WTF::PrintStream;
 using WTF::Ref;
@@ -75,6 +76,7 @@ using WTF::StringBuffer;
 using WTF::StringBuilder;
 using WTF::StringImpl;
 using WTF::StringView;
+using WTF::TextPosition;
 using WTF::Vector;
 
 #endif // WTF_Forward_h
index 18cae71bdadca048d681ebfcd0289860c66c580e..a16d20d8ce1cc3eead0508addd2424cd08a6437d 100644 (file)
@@ -1,3 +1,224 @@
+2015-01-12  Darin Adler  <darin@apple.com>
+
+        Modernize and streamline HTMLTokenizer
+        https://bugs.webkit.org/show_bug.cgi?id=140166
+
+        Reviewed by Sam Weinig.
+
+        * html/parser/AtomicHTMLToken.h:
+        (WebCore::AtomicHTMLToken::initializeAttributes): Removed unneeded assertions
+        based on fields I removed.
+
+        * html/parser/HTMLDocumentParser.cpp:
+        (WebCore::HTMLDocumentParser::HTMLDocumentParser): Change to use updateStateFor
+        to set the initial state when parsing a fragment, since it implements the same
+        rule taht the tokenizerStateForContextElement function did.
+        (WebCore::HTMLDocumentParser::pumpTokenizer): Updated to use the revised
+        interfaces for HTMLSourceTracker and HTMLTokenizer.
+        (WebCore::HTMLDocumentParser::constructTreeFromHTMLToken): Changed to take a
+        TokenPtr instead of an HTMLToken, so we can clear out the TokenPtr earlier
+        for non-character tokens, and let them get cleared later for character tokens.
+        (WebCore::HTMLDocumentParser::insert): Pass references.
+        (WebCore::HTMLDocumentParser::append): Ditto.
+        (WebCore::HTMLDocumentParser::appendCurrentInputStreamToPreloadScannerAndScan): Ditto.
+
+        * html/parser/HTMLDocumentParser.h: Updated argument type for constructTreeFromHTMLToken
+        and removed now-unneeded m_token data members.
+
+        * html/parser/HTMLEntityParser.cpp: Removed unneeded uses of the inline keyword.
+        (WebCore::HTMLEntityParser::consumeNamedEntity): Replaced two uses of
+        advanceAndASSERT with just plain advance; there's really no need to assert the
+        character is the one we just got out of the string.
+
+        * html/parser/HTMLInputStream.h: Moved the include of TextPosition.h here from
+        its old location since this class has two data members that are OrdinalNumber.
+
+        * html/parser/HTMLMetaCharsetParser.cpp:
+        (WebCore::HTMLMetaCharsetParser::HTMLMetaCharsetParser): Removed most of the
+        initialization, since it's now done by defaults.
+        (WebCore::extractCharset): Rewrote this to be a non-member function, and to
+        use a for loop, and to handle quote marks in a simpler way. Also changed it
+        to return a StringView so we don't have to allocate a new string.
+        (WebCore::HTMLMetaCharsetParser::processMeta): Use a modern for loop, and
+        also take a token argument since it's no longer a data member.
+        (WebCore::HTMLMetaCharsetParser::encodingFromMetaAttributes): Use a modern for
+        loop, StringView instead of string, and don't bother naming the local enum.
+        (WebCore::HTMLMetaCharsetParser::checkForMetaCharset): Updated for the new
+        way of getting tokens from the tokenizer.
+
+        * html/parser/HTMLMetaCharsetParser.h: Got rid of some data members and
+        tightened up the formatting a little. Don't bother allocating the tokenizer
+        on the heap.
+
+        * html/parser/HTMLPreloadScanner.cpp:
+        (WebCore::TokenPreloadScanner::TokenPreloadScanner): Removed unneeded
+        initialization.
+        (WebCore::HTMLPreloadScanner::HTMLPreloadScanner): Ditto.
+        (WebCore::HTMLPreloadScanner::scan): Changed to take a reference.
+
+        * html/parser/HTMLPreloadScanner.h: Removed unneeded includes, typedefs,
+        and forward declarations. Removed explicit declaration of the destructor,
+        since the default one works. Removed unused createCheckpoint and rewindTo
+        functions. Gave initial values for various data members. Marked the device
+        scale factor const beacuse it's set in the constructor and never changed.
+        Also removed the unneeded isSafeToSendToAnotherThread.
+
+        * html/parser/HTMLResourcePreloader.cpp:
+        (WebCore::PreloadRequest::isSafeToSendToAnotherThread): Deleted.
+
+        * html/parser/HTMLResourcePreloader.h:
+        (WebCore::PreloadRequest::PreloadRequest): Removed unneeded calls to
+        isolatedCopy. Also removed isSafeToSendToAnotherThread.
+
+        * html/parser/HTMLSourceTracker.cpp:
+        (WebCore::HTMLSourceTracker::startToken): Renamed. Changed to keep state
+         in the source tracker itself, not the token.
+        (WebCore::HTMLSourceTracker::endToken): Ditto.
+        (WebCore::HTMLSourceTracker::source): Renamed. Changed to use the state
+        from the source tracker.
+
+        * html/parser/HTMLSourceTracker.h: Removed unneeded include of HTMLToken.h.
+        Renamed functions, removed now-unneeded comment.
+
+        * html/parser/HTMLToken.h: Cut down on the fields used by the source tracker.
+        It only needs to know the start and end of each attribute, not each part of
+        each attribute. Removed setBaseOffset, setEndOffset, length, addNewAttribute,
+        beginAttributeName, endAttributeName, beginAttributeValue, endAttributeValue,
+        m_baseOffset and m_length. Added beginAttribute and endAttribute.
+        (WebCore::HTMLToken::clear): No need to zero m_length or m_baseOffset any more.
+        (WebCore::HTMLToken::length): Deleted.
+        (WebCore::HTMLToken::setBaseOffset): Deleted.
+        (WebCore::HTMLToken::setEndOffset): Deleted.
+        (WebCore::HTMLToken::beginStartTag): Only null out m_currentAttribute if we
+        are compiling in assertions.
+        (WebCore::HTMLToken::beginEndTag): Ditto.
+        (WebCore::HTMLToken::addNewAttribute): Deleted.
+        (WebCore::HTMLToken::beginAttribute): Moved the code from addNewAttribute in
+        here and set the start offset.
+        (WebCore::HTMLToken::beginAttributeName): Deleted.
+        (WebCore::HTMLToken::endAttributeName): Deleted.
+        (WebCore::HTMLToken::beginAttributeValue): Deleted.
+        (WebCore::HTMLToken::endAttributeValue): Deleted.
+
+        * html/parser/HTMLTokenizer.cpp:
+        (WebCore::HTMLToken::endAttribute): Added. Sets the end offset.
+        (WebCore::HTMLToken::appendToAttributeName): Updated assertion.
+        (WebCore::HTMLToken::appendToAttributeValue): Ditto.
+        (WebCore::convertASCIIAlphaToLower): Renamed from toLowerCase and changed
+        so it's legal to call on lower case letters too.
+        (WebCore::vectorEqualsString): Changed to take a string literal rather than
+        a WTF::String.
+        (WebCore::HTMLTokenizer::inEndTagBufferingState): Made this a member function.
+        (WebCore::HTMLTokenizer::HTMLTokenizer): Updated for data member changes.
+        (WebCore::HTMLTokenizer::bufferASCIICharacter): Added. Optimized version of
+        bufferCharacter for the common case where we know the character is ASCII.
+        (WebCore::HTMLTokenizer::bufferCharacter): Moved this function here from the
+        header since it's only used inside the class.
+        (WebCore::HTMLTokenizer::emitAndResumeInDataState): Moved this here, renamed
+        it and removed the state argument.
+        (WebCore::HTMLTokenizer::emitAndReconsumeInDataState): Ditto.
+        (WebCore::HTMLTokenizer::emitEndOfFile): More of the same.
+        (WebCore::HTMLTokenizer::saveEndTagNameIfNeeded): Ditto.
+        (WebCore::HTMLTokenizer::haveBufferedCharacterToken): Ditto.
+        (WebCore::HTMLTokenizer::flushBufferedEndTag): Updated since m_token is now
+        the actual token, not just a pointer.
+        (WebCore::HTMLTokenizer::flushEmitAndResumeInDataState): Renamed this and
+        removed the state argument.
+        (WebCore::HTMLTokenizer::processToken): This function, formerly nextToken,
+        is now the internal function used by nextToken. Updated its contents to use
+        simpler macros, changed code to set m_state when returning, rather than
+        constantly setting it when cycling through states, switched style to use
+        early return/goto rather than lots of else statements, took out unneeded
+        braces now that BEGIN/END_STATE handles the braces, collapsed upper and
+        lower case letter handling in many states, changed lookAhead call sites to
+        use the new advancePast function instead.
+        (WebCore::HTMLTokenizer::updateStateFor): Set m_state directly instead of
+        calling a setstate function.
+        (WebCore::HTMLTokenizer::appendToTemporaryBuffer): Moved here from header.
+        (WebCore::HTMLTokenizer::temporaryBufferIs): Changed argument type to
+        a literal instead of a WTF::String.
+        (WebCore::HTMLTokenizer::appendToPossibleEndTag): Renamed and changed type
+        to be a UChar instead of LChar, although all characters will be ASCII.
+        (WebCore::HTMLTokenizer::isAppropriateEndTag): Marked const, and changed
+        type from size_t to unsigned.
+
+        * html/parser/HTMLTokenizer.h: Changed interface of nextToken so it returns
+        a TokenPtr so code doesn't have to understand special rules about when to
+        work with an HTMLToken and when to clear it. Made most functions private,
+        and made the State enum private as well. Replaced the state and setState
+        functions with more specific functions for the few states we need to deal
+        with outside the class. Moved function bodies outside the class definition
+        so it's easier to read the class definition.
+
+        * html/parser/HTMLTreeBuilder.cpp:
+        (WebCore::HTMLTreeBuilder::processStartTagForInBody): Updated to use the
+        new set state functions instead of setState.
+        (WebCore::HTMLTreeBuilder::processEndTag): Ditto.
+        (WebCore::HTMLTreeBuilder::processGenericRCDATAStartTag): Ditto.
+        (WebCore::HTMLTreeBuilder::processGenericRawTextStartTag): Ditto.
+        (WebCore::HTMLTreeBuilder::processScriptStartTag): Ditto.
+
+        * html/parser/InputStreamPreprocessor.h: Marked the constructor explicit,
+        and mde it take a reference rather than a pointer.
+
+        * html/parser/TextDocumentParser.cpp:
+        (WebCore::TextDocumentParser::insertFakePreElement): Updated to use the
+        new set state functions instead of setState.
+
+        * html/parser/XSSAuditor.cpp:
+        (WebCore::XSSAuditor::decodedSnippetForName): Updated for name change.
+        (WebCore::XSSAuditor::decodedSnippetForAttribute): Updated for changes to
+        attribute range tracking.
+        (WebCore::XSSAuditor::decodedSnippetForJavaScript): Updated for name change.
+        (WebCore::XSSAuditor::isSafeToSendToAnotherThread): Deleted.
+
+        * html/parser/XSSAuditor.h: Deleted isSafeToSendToAnotherThread.
+
+        * html/track/WebVTTTokenizer.cpp: Removed the local state variable from
+        WEBVTT_ADVANCE_TO; there is no need for it.
+        (WebCore::WebVTTTokenizer::WebVTTTokenizer): Use a reference instead of a
+        pointer for the preprocessor.
+        (WebCore::WebVTTTokenizer::nextToken): Ditto. Also removed the state local
+        variable and the switch statement, replacing with labels instead since we
+        go between states with goto.
+
+        * platform/text/SegmentedString.cpp:
+        (WebCore::SegmentedString::operator=): Changed the return type to be non-const
+        to match normal C++ design rules.
+        (WebCore::SegmentedString::pushBack): Renamed from prepend since this is not a
+        general purpose prepend function. Also fixed assertions to not use the strangely
+        named "escaped" function, since we are deleting it.
+        (WebCore::SegmentedString::append): Ditto.
+        (WebCore::SegmentedString::advancePastNonNewlines): Renamed from advance, since
+        the function only works for non-newlines.
+        (WebCore::SegmentedString::currentColumn): Got rid of unneeded local variable.
+        (WebCore::SegmentedString::advancePastSlowCase): Moved here from header and
+        renamed. This function now consumes the characters if they match.
+
+        * platform/text/SegmentedString.h: Made the changes mentioned above.
+        (WebCore::SegmentedString::excludeLineNumbers): Deleted.
+        (WebCore::SegmentedString::advancePast): Renamed from lookAhead. Also changed
+        behavior so the characters are consumed.
+        (WebCore::SegmentedString::advancePastIgnoringCase): Ditto.
+        (WebCore::SegmentedString::advanceAndASSERT): Deleted.
+        (WebCore::SegmentedString::advanceAndASSERTIgnoringCase): Deleted.
+        (WebCore::SegmentedString::escaped): Deleted.
+
+        * xml/parser/CharacterReferenceParserInlines.h:
+        (WebCore::isHexDigit): Deleted.
+        (WebCore::unconsumeCharacters): Updated for name change.
+        (WebCore::consumeCharacterReference): Removed unneeded name for local enum,
+        renamed local variable "cc" to character. Changed code to use helpers like
+        isASCIIAlpha and toASCIIHexValue. Removed unneeded use of advanceAndASSERT,
+        since we don't really need to assert the character we just extracted.
+
+        * xml/parser/MarkupTokenizerInlines.h:
+        (WebCore::isTokenizerWhitespace): Renamed argument to character.
+        (WebCore::advanceStringAndASSERTIgnoringCase): Deleted.
+        (WebCore::advanceStringAndASSERT): Deleted.
+        Changed all the macro implementations so they set m_state only when
+        returning from the function and just use goto inside the state machine.
+
 2015-01-11  Andreas Kling  <akling@apple.com>
 
         Enable Vector bounds checking for ElementDescendantIterator.
index 5e61fba1b8d0c8ba2f5bb4190a6dd331f7775731..cc5a21555b0858ee36a202d9521c3aebff53af55 100644 (file)
@@ -191,11 +191,6 @@ inline void AtomicHTMLToken::initializeAttributes(const HTMLToken::AttributeList
         if (attribute.name.isEmpty())
             continue;
 
-        ASSERT(attribute.nameRange.start);
-        ASSERT(attribute.nameRange.end);
-        ASSERT(attribute.valueRange.start);
-        ASSERT(attribute.valueRange.end);
-
         QualifiedName name(nullAtom, AtomicString(attribute.name), nullAtom);
 
         // FIXME: This is N^2 for the number of attributes.
index d9643cd2ad29db4a99652cc3ae3c1db6fb18668f..812c9c5be00b06ee24bbbefc72d7ce8c434f1dc1 100644 (file)
@@ -39,28 +39,6 @@ namespace WebCore {
 
 using namespace HTMLNames;
 
-// This is a direct transcription of step 4 from:
-// https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments
-static HTMLTokenizer::State tokenizerStateForContextElement(Element& contextElement, bool reportErrors, const HTMLParserOptions& options)
-{
-    const QualifiedName& contextTag = contextElement.tagQName();
-
-    if (contextTag.matches(titleTag) || contextTag.matches(textareaTag))
-        return HTMLTokenizer::RCDATAState;
-    if (contextTag.matches(styleTag)
-        || contextTag.matches(xmpTag)
-        || contextTag.matches(iframeTag)
-        || (contextTag.matches(noembedTag) && options.pluginsEnabled)
-        || (contextTag.matches(noscriptTag) && options.scriptEnabled)
-        || contextTag.matches(noframesTag))
-        return reportErrors ? HTMLTokenizer::RAWTEXTState : HTMLTokenizer::PLAINTEXTState;
-    if (contextTag.matches(scriptTag))
-        return reportErrors ? HTMLTokenizer::ScriptDataState : HTMLTokenizer::PLAINTEXTState;
-    if (contextTag.matches(plaintextTag))
-        return HTMLTokenizer::PLAINTEXTState;
-    return HTMLTokenizer::DataState;
-}
-
 HTMLDocumentParser::HTMLDocumentParser(HTMLDocument& document)
     : ScriptableDocumentParser(document)
     , m_options(document)
@@ -85,8 +63,9 @@ inline HTMLDocumentParser::HTMLDocumentParser(DocumentFragment& fragment, Elemen
     , m_treeBuilder(std::make_unique<HTMLTreeBuilder>(*this, fragment, contextElement, parserContentPolicy(), m_options))
     , m_xssAuditorDelegate(fragment.document())
 {
-    bool reportErrors = false; // For now document fragment parsing never reports errors.
-    m_tokenizer.setState(tokenizerStateForContextElement(contextElement, reportErrors, m_options));
+    // https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments
+    if (contextElement.isHTMLElement())
+        m_tokenizer.updateStateFor(contextElement.tagQName().localName());
     m_xssAuditor.initForFragment();
 }
 
@@ -279,22 +258,22 @@ void HTMLDocumentParser::pumpTokenizer(SynchronousMode mode)
 
     while (canTakeNextToken(mode, session) && !session.needsYield) {
         if (!isParsingFragment())
-            m_sourceTracker.start(m_input.current(), &m_tokenizer, m_token);
+            m_sourceTracker.startToken(m_input.current(), m_tokenizer);
 
-        if (!m_tokenizer.nextToken(m_input.current(), m_token))
+        auto token = m_tokenizer.nextToken(m_input.current());
+        if (!token)
             break;
 
         if (!isParsingFragment()) {
-            m_sourceTracker.end(m_input.current(), &m_tokenizer, m_token);
+            m_sourceTracker.endToken(m_input.current(), m_tokenizer);
 
             // We do not XSS filter innerHTML, which means we (intentionally) fail
             // http/tests/security/xssAuditor/dom-write-innerHTML.html
-            if (auto xssInfo = m_xssAuditor.filterToken(FilterTokenRequest(m_token, m_sourceTracker, m_tokenizer.shouldAllowCDATA())))
+            if (auto xssInfo = m_xssAuditor.filterToken(FilterTokenRequest(*token, m_sourceTracker, m_tokenizer.shouldAllowCDATA())))
                 m_xssAuditorDelegate.didBlockScript(*xssInfo);
         }
 
-        constructTreeFromHTMLToken(m_token);
-        ASSERT(m_token.type() == HTMLToken::Uninitialized);
+        constructTreeFromHTMLToken(token);
     }
 
     // Ensure we haven't been totally deref'ed after pumping. Any caller of this
@@ -308,20 +287,20 @@ void HTMLDocumentParser::pumpTokenizer(SynchronousMode mode)
         m_parserScheduler->scheduleForResume();
 
     if (isWaitingForScripts()) {
-        ASSERT(m_tokenizer.state() == HTMLTokenizer::DataState);
+        ASSERT(m_tokenizer.isInDataState());
         if (!m_preloadScanner) {
             m_preloadScanner = std::make_unique<HTMLPreloadScanner>(m_options, document()->url(), document()->deviceScaleFactor());
             m_preloadScanner->appendToEnd(m_input.current());
         }
-        m_preloadScanner->scan(m_preloader.get(), *document());
+        m_preloadScanner->scan(*m_preloader, *document());
     }
 
     InspectorInstrumentation::didWriteHTML(cookie, m_input.current().currentLine().zeroBasedInt());
 }
 
-void HTMLDocumentParser::constructTreeFromHTMLToken(HTMLToken& rawToken)
+void HTMLDocumentParser::constructTreeFromHTMLToken(HTMLTokenizer::TokenPtr& rawToken)
 {
-    AtomicHTMLToken token(rawToken);
+    AtomicHTMLToken token(*rawToken);
 
     // We clear the rawToken in case constructTreeFromAtomicToken
     // synchronously re-enters the parser. We don't clear the token immedately
@@ -333,15 +312,13 @@ void HTMLDocumentParser::constructTreeFromHTMLToken(HTMLToken& rawToken)
     // FIXME: Stop clearing the rawToken once we start running the parser off
     // the main thread or once we stop allowing synchronous JavaScript
     // execution from parseAttribute.
-    if (rawToken.type() != HTMLToken::Character)
+    if (rawToken->type() != HTMLToken::Character) {
+        // Clearing the TokenPtr makes sure we don't clear the HTMLToken a second time
+        // later when the TokenPtr is destroyed.
         rawToken.clear();
+    }
 
     m_treeBuilder->constructTree(token);
-
-    if (rawToken.type() != HTMLToken::Uninitialized) {
-        ASSERT(rawToken.type() == HTMLToken::Character);
-        rawToken.clear();
-    }
 }
 
 bool HTMLDocumentParser::hasInsertionPoint()
@@ -373,7 +350,7 @@ void HTMLDocumentParser::insert(const SegmentedString& source)
         if (!m_insertionPreloadScanner)
             m_insertionPreloadScanner = std::make_unique<HTMLPreloadScanner>(m_options, document()->url(), document()->deviceScaleFactor());
         m_insertionPreloadScanner->appendToEnd(source);
-        m_insertionPreloadScanner->scan(m_preloader.get(), *document());
+        m_insertionPreloadScanner->scan(*m_preloader, *document());
     }
 
     endIfDelayed();
@@ -398,7 +375,7 @@ void HTMLDocumentParser::append(PassRefPtr<StringImpl> inputSource)
         } else {
             m_preloadScanner->appendToEnd(source);
             if (isWaitingForScripts())
-                m_preloadScanner->scan(m_preloader.get(), *document());
+                m_preloadScanner->scan(*m_preloader, *document());
         }
     }
 
@@ -533,7 +510,7 @@ void HTMLDocumentParser::appendCurrentInputStreamToPreloadScannerAndScan()
 {
     ASSERT(m_preloadScanner);
     m_preloadScanner->appendToEnd(m_input.current());
-    m_preloadScanner->scan(m_preloader.get(), *document());
+    m_preloadScanner->scan(*m_preloader, *document());
 }
 
 void HTMLDocumentParser::notifyFinished(CachedResource* cachedResource)
index 44631fd22f577ceb2cc6d2a37b3e8a368a2eea16..fe0435ad7b742843054fbafd94a96eff4077d567 100644 (file)
@@ -103,7 +103,7 @@ private:
     bool canTakeNextToken(SynchronousMode, PumpSession&);
     void pumpTokenizer(SynchronousMode);
     void pumpTokenizerIfPossible(SynchronousMode);
-    void constructTreeFromHTMLToken(HTMLToken&);
+    void constructTreeFromHTMLToken(HTMLTokenizer::TokenPtr&);
 
     void runScriptsForPausedTreeBuilder();
     void resumeParsingAfterScriptExecution();
@@ -121,7 +121,6 @@ private:
     HTMLParserOptions m_options;
     HTMLInputStream m_input;
 
-    HTMLToken m_token;
     HTMLTokenizer m_tokenizer;
     std::unique_ptr<HTMLScriptRunner> m_scriptRunner;
     std::unique_ptr<HTMLTreeBuilder> m_treeBuilder;
index a0160506cc8497f98ca9b69bd809ed40fb7916c4..dfdfd6c18b7a24539f056f5c6c4f5dd79a8bbef9 100644 (file)
@@ -60,9 +60,9 @@ public:
         return windowsLatin1ExtensionArray[value - 0x80];
     }
 
-    inline static bool acceptMalformed() { return true; }
+    static bool acceptMalformed() { return true; }
 
-    inline static bool consumeNamedEntity(SegmentedString& source, StringBuilder& decodedEntity, bool& notEnoughCharacters, UChar additionalAllowedCharacter, UChar& cc)
+    static bool consumeNamedEntity(SegmentedString& source, StringBuilder& decodedEntity, bool& notEnoughCharacters, UChar additionalAllowedCharacter, UChar& cc)
     {
         StringBuilder consumedCharacters;
         HTMLEntitySearch entitySearch;
@@ -72,7 +72,7 @@ public:
             if (!entitySearch.isEntityPrefix())
                 break;
             consumedCharacters.append(cc);
-            source.advanceAndASSERT(cc);
+            source.advance();
         }
         notEnoughCharacters = source.isEmpty();
         if (notEnoughCharacters) {
@@ -97,7 +97,7 @@ public:
                 cc = source.currentChar();
                 ASSERT_UNUSED(reference, cc == *reference++);
                 consumedCharacters.append(cc);
-                source.advanceAndASSERT(cc);
+                source.advance();
                 ASSERT(!source.isEmpty());
             }
             cc = source.currentChar();
index a7b86b3baff0f1f550eba8adb2ad4cd7c3fad35e..e738f5f3480d37468c3dda4eda511389c9a6fa2c 100644 (file)
@@ -28,6 +28,7 @@
 
 #include "InputStreamPreprocessor.h"
 #include "SegmentedString.h"
+#include <wtf/text/TextPosition.h>
 
 namespace WebCore {
 
index 11e14a46e16e3f41d40cd33f800225bf0b6a6bff..752dbeedb1778e81bb64c2b775b06263b5060297 100644 (file)
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2010 Google Inc. All Rights Reserved.
+ * Copyright (C) 2015 Apple Inc. All Rights Reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
 
 #include "HTMLNames.h"
 #include "HTMLParserIdioms.h"
-#include "HTMLTokenizer.h"
-#include "TextCodec.h"
 #include "TextEncodingRegistry.h"
 
-using namespace WTF;
-
 namespace WebCore {
 
 using namespace HTMLNames;
 
 HTMLMetaCharsetParser::HTMLMetaCharsetParser()
-    : m_tokenizer(std::make_unique<HTMLTokenizer>(HTMLParserOptions()))
-    , m_assumedCodec(newTextCodec(Latin1Encoding()))
-    , m_inHeadSection(true)
-    , m_doneChecking(false)
-{
-}
-
-HTMLMetaCharsetParser::~HTMLMetaCharsetParser()
+    : m_codec(newTextCodec(Latin1Encoding()))
 {
 }
 
-static const char charsetString[] = "charset";
-static const size_t charsetLength = sizeof("charset") - 1;
-
-String HTMLMetaCharsetParser::extractCharset(const String& value)
+static StringView extractCharset(const String& value)
 {
-    size_t pos = 0;
     unsigned length = value.length();
-
-    while (pos < length) {
-        pos = value.find(charsetString, pos, false);
+    for (size_t pos = 0; pos < length; ) {
+        pos = value.find("charset", pos, false);
         if (pos == notFound)
             break;
 
+        static const size_t charsetLength = sizeof("charset") - 1;
         pos += charsetLength;
 
         // Skip whitespace.
@@ -77,12 +63,10 @@ String HTMLMetaCharsetParser::extractCharset(const String& value)
         while (pos < length && value[pos] <= ' ')
             ++pos;
 
-        char quoteMark = 0;
-        if (pos < length && (value[pos] == '"' || value[pos] == '\'')) {
-            quoteMark = static_cast<char>(value[pos++]);
-            ASSERT(!(quoteMark & 0x80));
-        }
-            
+        UChar quoteMark = 0;
+        if (pos < length && (value[pos] == '"' || value[pos] == '\''))
+            quoteMark = value[pos++];
+
         if (pos == length)
             break;
 
@@ -93,19 +77,17 @@ String HTMLMetaCharsetParser::extractCharset(const String& value)
         if (quoteMark && (end == length))
             break; // Close quote not found.
 
-        return value.substring(pos, end - pos);
+        return StringView(value).substring(pos, end - pos);
     }
-
-    return "";
+    return StringView();
 }
 
-bool HTMLMetaCharsetParser::processMeta()
+bool HTMLMetaCharsetParser::processMeta(HTMLToken& token)
 {
-    const HTMLToken::AttributeList& tokenAttributes = m_token.attributes();
     AttributeList attributes;
-    for (HTMLToken::AttributeList::const_iterator iter = tokenAttributes.begin(); iter != tokenAttributes.end(); ++iter) {
-        String attributeName = StringImpl::create8BitIfPossible(iter->name);
-        String attributeValue = StringImpl::create8BitIfPossible(iter->value);
+    for (auto& attribute : token.attributes()) {
+        String attributeName = StringImpl::create8BitIfPossible(attribute.name);
+        String attributeValue = StringImpl::create8BitIfPossible(attribute.value);
         attributes.append(std::make_pair(attributeName, attributeValue));
     }
 
@@ -116,12 +98,12 @@ bool HTMLMetaCharsetParser::processMeta()
 TextEncoding HTMLMetaCharsetParser::encodingFromMetaAttributes(const AttributeList& attributes)
 {
     bool gotPragma = false;
-    Mode mode = None;
-    String charset;
+    enum { None, Charset, Pragma } mode = None;
+    StringView charset;
 
-    for (AttributeList::const_iterator iter = attributes.begin(); iter != attributes.end(); ++iter) {
-        const AtomicString& attributeName = iter->first;
-        const String& attributeValue = iter->second;
+    for (auto& attribute : attributes) {
+        const String& attributeName = attribute.first;
+        const String& attributeValue = attribute.second;
 
         if (attributeName == http_equivAttr) {
             if (equalIgnoringCase(attributeValue, "content-type"))
@@ -139,13 +121,11 @@ TextEncoding HTMLMetaCharsetParser::encodingFromMetaAttributes(const AttributeLi
     }
 
     if (mode == Charset || (mode == Pragma && gotPragma))
-        return TextEncoding(stripLeadingAndTrailingHTMLSpaces(charset));
+        return TextEncoding(stripLeadingAndTrailingHTMLSpaces(charset.toStringWithoutCopying()));
 
     return TextEncoding();
 }
 
-static const int bytesToCheckUnconditionally = 1024; // That many input bytes will be checked for meta charset even if <head> section is over.
-
 bool HTMLMetaCharsetParser::checkForMetaCharset(const char* data, size_t length)
 {
     if (m_doneChecking)
@@ -156,30 +136,32 @@ bool HTMLMetaCharsetParser::checkForMetaCharset(const char* data, size_t length)
     // We still don't have an encoding, and are in the head.
     // The following tags are allowed in <head>:
     // SCRIPT|STYLE|META|LINK|OBJECT|TITLE|BASE
-
+    //
     // We stop scanning when a tag that is not permitted in <head>
     // is seen, rather when </head> is seen, because that more closely
     // matches behavior in other browsers; more details in
     // <http://bugs.webkit.org/show_bug.cgi?id=3590>.
-
+    //
     // Additionally, we ignore things that looks like tags in <title>, <script>
     // and <noscript>; see <http://bugs.webkit.org/show_bug.cgi?id=4560>,
     // <http://bugs.webkit.org/show_bug.cgi?id=12165> and
     // <http://bugs.webkit.org/show_bug.cgi?id=12389>.
-
+    //
     // Since many sites have charset declarations after <body> or other tags
     // that are disallowed in <head>, we don't bail out until we've checked at
     // least bytesToCheckUnconditionally bytes of input.
 
-    m_input.append(SegmentedString(m_assumedCodec->decode(data, length)));
+    static const int bytesToCheckUnconditionally = 1024;
+
+    m_input.append(SegmentedString(m_codec->decode(data, length)));
 
-    while (m_tokenizer->nextToken(m_input, m_token)) {
-        bool end = m_token.type() == HTMLToken::EndTag;
-        if (end || m_token.type() == HTMLToken::StartTag) {
-            AtomicString tagName(m_token.name());
-            if (!end) {
-                m_tokenizer->updateStateFor(tagName);
-                if (tagName == metaTag && processMeta()) {
+    while (auto token = m_tokenizer.nextToken(m_input)) {
+        bool isEnd = token->type() == HTMLToken::EndTag;
+        if (isEnd || token->type() == HTMLToken::StartTag) {
+            AtomicString tagName(token->name());
+            if (!isEnd) {
+                m_tokenizer.updateStateFor(tagName);
+                if (tagName == metaTag && processMeta(*token)) {
                     m_doneChecking = true;
                     return true;
                 }
@@ -189,7 +171,8 @@ bool HTMLMetaCharsetParser::checkForMetaCharset(const char* data, size_t length)
                 && tagName != styleTag && tagName != linkTag
                 && tagName != metaTag && tagName != objectTag
                 && tagName != titleTag && tagName != baseTag
-                && (end || tagName != htmlTag) && (end || tagName != headTag)) {
+                && (isEnd || tagName != htmlTag)
+                && (isEnd || tagName != headTag)) {
                 m_inHeadSection = false;
             }
         }
@@ -198,8 +181,6 @@ bool HTMLMetaCharsetParser::checkForMetaCharset(const char* data, size_t length)
             m_doneChecking = true;
             return true;
         }
-
-        m_token.clear();
     }
 
     return false;
index de028bb0bca0266fb6d4f78aa4f07240a232959b..2c6d7c57e8076d9fef058b678a87e23afb245eda 100644 (file)
 #ifndef HTMLMetaCharsetParser_h
 #define HTMLMetaCharsetParser_h
 
-#include "HTMLToken.h"
+#include "HTMLTokenizer.h"
 #include "SegmentedString.h"
 #include "TextEncoding.h"
-#include <wtf/Noncopyable.h>
 
 namespace WebCore {
 
-class HTMLTokenizer;
 class TextCodec;
 
 class HTMLMetaCharsetParser {
     WTF_MAKE_NONCOPYABLE(HTMLMetaCharsetParser); WTF_MAKE_FAST_ALLOCATED;
 public:
     HTMLMetaCharsetParser();
-    ~HTMLMetaCharsetParser();
 
     // Returns true if done checking, regardless whether an encoding is found.
     bool checkForMetaCharset(const char*, size_t);
 
     const TextEncoding& encoding() { return m_encoding; }
 
-    typedef Vector<std::pair<String, String>> AttributeList;
     // The returned encoding might not be valid.
-    static TextEncoding encodingFromMetaAttributes(const AttributeList&
-);
+    typedef Vector<std::pair<String, String>> AttributeList;
+    static TextEncoding encodingFromMetaAttributes(const AttributeList&);
 
 private:
-    bool processMeta();
-    static String extractCharset(const String&);
+    bool processMeta(HTMLToken&);
 
-    enum Mode {
-        None,
-        Charset,
-        Pragma,
-    };
-
-    std::unique_ptr<HTMLTokenizer> m_tokenizer;
-    std::unique_ptr<TextCodec> m_assumedCodec;
+    HTMLTokenizer m_tokenizer;
+    const std::unique_ptr<TextCodec> m_codec;
     SegmentedString m_input;
-    HTMLToken m_token;
-    bool m_inHeadSection;
-
-    bool m_doneChecking;
+    bool m_inHeadSection { true };
+    bool m_doneChecking { false };
     TextEncoding m_encoding;
 };
 
index 149e40cd80d0880a975082e4992cd1a73c28d775..47031b128761bf0ea90c53c1aa2643971c97b3dc 100644 (file)
@@ -242,40 +242,8 @@ private:
 
 TokenPreloadScanner::TokenPreloadScanner(const URL& documentURL, float deviceScaleFactor)
     : m_documentURL(documentURL)
-    , m_inStyle(false)
     , m_deviceScaleFactor(deviceScaleFactor)
-#if ENABLE(TEMPLATE_ELEMENT)
-    , m_templateCount(0)
-#endif
-{
-}
-
-TokenPreloadScanner::~TokenPreloadScanner()
-{
-}
-
-TokenPreloadScannerCheckpoint TokenPreloadScanner::createCheckpoint()
-{
-    TokenPreloadScannerCheckpoint checkpoint = m_checkpoints.size();
-    m_checkpoints.append(Checkpoint(m_predictedBaseElementURL, m_inStyle
-#if ENABLE(TEMPLATE_ELEMENT)
-                                    , m_templateCount
-#endif
-                                    ));
-    return checkpoint;
-}
-
-void TokenPreloadScanner::rewindTo(TokenPreloadScannerCheckpoint checkpointIndex)
 {
-    ASSERT(checkpointIndex < m_checkpoints.size()); // If this ASSERT fires, checkpointIndex is invalid.
-    const Checkpoint& checkpoint = m_checkpoints[checkpointIndex];
-    m_predictedBaseElementURL = checkpoint.predictedBaseElementURL;
-    m_inStyle = checkpoint.inStyle;
-#if ENABLE(TEMPLATE_ELEMENT)
-    m_templateCount = checkpoint.templateCount;
-#endif
-    m_cssScanner.reset();
-    m_checkpoints.clear();
 }
 
 void TokenPreloadScanner::scan(const HTMLToken& token, Vector<std::unique_ptr<PreloadRequest>>& requests, Document& document)
@@ -349,11 +317,7 @@ void TokenPreloadScanner::updatePredictedBaseURL(const HTMLToken& token)
 
 HTMLPreloadScanner::HTMLPreloadScanner(const HTMLParserOptions& options, const URL& documentURL, float deviceScaleFactor)
     : m_scanner(documentURL, deviceScaleFactor)
-    , m_tokenizer(std::make_unique<HTMLTokenizer>(options))
-{
-}
-
-HTMLPreloadScanner::~HTMLPreloadScanner()
+    , m_tokenizer(options)
 {
 }
 
@@ -362,7 +326,7 @@ void HTMLPreloadScanner::appendToEnd(const SegmentedString& source)
     m_source.append(source);
 }
 
-void HTMLPreloadScanner::scan(HTMLResourcePreloader* preloader, Document& document)
+void HTMLPreloadScanner::scan(HTMLResourcePreloader& preloader, Document& document)
 {
     ASSERT(isMainThread()); // HTMLTokenizer::updateStateFor only works on the main thread.
 
@@ -374,14 +338,13 @@ void HTMLPreloadScanner::scan(HTMLResourcePreloader* preloader, Document& docume
 
     PreloadRequestStream requests;
 
-    while (m_tokenizer->nextToken(m_source, m_token)) {
-        if (m_token.type() == HTMLToken::StartTag)
-            m_tokenizer->updateStateFor(AtomicString(m_token.name()));
-        m_scanner.scan(m_token, requests, document);
-        m_token.clear();
+    while (auto token = m_tokenizer.nextToken(m_source)) {
+        if (token->type() == HTMLToken::StartTag)
+            m_tokenizer.updateStateFor(AtomicString(token->name()));
+        m_scanner.scan(*token, requests, document);
     }
 
-    preloader->preload(WTF::move(requests));
+    preloader.preload(WTF::move(requests));
 }
 
 }
index 9dd6cfc0eb0957c9449430ba880afccab8f2601d..1a9c28390311b18dbbdde693151d0049cce574ac 100644 (file)
 #define HTMLPreloadScanner_h
 
 #include "CSSPreloadScanner.h"
-#include "HTMLToken.h"
+#include "HTMLTokenizer.h"
 #include "SegmentedString.h"
-#include <wtf/Vector.h>
 
 namespace WebCore {
 
-typedef size_t TokenPreloadScannerCheckpoint;
-
-class HTMLParserOptions;
-class HTMLTokenizer;
-class SegmentedString;
-class Frame;
-
 class TokenPreloadScanner {
-    WTF_MAKE_NONCOPYABLE(TokenPreloadScanner); WTF_MAKE_FAST_ALLOCATED;
+    WTF_MAKE_NONCOPYABLE(TokenPreloadScanner);
 public:
     explicit TokenPreloadScanner(const URL& documentURL, float deviceScaleFactor = 1.0);
-    ~TokenPreloadScanner();
 
-    void scan(const HTMLToken&, PreloadRequestStream& requests, Document&);
+    void scan(const HTMLToken&, PreloadRequestStream&, Document&);
 
     void setPredictedBaseElementURL(const URL& url) { m_predictedBaseElementURL = url; }
 
-    // A TokenPreloadScannerCheckpoint is valid until the next call to rewindTo,
-    // at which point all outstanding checkpoints are invalidated.
-    TokenPreloadScannerCheckpoint createCheckpoint();
-    void rewindTo(TokenPreloadScannerCheckpoint);
-
-    bool isSafeToSendToAnotherThread()
-    {
-        return m_documentURL.isSafeToSendToAnotherThread()
-            && m_predictedBaseElementURL.isSafeToSendToAnotherThread();
-    }
-
 private:
     enum class TagId {
         // These tags are scanned by the StartTagScanner.
@@ -85,54 +65,29 @@ private:
 
     void updatePredictedBaseURL(const HTMLToken&);
 
-    struct Checkpoint {
-        Checkpoint(const URL& predictedBaseElementURL, bool inStyle
-#if ENABLE(TEMPLATE_ELEMENT)
-            , size_t templateCount
-#endif
-            )
-            : predictedBaseElementURL(predictedBaseElementURL)
-            , inStyle(inStyle)
-#if ENABLE(TEMPLATE_ELEMENT)
-            , templateCount(templateCount)
-#endif
-        {
-        }
-
-        URL predictedBaseElementURL;
-        bool inStyle;
-#if ENABLE(TEMPLATE_ELEMENT)
-        size_t templateCount;
-#endif
-    };
-
     CSSPreloadScanner m_cssScanner;
     const URL m_documentURL;
-    URL m_predictedBaseElementURL;
-    bool m_inStyle;
-    float m_deviceScaleFactor;
+    const float m_deviceScaleFactor { 1 };
 
+    URL m_predictedBaseElementURL;
+    bool m_inStyle { false };
 #if ENABLE(TEMPLATE_ELEMENT)
-    size_t m_templateCount;
+    unsigned m_templateCount { 0 };
 #endif
-
-    Vector<Checkpoint> m_checkpoints;
 };
 
 class HTMLPreloadScanner {
-    WTF_MAKE_NONCOPYABLE(HTMLPreloadScanner); WTF_MAKE_FAST_ALLOCATED;
+    WTF_MAKE_FAST_ALLOCATED;
 public:
     HTMLPreloadScanner(const HTMLParserOptions&, const URL& documentURL, float deviceScaleFactor = 1.0);
-    ~HTMLPreloadScanner();
 
     void appendToEnd(const SegmentedString&);
-    void scan(HTMLResourcePreloader*, Document&);
+    void scan(HTMLResourcePreloader&, Document&);
 
 private:
     TokenPreloadScanner m_scanner;
     SegmentedString m_source;
-    HTMLToken m_token;
-    std::unique_ptr<HTMLTokenizer> m_tokenizer;
+    HTMLTokenizer m_tokenizer;
 };
 
 }
index 16ea7612d3aa6339208b69970abc1e278a2924d9..3c7b7ad1b892c55411b4e72b6391310f9c96ceca 100644 (file)
 
 namespace WebCore {
 
-bool PreloadRequest::isSafeToSendToAnotherThread() const
-{
-    return m_initiator.isSafeToSendToAnotherThread()
-        && m_charset.isSafeToSendToAnotherThread()
-        && m_resourceURL.isSafeToSendToAnotherThread()
-        && m_mediaAttribute.isSafeToSendToAnotherThread()
-        && m_baseURL.isSafeToSendToAnotherThread();
-}
-
 URL PreloadRequest::completeURL(Document& document)
 {
     return document.completeURL(m_resourceURL, m_baseURL.isEmpty() ? document.url() : m_baseURL);
index f93a093bd6099dee1a23fb940ef44e95b966e4d4..2a8b6c87417c18993f7f9e35dec98569a5b9ceb9 100644 (file)
@@ -35,16 +35,14 @@ class PreloadRequest {
 public:
     PreloadRequest(const String& initiator, const String& resourceURL, const URL& baseURL, CachedResource::Type resourceType, const String& mediaAttribute)
         : m_initiator(initiator)
-        , m_resourceURL(resourceURL.isolatedCopy())
+        , m_resourceURL(resourceURL)
         , m_baseURL(baseURL.copy())
         , m_resourceType(resourceType)
-        , m_mediaAttribute(mediaAttribute.isolatedCopy())
+        , m_mediaAttribute(mediaAttribute)
         , m_crossOriginModeAllowsCookies(false)
     {
     }
 
-    bool isSafeToSendToAnotherThread() const;
-
     CachedResourceRequest resourceRequest(Document&);
 
     const String& charset() const { return m_charset; }
index f48d87c4107e2e6914e00ee730121630f2ae849b..0c9a0463270dcaa2b1fef59e452cf65a5953d2d7 100644 (file)
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2010 Adam Barth. All Rights Reserved.
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -25,6 +26,7 @@
 
 #include "config.h"
 #include "HTMLSourceTracker.h"
+
 #include "HTMLTokenizer.h"
 #include <wtf/text/StringBuilder.h>
 
@@ -34,36 +36,41 @@ HTMLSourceTracker::HTMLSourceTracker()
 {
 }
 
-void HTMLSourceTracker::start(SegmentedString& currentInput, HTMLTokenizer* tokenizer, HTMLToken& token)
+void HTMLSourceTracker::startToken(SegmentedString& currentInput, HTMLTokenizer& tokenizer)
 {
-    if (token.type() == HTMLToken::Uninitialized) {
-        m_previousSource.clear();
-        if (tokenizer->numberOfBufferedCharacters())
-            m_previousSource = tokenizer->bufferedCharacters();
+    if (!m_started) {
+        if (tokenizer.numberOfBufferedCharacters())
+            m_previousSource = tokenizer.bufferedCharacters();
+        else
+            m_previousSource.clear();
+        m_started = true;
     } else
         m_previousSource.append(m_currentSource);
 
     m_currentSource = currentInput;
-    token.setBaseOffset(m_currentSource.numberOfCharactersConsumed() - m_previousSource.length());
+    m_tokenStart = m_currentSource.numberOfCharactersConsumed() - m_previousSource.length();
 }
 
-void HTMLSourceTracker::end(SegmentedString& currentInput, HTMLTokenizer* tokenizer, HTMLToken& token)
+void HTMLSourceTracker::endToken(SegmentedString& currentInput, HTMLTokenizer& tokenizer)
 {
-    m_cachedSourceForToken = String();
+    ASSERT(m_started);
+    m_started = false;
 
-    // FIXME: This work should really be done by the HTMLTokenizer.
-    token.setEndOffset(currentInput.numberOfCharactersConsumed() - tokenizer->numberOfBufferedCharacters());
+    m_tokenEnd = currentInput.numberOfCharactersConsumed() - tokenizer.numberOfBufferedCharacters();
+    m_cachedSourceForToken = String();
 }
 
-String HTMLSourceTracker::sourceForToken(const HTMLToken& token)
+String HTMLSourceTracker::source(const HTMLToken& token)
 {
+    ASSERT(!m_started);
+
     if (token.type() == HTMLToken::EndOfFile)
         return String(); // Hides the null character we use to mark the end of file.
 
     if (!m_cachedSourceForToken.isEmpty())
         return m_cachedSourceForToken;
 
-    unsigned length = token.length();
+    unsigned length = m_tokenEnd - m_tokenStart;
 
     StringBuilder source;
     source.reserveCapacity(length);
@@ -83,4 +90,9 @@ String HTMLSourceTracker::sourceForToken(const HTMLToken& token)
     return m_cachedSourceForToken;
 }
 
+String HTMLSourceTracker::source(const HTMLToken& token, unsigned attributeStart, unsigned attributeEnd)
+{
+    return source(token).substring(attributeStart - m_tokenStart, attributeEnd - attributeStart);
+}
+
 }
index 7f0378b8f619507da926161cc3f8148185bafb50..3601e25dba5849eb9948088e1a9857be32267495 100644 (file)
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2010 Adam Barth. All Rights Reserved.
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
 #ifndef HTMLSourceTracker_h
 #define HTMLSourceTracker_h
 
-#include "HTMLToken.h"
 #include "SegmentedString.h"
 
 namespace WebCore {
 
+class HTMLToken;
 class HTMLTokenizer;
 
 class HTMLSourceTracker {
@@ -38,15 +39,18 @@ class HTMLSourceTracker {
 public:
     HTMLSourceTracker();
 
-    // FIXME: Once we move "end" into HTMLTokenizer, rename "start" to
-    // something that makes it obvious that this method can be called multiple
-    // times.
-    void start(SegmentedString&, HTMLTokenizer*, HTMLToken&);
-    void end(SegmentedString&, HTMLTokenizer*, HTMLToken&);
+    void startToken(SegmentedString&, HTMLTokenizer&);
+    void endToken(SegmentedString&, HTMLTokenizer&);
 
-    String sourceForToken(const HTMLToken&);
+    String source(const HTMLToken&);
+    String source(const HTMLToken&, unsigned attributeStart, unsigned attributeEnd);
 
 private:
+    bool m_started { false };
+
+    unsigned m_tokenStart;
+    unsigned m_tokenEnd;
+
     SegmentedString m_previousSource;
     SegmentedString m_currentSource;
 
index 8a0349c0ab89e201247d3d6034e920f5deebc4fa..617172181d83d76e23ae9cb91a93054af0deec64 100644 (file)
@@ -53,15 +53,12 @@ public:
     };
 
     struct Attribute {
-        struct Range {
-            unsigned start;
-            unsigned end;
-        };
-
-        Range nameRange;
-        Range valueRange;
         Vector<UChar, 32> name;
         Vector<UChar, 32> value;
+
+        // Used by HTMLSourceTracker.
+        unsigned startOffset;
+        unsigned endOffset;
     };
 
     typedef Vector<Attribute, 10> AttributeList;
@@ -73,11 +70,6 @@ public:
 
     Type type() const;
 
-    // Used by HTMLSourceTracker.
-    void setBaseOffset(unsigned); // Base for attribute offsets, and the end of token offset.
-    void setEndOffset(unsigned);
-    unsigned length() const;
-
     // EndOfFile
 
     void makeEndOfFile();
@@ -113,15 +105,10 @@ public:
     void beginEndTag(LChar);
     void beginEndTag(const Vector<LChar, 32>&);
 
-    void addNewAttribute();
-
-    void beginAttributeName(unsigned offset);
+    void beginAttribute(unsigned offset);
     void appendToAttributeName(UChar);
-    void endAttributeName(unsigned offset);
-
-    void beginAttributeValue(unsigned offset);
     void appendToAttributeValue(UChar);
-    void endAttributeValue(unsigned offset);
+    void endAttribute(unsigned offset);
 
     void setSelfClosing();
 
@@ -154,9 +141,6 @@ public:
 private:
     Type m_type;
 
-    unsigned m_baseOffset;
-    unsigned m_length;
-
     DataVector m_data;
     UChar m_data8BitCheck;
 
@@ -172,8 +156,9 @@ private:
 const HTMLToken::Attribute* findAttribute(const Vector<HTMLToken::Attribute>&, StringView name);
 
 inline HTMLToken::HTMLToken()
+    : m_type(Uninitialized)
+    , m_data8BitCheck(0)
 {
-    clear();
 }
 
 inline void HTMLToken::clear()
@@ -181,9 +166,6 @@ inline void HTMLToken::clear()
     m_type = Uninitialized;
     m_data.clear();
     m_data8BitCheck = 0;
-
-    m_length = 0;
-    m_baseOffset = 0;
 }
 
 inline HTMLToken::Type HTMLToken::type() const
@@ -197,21 +179,6 @@ inline void HTMLToken::makeEndOfFile()
     m_type = EndOfFile;
 }
 
-inline unsigned HTMLToken::length() const
-{
-    return m_length;
-}
-
-inline void HTMLToken::setBaseOffset(unsigned offset)
-{
-    m_baseOffset = offset;
-}
-
-inline void HTMLToken::setEndOffset(unsigned endOffset)
-{
-    m_length = endOffset - m_baseOffset;
-}
-
 inline const HTMLToken::DataVector& HTMLToken::name() const
 {
     ASSERT(m_type == StartTag || m_type == EndTag || m_type == DOCTYPE);
@@ -300,9 +267,12 @@ inline void HTMLToken::beginStartTag(UChar character)
     ASSERT(m_type == Uninitialized);
     m_type = StartTag;
     m_selfClosing = false;
-    m_currentAttribute = nullptr;
     m_attributes.clear();
 
+#if !ASSERT_DISABLED
+    m_currentAttribute = nullptr;
+#endif
+
     m_data.append(character);
     m_data8BitCheck = character;
 }
@@ -312,9 +282,12 @@ inline void HTMLToken::beginEndTag(LChar character)
     ASSERT(m_type == Uninitialized);
     m_type = EndTag;
     m_selfClosing = false;
-    m_currentAttribute = nullptr;
     m_attributes.clear();
 
+#if !ASSERT_DISABLED
+    m_currentAttribute = nullptr;
+#endif
+
     m_data.append(character);
 }
 
@@ -323,64 +296,41 @@ inline void HTMLToken::beginEndTag(const Vector<LChar, 32>& characters)
     ASSERT(m_type == Uninitialized);
     m_type = EndTag;
     m_selfClosing = false;
-    m_currentAttribute = nullptr;
     m_attributes.clear();
 
-    m_data.appendVector(characters);
-}
-
-inline void HTMLToken::addNewAttribute()
-{
-    ASSERT(m_type == StartTag || m_type == EndTag);
-    m_attributes.grow(m_attributes.size() + 1);
-    m_currentAttribute = &m_attributes.last();
-
 #if !ASSERT_DISABLED
-    m_currentAttribute->nameRange.start = 0;
-    m_currentAttribute->nameRange.end = 0;
-    m_currentAttribute->valueRange.start = 0;
-    m_currentAttribute->valueRange.end = 0;
+    m_currentAttribute = nullptr;
 #endif
-}
 
-inline void HTMLToken::beginAttributeName(unsigned offset)
-{
-    ASSERT(offset);
-    ASSERT(!m_currentAttribute->nameRange.start);
-    m_currentAttribute->nameRange.start = offset - m_baseOffset;
+    m_data.appendVector(characters);
 }
 
-inline void HTMLToken::endAttributeName(unsigned offset)
+inline void HTMLToken::beginAttribute(unsigned offset)
 {
+    ASSERT(m_type == StartTag || m_type == EndTag);
     ASSERT(offset);
-    ASSERT(m_currentAttribute->nameRange.start);
-    ASSERT(!m_currentAttribute->nameRange.end);
 
-    unsigned adjustedOffset = offset - m_baseOffset;
-    m_currentAttribute->nameRange.end = adjustedOffset;
-
-    // FIXME: Is this intentional? Why point the value at the end of the name?
-    m_currentAttribute->valueRange.start = adjustedOffset;
-    m_currentAttribute->valueRange.end = adjustedOffset;
-}
+    m_attributes.grow(m_attributes.size() + 1);
+    m_currentAttribute = &m_attributes.last();
 
-inline void HTMLToken::beginAttributeValue(unsigned offset)
-{
-    ASSERT(offset);
-    m_currentAttribute->valueRange.start = offset - m_baseOffset;
+    m_currentAttribute->startOffset = offset;
 }
 
-inline void HTMLToken::endAttributeValue(unsigned offset)
+inline void HTMLToken::endAttribute(unsigned offset)
 {
     ASSERT(offset);
-    m_currentAttribute->valueRange.end = offset - m_baseOffset;
+    ASSERT(m_currentAttribute);
+    m_currentAttribute->endOffset = offset;
+#if !ASSERT_DISABLED
+    m_currentAttribute = nullptr;
+#endif
 }
 
 inline void HTMLToken::appendToAttributeName(UChar character)
 {
     ASSERT(character);
     ASSERT(m_type == StartTag || m_type == EndTag);
-    ASSERT(m_currentAttribute->nameRange.start);
+    ASSERT(m_currentAttribute);
     m_currentAttribute->name.append(character);
 }
 
@@ -388,7 +338,7 @@ inline void HTMLToken::appendToAttributeValue(UChar character)
 {
     ASSERT(character);
     ASSERT(m_type == StartTag || m_type == EndTag);
-    ASSERT(m_currentAttribute->valueRange.start);
+    ASSERT(m_currentAttribute);
     m_currentAttribute->value.append(character);
 }
 
index 063bab84d764f87fd51cb4668c1601cf472b8e7e..489e6c51a582e0c497f93ffd48c870fee0b329b6 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2008 Apple Inc. All Rights Reserved.
+ * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved.
  * Copyright (C) 2009 Torch Mobile, Inc. http://www.torchmobile.com/
  * Copyright (C) 2010 Google, Inc. All Rights Reserved.
  *
 #include "HTMLTokenizer.h"
 
 #include "HTMLEntityParser.h"
-#include "HTMLTreeBuilder.h"
+#include "HTMLNames.h"
 #include "MarkupTokenizerInlines.h"
-#include "NotImplemented.h"
 #include <wtf/ASCIICType.h>
-#include <wtf/CurrentTime.h>
-#include <wtf/text/CString.h>
 
 using namespace WTF;
 
@@ -42,64 +39,95 @@ namespace WebCore {
 
 using namespace HTMLNames;
 
-static inline UChar toLowerCase(UChar cc)
+static inline LChar convertASCIIAlphaToLower(UChar character)
 {
-    ASSERT(isASCIIUpper(cc));
-    const int lowerCaseOffset = 0x20;
-    return cc + lowerCaseOffset;
+    ASSERT(isASCIIAlpha(character));
+    return toASCIILowerUnchecked(character);
 }
 
-static inline bool vectorEqualsString(const Vector<LChar, 32>& vector, const String& string)
+static inline bool vectorEqualsString(const Vector<LChar, 32>& vector, const char* string)
 {
-    if (vector.size() != string.length())
-        return false;
-
-    if (!string.length())
-        return true;
-
-    return equal(string.impl(), vector.data(), vector.size());
+    unsigned size = vector.size();
+    for (unsigned i = 0; i < size; ++i) {
+        if (!string[i] || vector[i] != string[i])
+            return false;
+    }
+    return !string[size];
 }
 
-static inline bool isEndTagBufferingState(HTMLTokenizer::State state)
+inline bool HTMLTokenizer::inEndTagBufferingState() const
 {
-    switch (state) {
-    case HTMLTokenizer::RCDATAEndTagOpenState:
-    case HTMLTokenizer::RCDATAEndTagNameState:
-    case HTMLTokenizer::RAWTEXTEndTagOpenState:
-    case HTMLTokenizer::RAWTEXTEndTagNameState:
-    case HTMLTokenizer::ScriptDataEndTagOpenState:
-    case HTMLTokenizer::ScriptDataEndTagNameState:
-    case HTMLTokenizer::ScriptDataEscapedEndTagOpenState:
-    case HTMLTokenizer::ScriptDataEscapedEndTagNameState:
+    switch (m_state) {
+    case RCDATAEndTagOpenState:
+    case RCDATAEndTagNameState:
+    case RAWTEXTEndTagOpenState:
+    case RAWTEXTEndTagNameState:
+    case ScriptDataEndTagOpenState:
+    case ScriptDataEndTagNameState:
+    case ScriptDataEscapedEndTagOpenState:
+    case ScriptDataEscapedEndTagNameState:
         return true;
     default:
         return false;
     }
 }
 
-#define HTML_BEGIN_STATE(stateName) BEGIN_STATE(HTMLTokenizer, stateName)
-#define HTML_RECONSUME_IN(stateName) RECONSUME_IN(HTMLTokenizer, stateName)
-#define HTML_ADVANCE_TO(stateName) ADVANCE_TO(HTMLTokenizer, stateName)
-#define HTML_SWITCH_TO(stateName) SWITCH_TO(HTMLTokenizer, stateName)
-
 HTMLTokenizer::HTMLTokenizer(const HTMLParserOptions& options)
-    : m_inputStreamPreprocessor(this)
+    : m_preprocessor(*this)
     , m_options(options)
 {
-    reset();
 }
 
-HTMLTokenizer::~HTMLTokenizer()
+inline void HTMLTokenizer::bufferASCIICharacter(UChar character)
+{
+    ASSERT(character != kEndOfFileMarker);
+    ASSERT(isASCII(character));
+    LChar narrowedCharacter = character;
+    m_token.appendToCharacter(narrowedCharacter);
+}
+
+inline void HTMLTokenizer::bufferCharacter(UChar character)
+{
+    ASSERT(character != kEndOfFileMarker);
+    m_token.appendToCharacter(character);
+}
+
+inline bool HTMLTokenizer::emitAndResumeInDataState(SegmentedString& source)
+{
+    saveEndTagNameIfNeeded();
+    m_state = DataState;
+    source.advanceAndUpdateLineNumber();
+    return true;
+}
+
+inline bool HTMLTokenizer::emitAndReconsumeInDataState()
+{
+    saveEndTagNameIfNeeded();
+    m_state = DataState;
+    return true;
+}
+
+inline bool HTMLTokenizer::emitEndOfFile(SegmentedString& source)
+{
+    m_state = DataState;
+    if (haveBufferedCharacterToken())
+        return true;
+    source.advance();
+    m_token.clear();
+    m_token.makeEndOfFile();
+    return true;
+}
+
+inline void HTMLTokenizer::saveEndTagNameIfNeeded()
 {
+    ASSERT(m_token.type() != HTMLToken::Uninitialized);
+    if (m_token.type() == HTMLToken::StartTag)
+        m_appropriateEndTagName = m_token.name();
 }
 
-void HTMLTokenizer::reset()
+inline bool HTMLTokenizer::haveBufferedCharacterToken() const
 {
-    m_state = HTMLTokenizer::DataState;
-    m_token = 0;
-    m_forceNullCharacterReplacement = false;
-    m_shouldAllowCDATA = false;
-    m_additionalAllowedCharacter = '\0';
+    return m_token.type() == HTMLToken::Character;
 }
 
 inline bool HTMLTokenizer::processEntity(SegmentedString& source)
@@ -119,1426 +147,1246 @@ inline bool HTMLTokenizer::processEntity(SegmentedString& source)
     return true;
 }
 
-bool HTMLTokenizer::flushBufferedEndTag(SegmentedString& source)
+void HTMLTokenizer::flushBufferedEndTag()
 {
-    ASSERT(m_token->type() == HTMLToken::Character || m_token->type() == HTMLToken::Uninitialized);
-    source.advanceAndUpdateLineNumber();
-    if (m_token->type() == HTMLToken::Character)
-        return true;
-    m_token->beginEndTag(m_bufferedEndTagName);
+    m_token.beginEndTag(m_bufferedEndTagName);
     m_bufferedEndTagName.clear();
     m_appropriateEndTagName.clear();
     m_temporaryBuffer.clear();
+}
+
+bool HTMLTokenizer::commitToPartialEndTag(SegmentedString& source, UChar character, State state)
+{
+    ASSERT(source.currentChar() == character);
+    appendToTemporaryBuffer(character);
+    source.advanceAndUpdateLineNumber();
+
+    if (haveBufferedCharacterToken()) {
+        // Emit the buffered character token.
+        // The next call to processToken will flush the buffered end tag and continue parsing it.
+        m_state = state;
+        return true;
+    }
+
+    flushBufferedEndTag();
     return false;
 }
 
-#define FLUSH_AND_ADVANCE_TO(stateName)                                    \
-    do {                                                                   \
-        m_state = HTMLTokenizer::stateName;                           \
-        if (flushBufferedEndTag(source))                                   \
-            return true;                                                   \
-        if (source.isEmpty()                                               \
-            || !m_inputStreamPreprocessor.peek(source))                    \
-            return haveBufferedCharacterToken();                           \
-        cc = m_inputStreamPreprocessor.nextInputCharacter();               \
-        goto stateName;                                                    \
-    } while (false)
-
-bool HTMLTokenizer::flushEmitAndResumeIn(SegmentedString& source, HTMLTokenizer::State state)
+bool HTMLTokenizer::commitToCompleteEndTag(SegmentedString& source)
 {
-    m_state = state;
-    flushBufferedEndTag(source);
+    ASSERT(source.currentChar() == '>');
+    appendToTemporaryBuffer('>');
+    source.advance();
+
+    m_state = DataState;
+
+    if (haveBufferedCharacterToken()) {
+        // Emit the character token we already have.
+        // The next call to processToken will flush the buffered end tag and emit it.
+        return true;
+    }
+
+    flushBufferedEndTag();
     return true;
 }
 
-bool HTMLTokenizer::nextToken(SegmentedString& source, HTMLToken& token)
+bool HTMLTokenizer::processToken(SegmentedString& source)
 {
-    // If we have a token in progress, then we're supposed to be called back
-    // with the same token so we can finish it.
-    ASSERT(!m_token || m_token == &token || token.type() == HTMLToken::Uninitialized);
-    m_token = &token;
-
-    if (!m_bufferedEndTagName.isEmpty() && !isEndTagBufferingState(m_state)) {
-        // FIXME: This should call flushBufferedEndTag().
-        // We started an end tag during our last iteration.
-        m_token->beginEndTag(m_bufferedEndTagName);
-        m_bufferedEndTagName.clear();
-        m_appropriateEndTagName.clear();
-        m_temporaryBuffer.clear();
-        if (m_state == HTMLTokenizer::DataState) {
-            // We're back in the data state, so we must be done with the tag.
+    if (!m_bufferedEndTagName.isEmpty() && !inEndTagBufferingState()) {
+        // We are back here after emitting a character token that came just before an end tag.
+        // To continue parsing the end tag we need to move the buffered tag name into the token.
+        flushBufferedEndTag();
+
+        // If we are in the data state, the end tag is already complete and we should emit it
+        // now, otherwise, we want to resume parsing the partial end tag.
+        if (m_state == DataState)
             return true;
-        }
     }
 
-    if (source.isEmpty() || !m_inputStreamPreprocessor.peek(source))
+    if (!m_preprocessor.peek(source, isNullCharacterSkippingState(m_state)))
         return haveBufferedCharacterToken();
-    UChar cc = m_inputStreamPreprocessor.nextInputCharacter();
+    UChar character = m_preprocessor.nextInputCharacter();
 
-    // Source: http://www.whatwg.org/specs/web-apps/current-work/#tokenisation0
+    // https://html.spec.whatwg.org/#tokenization
     switch (m_state) {
-    HTML_BEGIN_STATE(DataState) {
-        if (cc == '&')
-            HTML_ADVANCE_TO(CharacterReferenceInDataState);
-        else if (cc == '<') {
-            if (m_token->type() == HTMLToken::Character) {
-                // We have a bunch of character tokens queued up that we
-                // are emitting lazily here.
-                return true;
-            }
-            HTML_ADVANCE_TO(TagOpenState);
-        } else if (cc == kEndOfFileMarker)
-            return emitEndOfFile(source);
-        else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(DataState);
-        }
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(CharacterReferenceInDataState) {
-        if (!processEntity(source))
-            return haveBufferedCharacterToken();
-        HTML_SWITCH_TO(DataState);
-    }
-    END_STATE()
 
-    HTML_BEGIN_STATE(RCDATAState) {
-        if (cc == '&')
-            HTML_ADVANCE_TO(CharacterReferenceInRCDATAState);
-        else if (cc == '<')
-            HTML_ADVANCE_TO(RCDATALessThanSignState);
-        else if (cc == kEndOfFileMarker)
-            return emitEndOfFile(source);
-        else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(RCDATAState);
+    BEGIN_STATE(DataState)
+        if (character == '&')
+            ADVANCE_TO(CharacterReferenceInDataState);
+        if (character == '<') {
+            if (haveBufferedCharacterToken())
+                RETURN_IN_CURRENT_STATE(true);
+            ADVANCE_TO(TagOpenState);
         }
-    }
+        if (character == kEndOfFileMarker)
+            return emitEndOfFile(source);
+        bufferCharacter(character);
+        ADVANCE_TO(DataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CharacterReferenceInRCDATAState) {
+    BEGIN_STATE(CharacterReferenceInDataState)
         if (!processEntity(source))
-            return haveBufferedCharacterToken();
-        HTML_SWITCH_TO(RCDATAState);
-    }
+            RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
+        SWITCH_TO(DataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(RAWTEXTState) {
-        if (cc == '<')
-            HTML_ADVANCE_TO(RAWTEXTLessThanSignState);
-        else if (cc == kEndOfFileMarker)
-            return emitEndOfFile(source);
-        else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(RAWTEXTState);
-        }
-    }
+    BEGIN_STATE(RCDATAState)
+        if (character == '&')
+            ADVANCE_TO(CharacterReferenceInRCDATAState);
+        if (character == '<')
+            ADVANCE_TO(RCDATALessThanSignState);
+        if (character == kEndOfFileMarker)
+            RECONSUME_IN(DataState);
+        bufferCharacter(character);
+        ADVANCE_TO(RCDATAState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataState) {
-        if (cc == '<')
-            HTML_ADVANCE_TO(ScriptDataLessThanSignState);
-        else if (cc == kEndOfFileMarker)
-            return emitEndOfFile(source);
-        else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(ScriptDataState);
+    BEGIN_STATE(CharacterReferenceInRCDATAState)
+        if (!processEntity(source))
+            RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
+        SWITCH_TO(RCDATAState);
+    END_STATE()
+
+    BEGIN_STATE(RAWTEXTState)
+        if (character == '<')
+            ADVANCE_TO(RAWTEXTLessThanSignState);
+        if (character == kEndOfFileMarker)
+            RECONSUME_IN(DataState);
+        bufferCharacter(character);
+        ADVANCE_TO(RAWTEXTState);
+    END_STATE()
+
+    BEGIN_STATE(ScriptDataState)
+        if (character == '<')
+            ADVANCE_TO(ScriptDataLessThanSignState);
+        if (character == kEndOfFileMarker)
+            RECONSUME_IN(DataState);
+        bufferCharacter(character);
+        ADVANCE_TO(ScriptDataState);
+    END_STATE()
+
+    BEGIN_STATE(PLAINTEXTState)
+        if (character == kEndOfFileMarker)
+            RECONSUME_IN(DataState);
+        bufferCharacter(character);
+        ADVANCE_TO(PLAINTEXTState);
+    END_STATE()
+
+    BEGIN_STATE(TagOpenState)
+        if (character == '!')
+            ADVANCE_TO(MarkupDeclarationOpenState);
+        if (character == '/')
+            ADVANCE_TO(EndTagOpenState);
+        if (isASCIIAlpha(character)) {
+            m_token.beginStartTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(TagNameState);
         }
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(PLAINTEXTState) {
-        if (cc == kEndOfFileMarker)
-            return emitEndOfFile(source);
-        bufferCharacter(cc);
-        HTML_ADVANCE_TO(PLAINTEXTState);
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(TagOpenState) {
-        if (cc == '!')
-            HTML_ADVANCE_TO(MarkupDeclarationOpenState);
-        else if (cc == '/')
-            HTML_ADVANCE_TO(EndTagOpenState);
-        else if (isASCIIUpper(cc)) {
-            m_token->beginStartTag(toLowerCase(cc));
-            HTML_ADVANCE_TO(TagNameState);
-        } else if (isASCIILower(cc)) {
-            m_token->beginStartTag(cc);
-            HTML_ADVANCE_TO(TagNameState);
-        } else if (cc == '?') {
+        if (character == '?') {
             parseError();
             // The spec consumes the current character before switching
             // to the bogus comment state, but it's easier to implement
             // if we reconsume the current character.
-            HTML_RECONSUME_IN(BogusCommentState);
-        } else {
-            parseError();
-            bufferASCIICharacter('<');
-            HTML_RECONSUME_IN(DataState);
+            RECONSUME_IN(BogusCommentState);
         }
-    }
+        parseError();
+        bufferASCIICharacter('<');
+        RECONSUME_IN(DataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(EndTagOpenState) {
-        if (isASCIIUpper(cc)) {
-            m_token->beginEndTag(static_cast<LChar>(toLowerCase(cc)));
-            m_appropriateEndTagName.clear();
-            HTML_ADVANCE_TO(TagNameState);
-        } else if (isASCIILower(cc)) {
-            m_token->beginEndTag(static_cast<LChar>(cc));
+    BEGIN_STATE(EndTagOpenState)
+        if (isASCIIAlpha(character)) {
+            m_token.beginEndTag(convertASCIIAlphaToLower(character));
             m_appropriateEndTagName.clear();
-            HTML_ADVANCE_TO(TagNameState);
-        } else if (cc == '>') {
+            ADVANCE_TO(TagNameState);
+        }
+        if (character == '>') {
             parseError();
-            HTML_ADVANCE_TO(DataState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(DataState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
             bufferASCIICharacter('<');
             bufferASCIICharacter('/');
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            parseError();
-            HTML_RECONSUME_IN(BogusCommentState);
+            RECONSUME_IN(DataState);
         }
-    }
+        parseError();
+        RECONSUME_IN(BogusCommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(TagNameState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeAttributeNameState);
-        else if (cc == '/')
-            HTML_ADVANCE_TO(SelfClosingStartTagState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (m_options.usePreHTML5ParserQuirks && cc == '<')
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        else if (isASCIIUpper(cc)) {
-            m_token->appendToName(toLowerCase(cc));
-            HTML_ADVANCE_TO(TagNameState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            m_token->appendToName(cc);
-            HTML_ADVANCE_TO(TagNameState);
+    BEGIN_STATE(TagNameState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeAttributeNameState);
+        if (character == '/')
+            ADVANCE_TO(SelfClosingStartTagState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (m_options.usePreHTML5ParserQuirks && character == '<')
+            return emitAndReconsumeInDataState();
+        if (character == kEndOfFileMarker) {
+            parseError();
+            RECONSUME_IN(DataState);
         }
-    }
+        m_token.appendToName(toASCIILower(character));
+        ADVANCE_TO(TagNameState);
     END_STATE()
 
-    HTML_BEGIN_STATE(RCDATALessThanSignState) {
-        if (cc == '/') {
+    BEGIN_STATE(RCDATALessThanSignState)
+        if (character == '/') {
             m_temporaryBuffer.clear();
             ASSERT(m_bufferedEndTagName.isEmpty());
-            HTML_ADVANCE_TO(RCDATAEndTagOpenState);
-        } else {
-            bufferASCIICharacter('<');
-            HTML_RECONSUME_IN(RCDATAState);
+            ADVANCE_TO(RCDATAEndTagOpenState);
         }
-    }
+        bufferASCIICharacter('<');
+        RECONSUME_IN(RCDATAState);
     END_STATE()
 
-    HTML_BEGIN_STATE(RCDATAEndTagOpenState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(RCDATAEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(RCDATAEndTagNameState);
-        } else {
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            HTML_RECONSUME_IN(RCDATAState);
+    BEGIN_STATE(RCDATAEndTagOpenState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(RCDATAEndTagNameState);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        RECONSUME_IN(RCDATAState);
     END_STATE()
 
-    HTML_BEGIN_STATE(RCDATAEndTagNameState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(RCDATAEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(RCDATAEndTagNameState);
-        } else {
-            if (isTokenizerWhitespace(cc)) {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
-                }
-            } else if (cc == '/') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
-                }
-            } else if (cc == '>') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
-                }
+    BEGIN_STATE(RCDATAEndTagNameState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(RCDATAEndTagNameState);
+        }
+        if (isTokenizerWhitespace(character)) {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
+                    return true;
+                SWITCH_TO(BeforeAttributeNameState);
             }
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            m_token->appendToCharacter(m_temporaryBuffer);
-            m_bufferedEndTagName.clear();
-            m_temporaryBuffer.clear();
-            HTML_RECONSUME_IN(RCDATAState);
+        } else if (character == '/') {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
+                    return true;
+                SWITCH_TO(SelfClosingStartTagState);
+            }
+        } else if (character == '>') {
+            if (isAppropriateEndTag())
+                return commitToCompleteEndTag(source);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        m_token.appendToCharacter(m_temporaryBuffer);
+        m_bufferedEndTagName.clear();
+        m_temporaryBuffer.clear();
+        RECONSUME_IN(RCDATAState);
     END_STATE()
 
-    HTML_BEGIN_STATE(RAWTEXTLessThanSignState) {
-        if (cc == '/') {
+    BEGIN_STATE(RAWTEXTLessThanSignState)
+        if (character == '/') {
             m_temporaryBuffer.clear();
             ASSERT(m_bufferedEndTagName.isEmpty());
-            HTML_ADVANCE_TO(RAWTEXTEndTagOpenState);
-        } else {
-            bufferASCIICharacter('<');
-            HTML_RECONSUME_IN(RAWTEXTState);
+            ADVANCE_TO(RAWTEXTEndTagOpenState);
         }
-    }
+        bufferASCIICharacter('<');
+        RECONSUME_IN(RAWTEXTState);
     END_STATE()
 
-    HTML_BEGIN_STATE(RAWTEXTEndTagOpenState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
-        } else {
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            HTML_RECONSUME_IN(RAWTEXTState);
+    BEGIN_STATE(RAWTEXTEndTagOpenState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(RAWTEXTEndTagNameState);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        RECONSUME_IN(RAWTEXTState);
     END_STATE()
 
-    HTML_BEGIN_STATE(RAWTEXTEndTagNameState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
-        } else {
-            if (isTokenizerWhitespace(cc)) {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
-                }
-            } else if (cc == '/') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
-                }
-            } else if (cc == '>') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
-                }
+    BEGIN_STATE(RAWTEXTEndTagNameState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(RAWTEXTEndTagNameState);
+        }
+        if (isTokenizerWhitespace(character)) {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
+                    return true;
+                SWITCH_TO(BeforeAttributeNameState);
             }
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            m_token->appendToCharacter(m_temporaryBuffer);
-            m_bufferedEndTagName.clear();
-            m_temporaryBuffer.clear();
-            HTML_RECONSUME_IN(RAWTEXTState);
+        } else if (character == '/') {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
+                    return true;
+                SWITCH_TO(SelfClosingStartTagState);
+            }
+        } else if (character == '>') {
+            if (isAppropriateEndTag())
+                return commitToCompleteEndTag(source);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        m_token.appendToCharacter(m_temporaryBuffer);
+        m_bufferedEndTagName.clear();
+        m_temporaryBuffer.clear();
+        RECONSUME_IN(RAWTEXTState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataLessThanSignState) {
-        if (cc == '/') {
+    BEGIN_STATE(ScriptDataLessThanSignState)
+        if (character == '/') {
             m_temporaryBuffer.clear();
             ASSERT(m_bufferedEndTagName.isEmpty());
-            HTML_ADVANCE_TO(ScriptDataEndTagOpenState);
-        } else if (cc == '!') {
+            ADVANCE_TO(ScriptDataEndTagOpenState);
+        }
+        if (character == '!') {
             bufferASCIICharacter('<');
             bufferASCIICharacter('!');
-            HTML_ADVANCE_TO(ScriptDataEscapeStartState);
-        } else {
-            bufferASCIICharacter('<');
-            HTML_RECONSUME_IN(ScriptDataState);
+            ADVANCE_TO(ScriptDataEscapeStartState);
         }
-    }
+        bufferASCIICharacter('<');
+        RECONSUME_IN(ScriptDataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEndTagOpenState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(ScriptDataEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(ScriptDataEndTagNameState);
-        } else {
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            HTML_RECONSUME_IN(ScriptDataState);
+    BEGIN_STATE(ScriptDataEndTagOpenState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(ScriptDataEndTagNameState);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        RECONSUME_IN(ScriptDataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEndTagNameState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(ScriptDataEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(ScriptDataEndTagNameState);
-        } else {
-            if (isTokenizerWhitespace(cc)) {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
-                }
-            } else if (cc == '/') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
-                }
-            } else if (cc == '>') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
-                }
+    BEGIN_STATE(ScriptDataEndTagNameState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(ScriptDataEndTagNameState);
+        }
+        if (isTokenizerWhitespace(character)) {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
+                    return true;
+                SWITCH_TO(BeforeAttributeNameState);
             }
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            m_token->appendToCharacter(m_temporaryBuffer);
-            m_bufferedEndTagName.clear();
-            m_temporaryBuffer.clear();
-            HTML_RECONSUME_IN(ScriptDataState);
+        } else if (character == '/') {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
+                    return true;
+                SWITCH_TO(SelfClosingStartTagState);
+            }
+        } else if (character == '>') {
+            if (isAppropriateEndTag())
+                return commitToCompleteEndTag(source);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        m_token.appendToCharacter(m_temporaryBuffer);
+        m_bufferedEndTagName.clear();
+        m_temporaryBuffer.clear();
+        RECONSUME_IN(ScriptDataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapeStartState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataEscapeStartState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataEscapeStartDashState);
+            ADVANCE_TO(ScriptDataEscapeStartDashState);
         } else
-            HTML_RECONSUME_IN(ScriptDataState);
-    }
+            RECONSUME_IN(ScriptDataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapeStartDashState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataEscapeStartDashState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataEscapedDashDashState);
+            ADVANCE_TO(ScriptDataEscapedDashDashState);
         } else
-            HTML_RECONSUME_IN(ScriptDataState);
-    }
+            RECONSUME_IN(ScriptDataState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapedState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataEscapedState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataEscapedDashState);
-        } else if (cc == '<')
-            HTML_ADVANCE_TO(ScriptDataEscapedLessThanSignState);
-        else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(ScriptDataEscapedDashState);
+        }
+        if (character == '<')
+            ADVANCE_TO(ScriptDataEscapedLessThanSignState);
+        if (character == kEndOfFileMarker) {
             parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(ScriptDataEscapedState);
+            RECONSUME_IN(DataState);
         }
-    }
+        bufferCharacter(character);
+        ADVANCE_TO(ScriptDataEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapedDashState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataEscapedDashState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataEscapedDashDashState);
-        } else if (cc == '<')
-            HTML_ADVANCE_TO(ScriptDataEscapedLessThanSignState);
-        else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(ScriptDataEscapedDashDashState);
+        }
+        if (character == '<')
+            ADVANCE_TO(ScriptDataEscapedLessThanSignState);
+        if (character == kEndOfFileMarker) {
             parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(ScriptDataEscapedState);
+            RECONSUME_IN(DataState);
         }
-    }
+        bufferCharacter(character);
+        ADVANCE_TO(ScriptDataEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapedDashDashState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataEscapedDashDashState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataEscapedDashDashState);
-        } else if (cc == '<')
-            HTML_ADVANCE_TO(ScriptDataEscapedLessThanSignState);
-        else if (cc == '>') {
+            ADVANCE_TO(ScriptDataEscapedDashDashState);
+        }
+        if (character == '<')
+            ADVANCE_TO(ScriptDataEscapedLessThanSignState);
+        if (character == '>') {
             bufferASCIICharacter('>');
-            HTML_ADVANCE_TO(ScriptDataState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(ScriptDataState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(ScriptDataEscapedState);
+            RECONSUME_IN(DataState);
         }
-    }
+        bufferCharacter(character);
+        ADVANCE_TO(ScriptDataEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapedLessThanSignState) {
-        if (cc == '/') {
+    BEGIN_STATE(ScriptDataEscapedLessThanSignState)
+        if (character == '/') {
             m_temporaryBuffer.clear();
             ASSERT(m_bufferedEndTagName.isEmpty());
-            HTML_ADVANCE_TO(ScriptDataEscapedEndTagOpenState);
-        } else if (isASCIIUpper(cc)) {
-            bufferASCIICharacter('<');
-            bufferASCIICharacter(cc);
-            m_temporaryBuffer.clear();
-            m_temporaryBuffer.append(toLowerCase(cc));
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
-        } else if (isASCIILower(cc)) {
+            ADVANCE_TO(ScriptDataEscapedEndTagOpenState);
+        }
+        if (isASCIIAlpha(character)) {
             bufferASCIICharacter('<');
-            bufferASCIICharacter(cc);
+            bufferASCIICharacter(character);
             m_temporaryBuffer.clear();
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
-        } else {
-            bufferASCIICharacter('<');
-            HTML_RECONSUME_IN(ScriptDataEscapedState);
+            appendToTemporaryBuffer(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(ScriptDataDoubleEscapeStartState);
         }
-    }
+        bufferASCIICharacter('<');
+        RECONSUME_IN(ScriptDataEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapedEndTagOpenState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
-        } else {
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            HTML_RECONSUME_IN(ScriptDataEscapedState);
+    BEGIN_STATE(ScriptDataEscapedEndTagOpenState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(ScriptDataEscapedEndTagNameState);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        RECONSUME_IN(ScriptDataEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataEscapedEndTagNameState) {
-        if (isASCIIUpper(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
-            HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
-        } else if (isASCIILower(cc)) {
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            addToPossibleEndTag(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
-        } else {
-            if (isTokenizerWhitespace(cc)) {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
-                }
-            } else if (cc == '/') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
-                }
-            } else if (cc == '>') {
-                if (isAppropriateEndTag()) {
-                    m_temporaryBuffer.append(static_cast<LChar>(cc));
-                    return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
-                }
+    BEGIN_STATE(ScriptDataEscapedEndTagNameState)
+        if (isASCIIAlpha(character)) {
+            appendToTemporaryBuffer(character);
+            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(ScriptDataEscapedEndTagNameState);
+        }
+        if (isTokenizerWhitespace(character)) {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
+                    return true;
+                SWITCH_TO(BeforeAttributeNameState);
             }
-            bufferASCIICharacter('<');
-            bufferASCIICharacter('/');
-            m_token->appendToCharacter(m_temporaryBuffer);
-            m_bufferedEndTagName.clear();
-            m_temporaryBuffer.clear();
-            HTML_RECONSUME_IN(ScriptDataEscapedState);
+        } else if (character == '/') {
+            if (isAppropriateEndTag()) {
+                if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
+                    return true;
+                SWITCH_TO(SelfClosingStartTagState);
+            }
+        } else if (character == '>') {
+            if (isAppropriateEndTag())
+                return commitToCompleteEndTag(source);
         }
-    }
+        bufferASCIICharacter('<');
+        bufferASCIICharacter('/');
+        m_token.appendToCharacter(m_temporaryBuffer);
+        m_bufferedEndTagName.clear();
+        m_temporaryBuffer.clear();
+        RECONSUME_IN(ScriptDataEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataDoubleEscapeStartState) {
-        if (isTokenizerWhitespace(cc) || cc == '/' || cc == '>') {
-            bufferASCIICharacter(cc);
-            if (temporaryBufferIs(scriptTag.localName()))
-                HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
+    BEGIN_STATE(ScriptDataDoubleEscapeStartState)
+        if (isTokenizerWhitespace(character) || character == '/' || character == '>') {
+            bufferASCIICharacter(character);
+            if (temporaryBufferIs("script"))
+                ADVANCE_TO(ScriptDataDoubleEscapedState);
             else
-                HTML_ADVANCE_TO(ScriptDataEscapedState);
-        } else if (isASCIIUpper(cc)) {
-            bufferASCIICharacter(cc);
-            m_temporaryBuffer.append(toLowerCase(cc));
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
-        } else if (isASCIILower(cc)) {
-            bufferASCIICharacter(cc);
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
-        } else
-            HTML_RECONSUME_IN(ScriptDataEscapedState);
-    }
+                ADVANCE_TO(ScriptDataEscapedState);
+        }
+        if (isASCIIAlpha(character)) {
+            bufferASCIICharacter(character);
+            appendToTemporaryBuffer(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(ScriptDataDoubleEscapeStartState);
+        }
+        RECONSUME_IN(ScriptDataEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataDoubleEscapedState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataDoubleEscapedState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedDashState);
-        } else if (cc == '<') {
+            ADVANCE_TO(ScriptDataDoubleEscapedDashState);
+        }
+        if (character == '<') {
             bufferASCIICharacter('<');
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
+            RECONSUME_IN(DataState);
         }
-    }
+        bufferCharacter(character);
+        ADVANCE_TO(ScriptDataDoubleEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataDoubleEscapedDashState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataDoubleEscapedDashState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
-        } else if (cc == '<') {
+            ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
+        }
+        if (character == '<') {
             bufferASCIICharacter('<');
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
+            RECONSUME_IN(DataState);
         }
-    }
+        bufferCharacter(character);
+        ADVANCE_TO(ScriptDataDoubleEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataDoubleEscapedDashDashState) {
-        if (cc == '-') {
+    BEGIN_STATE(ScriptDataDoubleEscapedDashDashState)
+        if (character == '-') {
             bufferASCIICharacter('-');
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
-        } else if (cc == '<') {
+            ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
+        }
+        if (character == '<') {
             bufferASCIICharacter('<');
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
-        } else if (cc == '>') {
+            ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
+        }
+        if (character == '>') {
             bufferASCIICharacter('>');
-            HTML_ADVANCE_TO(ScriptDataState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(ScriptDataState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
+            RECONSUME_IN(DataState);
         }
-    }
+        bufferCharacter(character);
+        ADVANCE_TO(ScriptDataDoubleEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataDoubleEscapedLessThanSignState) {
-        if (cc == '/') {
+    BEGIN_STATE(ScriptDataDoubleEscapedLessThanSignState)
+        if (character == '/') {
             bufferASCIICharacter('/');
             m_temporaryBuffer.clear();
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapeEndState);
-        } else
-            HTML_RECONSUME_IN(ScriptDataDoubleEscapedState);
-    }
+            ADVANCE_TO(ScriptDataDoubleEscapeEndState);
+        }
+        RECONSUME_IN(ScriptDataDoubleEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ScriptDataDoubleEscapeEndState) {
-        if (isTokenizerWhitespace(cc) || cc == '/' || cc == '>') {
-            bufferASCIICharacter(cc);
-            if (temporaryBufferIs(scriptTag.localName()))
-                HTML_ADVANCE_TO(ScriptDataEscapedState);
+    BEGIN_STATE(ScriptDataDoubleEscapeEndState)
+        if (isTokenizerWhitespace(character) || character == '/' || character == '>') {
+            bufferASCIICharacter(character);
+            if (temporaryBufferIs("script"))
+                ADVANCE_TO(ScriptDataEscapedState);
             else
-                HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
-        } else if (isASCIIUpper(cc)) {
-            bufferASCIICharacter(cc);
-            m_temporaryBuffer.append(toLowerCase(cc));
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapeEndState);
-        } else if (isASCIILower(cc)) {
-            bufferASCIICharacter(cc);
-            m_temporaryBuffer.append(static_cast<LChar>(cc));
-            HTML_ADVANCE_TO(ScriptDataDoubleEscapeEndState);
-        } else
-            HTML_RECONSUME_IN(ScriptDataDoubleEscapedState);
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(BeforeAttributeNameState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeAttributeNameState);
-        else if (cc == '/')
-            HTML_ADVANCE_TO(SelfClosingStartTagState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (m_options.usePreHTML5ParserQuirks && cc == '<')
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        else if (isASCIIUpper(cc)) {
-            m_token->addNewAttribute();
-            m_token->beginAttributeName(source.numberOfCharactersConsumed());
-            m_token->appendToAttributeName(toLowerCase(cc));
-            HTML_ADVANCE_TO(AttributeNameState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            if (cc == '"' || cc == '\'' || cc == '<' || cc == '=')
-                parseError();
-            m_token->addNewAttribute();
-            m_token->beginAttributeName(source.numberOfCharactersConsumed());
-            m_token->appendToAttributeName(cc);
-            HTML_ADVANCE_TO(AttributeNameState);
+                ADVANCE_TO(ScriptDataDoubleEscapedState);
         }
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(AttributeNameState) {
-        if (isTokenizerWhitespace(cc)) {
-            m_token->endAttributeName(source.numberOfCharactersConsumed());
-            HTML_ADVANCE_TO(AfterAttributeNameState);
-        } else if (cc == '/') {
-            m_token->endAttributeName(source.numberOfCharactersConsumed());
-            HTML_ADVANCE_TO(SelfClosingStartTagState);
-        } else if (cc == '=') {
-            m_token->endAttributeName(source.numberOfCharactersConsumed());
-            HTML_ADVANCE_TO(BeforeAttributeValueState);
-        } else if (cc == '>') {
-            m_token->endAttributeName(source.numberOfCharactersConsumed());
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (m_options.usePreHTML5ParserQuirks && cc == '<') {
-            m_token->endAttributeName(source.numberOfCharactersConsumed());
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else if (isASCIIUpper(cc)) {
-            m_token->appendToAttributeName(toLowerCase(cc));
-            HTML_ADVANCE_TO(AttributeNameState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            m_token->endAttributeName(source.numberOfCharactersConsumed());
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            if (cc == '"' || cc == '\'' || cc == '<' || cc == '=')
-                parseError();
-            m_token->appendToAttributeName(cc);
-            HTML_ADVANCE_TO(AttributeNameState);
+        if (isASCIIAlpha(character)) {
+            bufferASCIICharacter(character);
+            appendToTemporaryBuffer(convertASCIIAlphaToLower(character));
+            ADVANCE_TO(ScriptDataDoubleEscapeEndState);
         }
-    }
+        RECONSUME_IN(ScriptDataDoubleEscapedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AfterAttributeNameState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(AfterAttributeNameState);
-        else if (cc == '/')
-            HTML_ADVANCE_TO(SelfClosingStartTagState);
-        else if (cc == '=')
-            HTML_ADVANCE_TO(BeforeAttributeValueState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (m_options.usePreHTML5ParserQuirks && cc == '<')
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        else if (isASCIIUpper(cc)) {
-            m_token->addNewAttribute();
-            m_token->beginAttributeName(source.numberOfCharactersConsumed());
-            m_token->appendToAttributeName(toLowerCase(cc));
-            HTML_ADVANCE_TO(AttributeNameState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            if (cc == '"' || cc == '\'' || cc == '<')
-                parseError();
-            m_token->addNewAttribute();
-            m_token->beginAttributeName(source.numberOfCharactersConsumed());
-            m_token->appendToAttributeName(cc);
-            HTML_ADVANCE_TO(AttributeNameState);
+    BEGIN_STATE(BeforeAttributeNameState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeAttributeNameState);
+        if (character == '/')
+            ADVANCE_TO(SelfClosingStartTagState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (m_options.usePreHTML5ParserQuirks && character == '<')
+            return emitAndReconsumeInDataState();
+        if (character == kEndOfFileMarker) {
+            parseError();
+            RECONSUME_IN(DataState);
         }
-    }
+        if (character == '"' || character == '\'' || character == '<' || character == '=')
+            parseError();
+        m_token.beginAttribute(source.numberOfCharactersConsumed());
+        m_token.appendToAttributeName(toASCIILower(character));
+        ADVANCE_TO(AttributeNameState);
+    END_STATE()
+
+    BEGIN_STATE(AttributeNameState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(AfterAttributeNameState);
+        if (character == '/')
+            ADVANCE_TO(SelfClosingStartTagState);
+        if (character == '=')
+            ADVANCE_TO(BeforeAttributeValueState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (m_options.usePreHTML5ParserQuirks && character == '<')
+            return emitAndReconsumeInDataState();
+        if (character == kEndOfFileMarker) {
+            parseError();
+            RECONSUME_IN(DataState);
+        }
+        if (character == '"' || character == '\'' || character == '<' || character == '=')
+            parseError();
+        m_token.appendToAttributeName(toASCIILower(character));
+        ADVANCE_TO(AttributeNameState);
+    END_STATE()
+
+    BEGIN_STATE(AfterAttributeNameState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(AfterAttributeNameState);
+        if (character == '/')
+            ADVANCE_TO(SelfClosingStartTagState);
+        if (character == '=')
+            ADVANCE_TO(BeforeAttributeValueState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (m_options.usePreHTML5ParserQuirks && character == '<')
+            return emitAndReconsumeInDataState();
+        if (character == kEndOfFileMarker) {
+            parseError();
+            RECONSUME_IN(DataState);
+        }
+        if (character == '"' || character == '\'' || character == '<')
+            parseError();
+        m_token.beginAttribute(source.numberOfCharactersConsumed());
+        m_token.appendToAttributeName(toASCIILower(character));
+        ADVANCE_TO(AttributeNameState);
     END_STATE()
 
-    HTML_BEGIN_STATE(BeforeAttributeValueState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeAttributeValueState);
-        else if (cc == '"') {
-            m_token->beginAttributeValue(source.numberOfCharactersConsumed() + 1);
-            HTML_ADVANCE_TO(AttributeValueDoubleQuotedState);
-        } else if (cc == '&') {
-            m_token->beginAttributeValue(source.numberOfCharactersConsumed());
-            HTML_RECONSUME_IN(AttributeValueUnquotedState);
-        } else if (cc == '\'') {
-            m_token->beginAttributeValue(source.numberOfCharactersConsumed() + 1);
-            HTML_ADVANCE_TO(AttributeValueSingleQuotedState);
-        } else if (cc == '>') {
-            parseError();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            if (cc == '<' || cc == '=' || cc == '`')
-                parseError();
-            m_token->beginAttributeValue(source.numberOfCharactersConsumed());
-            m_token->appendToAttributeValue(cc);
-            HTML_ADVANCE_TO(AttributeValueUnquotedState);
+    BEGIN_STATE(BeforeAttributeValueState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeAttributeValueState);
+        if (character == '"')
+            ADVANCE_TO(AttributeValueDoubleQuotedState);
+        if (character == '&')
+            RECONSUME_IN(AttributeValueUnquotedState);
+        if (character == '\'')
+            ADVANCE_TO(AttributeValueSingleQuotedState);
+        if (character == '>') {
+            parseError();
+            return emitAndResumeInDataState(source);
         }
-    }
+        if (character == kEndOfFileMarker) {
+            parseError();
+            RECONSUME_IN(DataState);
+        }
+        if (character == '<' || character == '=' || character == '`')
+            parseError();
+        m_token.appendToAttributeValue(character);
+        ADVANCE_TO(AttributeValueUnquotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AttributeValueDoubleQuotedState) {
-        if (cc == '"') {
-            m_token->endAttributeValue(source.numberOfCharactersConsumed());
-            HTML_ADVANCE_TO(AfterAttributeValueQuotedState);
-        } else if (cc == '&') {
+    BEGIN_STATE(AttributeValueDoubleQuotedState)
+        if (character == '"') {
+            m_token.endAttribute(source.numberOfCharactersConsumed());
+            ADVANCE_TO(AfterAttributeValueQuotedState);
+        }
+        if (character == '&') {
             m_additionalAllowedCharacter = '"';
-            HTML_ADVANCE_TO(CharacterReferenceInAttributeValueState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(CharacterReferenceInAttributeValueState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->endAttributeValue(source.numberOfCharactersConsumed());
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            m_token->appendToAttributeValue(cc);
-            HTML_ADVANCE_TO(AttributeValueDoubleQuotedState);
+            m_token.endAttribute(source.numberOfCharactersConsumed());
+            RECONSUME_IN(DataState);
         }
-    }
+        m_token.appendToAttributeValue(character);
+        ADVANCE_TO(AttributeValueDoubleQuotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AttributeValueSingleQuotedState) {
-        if (cc == '\'') {
-            m_token->endAttributeValue(source.numberOfCharactersConsumed());
-            HTML_ADVANCE_TO(AfterAttributeValueQuotedState);
-        } else if (cc == '&') {
+    BEGIN_STATE(AttributeValueSingleQuotedState)
+        if (character == '\'') {
+            m_token.endAttribute(source.numberOfCharactersConsumed());
+            ADVANCE_TO(AfterAttributeValueQuotedState);
+        }
+        if (character == '&') {
             m_additionalAllowedCharacter = '\'';
-            HTML_ADVANCE_TO(CharacterReferenceInAttributeValueState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(CharacterReferenceInAttributeValueState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->endAttributeValue(source.numberOfCharactersConsumed());
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            m_token->appendToAttributeValue(cc);
-            HTML_ADVANCE_TO(AttributeValueSingleQuotedState);
+            m_token.endAttribute(source.numberOfCharactersConsumed());
+            RECONSUME_IN(DataState);
         }
-    }
+        m_token.appendToAttributeValue(character);
+        ADVANCE_TO(AttributeValueSingleQuotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AttributeValueUnquotedState) {
-        if (isTokenizerWhitespace(cc)) {
-            m_token->endAttributeValue(source.numberOfCharactersConsumed());
-            HTML_ADVANCE_TO(BeforeAttributeNameState);
-        } else if (cc == '&') {
+    BEGIN_STATE(AttributeValueUnquotedState)
+        if (isTokenizerWhitespace(character)) {
+            m_token.endAttribute(source.numberOfCharactersConsumed());
+            ADVANCE_TO(BeforeAttributeNameState);
+        }
+        if (character == '&') {
             m_additionalAllowedCharacter = '>';
-            HTML_ADVANCE_TO(CharacterReferenceInAttributeValueState);
-        } else if (cc == '>') {
-            m_token->endAttributeValue(source.numberOfCharactersConsumed());
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            m_token->endAttributeValue(source.numberOfCharactersConsumed());
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            if (cc == '"' || cc == '\'' || cc == '<' || cc == '=' || cc == '`')
-                parseError();
-            m_token->appendToAttributeValue(cc);
-            HTML_ADVANCE_TO(AttributeValueUnquotedState);
+            ADVANCE_TO(CharacterReferenceInAttributeValueState);
         }
-    }
+        if (character == '>') {
+            m_token.endAttribute(source.numberOfCharactersConsumed());
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
+            parseError();
+            m_token.endAttribute(source.numberOfCharactersConsumed());
+            RECONSUME_IN(DataState);
+        }
+        if (character == '"' || character == '\'' || character == '<' || character == '=' || character == '`')
+            parseError();
+        m_token.appendToAttributeValue(character);
+        ADVANCE_TO(AttributeValueUnquotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CharacterReferenceInAttributeValueState) {
+    BEGIN_STATE(CharacterReferenceInAttributeValueState)
         bool notEnoughCharacters = false;
         StringBuilder decodedEntity;
         bool success = consumeHTMLEntity(source, decodedEntity, notEnoughCharacters, m_additionalAllowedCharacter);
         if (notEnoughCharacters)
-            return haveBufferedCharacterToken();
+            RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
         if (!success) {
             ASSERT(decodedEntity.isEmpty());
-            m_token->appendToAttributeValue('&');
+            m_token.appendToAttributeValue('&');
         } else {
             for (unsigned i = 0; i < decodedEntity.length(); ++i)
-                m_token->appendToAttributeValue(decodedEntity[i]);
+                m_token.appendToAttributeValue(decodedEntity[i]);
         }
         // We're supposed to switch back to the attribute value state that
         // we were in when we were switched into this state. Rather than
         // keeping track of this explictly, we observe that the previous
         // state can be determined by m_additionalAllowedCharacter.
         if (m_additionalAllowedCharacter == '"')
-            HTML_SWITCH_TO(AttributeValueDoubleQuotedState);
-        else if (m_additionalAllowedCharacter == '\'')
-            HTML_SWITCH_TO(AttributeValueSingleQuotedState);
-        else if (m_additionalAllowedCharacter == '>')
-            HTML_SWITCH_TO(AttributeValueUnquotedState);
-        else
-            ASSERT_NOT_REACHED();
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(AfterAttributeValueQuotedState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeAttributeNameState);
-        else if (cc == '/')
-            HTML_ADVANCE_TO(SelfClosingStartTagState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (m_options.usePreHTML5ParserQuirks && cc == '<')
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == kEndOfFileMarker) {
-            parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
-            parseError();
-            HTML_RECONSUME_IN(BeforeAttributeNameState);
+            SWITCH_TO(AttributeValueDoubleQuotedState);
+        if (m_additionalAllowedCharacter == '\'')
+            SWITCH_TO(AttributeValueSingleQuotedState);
+        ASSERT(m_additionalAllowedCharacter == '>');
+        SWITCH_TO(AttributeValueUnquotedState);
+    END_STATE()
+
+    BEGIN_STATE(AfterAttributeValueQuotedState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeAttributeNameState);
+        if (character == '/')
+            ADVANCE_TO(SelfClosingStartTagState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (m_options.usePreHTML5ParserQuirks && character == '<')
+            return emitAndReconsumeInDataState();
+        if (character == kEndOfFileMarker) {
+            parseError();
+            RECONSUME_IN(DataState);
         }
-    }
+        parseError();
+        RECONSUME_IN(BeforeAttributeNameState);
     END_STATE()
 
-    HTML_BEGIN_STATE(SelfClosingStartTagState) {
-        if (cc == '>') {
-            m_token->setSelfClosing();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            HTML_RECONSUME_IN(DataState);
-        } else {
+    BEGIN_STATE(SelfClosingStartTagState)
+        if (character == '>') {
+            m_token.setSelfClosing();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            HTML_RECONSUME_IN(BeforeAttributeNameState);
+            RECONSUME_IN(DataState);
         }
-    }
+        parseError();
+        RECONSUME_IN(BeforeAttributeNameState);
     END_STATE()
 
-    HTML_BEGIN_STATE(BogusCommentState) {
-        m_token->beginComment();
-        HTML_RECONSUME_IN(ContinueBogusCommentState);
-    }
+    BEGIN_STATE(BogusCommentState)
+        m_token.beginComment();
+        RECONSUME_IN(ContinueBogusCommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(ContinueBogusCommentState) {
-        if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == kEndOfFileMarker)
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        else {
-            m_token->appendToComment(cc);
-            HTML_ADVANCE_TO(ContinueBogusCommentState);
-        }
-    }
+    BEGIN_STATE(ContinueBogusCommentState)
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == kEndOfFileMarker)
+            return emitAndReconsumeInDataState();
+        m_token.appendToComment(character);
+        ADVANCE_TO(ContinueBogusCommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(MarkupDeclarationOpenState) {
-        DEPRECATED_DEFINE_STATIC_LOCAL(String, dashDashString, (ASCIILiteral("--")));
-        DEPRECATED_DEFINE_STATIC_LOCAL(String, doctypeString, (ASCIILiteral("doctype")));
-        DEPRECATED_DEFINE_STATIC_LOCAL(String, cdataString, (ASCIILiteral("[CDATA[")));
-        if (cc == '-') {
-            SegmentedString::LookAheadResult result = source.lookAhead(dashDashString);
-            if (result == SegmentedString::DidMatch) {
-                source.advanceAndASSERT('-');
-                source.advanceAndASSERT('-');
-                m_token->beginComment();
-                HTML_SWITCH_TO(CommentStartState);
-            } else if (result == SegmentedString::NotEnoughCharacters)
-                return haveBufferedCharacterToken();
-        } else if (cc == 'D' || cc == 'd') {
-            SegmentedString::LookAheadResult result = source.lookAheadIgnoringCase(doctypeString);
+    BEGIN_STATE(MarkupDeclarationOpenState)
+        if (character == '-') {
+            auto result = source.advancePast("--");
             if (result == SegmentedString::DidMatch) {
-                advanceStringAndASSERTIgnoringCase(source, "doctype");
-                HTML_SWITCH_TO(DOCTYPEState);
-            } else if (result == SegmentedString::NotEnoughCharacters)
-                return haveBufferedCharacterToken();
-        } else if (cc == '[' && shouldAllowCDATA()) {
-            SegmentedString::LookAheadResult result = source.lookAhead(cdataString);
-            if (result == SegmentedString::DidMatch) {
-                advanceStringAndASSERT(source, "[CDATA[");
-                HTML_SWITCH_TO(CDATASectionState);
-            } else if (result == SegmentedString::NotEnoughCharacters)
-                return haveBufferedCharacterToken();
+                m_token.beginComment();
+                SWITCH_TO(CommentStartState);
+            }
+            if (result == SegmentedString::NotEnoughCharacters)
+                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
+        } else if (isASCIIAlphaCaselessEqual(character, 'd')) {
+            auto result = source.advancePastIgnoringCase("doctype");
+            if (result == SegmentedString::DidMatch)
+                SWITCH_TO(DOCTYPEState);
+            if (result == SegmentedString::NotEnoughCharacters)
+                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
+        } else if (character == '[' && shouldAllowCDATA()) {
+            auto result = source.advancePast("[CDATA[");
+            if (result == SegmentedString::DidMatch)
+                SWITCH_TO(CDATASectionState);
+            if (result == SegmentedString::NotEnoughCharacters)
+                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
         }
         parseError();
-        HTML_RECONSUME_IN(BogusCommentState);
-    }
+        RECONSUME_IN(BogusCommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CommentStartState) {
-        if (cc == '-')
-            HTML_ADVANCE_TO(CommentStartDashState);
-        else if (cc == '>') {
+    BEGIN_STATE(CommentStartState)
+        if (character == '-')
+            ADVANCE_TO(CommentStartDashState);
+        if (character == '>') {
             parseError();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToComment(cc);
-            HTML_ADVANCE_TO(CommentState);
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToComment(character);
+        ADVANCE_TO(CommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CommentStartDashState) {
-        if (cc == '-')
-            HTML_ADVANCE_TO(CommentEndState);
-        else if (cc == '>') {
+    BEGIN_STATE(CommentStartDashState)
+        if (character == '-')
+            ADVANCE_TO(CommentEndState);
+        if (character == '>') {
             parseError();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToComment('-');
-            m_token->appendToComment(cc);
-            HTML_ADVANCE_TO(CommentState);
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToComment('-');
+        m_token.appendToComment(character);
+        ADVANCE_TO(CommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CommentState) {
-        if (cc == '-')
-            HTML_ADVANCE_TO(CommentEndDashState);
-        else if (cc == kEndOfFileMarker) {
+    BEGIN_STATE(CommentState)
+        if (character == '-')
+            ADVANCE_TO(CommentEndDashState);
+        if (character == kEndOfFileMarker) {
             parseError();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToComment(cc);
-            HTML_ADVANCE_TO(CommentState);
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToComment(character);
+        ADVANCE_TO(CommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CommentEndDashState) {
-        if (cc == '-')
-            HTML_ADVANCE_TO(CommentEndState);
-        else if (cc == kEndOfFileMarker) {
+    BEGIN_STATE(CommentEndDashState)
+        if (character == '-')
+            ADVANCE_TO(CommentEndState);
+        if (character == kEndOfFileMarker) {
             parseError();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToComment('-');
-            m_token->appendToComment(cc);
-            HTML_ADVANCE_TO(CommentState);
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToComment('-');
+        m_token.appendToComment(character);
+        ADVANCE_TO(CommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CommentEndState) {
-        if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == '!') {
-            parseError();
-            HTML_ADVANCE_TO(CommentEndBangState);
-        } else if (cc == '-') {
+    BEGIN_STATE(CommentEndState)
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == '!') {
             parseError();
-            m_token->appendToComment('-');
-            HTML_ADVANCE_TO(CommentEndState);
-        } else if (cc == kEndOfFileMarker) {
+            ADVANCE_TO(CommentEndBangState);
+        }
+        if (character == '-') {
             parseError();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
+            m_token.appendToComment('-');
+            ADVANCE_TO(CommentEndState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->appendToComment('-');
-            m_token->appendToComment('-');
-            m_token->appendToComment(cc);
-            HTML_ADVANCE_TO(CommentState);
+            return emitAndReconsumeInDataState();
         }
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(CommentEndBangState) {
-        if (cc == '-') {
-            m_token->appendToComment('-');
-            m_token->appendToComment('-');
-            m_token->appendToComment('!');
-            HTML_ADVANCE_TO(CommentEndDashState);
-        } else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == kEndOfFileMarker) {
+        parseError();
+        m_token.appendToComment('-');
+        m_token.appendToComment('-');
+        m_token.appendToComment(character);
+        ADVANCE_TO(CommentState);
+    END_STATE()
+
+    BEGIN_STATE(CommentEndBangState)
+        if (character == '-') {
+            m_token.appendToComment('-');
+            m_token.appendToComment('-');
+            m_token.appendToComment('!');
+            ADVANCE_TO(CommentEndDashState);
+        }
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == kEndOfFileMarker) {
             parseError();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToComment('-');
-            m_token->appendToComment('-');
-            m_token->appendToComment('!');
-            m_token->appendToComment(cc);
-            HTML_ADVANCE_TO(CommentState);
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToComment('-');
+        m_token.appendToComment('-');
+        m_token.appendToComment('!');
+        m_token.appendToComment(character);
+        ADVANCE_TO(CommentState);
     END_STATE()
 
-    HTML_BEGIN_STATE(DOCTYPEState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeDOCTYPENameState);
-        else if (cc == kEndOfFileMarker) {
+    BEGIN_STATE(DOCTYPEState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeDOCTYPENameState);
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->beginDOCTYPE();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            parseError();
-            HTML_RECONSUME_IN(BeforeDOCTYPENameState);
+            m_token.beginDOCTYPE();
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        parseError();
+        RECONSUME_IN(BeforeDOCTYPENameState);
     END_STATE()
 
-    HTML_BEGIN_STATE(BeforeDOCTYPENameState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeDOCTYPENameState);
-        else if (isASCIIUpper(cc)) {
-            m_token->beginDOCTYPE(toLowerCase(cc));
-            HTML_ADVANCE_TO(DOCTYPENameState);
-        } else if (cc == '>') {
+    BEGIN_STATE(BeforeDOCTYPENameState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeDOCTYPENameState);
+        if (character == '>') {
             parseError();
-            m_token->beginDOCTYPE();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.beginDOCTYPE();
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->beginDOCTYPE();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->beginDOCTYPE(cc);
-            HTML_ADVANCE_TO(DOCTYPENameState);
+            m_token.beginDOCTYPE();
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.beginDOCTYPE(toASCIILower(character));
+        ADVANCE_TO(DOCTYPENameState);
     END_STATE()
 
-    HTML_BEGIN_STATE(DOCTYPENameState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(AfterDOCTYPENameState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (isASCIIUpper(cc)) {
-            m_token->appendToName(toLowerCase(cc));
-            HTML_ADVANCE_TO(DOCTYPENameState);
-        } else if (cc == kEndOfFileMarker) {
+    BEGIN_STATE(DOCTYPENameState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(AfterDOCTYPENameState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToName(cc);
-            HTML_ADVANCE_TO(DOCTYPENameState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToName(toASCIILower(character));
+        ADVANCE_TO(DOCTYPENameState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AfterDOCTYPENameState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(AfterDOCTYPENameState);
-        if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == kEndOfFileMarker) {
-            parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            DEPRECATED_DEFINE_STATIC_LOCAL(String, publicString, (ASCIILiteral("public")));
-            DEPRECATED_DEFINE_STATIC_LOCAL(String, systemString, (ASCIILiteral("system")));
-            if (cc == 'P' || cc == 'p') {
-                SegmentedString::LookAheadResult result = source.lookAheadIgnoringCase(publicString);
-                if (result == SegmentedString::DidMatch) {
-                    advanceStringAndASSERTIgnoringCase(source, "public");
-                    HTML_SWITCH_TO(AfterDOCTYPEPublicKeywordState);
-                } else if (result == SegmentedString::NotEnoughCharacters)
-                    return haveBufferedCharacterToken();
-            } else if (cc == 'S' || cc == 's') {
-                SegmentedString::LookAheadResult result = source.lookAheadIgnoringCase(systemString);
-                if (result == SegmentedString::DidMatch) {
-                    advanceStringAndASSERTIgnoringCase(source, "system");
-                    HTML_SWITCH_TO(AfterDOCTYPESystemKeywordState);
-                } else if (result == SegmentedString::NotEnoughCharacters)
-                    return haveBufferedCharacterToken();
-            }
+    BEGIN_STATE(AfterDOCTYPENameState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(AfterDOCTYPENameState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        if (isASCIIAlphaCaselessEqual(character, 'p')) {
+            auto result = source.advancePastIgnoringCase("public");
+            if (result == SegmentedString::DidMatch)
+                SWITCH_TO(AfterDOCTYPEPublicKeywordState);
+            if (result == SegmentedString::NotEnoughCharacters)
+                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
+        } else if (isASCIIAlphaCaselessEqual(character, 's')) {
+            auto result = source.advancePastIgnoringCase("system");
+            if (result == SegmentedString::DidMatch)
+                SWITCH_TO(AfterDOCTYPESystemKeywordState);
+            if (result == SegmentedString::NotEnoughCharacters)
+                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
+        }
+        parseError();
+        m_token.setForceQuirks();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AfterDOCTYPEPublicKeywordState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
-        else if (cc == '"') {
-            parseError();
-            m_token->setPublicIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
-        } else if (cc == '\'') {
+    BEGIN_STATE(AfterDOCTYPEPublicKeywordState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
+        if (character == '"') {
             parseError();
-            m_token->setPublicIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
-        } else if (cc == '>') {
+            m_token.setPublicIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
+        }
+        if (character == '\'') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.setPublicIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
+        }
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        parseError();
+        m_token.setForceQuirks();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(BeforeDOCTYPEPublicIdentifierState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
-        else if (cc == '"') {
-            m_token->setPublicIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
-        } else if (cc == '\'') {
-            m_token->setPublicIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
-        } else if (cc == '>') {
-            parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+    BEGIN_STATE(BeforeDOCTYPEPublicIdentifierState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
+        if (character == '"') {
+            m_token.setPublicIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
+        }
+        if (character == '\'') {
+            m_token.setPublicIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
+        }
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        parseError();
+        m_token.setForceQuirks();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(DOCTYPEPublicIdentifierDoubleQuotedState) {
-        if (cc == '"')
-            HTML_ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
-        else if (cc == '>') {
+    BEGIN_STATE(DOCTYPEPublicIdentifierDoubleQuotedState)
+        if (character == '"')
+            ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToPublicIdentifier(cc);
-            HTML_ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToPublicIdentifier(character);
+        ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(DOCTYPEPublicIdentifierSingleQuotedState) {
-        if (cc == '\'')
-            HTML_ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
-        else if (cc == '>') {
+    BEGIN_STATE(DOCTYPEPublicIdentifierSingleQuotedState)
+        if (character == '\'')
+            ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToPublicIdentifier(cc);
-            HTML_ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToPublicIdentifier(character);
+        ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AfterDOCTYPEPublicIdentifierState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == '"') {
+    BEGIN_STATE(AfterDOCTYPEPublicIdentifierState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == '"') {
             parseError();
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
-        } else if (cc == '\'') {
-            parseError();
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+        }
+        if (character == '\'') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
-    END_STATE()
-
-    HTML_BEGIN_STATE(BetweenDOCTYPEPublicAndSystemIdentifiersState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == '"') {
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
-        } else if (cc == '\'') {
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
-        } else if (cc == kEndOfFileMarker) {
-            parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
+        parseError();
+        m_token.setForceQuirks();
+        ADVANCE_TO(BogusDOCTYPEState);
+    END_STATE()
+
+    BEGIN_STATE(BetweenDOCTYPEPublicAndSystemIdentifiersState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == '"') {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+        }
+        if (character == '\'') {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        parseError();
+        m_token.setForceQuirks();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AfterDOCTYPESystemKeywordState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
-        else if (cc == '"') {
-            parseError();
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
-        } else if (cc == '\'') {
+    BEGIN_STATE(AfterDOCTYPESystemKeywordState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
+        if (character == '"') {
             parseError();
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
-        } else if (cc == '>') {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+        }
+        if (character == '\'') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+        }
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        parseError();
+        m_token.setForceQuirks();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(BeforeDOCTYPESystemIdentifierState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
-        if (cc == '"') {
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
-        } else if (cc == '\'') {
-            m_token->setSystemIdentifierToEmptyString();
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
-        } else if (cc == '>') {
-            parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+    BEGIN_STATE(BeforeDOCTYPESystemIdentifierState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
+        if (character == '"') {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+        }
+        if (character == '\'') {
+            m_token.setSystemIdentifierToEmptyString();
+            ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+        }
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        parseError();
+        m_token.setForceQuirks();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(DOCTYPESystemIdentifierDoubleQuotedState) {
-        if (cc == '"')
-            HTML_ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
-        else if (cc == '>') {
+    BEGIN_STATE(DOCTYPESystemIdentifierDoubleQuotedState)
+        if (character == '"')
+            ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToSystemIdentifier(cc);
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToSystemIdentifier(character);
+        ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(DOCTYPESystemIdentifierSingleQuotedState) {
-        if (cc == '\'')
-            HTML_ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
-        else if (cc == '>') {
+    BEGIN_STATE(DOCTYPESystemIdentifierSingleQuotedState)
+        if (character == '\'')
+            ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
+        if (character == '>') {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        } else if (cc == kEndOfFileMarker) {
+            m_token.setForceQuirks();
+            return emitAndResumeInDataState(source);
+        }
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            m_token->appendToSystemIdentifier(cc);
-            HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        m_token.appendToSystemIdentifier(character);
+        ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
     END_STATE()
 
-    HTML_BEGIN_STATE(AfterDOCTYPESystemIdentifierState) {
-        if (isTokenizerWhitespace(cc))
-            HTML_ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
-        else if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == kEndOfFileMarker) {
+    BEGIN_STATE(AfterDOCTYPESystemIdentifierState)
+        if (isTokenizerWhitespace(character))
+            ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == kEndOfFileMarker) {
             parseError();
-            m_token->setForceQuirks();
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        } else {
-            parseError();
-            HTML_ADVANCE_TO(BogusDOCTYPEState);
+            m_token.setForceQuirks();
+            return emitAndReconsumeInDataState();
         }
-    }
+        parseError();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(BogusDOCTYPEState) {
-        if (cc == '>')
-            return emitAndResumeIn(source, HTMLTokenizer::DataState);
-        else if (cc == kEndOfFileMarker)
-            return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
-        HTML_ADVANCE_TO(BogusDOCTYPEState);
-    }
+    BEGIN_STATE(BogusDOCTYPEState)
+        if (character == '>')
+            return emitAndResumeInDataState(source);
+        if (character == kEndOfFileMarker)
+            return emitAndReconsumeInDataState();
+        ADVANCE_TO(BogusDOCTYPEState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CDATASectionState) {
-        if (cc == ']')
-            HTML_ADVANCE_TO(CDATASectionRightSquareBracketState);
-        else if (cc == kEndOfFileMarker)
-            HTML_RECONSUME_IN(DataState);
-        else {
-            bufferCharacter(cc);
-            HTML_ADVANCE_TO(CDATASectionState);
-        }
-    }
+    BEGIN_STATE(CDATASectionState)
+        if (character == ']')
+            ADVANCE_TO(CDATASectionRightSquareBracketState);
+        if (character == kEndOfFileMarker)
+            RECONSUME_IN(DataState);
+        bufferCharacter(character);
+        ADVANCE_TO(CDATASectionState);
     END_STATE()
 
-    HTML_BEGIN_STATE(CDATASectionRightSquareBracketState) {
-        if (cc == ']')
-            HTML_ADVANCE_TO(CDATASectionDoubleRightSquareBracketState);
-        else {
-            bufferASCIICharacter(']');
-            HTML_RECONSUME_IN(CDATASectionState);
-        }
-    }
+    BEGIN_STATE(CDATASectionRightSquareBracketState)
+        if (character == ']')
+            ADVANCE_TO(CDATASectionDoubleRightSquareBracketState);
+        bufferASCIICharacter(']');
+        RECONSUME_IN(CDATASectionState);
+    END_STATE()
 
-    HTML_BEGIN_STATE(CDATASectionDoubleRightSquareBracketState) {
-        if (cc == '>')
-            HTML_ADVANCE_TO(DataState);
-        else {
-            bufferASCIICharacter(']');
-            bufferASCIICharacter(']');
-            HTML_RECONSUME_IN(CDATASectionState);
-        }
-    }
+    BEGIN_STATE(CDATASectionDoubleRightSquareBracketState)
+        if (character == '>')
+            ADVANCE_TO(DataState);
+        bufferASCIICharacter(']');
+        bufferASCIICharacter(']');
+        RECONSUME_IN(CDATASectionState);
     END_STATE()
 
     }
@@ -1561,39 +1409,45 @@ String HTMLTokenizer::bufferedCharacters() const
 void HTMLTokenizer::updateStateFor(const AtomicString& tagName)
 {
     if (tagName == textareaTag || tagName == titleTag)
-        setState(HTMLTokenizer::RCDATAState);
+        m_state = RCDATAState;
     else if (tagName == plaintextTag)
-        setState(HTMLTokenizer::PLAINTEXTState);
+        m_state = PLAINTEXTState;
     else if (tagName == scriptTag)
-        setState(HTMLTokenizer::ScriptDataState);
+        m_state = ScriptDataState;
     else if (tagName == styleTag
         || tagName == iframeTag
         || tagName == xmpTag
         || (tagName == noembedTag && m_options.pluginsEnabled)
         || tagName == noframesTag
         || (tagName == noscriptTag && m_options.scriptEnabled))
-        setState(HTMLTokenizer::RAWTEXTState);
+        m_state = RAWTEXTState;
+}
+
+inline void HTMLTokenizer::appendToTemporaryBuffer(UChar character)
+{
+    ASSERT(isASCII(character));
+    m_temporaryBuffer.append(character);
 }
 
-inline bool HTMLTokenizer::temporaryBufferIs(const String& expectedString)
+inline bool HTMLTokenizer::temporaryBufferIs(const char* expectedString)
 {
     return vectorEqualsString(m_temporaryBuffer, expectedString);
 }
 
-inline void HTMLTokenizer::addToPossibleEndTag(LChar cc)
+inline void HTMLTokenizer::appendToPossibleEndTag(UChar character)
 {
-    ASSERT(isEndTagBufferingState(m_state));
-    m_bufferedEndTagName.append(cc);
+    ASSERT(isASCII(character));
+    m_bufferedEndTagName.append(character);
 }
 
-inline bool HTMLTokenizer::isAppropriateEndTag()
+inline bool HTMLTokenizer::isAppropriateEndTag() const
 {
     if (m_bufferedEndTagName.size() != m_appropriateEndTagName.size())
         return false;
 
-    size_t numCharacters = m_bufferedEndTagName.size();
+    unsigned size = m_bufferedEndTagName.size();
 
-    for (size_t i = 0; i < numCharacters; i++) {
+    for (unsigned i = 0; i < size; i++) {
         if (m_bufferedEndTagName[i] != m_appropriateEndTagName[i])
             return false;
     }
@@ -1603,7 +1457,6 @@ inline bool HTMLTokenizer::isAppropriateEndTag()
 
 inline void HTMLTokenizer::parseError()
 {
-    notImplemented();
 }
 
 }
index 3d4356843107a4ce27ee41a3ccd212e83e1d1012..fed21188db4f6e271f11412640b398f0907175a7 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2008 Apple Inc. All Rights Reserved.
+ * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved.
  * Copyright (C) 2010 Google, Inc. All Rights Reserved.
  *
  * Redistribution and use in source and binary forms, with or without
 #include "HTMLParserOptions.h"
 #include "HTMLToken.h"
 #include "InputStreamPreprocessor.h"
-#include "SegmentedString.h"
 
 namespace WebCore {
 
+class SegmentedString;
+
 class HTMLTokenizer {
-    WTF_MAKE_NONCOPYABLE(HTMLTokenizer);
-    WTF_MAKE_FAST_ALLOCATED;
 public:
-    explicit HTMLTokenizer(const HTMLParserOptions&);
-    ~HTMLTokenizer();
+    explicit HTMLTokenizer(const HTMLParserOptions& = HTMLParserOptions());
+
+    // If we can't parse a whole token, this returns null.
+    class TokenPtr;
+    TokenPtr nextToken(SegmentedString&);
+
+    // Returns a copy of any characters buffered internally by the tokenizer.
+    // The tokenizer buffers characters when searching for the </script> token that terminates a script element.
+    String bufferedCharacters() const;
+    size_t numberOfBufferedCharacters() const;
+
+    // Updates the tokenizer's state according to the given tag name. This is an approximation of how the tree
+    // builder would update the tokenizer's state. This method is useful for approximating HTML tokenization.
+    // To get exactly the correct tokenization, you need the real tree builder.
+    //
+    // The main failures in the approximation are as follows:
+    //
+    //  * The first set of character tokens emitted for a <pre> element might contain an extra leading newline.
+    //  * The replacement of U+0000 with U+FFFD will not be sensitive to the tree builder's insertion mode.
+    //  * CDATA sections in foreign content will be tokenized as bogus comments instead of as character tokens.
+    //
+    // This approximation is also the algorithm called for when parsing an HTML fragment.
+    // https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments
+    void updateStateFor(const AtomicString& tagName);
 
-    void reset();
+    void setForceNullCharacterReplacement(bool);
 
+    bool shouldAllowCDATA() const;
+    void setShouldAllowCDATA(bool);
+
+    bool isInDataState() const;
+
+    void setDataState();
+    void setPLAINTEXTState();
+    void setRAWTEXTState();
+    void setRCDATAState();
+    void setScriptDataState();
+
+    bool neverSkipNullCharacters() const;
+
+private:
     enum State {
         DataState,
         CharacterReferenceInDataState,
@@ -88,10 +123,7 @@ public:
         AfterAttributeValueQuotedState,
         SelfClosingStartTagState,
         BogusCommentState,
-        // The ContinueBogusCommentState is not in the HTML5 spec, but we use
-        // it internally to keep track of whether we've started the bogus
-        // comment token yet.
-        ContinueBogusCommentState,
+        ContinueBogusCommentState, // Not in the HTML spec, used internally to track whether we started the bogus comment token.
         MarkupDeclarationOpenState,
         CommentStartState,
         CommentStartDashState,
@@ -121,155 +153,197 @@ public:
         CDATASectionDoubleRightSquareBracketState,
     };
 
-    // This function returns true if it emits a token. Otherwise, callers
-    // must provide the same (in progress) token on the next call (unless
-    // they call reset() first).
-    bool nextToken(SegmentedString&, HTMLToken&);
+    bool processToken(SegmentedString&);
+    bool processEntity(SegmentedString&);
 
-    // Returns a copy of any characters buffered internally by the tokenizer.
-    // The tokenizer buffers characters when searching for the </script> token
-    // that terminates a script element.
-    String bufferedCharacters() const;
+    void parseError();
 
-    size_t numberOfBufferedCharacters() const
-    {
-        // Notice that we add 2 to the length of the m_temporaryBuffer to
-        // account for the "</" characters, which are effecitvely buffered in
-        // the tokenizer's state machine.
-        return m_temporaryBuffer.size() ? m_temporaryBuffer.size() + 2 : 0;
-    }
+    void bufferASCIICharacter(UChar);
+    void bufferCharacter(UChar);
 
-    // Updates the tokenizer's state according to the given tag name. This is
-    // an approximation of how the tree builder would update the tokenizer's
-    // state. This method is useful for approximating HTML tokenization. To
-    // get exactly the correct tokenization, you need the real tree builder.
-    //
-    // The main failures in the approximation are as follows:
-    //
-    //  * The first set of character tokens emitted for a <pre> element might
-    //    contain an extra leading newline.
-    //  * The replacement of U+0000 with U+FFFD will not be sensitive to the
-    //    tree builder's insertion mode.
-    //  * CDATA sections in foreign content will be tokenized as bogus comments
-    //    instead of as character tokens.
-    //
-    void updateStateFor(const AtomicString& tagName);
+    bool emitAndResumeInDataState(SegmentedString&);
+    bool emitAndReconsumeInDataState();
+    bool emitEndOfFile(SegmentedString&);
 
-    bool forceNullCharacterReplacement() const { return m_forceNullCharacterReplacement; }
-    void setForceNullCharacterReplacement(bool value) { m_forceNullCharacterReplacement = value; }
+    // Return true if we wil emit a character token before dealing with the buffered end tag.
+    void flushBufferedEndTag();
+    bool commitToPartialEndTag(SegmentedString&, UChar, State);
+    bool commitToCompleteEndTag(SegmentedString&);
 
-    bool shouldAllowCDATA() const { return m_shouldAllowCDATA; }
-    void setShouldAllowCDATA(bool value) { m_shouldAllowCDATA = value; }
+    void appendToTemporaryBuffer(UChar);
+    bool temporaryBufferIs(const char*);
 
-    State state() const { return m_state; }
-    void setState(State state) { m_state = state; }
+    // Sometimes we speculatively consume input characters and we don't know whether they represent
+    // end tags or RCDATA, etc. These functions help manage these state.
+    bool inEndTagBufferingState() const;
+    void appendToPossibleEndTag(UChar);
+    void saveEndTagNameIfNeeded();
+    bool isAppropriateEndTag() const;
 
-    inline bool shouldSkipNullCharacters() const
-    {
-        return !m_forceNullCharacterReplacement
-            && (m_state == HTMLTokenizer::DataState
-                || m_state == HTMLTokenizer::RCDATAState
-                || m_state == HTMLTokenizer::RAWTEXTState);
-    }
+    bool haveBufferedCharacterToken() const;
+
+    static bool isNullCharacterSkippingState(State);
+
+    State m_state { DataState };
+    bool m_forceNullCharacterReplacement { false };
+    bool m_shouldAllowCDATA { false };
+
+    mutable HTMLToken m_token;
+
+    // https://html.spec.whatwg.org/#additional-allowed-character
+    UChar m_additionalAllowedCharacter { 0 };
+
+    // https://html.spec.whatwg.org/#preprocessing-the-input-stream
+    InputStreamPreprocessor<HTMLTokenizer> m_preprocessor;
+
+    Vector<UChar, 32> m_appropriateEndTagName;
+
+    // https://html.spec.whatwg.org/#temporary-buffer
+    Vector<LChar, 32> m_temporaryBuffer;
+
+    // We occasionally want to emit both a character token and an end tag
+    // token (e.g., when lexing script). We buffer the name of the end tag
+    // token here so we remember it next time we re-enter the tokenizer.
+    Vector<LChar, 32> m_bufferedEndTagName;
+
+    const HTMLParserOptions m_options;
+};
+
+class HTMLTokenizer::TokenPtr {
+public:
+    TokenPtr();
+    ~TokenPtr();
+
+    TokenPtr(TokenPtr&&);
+    TokenPtr& operator=(TokenPtr&&) = delete;
+
+    void clear();
+
+    operator bool() const;
+
+    HTMLToken& operator*() const;
+    HTMLToken* operator->() const;
 
 private:
-    inline bool processEntity(SegmentedString&);
+    friend class HTMLTokenizer;
+    explicit TokenPtr(HTMLToken*);
 
-    inline void parseError();
+    HTMLToken* m_token { nullptr };
+};
 
-    void bufferASCIICharacter(UChar character)
-    {
-        ASSERT(character != kEndOfFileMarker);
-        ASSERT(isASCII(character));
-        m_token->appendToCharacter(static_cast<LChar>(character));
-    }
+inline HTMLTokenizer::TokenPtr::TokenPtr()
+{
+}
 
-    void bufferCharacter(UChar character)
-    {
-        ASSERT(character != kEndOfFileMarker);
-        m_token->appendToCharacter(character);
-    }
-    void bufferCharacter(char) = delete;
-    void bufferCharacter(LChar) = delete;
-
-    inline bool emitAndResumeIn(SegmentedString& source, State state)
-    {
-        saveEndTagNameIfNeeded();
-        m_state = state;
-        source.advanceAndUpdateLineNumber();
-        return true;
-    }
-    
-    inline bool emitAndReconsumeIn(SegmentedString&, State state)
-    {
-        saveEndTagNameIfNeeded();
-        m_state = state;
-        return true;
-    }
+inline HTMLTokenizer::TokenPtr::TokenPtr(HTMLToken* token)
+    : m_token(token)
+{
+}
 
-    inline bool emitEndOfFile(SegmentedString& source)
-    {
-        if (haveBufferedCharacterToken())
-            return true;
-        m_state = HTMLTokenizer::DataState;
-        source.advanceAndUpdateLineNumber();
+inline HTMLTokenizer::TokenPtr::~TokenPtr()
+{
+    if (m_token)
         m_token->clear();
-        m_token->makeEndOfFile();
-        return true;
+}
+
+inline HTMLTokenizer::TokenPtr::TokenPtr(TokenPtr&& other)
+    : m_token(other.m_token)
+{
+    other.m_token = nullptr;
+}
+
+inline void HTMLTokenizer::TokenPtr::clear()
+{
+    if (m_token) {
+        m_token->clear();
+        m_token = nullptr;
     }
+}
 
-    inline bool flushEmitAndResumeIn(SegmentedString&, State);
+inline HTMLTokenizer::TokenPtr::operator bool() const
+{
+    return m_token;
+}
 
-    // Return whether we need to emit a character token before dealing with
-    // the buffered end tag.
-    inline bool flushBufferedEndTag(SegmentedString&);
-    inline bool temporaryBufferIs(const String&);
+inline HTMLToken& HTMLTokenizer::TokenPtr::operator*() const
+{
+    ASSERT(m_token);
+    return *m_token;
+}
 
-    // Sometimes we speculatively consume input characters and we don't
-    // know whether they represent end tags or RCDATA, etc. These
-    // functions help manage these state.
-    inline void addToPossibleEndTag(LChar cc);
+inline HTMLToken* HTMLTokenizer::TokenPtr::operator->() const
+{
+    ASSERT(m_token);
+    return m_token;
+}
 
-    inline void saveEndTagNameIfNeeded()
-    {
-        ASSERT(m_token->type() != HTMLToken::Uninitialized);
-        if (m_token->type() == HTMLToken::StartTag)
-            m_appropriateEndTagName = m_token->name();
-    }
-    inline bool isAppropriateEndTag();
+inline HTMLTokenizer::TokenPtr HTMLTokenizer::nextToken(SegmentedString& source)
+{
+    return TokenPtr(processToken(source) ? &m_token : nullptr);
+}
 
+inline size_t HTMLTokenizer::numberOfBufferedCharacters() const
+{
+    // Notice that we add 2 to the length of the m_temporaryBuffer to
+    // account for the "</" characters, which are effecitvely buffered in
+    // the tokenizer's state machine.
+    return m_temporaryBuffer.size() ? m_temporaryBuffer.size() + 2 : 0;
+}
 
-    inline bool haveBufferedCharacterToken()
-    {
-        return m_token->type() == HTMLToken::Character;
-    }
+inline void HTMLTokenizer::setForceNullCharacterReplacement(bool value)
+{
+    m_forceNullCharacterReplacement = value;
+}
 
-    State m_state;
-    bool m_forceNullCharacterReplacement;
-    bool m_shouldAllowCDATA;
+inline bool HTMLTokenizer::shouldAllowCDATA() const
+{
+    return m_shouldAllowCDATA;
+}
 
-    // m_token is owned by the caller. If nextToken is not on the stack,
-    // this member might be pointing to unallocated memory.
-    HTMLToken* m_token;
+inline void HTMLTokenizer::setShouldAllowCDATA(bool value)
+{
+    m_shouldAllowCDATA = value;
+}
 
-    // http://www.whatwg.org/specs/web-apps/current-work/#additional-allowed-character
-    UChar m_additionalAllowedCharacter;
+inline bool HTMLTokenizer::isInDataState() const
+{
+    return m_state == DataState;
+}
 
-    // http://www.whatwg.org/specs/web-apps/current-work/#preprocessing-the-input-stream
-    InputStreamPreprocessor<HTMLTokenizer> m_inputStreamPreprocessor;
+inline void HTMLTokenizer::setDataState()
+{
+    m_state = DataState;
+}
 
-    Vector<UChar, 32> m_appropriateEndTagName;
+inline void HTMLTokenizer::setPLAINTEXTState()
+{
+    m_state = PLAINTEXTState;
+}
 
-    // http://www.whatwg.org/specs/web-apps/current-work/#temporary-buffer
-    Vector<LChar, 32> m_temporaryBuffer;
+inline void HTMLTokenizer::setRAWTEXTState()
+{
+    m_state = RAWTEXTState;
+}
 
-    // We occationally want to emit both a character token and an end tag
-    // token (e.g., when lexing script). We buffer the name of the end tag
-    // token here so we remember it next time we re-enter the tokenizer.
-    Vector<LChar, 32> m_bufferedEndTagName;
+inline void HTMLTokenizer::setRCDATAState()
+{
+    m_state = RCDATAState;
+}
 
-    HTMLParserOptions m_options;
-};
+inline void HTMLTokenizer::setScriptDataState()
+{
+    m_state = ScriptDataState;
+}
+
+inline bool HTMLTokenizer::isNullCharacterSkippingState(State state)
+{
+    return state == DataState || state == RCDATAState || state == RAWTEXTState;
+}
+
+inline bool HTMLTokenizer::neverSkipNullCharacters() const
+{
+    return m_forceNullCharacterReplacement;
+}
 
 }
 
index eaca0eb20a734e0de9abde28142e89cdb0feac17..042ca0ec9141c736414dafb50441d7f62c180e95 100644 (file)
@@ -695,7 +695,7 @@ void HTMLTreeBuilder::processStartTagForInBody(AtomicHTMLToken& token)
     if (token.name() == plaintextTag) {
         processFakePEndTagIfPInButtonScope();
         m_tree.insertHTMLElement(&token);
-        m_parser.tokenizer().setState(HTMLTokenizer::PLAINTEXTState);
+        m_parser.tokenizer().setPLAINTEXTState();
         return;
     }
     if (token.name() == buttonTag) {
@@ -799,7 +799,7 @@ void HTMLTreeBuilder::processStartTagForInBody(AtomicHTMLToken& token)
     if (token.name() == textareaTag) {
         m_tree.insertHTMLElement(&token);
         m_shouldSkipLeadingNewline = true;
-        m_parser.tokenizer().setState(HTMLTokenizer::RCDATAState);
+        m_parser.tokenizer().setRCDATAState();
         m_originalInsertionMode = m_insertionMode;
         m_framesetOk = false;
         m_insertionMode = InsertionMode::Text;
@@ -2137,8 +2137,8 @@ void HTMLTreeBuilder::processEndTag(AtomicHTMLToken& token)
             // self-closing script tag was encountered and pre-HTML5 parser
             // quirks are enabled. We must set the tokenizer's state to
             // DataState explicitly if the tokenizer didn't have a chance to.
-            ASSERT(m_parser.tokenizer().state() == HTMLTokenizer::DataState || m_options.usePreHTML5ParserQuirks);
-            m_parser.tokenizer().setState(HTMLTokenizer::DataState);
+            ASSERT(m_parser.tokenizer().isInDataState() || m_options.usePreHTML5ParserQuirks);
+            m_parser.tokenizer().setDataState();
             return;
         }
         m_tree.openElements().pop();
@@ -2739,7 +2739,7 @@ void HTMLTreeBuilder::processGenericRCDATAStartTag(AtomicHTMLToken& token)
 {
     ASSERT(token.type() == HTMLToken::StartTag);
     m_tree.insertHTMLElement(&token);
-    m_parser.tokenizer().setState(HTMLTokenizer::RCDATAState);
+    m_parser.tokenizer().setRCDATAState();
     m_originalInsertionMode = m_insertionMode;
     m_insertionMode = InsertionMode::Text;
 }
@@ -2748,7 +2748,7 @@ void HTMLTreeBuilder::processGenericRawTextStartTag(AtomicHTMLToken& token)
 {
     ASSERT(token.type() == HTMLToken::StartTag);
     m_tree.insertHTMLElement(&token);
-    m_parser.tokenizer().setState(HTMLTokenizer::RAWTEXTState);
+    m_parser.tokenizer().setRAWTEXTState();
     m_originalInsertionMode = m_insertionMode;
     m_insertionMode = InsertionMode::Text;
 }
@@ -2757,7 +2757,7 @@ void HTMLTreeBuilder::processScriptStartTag(AtomicHTMLToken& token)
 {
     ASSERT(token.type() == HTMLToken::StartTag);
     m_tree.insertScriptElement(&token);
-    m_parser.tokenizer().setState(HTMLTokenizer::ScriptDataState);
+    m_parser.tokenizer().setScriptDataState();
     m_originalInsertionMode = m_insertionMode;
 
     TextPosition position = m_parser.textPosition();
index ffd639abe0d92770119a93fbb49730db91f48d36..6290c151cebdee618b7502523c7dae49cd5d756a 100644 (file)
@@ -40,7 +40,7 @@ template <typename Tokenizer>
 class InputStreamPreprocessor {
     WTF_MAKE_NONCOPYABLE(InputStreamPreprocessor);
 public:
-    InputStreamPreprocessor(Tokenizer* tokenizer)
+    explicit InputStreamPreprocessor(Tokenizer& tokenizer)
         : m_tokenizer(tokenizer)
     {
         reset();
@@ -51,8 +51,11 @@ public:
     // Returns whether we succeeded in peeking at the next character.
     // The only way we can fail to peek is if there are no more
     // characters in |source| (after collapsing \r\n, etc).
-    ALWAYS_INLINE bool peek(SegmentedString& source)
+    ALWAYS_INLINE bool peek(SegmentedString& source, bool skipNullCharacters = false)
     {
+        if (source.isEmpty())
+            return false;
+
         m_nextInputCharacter = source.currentChar();
 
         // Every branch in this function is expensive, so we have a
@@ -64,16 +67,14 @@ public:
             m_skipNextNewLine = false;
             return true;
         }
-        return processNextInputCharacter(source);
+        return processNextInputCharacter(source, skipNullCharacters);
     }
 
     // Returns whether there are more characters in |source| after advancing.
-    ALWAYS_INLINE bool advance(SegmentedString& source)
+    ALWAYS_INLINE bool advance(SegmentedString& source, bool skipNullCharacters = false)
     {
         source.advanceAndUpdateLineNumber();
-        if (source.isEmpty())
-            return false;
-        return peek(source);
+        return peek(source, skipNullCharacters);
     }
 
     bool skipNextNewLine() const { return m_skipNextNewLine; }
@@ -85,7 +86,7 @@ public:
     }
 
 private:
-    bool processNextInputCharacter(SegmentedString& source)
+    bool processNextInputCharacter(SegmentedString& source, bool skipNullCharacters)
     {
     ProcessAgain:
         ASSERT(m_nextInputCharacter == source.currentChar());
@@ -107,7 +108,7 @@ private:
             // by the replacement character. We suspect this is a problem with the spec as doing
             // that filtering breaks surrogate pair handling and causes us not to match Minefield.
             if (m_nextInputCharacter == '\0' && !shouldTreatNullAsEndOfFileMarker(source)) {
-                if (m_tokenizer->shouldSkipNullCharacters()) {
+                if (skipNullCharacters && !m_tokenizer.neverSkipNullCharacters()) {
                     source.advancePastNonNewline();
                     if (source.isEmpty())
                         return false;
@@ -125,7 +126,7 @@ private:
         return source.isClosed() && source.length() == 1;
     }
 
-    Tokenizer* m_tokenizer;
+    Tokenizer& m_tokenizer;
 
     // http://www.whatwg.org/specs/web-apps/current-work/#next-input-character
     UChar m_nextInputCharacter;
index 9df699611c9336decf5353eb2026805523f7ebf4..5fa62a322a363893716d515c82d356e8078ed74f 100644 (file)
@@ -61,7 +61,7 @@ void TextDocumentParser::insertFakePreElement()
 
     // Although Text Documents expose a "pre" element in their DOM, they
     // act like a <plaintext> tag, so we have to force plaintext mode.
-    tokenizer().setState(HTMLTokenizer::PLAINTEXTState);
+    tokenizer().setPLAINTEXTState();
 
     m_haveInsertedFakePreElement = true;
 }
index 298c8c648883b9a9eadb708ccefd54a278e2cb79..722021b40fdbf49b543894a0633234b51f1b6bdd 100644 (file)
@@ -566,7 +566,7 @@ bool XSSAuditor::eraseAttributeIfInjected(const FilterTokenRequest& request, con
 String XSSAuditor::decodedSnippetForName(const FilterTokenRequest& request)
 {
     // Grab a fixed number of characters equal to the length of the token's name plus one (to account for the "<").
-    return fullyDecodeString(request.sourceTracker.sourceForToken(request.token), m_encoding).substring(0, request.token.name().size() + 1);
+    return fullyDecodeString(request.sourceTracker.source(request.token), m_encoding).substring(0, request.token.name().size() + 1);
 }
 
 String XSSAuditor::decodedSnippetForAttribute(const FilterTokenRequest& request, const HTMLToken::Attribute& attribute, AttributeKind treatment)
@@ -575,9 +575,9 @@ String XSSAuditor::decodedSnippetForAttribute(const FilterTokenRequest& request,
     // for an input of |name="value"|, the snippet is |name="value|. For an
     // unquoted input of |name=value |, the snippet is |name=value|.
     // FIXME: We should grab one character before the name also.
-    unsigned start = attribute.nameRange.start;
-    unsigned end = attribute.valueRange.end;
-    String decodedSnippet = fullyDecodeString(request.sourceTracker.sourceForToken(request.token).substring(start, end - start), m_encoding);
+    unsigned start = attribute.startOffset;
+    unsigned end = attribute.endOffset;
+    String decodedSnippet = fullyDecodeString(request.sourceTracker.source(request.token, start, end), m_encoding);
     decodedSnippet.truncate(kMaximumFragmentLengthTarget);
     if (treatment == SrcLikeAttribute) {
         int slashCount = 0;
@@ -630,7 +630,7 @@ String XSSAuditor::decodedSnippetForAttribute(const FilterTokenRequest& request,
 
 String XSSAuditor::decodedSnippetForJavaScript(const FilterTokenRequest& request)
 {
-    String string = request.sourceTracker.sourceForToken(request.token);
+    String string = request.sourceTracker.source(request.token);
     size_t startPosition = 0;
     size_t endPosition = string.length();
     size_t foundPosition = notFound;
@@ -737,12 +737,4 @@ bool XSSAuditor::isLikelySafeResource(const String& url)
     return (m_documentURL.host() == resourceURL.host() && resourceURL.query().isEmpty());
 }
 
-bool XSSAuditor::isSafeToSendToAnotherThread() const
-{
-    return m_documentURL.isSafeToSendToAnotherThread()
-        && m_decodedURL.isSafeToSendToAnotherThread()
-        && m_decodedHTTPBody.isSafeToSendToAnotherThread()
-        && m_cachedDecodedSnippet.isSafeToSendToAnotherThread();
-}
-
 } // namespace WebCore
index 2c541a279b843de44662bc424d06a41ec101fd89..34b6a8f93d96c909a4cf3aeafa2091c2ffe8077d 100644 (file)
@@ -61,7 +61,6 @@ public:
     void initForFragment();
 
     std::unique_ptr<XSSInfo> filterToken(const FilterTokenRequest&);
-    bool isSafeToSendToAnotherThread() const;
 
 private:
     static const size_t kMaximumFragmentLengthTarget = 100;
index 56ce08a0ce4f187683533092615c393b13fc9a5c..0cb19e85f535eb6160d8cb33dadbb8a175792a1d 100644 (file)
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) 2011, 2013 Google Inc.  All rights reserved.
- * Copyright (C) 2014 Apple Inc.  All rights reserved.
+ * Copyright (C) 2014-2015 Apple Inc.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
 
 namespace WebCore {
 
-#define WEBVTT_BEGIN_STATE(stateName) case stateName: stateName:
-#define WEBVTT_ADVANCE_TO(stateName)                               \
-    do {                                                           \
-        state = stateName;                                         \
-        ASSERT(!m_input.isEmpty());                                \
-        m_inputStreamPreprocessor.advance(m_input);                \
-        cc = m_inputStreamPreprocessor.nextInputCharacter();       \
-        goto stateName;                                            \
+#define WEBVTT_ADVANCE_TO(stateName)                        \
+    do {                                                    \
+        ASSERT(!m_input.isEmpty());                         \
+        m_preprocessor.advance(m_input);                    \
+        character = m_preprocessor.nextInputCharacter();    \
+        goto stateName;                                     \
     } while (false)
-
     
-template<unsigned charactersCount>
-ALWAYS_INLINE bool equalLiteral(const StringBuilder& s, const char (&characters)[charactersCount])
+template<unsigned charactersCount> ALWAYS_INLINE bool equalLiteral(const StringBuilder& s, const char (&characters)[charactersCount])
 {
     return WTF::equal(s, reinterpret_cast<const LChar*>(characters), charactersCount - 1);
 }
@@ -79,7 +75,7 @@ inline bool advanceAndEmitToken(SegmentedString& source, WebVTTToken& resultToke
 
 WebVTTTokenizer::WebVTTTokenizer(const String& input)
     : m_input(input)
-    , m_inputStreamPreprocessor(this)
+    , m_preprocessor(*this)
 {
     // Append an EOF marker and close the input "stream".
     ASSERT(!m_input.isClosed());
@@ -89,12 +85,12 @@ WebVTTTokenizer::WebVTTTokenizer(const String& input)
 
 bool WebVTTTokenizer::nextToken(WebVTTToken& token)
 {
-    if (m_input.isEmpty() || !m_inputStreamPreprocessor.peek(m_input))
+    if (m_input.isEmpty() || !m_preprocessor.peek(m_input))
         return false;
 
-    UChar cc = m_inputStreamPreprocessor.nextInputCharacter();
-    if (cc == kEndOfFileMarker) {
-        m_inputStreamPreprocessor.advance(m_input);
+    UChar character = m_preprocessor.nextInputCharacter();
+    if (character == kEndOfFileMarker) {
+        m_preprocessor.advance(m_input);
         return false;
     }
 
@@ -102,169 +98,134 @@ bool WebVTTTokenizer::nextToken(WebVTTToken& token)
     StringBuilder result;
     StringBuilder classes;
 
-    enum {
-        DataState,
-        EscapeState,
-        TagState,
-        StartTagState,
-        StartTagClassState,
-        StartTagAnnotationState,
-        EndTagState,
-        TimestampTagState,
-    } state = DataState;
-
-    // 4.8.10.13.4 WebVTT cue text tokenizer
-    switch (state) {
-    WEBVTT_BEGIN_STATE(DataState) {
-        if (cc == '&') {
-            buffer.append(static_cast<LChar>(cc));
-            WEBVTT_ADVANCE_TO(EscapeState);
-        } else if (cc == '<') {
-            if (result.isEmpty())
-                WEBVTT_ADVANCE_TO(TagState);
-            else {
-                // We don't want to advance input or perform a state transition - just return a (new) token.
-                // (On the next call to nextToken we will see '<' again, but take the other branch in this if instead.)
-                return emitToken(token, WebVTTToken::StringToken(result.toString()));
-            }
-        } else if (cc == kEndOfFileMarker)
-            return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
+// 4.8.10.13.4 WebVTT cue text tokenizer
+DataState:
+    if (character == '&') {
+        buffer.append('&');
+        WEBVTT_ADVANCE_TO(EscapeState);
+    } else if (character == '<') {
+        if (result.isEmpty())
+            WEBVTT_ADVANCE_TO(TagState);
         else {
-            result.append(cc);
-            WEBVTT_ADVANCE_TO(DataState);
-        }
-    }
-    END_STATE()
-
-    WEBVTT_BEGIN_STATE(EscapeState) {
-        if (cc == ';') {
-            if (equalLiteral(buffer, "&amp"))
-                result.append('&');
-            else if (equalLiteral(buffer, "&lt"))
-                result.append('<');
-            else if (equalLiteral(buffer, "&gt"))
-                result.append('>');
-            else if (equalLiteral(buffer, "&lrm"))
-                result.append(leftToRightMark);
-            else if (equalLiteral(buffer, "&rlm"))
-                result.append(rightToLeftMark);
-            else if (equalLiteral(buffer, "&nbsp"))
-                result.append(noBreakSpace);
-            else {
-                buffer.append(static_cast<LChar>(cc));
-                result.append(buffer);
-            }
-            buffer.clear();
-            WEBVTT_ADVANCE_TO(DataState);
-        } else if (isASCIIAlphanumeric(cc)) {
-            buffer.append(static_cast<LChar>(cc));
-            WEBVTT_ADVANCE_TO(EscapeState);
-        } else if (cc == '<') {
-            result.append(buffer);
+            // We don't want to advance input or perform a state transition - just return a (new) token.
+            // (On the next call to nextToken we will see '<' again, but take the other branch in this if instead.)
             return emitToken(token, WebVTTToken::StringToken(result.toString()));
-        } else if (cc == kEndOfFileMarker) {
-            result.append(buffer);
-            return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
-        } else {
-            result.append(buffer);
-            buffer.clear();
-
-            if (cc == '&') {
-                buffer.append(static_cast<LChar>(cc));
-                WEBVTT_ADVANCE_TO(EscapeState);
-            }
-            result.append(cc);
-            WEBVTT_ADVANCE_TO(DataState);
-        }
-    }
-    END_STATE()
-
-    WEBVTT_BEGIN_STATE(TagState) {
-        if (isTokenizerWhitespace(cc)) {
-            ASSERT(result.isEmpty());
-            WEBVTT_ADVANCE_TO(StartTagAnnotationState);
-        } else if (cc == '.') {
-            ASSERT(result.isEmpty());
-            WEBVTT_ADVANCE_TO(StartTagClassState);
-        } else if (cc == '/') {
-            WEBVTT_ADVANCE_TO(EndTagState);
-        } else if (WTF::isASCIIDigit(cc)) {
-            result.append(cc);
-            WEBVTT_ADVANCE_TO(TimestampTagState);
-        } else if (cc == '>' || cc == kEndOfFileMarker) {
-            ASSERT(result.isEmpty());
-            return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
-        } else {
-            result.append(cc);
-            WEBVTT_ADVANCE_TO(StartTagState);
         }
+    } else if (character == kEndOfFileMarker)
+        return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
+    else {
+        result.append(character);
+        WEBVTT_ADVANCE_TO(DataState);
     }
-    END_STATE()
 
-    WEBVTT_BEGIN_STATE(StartTagState) {
-        if (isTokenizerWhitespace(cc))
-            WEBVTT_ADVANCE_TO(StartTagAnnotationState);
-        else if (cc == '.')
-            WEBVTT_ADVANCE_TO(StartTagClassState);
-        else if (cc == '>' || cc == kEndOfFileMarker)
-            return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
+EscapeState:
+    if (character == ';') {
+        if (equalLiteral(buffer, "&amp"))
+            result.append('&');
+        else if (equalLiteral(buffer, "&lt"))
+            result.append('<');
+        else if (equalLiteral(buffer, "&gt"))
+            result.append('>');
+        else if (equalLiteral(buffer, "&lrm"))
+            result.append(leftToRightMark);
+        else if (equalLiteral(buffer, "&rlm"))
+            result.append(rightToLeftMark);
+        else if (equalLiteral(buffer, "&nbsp"))
+            result.append(noBreakSpace);
         else {
-            result.append(cc);
-            WEBVTT_ADVANCE_TO(StartTagState);
+            buffer.append(character);
+            result.append(buffer);
         }
-    }
-    END_STATE()
-
-    WEBVTT_BEGIN_STATE(StartTagClassState) {
-        if (isTokenizerWhitespace(cc)) {
-            addNewClass(classes, buffer);
-            buffer.clear();
-            WEBVTT_ADVANCE_TO(StartTagAnnotationState);
-        } else if (cc == '.') {
-            addNewClass(classes, buffer);
-            buffer.clear();
-            WEBVTT_ADVANCE_TO(StartTagClassState);
-        } else if (cc == '>' || cc == kEndOfFileMarker) {
-            addNewClass(classes, buffer);
-            buffer.clear();
-            return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString()));
-        } else {
-            buffer.append(cc);
-            WEBVTT_ADVANCE_TO(StartTagClassState);
+        buffer.clear();
+        WEBVTT_ADVANCE_TO(DataState);
+    } else if (isASCIIAlphanumeric(character)) {
+        buffer.append(character);
+        WEBVTT_ADVANCE_TO(EscapeState);
+    } else if (character == '<') {
+        result.append(buffer);
+        return emitToken(token, WebVTTToken::StringToken(result.toString()));
+    } else if (character == kEndOfFileMarker) {
+        result.append(buffer);
+        return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
+    } else {
+        result.append(buffer);
+        buffer.clear();
+
+        if (character == '&') {
+            buffer.append('&');
+            WEBVTT_ADVANCE_TO(EscapeState);
         }
-
+        result.append(character);
+        WEBVTT_ADVANCE_TO(DataState);
     }
-    END_STATE()
 
-    WEBVTT_BEGIN_STATE(StartTagAnnotationState) {
-        if (cc == '>' || cc == kEndOfFileMarker) {
-            return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString(), buffer.toAtomicString()));
-        }
-        buffer.append(cc);
+TagState:
+    if (isTokenizerWhitespace(character)) {
+        ASSERT(result.isEmpty());
         WEBVTT_ADVANCE_TO(StartTagAnnotationState);
-    }
-    END_STATE()
-    
-    WEBVTT_BEGIN_STATE(EndTagState) {
-        if (cc == '>' || cc == kEndOfFileMarker)
-            return advanceAndEmitToken(m_input, token, WebVTTToken::EndTag(result.toString()));
-        result.append(cc);
+    } else if (character == '.') {
+        ASSERT(result.isEmpty());
+        WEBVTT_ADVANCE_TO(StartTagClassState);
+    } else if (character == '/') {
         WEBVTT_ADVANCE_TO(EndTagState);
+    } else if (WTF::isASCIIDigit(character)) {
+        result.append(character);
+        WEBVTT_ADVANCE_TO(TimestampTagState);
+    } else if (character == '>' || character == kEndOfFileMarker) {
+        ASSERT(result.isEmpty());
+        return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
+    } else {
+        result.append(character);
+        WEBVTT_ADVANCE_TO(StartTagState);
     }
-    END_STATE()
 
-    WEBVTT_BEGIN_STATE(TimestampTagState) {
-        if (cc == '>' || cc == kEndOfFileMarker)
-            return advanceAndEmitToken(m_input, token, WebVTTToken::TimestampTag(result.toString()));
-        result.append(cc);
-        WEBVTT_ADVANCE_TO(TimestampTagState);
+StartTagState:
+    if (isTokenizerWhitespace(character))
+        WEBVTT_ADVANCE_TO(StartTagAnnotationState);
+    else if (character == '.')
+        WEBVTT_ADVANCE_TO(StartTagClassState);
+    else if (character == '>' || character == kEndOfFileMarker)
+        return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
+    else {
+        result.append(character);
+        WEBVTT_ADVANCE_TO(StartTagState);
     }
-    END_STATE()
 
+StartTagClassState:
+    if (isTokenizerWhitespace(character)) {
+        addNewClass(classes, buffer);
+        buffer.clear();
+        WEBVTT_ADVANCE_TO(StartTagAnnotationState);
+    } else if (character == '.') {
+        addNewClass(classes, buffer);
+        buffer.clear();
+        WEBVTT_ADVANCE_TO(StartTagClassState);
+    } else if (character == '>' || character == kEndOfFileMarker) {
+        addNewClass(classes, buffer);
+        buffer.clear();
+        return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString()));
+    } else {
+        buffer.append(character);
+        WEBVTT_ADVANCE_TO(StartTagClassState);
     }
 
-    ASSERT_NOT_REACHED();
-    return false;
+StartTagAnnotationState:
+    if (character == '>' || character == kEndOfFileMarker)
+        return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString(), buffer.toAtomicString()));
+    buffer.append(character);
+    WEBVTT_ADVANCE_TO(StartTagAnnotationState);
+
+EndTagState:
+    if (character == '>' || character == kEndOfFileMarker)
+        return advanceAndEmitToken(m_input, token, WebVTTToken::EndTag(result.toString()));
+    result.append(character);
+    WEBVTT_ADVANCE_TO(EndTagState);
+
+TimestampTagState:
+    if (character == '>' || character == kEndOfFileMarker)
+        return advanceAndEmitToken(m_input, token, WebVTTToken::TimestampTag(result.toString()));
+    result.append(character);
+    WEBVTT_ADVANCE_TO(TimestampTagState);
 }
 
 }
index a97797eee23250257c7bf187e6a4b8b3cb5e4794..6c1dda3ea84443bac0231f233a6462d6539e3682 100644 (file)
 namespace WebCore {
 
 class WebVTTTokenizer {
-    WTF_MAKE_NONCOPYABLE(WebVTTTokenizer);
 public:
     explicit WebVTTTokenizer(const String&);
-
     bool nextToken(WebVTTToken&);
 
-    inline bool shouldSkipNullCharacters() const { return true; }
+    static bool neverSkipNullCharacters() { return false; }
 
 private:
     SegmentedString m_input;
-
-    // ://www.whatwg.org/specs/web-apps/current-work/#preprocessing-the-input-stream
-    InputStreamPreprocessor<WebVTTTokenizer> m_inputStreamPreprocessor;
+    InputStreamPreprocessor<WebVTTTokenizer> m_preprocessor;
 };
 
 }
index b0dc3d3ddf4664a880077772270d608e9ccf0661..491d16ccd5c13c22f801fc07594bae0ceeb48a5f 100644 (file)
@@ -20,6 +20,8 @@
 #include "config.h"
 #include "SegmentedString.h"
 
+#include <wtf/text/TextPosition.h>
+
 namespace WebCore {
 
 SegmentedString::SegmentedString(const SegmentedString& other)
@@ -44,7 +46,7 @@ SegmentedString::SegmentedString(const SegmentedString& other)
         m_currentChar = m_currentString.m_length ? m_currentString.getCurrentChar() : 0;
 }
 
-const SegmentedString& SegmentedString::operator=(const SegmentedString& other)
+SegmentedString& SegmentedString::operator=(const SegmentedString& other)
 {
     m_pushedChar1 = other.m_pushedChar1;
     m_pushedChar2 = other.m_pushedChar2;
@@ -130,14 +132,14 @@ void SegmentedString::append(const SegmentedSubstring& s)
     m_empty = false;
 }
 
-void SegmentedString::prepend(const SegmentedSubstring& s)
+void SegmentedString::pushBack(const SegmentedSubstring& s)
 {
-    ASSERT(!escaped());
+    ASSERT(!m_pushedChar1);
     ASSERT(!s.numberOfCharactersConsumed());
     if (!s.m_length)
         return;
 
-    // FIXME: We're assuming that the prepend were originally consumed by
+    // FIXME: We're assuming that the characters were originally consumed by
     //        this SegmentedString.  We're also ASSERTing that s is a fresh
     //        SegmentedSubstring.  These assumptions are sufficient for our
     //        current use, but we might need to handle the more elaborate
@@ -166,7 +168,7 @@ void SegmentedString::close()
 void SegmentedString::append(const SegmentedString& s)
 {
     ASSERT(!m_closed);
-    ASSERT(!s.escaped());
+    ASSERT(!s.m_pushedChar1);
     append(s.m_currentString);
     if (s.isComposite()) {
         Deque<SegmentedSubstring>::const_iterator it = s.m_substrings.begin();
@@ -177,17 +179,17 @@ void SegmentedString::append(const SegmentedString& s)
     m_currentChar = m_pushedChar1 ? m_pushedChar1 : (m_currentString.m_length ? m_currentString.getCurrentChar() : 0);
 }
 
-void SegmentedString::prepend(const SegmentedString& s)
+void SegmentedString::pushBack(const SegmentedString& s)
 {
-    ASSERT(!escaped());
-    ASSERT(!s.escaped());
+    ASSERT(!m_pushedChar1);
+    ASSERT(!s.m_pushedChar1);
     if (s.isComposite()) {
         Deque<SegmentedSubstring>::const_reverse_iterator it = s.m_substrings.rbegin();
         Deque<SegmentedSubstring>::const_reverse_iterator e = s.m_substrings.rend();
         for (; it != e; ++it)
-            prepend(*it);
+            pushBack(*it);
     }
-    prepend(s.m_currentString);
+    pushBack(s.m_currentString);
     m_currentChar = m_pushedChar1 ? m_pushedChar1 : (m_currentString.m_length ? m_currentString.getCurrentChar() : 0);
 }
 
@@ -228,12 +230,12 @@ String SegmentedString::toString() const
     return result.toString();
 }
 
-void SegmentedString::advance(unsigned count, UChar* consumedCharacters)
+void SegmentedString::advancePastNonNewlines(unsigned count, UChar* consumedCharacters)
 {
     ASSERT_WITH_SECURITY_IMPLICATION(count <= length());
     for (unsigned i = 0; i < count; ++i) {
         consumedCharacters[i] = currentChar();
-        advance();
+        advancePastNonNewline();
     }
 }
 
@@ -353,8 +355,7 @@ OrdinalNumber SegmentedString::currentLine() const
 
 OrdinalNumber SegmentedString::currentColumn() const
 {
-    int zeroBasedColumn = numberOfCharactersConsumed() - m_numberOfCharactersConsumedPriorToCurrentLine;
-    return OrdinalNumber::fromZeroBasedInt(zeroBasedColumn);
+    return OrdinalNumber::fromZeroBasedInt(numberOfCharactersConsumed() - m_numberOfCharactersConsumedPriorToCurrentLine);
 }
 
 void SegmentedString::setCurrentPosition(OrdinalNumber line, OrdinalNumber columnAftreProlog, int prologLength)
@@ -363,4 +364,18 @@ void SegmentedString::setCurrentPosition(OrdinalNumber line, OrdinalNumber colum
     m_numberOfCharactersConsumedPriorToCurrentLine = numberOfCharactersConsumed() + prologLength - columnAftreProlog.zeroBasedInt();
 }
 
+SegmentedString::AdvancePastResult SegmentedString::advancePastSlowCase(const char* literal, bool caseSensitive)
+{
+    unsigned length = strlen(literal);
+    if (length > this->length())
+        return NotEnoughCharacters;
+    UChar* consumedCharacters;
+    String consumedString = String::createUninitialized(length, consumedCharacters);
+    advancePastNonNewlines(length, consumedCharacters);
+    if (consumedString.startsWith(literal, caseSensitive))
+        return DidMatch;
+    pushBack(SegmentedString(consumedString));
+    return DidNotMatch;
+}
+
 }
index d5fe367b32890d1227844504d678c57ffb29a99a..0813d60ce74496648a90cd0a594bcee7688ced6c 100644 (file)
@@ -1,5 +1,5 @@
 /*
-    Copyright (C) 2004, 2005, 2006, 2007, 2008 Apple Inc. All rights reserved.
+    Copyright (C) 2004-2008, 2015 Apple Inc. All rights reserved.
 
     This library is free software; you can redistribute it and/or
     modify it under the terms of the GNU Library General Public
@@ -22,8 +22,6 @@
 
 #include <wtf/Deque.h>
 #include <wtf/text/StringBuilder.h>
-#include <wtf/text/TextPosition.h>
-#include <wtf/text/WTFString.h>
 
 namespace WebCore {
 
@@ -170,16 +168,14 @@ public:
     }
 
     SegmentedString(const SegmentedString&);
-
-    const SegmentedString& operator=(const SegmentedString&);
+    SegmentedString& operator=(const SegmentedString&);
 
     void clear();
     void close();
 
     void append(const SegmentedString&);
-    void prepend(const SegmentedString&);
+    void pushBack(const SegmentedString&);
 
-    bool excludeLineNumbers() const { return m_currentString.excludeLineNumbers(); }
     void setExcludeLineNumbers();
 
     void push(UChar c)
@@ -199,14 +195,9 @@ public:
 
     bool isClosed() const { return m_closed; }
 
-    enum LookAheadResult {
-        DidNotMatch,
-        DidMatch,
-        NotEnoughCharacters,
-    };
-
-    LookAheadResult lookAhead(const String& string) { return lookAheadInline(string, true); }
-    LookAheadResult lookAheadIgnoringCase(const String& string) { return lookAheadInline(string, false); }
+    enum AdvancePastResult { DidNotMatch, DidMatch, NotEnoughCharacters };
+    template<unsigned length> AdvancePastResult advancePast(const char (&literal)[length]) { return advancePast(literal, length - 1, true); }
+    template<unsigned length> AdvancePastResult advancePastIgnoringCase(const char (&literal)[length]) { return advancePast(literal, length - 1, false); }
 
     void advance()
     {
@@ -226,7 +217,7 @@ public:
         (this->*m_advanceFunc)();
     }
 
-    inline void advanceAndUpdateLineNumber()
+    void advanceAndUpdateLineNumber()
     {
         if (m_fastPathFlags & Use8BitAdvance) {
             ASSERT(!m_pushedChar1);
@@ -253,18 +244,6 @@ public:
         (this->*m_advanceAndUpdateLineNumberFunc)();
     }
 
-    void advanceAndASSERT(UChar expectedCharacter)
-    {
-        ASSERT_UNUSED(expectedCharacter, currentChar() == expectedCharacter);
-        advance();
-    }
-
-    void advanceAndASSERTIgnoringCase(UChar expectedCharacter)
-    {
-        ASSERT_UNUSED(expectedCharacter, u_foldCase(currentChar(), U_FOLD_CASE_DEFAULT) == u_foldCase(expectedCharacter, U_FOLD_CASE_DEFAULT));
-        advance();
-    }
-
     void advancePastNonNewline()
     {
         ASSERT(currentChar() != '\n');
@@ -286,12 +265,6 @@ public:
         advanceAndUpdateLineNumberSlowCase();
     }
 
-    // Writes the consumed characters into consumedCharacters, which must
-    // have space for at least |count| characters.
-    void advance(unsigned count, UChar* consumedCharacters);
-
-    bool escaped() const { return m_pushedChar1; }
-
     int numberOfCharactersConsumed() const
     {
         int numberOfPushedCharacters = 0;
@@ -307,12 +280,12 @@ public:
 
     UChar currentChar() const { return m_currentChar; }    
 
-    // The method is moderately slow, comparing to currentLine method.
     OrdinalNumber currentColumn() const;
     OrdinalNumber currentLine() const;
-    // Sets value of line/column variables. Column is specified indirectly by a parameter columnAftreProlog
+
+    // Sets value of line/column variables. Column is specified indirectly by a parameter columnAfterProlog
     // which is a value of column that we should get after a prolog (first prologLength characters) has been consumed.
-    void setCurrentPosition(OrdinalNumber line, OrdinalNumber columnAftreProlog, int prologLength);
+    void setCurrentPosition(OrdinalNumber line, OrdinalNumber columnAfterProlog, int prologLength);
 
 private:
     enum FastPathFlags {
@@ -322,7 +295,7 @@ private:
     };
 
     void append(const SegmentedSubstring&);
-    void prepend(const SegmentedSubstring&);
+    void pushBack(const SegmentedSubstring&);
 
     void advance8();
     void advance16();
@@ -374,31 +347,12 @@ private:
         updateSlowCaseFunctionPointers();
     }
 
-    inline LookAheadResult lookAheadInline(const String& string, bool caseSensitive)
-    {
-        if (!m_pushedChar1 && string.length() <= static_cast<unsigned>(m_currentString.m_length)) {
-            String currentSubstring = m_currentString.currentSubString(string.length());
-            if (currentSubstring.startsWith(string, caseSensitive))
-                return DidMatch;
-            return DidNotMatch;
-        }
-        return lookAheadSlowCase(string, caseSensitive);
-    }
-    
-    LookAheadResult lookAheadSlowCase(const String& string, bool caseSensitive)
-    {
-        unsigned count = string.length();
-        if (count > length())
-            return NotEnoughCharacters;
-        UChar* consumedCharacters;
-        String consumedString = String::createUninitialized(count, consumedCharacters);
-        advance(count, consumedCharacters);
-        LookAheadResult result = DidNotMatch;
-        if (consumedString.startsWith(string, caseSensitive))
-            result = DidMatch;
-        prepend(SegmentedString(consumedString));
-        return result;
-    }
+    // Writes consumed characters into consumedCharacters, which must have space for at least |count| characters.
+    void advancePastNonNewlines(unsigned count);
+    void advancePastNonNewlines(unsigned count, UChar* consumedCharacters);
+
+    AdvancePastResult advancePast(const char* literal, unsigned length, bool caseSensitive);
+    AdvancePastResult advancePastSlowCase(const char* literal, bool caseSensitive);
 
     bool isComposite() const { return !m_substrings.isEmpty(); }
 
@@ -417,6 +371,27 @@ private:
     void (SegmentedString::*m_advanceAndUpdateLineNumberFunc)();
 };
 
+inline void SegmentedString::advancePastNonNewlines(unsigned count)
+{
+    for (unsigned i = 0; i < count; ++i)
+        advancePastNonNewline();
+}
+
+inline SegmentedString::AdvancePastResult SegmentedString::advancePast(const char* literal, unsigned length, bool caseSensitive)
+{
+    ASSERT(strlen(literal) == length);
+    ASSERT(!strchr(literal, '\n'));
+    if (!m_pushedChar1) {
+        if (length <= static_cast<unsigned>(m_currentString.m_length)) {
+            if (!m_currentString.currentSubString(length).startsWith(literal, caseSensitive))
+                return DidNotMatch;
+            advancePastNonNewlines(length);
+            return DidMatch;
+        }
+    }
+    return advancePastSlowCase(literal, caseSensitive);
+}
+
 }
 
 #endif
index 681eb33ae41efc5c5cee589a0fe09b94cbb40e01..23e0c0222a2003e2f6336a75a692fc5b140f03b2 100644 (file)
 
 namespace WebCore {
 
-inline bool isHexDigit(UChar cc)
-{
-    return (cc >= '0' && cc <= '9') || (cc >= 'a' && cc <= 'f') || (cc >= 'A' && cc <= 'F');
-}
-
 inline void unconsumeCharacters(SegmentedString& source, const StringBuilder& consumedCharacters)
 {
-    if (consumedCharacters.length() == 1)
-        source.push(consumedCharacters[0]);
-    else if (consumedCharacters.length() == 2) {
-        source.push(consumedCharacters[0]);
-        source.push(consumedCharacters[1]);
-    } else
-        source.prepend(SegmentedString(consumedCharacters.toStringPreserveCapacity()));
+    source.pushBack(SegmentedString(consumedCharacters.toStringPreserveCapacity()));
 }
 
 template <typename ParserFunctions>
@@ -54,7 +43,7 @@ bool consumeCharacterReference(SegmentedString& source, StringBuilder& decodedCh
     ASSERT(!notEnoughCharacters);
     ASSERT(decodedCharacter.isEmpty());
     
-    enum EntityState {
+    enum {
         Initial,
         Number,
         MaybeHexLowerCaseX,
@@ -62,111 +51,102 @@ bool consumeCharacterReference(SegmentedString& source, StringBuilder& decodedCh
         Hex,
         Decimal,
         Named
-    };
-    EntityState entityState = Initial;
+    } state = Initial;
     UChar32 result = 0;
     bool overflow = false;
     const UChar32 highestValidCharacter = 0x10FFFF;
     StringBuilder consumedCharacters;
     
     while (!source.isEmpty()) {
-        UChar cc = source.currentChar();
-        switch (entityState) {
-        case Initial: {
-            if (cc == '\x09' || cc == '\x0A' || cc == '\x0C' || cc == ' ' || cc == '<' || cc == '&')
+        UChar character = source.currentChar();
+        switch (state) {
+        case Initial:
+            if (character == '\x09' || character == '\x0A' || character == '\x0C' || character == ' ' || character == '<' || character == '&')
                 return false;
-            if (additionalAllowedCharacter && cc == additionalAllowedCharacter)
+            if (additionalAllowedCharacter && character == additionalAllowedCharacter)
                 return false;
-            if (cc == '#') {
-                entityState = Number;
+            if (character == '#') {
+                state = Number;
                 break;
             }
-            if ((cc >= 'a' && cc <= 'z') || (cc >= 'A' && cc <= 'Z')) {
-                entityState = Named;
-                continue;
+            if (isASCIIAlpha(character)) {
+                state = Named;
+                goto Named;
             }
             return false;
-        }
-        case Number: {
-            if (cc == 'x') {
-                entityState = MaybeHexLowerCaseX;
+        case Number:
+            if (character == 'x') {
+                state = MaybeHexLowerCaseX;
                 break;
             }
-            if (cc == 'X') {
-                entityState = MaybeHexUpperCaseX;
+            if (character == 'X') {
+                state = MaybeHexUpperCaseX;
                 break;
             }
-            if (cc >= '0' && cc <= '9') {
-                entityState = Decimal;
-                continue;
+            if (isASCIIDigit(character)) {
+                state = Decimal;
+                goto Decimal;
             }
-            source.push('#');
+            source.pushBack(SegmentedString(ASCIILiteral("#")));
             return false;
-        }
-        case MaybeHexLowerCaseX: {
-            if (isHexDigit(cc)) {
-                entityState = Hex;
-                continue;
+        case MaybeHexLowerCaseX:
+            if (isASCIIHexDigit(character)) {
+                state = Hex;
+                goto Hex;
             }
-            source.push('#');
-            source.push('x');
+            source.pushBack(SegmentedString(ASCIILiteral("#x")));
             return false;
-        }
-        case MaybeHexUpperCaseX: {
-            if (isHexDigit(cc)) {
-                entityState = Hex;
-                continue;
+        case MaybeHexUpperCaseX:
+            if (isASCIIHexDigit(character)) {
+                state = Hex;
+                goto Hex;
             }
-            source.push('#');
-            source.push('X');
+            source.pushBack(SegmentedString(ASCIILiteral("#X")));
             return false;
-        }
-        case Hex: {
-            if (cc >= '0' && cc <= '9')
-                result = result * 16 + cc - '0';
-            else if (cc >= 'a' && cc <= 'f')
-                result = result * 16 + 10 + cc - 'a';
-            else if (cc >= 'A' && cc <= 'F')
-                result = result * 16 + 10 + cc - 'A';
-            else if (cc == ';') {
-                source.advanceAndASSERT(cc);
+        case Hex:
+        Hex:
+            if (isASCIIHexDigit(character)) {
+                result = result * 16 + toASCIIHexValue(character);
+                if (result > highestValidCharacter)
+                    overflow = true;
+                break;
+            }
+            if (character == ';') {
+                source.advance();
                 decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
                 return true;
-            } else if (ParserFunctions::acceptMalformed()) {
+            }
+            if (ParserFunctions::acceptMalformed()) {
                 decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
                 return true;
-            } else {
-                unconsumeCharacters(source, consumedCharacters);
-                return false;
             }
-            if (result > highestValidCharacter)
-                overflow = true;
-            break;
-        }
-        case Decimal: {
-            if (cc >= '0' && cc <= '9')
-                result = result * 10 + cc - '0';
-            else if (cc == ';') {
-                source.advanceAndASSERT(cc);
+            unconsumeCharacters(source, consumedCharacters);
+            return false;
+        case Decimal:
+        Decimal:
+            if (isASCIIDigit(character)) {
+                result = result * 10 + character - '0';
+                if (result > highestValidCharacter)
+                    overflow = true;
+                break;
+            }
+            if (character == ';') {
+                source.advance();
                 decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
                 return true;
-            } else if (ParserFunctions::acceptMalformed()) {
+            }
+            if (ParserFunctions::acceptMalformed()) {
                 decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
                 return true;
-            } else {
-                unconsumeCharacters(source, consumedCharacters);
-                return false;
             }
-            if (result > highestValidCharacter)
-                overflow = true;
-            break;
-        }
-        case Named: {
-            return ParserFunctions::consumeNamedEntity(source, decodedCharacter, notEnoughCharacters, additionalAllowedCharacter, cc);
-        }
+            unconsumeCharacters(source, consumedCharacters);
+            return false;
+        case Named:
+        Named:
+            return ParserFunctions::consumeNamedEntity(source, decodedCharacter, notEnoughCharacters, additionalAllowedCharacter, character);
         }
-        consumedCharacters.append(cc);
-        source.advanceAndASSERT(cc);
+        consumedCharacters.append(character);
+        source.advance();
     }
     ASSERT(source.isEmpty());
     notEnoughCharacters = true;
index e0b3156bb540380cdbb0d347ce038690809f224a..987510e8a0786cfcf9753c860ce831411aefbd40 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2008 Apple Inc. All Rights Reserved.
+ * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved.
  * Copyright (C) 2009 Torch Mobile, Inc. http://www.torchmobile.com/
  * Copyright (C) 2010 Google, Inc. All Rights Reserved.
  *
 
 #include "SegmentedString.h"
 
+#if COMPILER(MSVC)
+// Disable the "unreachable code" warning so we can compile the ASSERT_NOT_REACHED in the END_STATE macro.
+#pragma warning(disable: 4702)
+#endif
+
 namespace WebCore {
 
-inline bool isTokenizerWhitespace(UChar cc)
+inline bool isTokenizerWhitespace(UChar character)
 {
-    return cc == ' ' || cc == '\x0A' || cc == '\x09' || cc == '\x0C';
+    return character == ' ' || character == '\x0A' || character == '\x09' || character == '\x0C';
 }
 
-inline void advanceStringAndASSERTIgnoringCase(SegmentedString& source, const char* expectedCharacters)
-{
-    while (*expectedCharacters)
-        source.advanceAndASSERTIgnoringCase(*expectedCharacters++);
-}
+#define BEGIN_STATE(stateName)                                  \
+    case stateName:                                             \
+    stateName: {                                                \
+        const auto currentState = stateName;                    \
+        UNUSED_PARAM(currentState);
 
-inline void advanceStringAndASSERT(SegmentedString& source, const char* expectedCharacters)
-{
-    while (*expectedCharacters)
-        source.advanceAndASSERT(*expectedCharacters++);
-}
-
-#if COMPILER(MSVC)
-// We need to disable the "unreachable code" warning because we want to assert
-// that some code points aren't reached in the state machine.
-#pragma warning(disable: 4702)
-#endif
+#define END_STATE()                                             \
+        ASSERT_NOT_REACHED();                                   \
+        break;                                                  \
+    }
 
-#define BEGIN_STATE(prefix, stateName) case prefix::stateName: stateName:
-#define END_STATE() ASSERT_NOT_REACHED(); break;
+#define RETURN_IN_CURRENT_STATE(expression)                     \
+    do {                                                        \
+        m_state = currentState;                                 \
+        return expression;                                      \
+    } while (false)
 
-// We use this macro when the HTML5 spec says "reconsume the current input
-// character in the <mumble> state."
-#define RECONSUME_IN(prefix, stateName)                                    \
-    do {                                                                   \
-        m_state = prefix::stateName;                                       \
-        goto stateName;                                                    \
+// We use this macro when the HTML spec says "reconsume the current input character in the <mumble> state."
+#define RECONSUME_IN(newState)                                  \
+    do {                                                        \
+        goto newState;                                          \
     } while (false)
 
-// We use this macro when the HTML5 spec says "consume the next input
-// character ... and switch to the <mumble> state."
-#define ADVANCE_TO(prefix, stateName)                                      \
-    do {                                                                   \
-        m_state = prefix::stateName;                                       \
-        if (!m_inputStreamPreprocessor.advance(source))                    \
-            return haveBufferedCharacterToken();                           \
-        cc = m_inputStreamPreprocessor.nextInputCharacter();               \
-        goto stateName;                                                    \
+// We use this macro when the HTML spec says "consume the next input character ... and switch to the <mumble> state."
+#define ADVANCE_TO(newState)                                    \
+    do {                                                        \
+        if (!m_preprocessor.advance(source, isNullCharacterSkippingState(newState))) { \
+            m_state = newState;                                 \
+            return haveBufferedCharacterToken();                \
+        }                                                       \
+        character = m_preprocessor.nextInputCharacter();        \
+        goto newState;                                          \
     } while (false)
 
-// Sometimes there's more complicated logic in the spec that separates when
-// we consume the next input character and when we switch to a particular
-// state. We handle those cases by advancing the source directly and using
-// this macro to switch to the indicated state.
-#define SWITCH_TO(prefix, stateName)                                       \
-    do {                                                                   \
-        m_state = prefix::stateName;                                       \
-        if (source.isEmpty() || !m_inputStreamPreprocessor.peek(source))   \
-            return haveBufferedCharacterToken();                           \
-        cc = m_inputStreamPreprocessor.nextInputCharacter();               \
-        goto stateName;                                                    \
+// For more complex cases, caller consumes the characters first and then uses this macro.
+#define SWITCH_TO(newState)                                     \
+    do {                                                        \
+        if (!m_preprocessor.peek(source, isNullCharacterSkippingState(newState))) { \
+            m_state = newState;                                 \
+            return haveBufferedCharacterToken();                \
+        }                                                       \
+        character = m_preprocessor.nextInputCharacter();        \
+        goto newState;                                          \
     } while (false)
 
 }