Implement URLParser::syntaxViolation
authorachristensen@apple.com <achristensen@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Tue, 27 Sep 2016 20:07:15 +0000 (20:07 +0000)
committerachristensen@apple.com <achristensen@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Tue, 27 Sep 2016 20:07:15 +0000 (20:07 +0000)
https://bugs.webkit.org/show_bug.cgi?id=162593

Reviewed by Geoffrey Garen.

Source/WebCore:

Most of the time when parsing URLs, we just look at the URL, find offsets of the host, path, query, etc.,
and the String can be used untouched.  When this happens, we do not want to allocate and copy the String.
We want to just add a reference to an existing String.

Sometimes we need to canonicalize the String because there has been a syntaxViolation,
defined as any String that is different than its canonicalized URL String.  In such cases we need to
allocate a new String and fill it with the canonicalized URL String.  When a syntaxViolation happens for the
first time, we know that everything in the input String up to that point is equal to what it would have been
if we had canonicalized the beginning of the URL, copy it into a buffer, and continue parsing in a mode where
instead of just looking at the input URL String, we canonicalize each code point into the buffer.

Changes to behavior involve additional spec compliance with tabs and newlines in different places in URLs,
as well as additional spec compliance when parsing empty and null URLs relative to other URLs.
Both are covered by new API tests. Existing behavior covered by existing API tests.

This is about a 15% speed improvement on my URL parsing benchmark.

* platform/URL.cpp:
(WebCore::assertProtocolIsGood):
(WebCore::URL::protocolIs):
(WebCore::protocolIs):
* platform/URL.h:
* platform/URLParser.cpp:
(WebCore::isTabOrNewline):
(WebCore::URLParser::incrementIteratorSkippingTabsAndNewlines):
(WebCore::URLParser::isWindowsDriveLetter):
(WebCore::URLParser::appendToASCIIBuffer):
(WebCore::URLParser::checkWindowsDriveLetter):
(WebCore::URLParser::shouldCopyFileURL):
(WebCore::URLParser::utf8PercentEncode):
(WebCore::URLParser::utf8QueryEncode):
(WebCore::URLParser::copyURLPartsUntil):
(WebCore::URLParser::syntaxViolation):
(WebCore::URLParser::fragmentSyntaxViolation):
(WebCore::URLParser::parsedDataView):
(WebCore::URLParser::currentPosition):
(WebCore::URLParser::URLParser):
(WebCore::URLParser::parse):
(WebCore::URLParser::parseAuthority):
(WebCore::URLParser::parseIPv4Number):
(WebCore::URLParser::parseIPv4Host):
(WebCore::URLParser::parseIPv6Host):
(WebCore::URLParser::parsePort):
(WebCore::URLParser::parseHostAndPort):
(WebCore::serializeURLEncodedForm):
(WebCore::URLParser::allValuesEqual):
(WebCore::URLParser::internalValuesConsistent):
(WebCore::URLParser::incrementIteratorSkippingTabAndNewLine): Deleted.
(WebCore::URLParser::syntaxError): Deleted.
(WebCore::parseIPv4Number): Deleted.
* platform/URLParser.h:
(WebCore::URLParser::incrementIteratorSkippingTabsAndNewlines):

Tools:

* TestWebKitAPI/Tests/WebCore/URLParser.cpp:
(TestWebKitAPI::TEST_F):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@206457 268f45cc-cd09-0410-ab3c-d52691b4dbfc

Source/WebCore/ChangeLog
Source/WebCore/platform/URL.cpp
Source/WebCore/platform/URL.h
Source/WebCore/platform/URLParser.cpp
Source/WebCore/platform/URLParser.h
Tools/ChangeLog
Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp

index 867a9cc..acdbd97 100644 (file)
@@ -1,3 +1,63 @@
+2016-09-27  Alex Christensen  <achristensen@webkit.org>
+
+        Implement URLParser::syntaxViolation
+        https://bugs.webkit.org/show_bug.cgi?id=162593
+
+        Reviewed by Geoffrey Garen.
+
+        Most of the time when parsing URLs, we just look at the URL, find offsets of the host, path, query, etc., 
+        and the String can be used untouched.  When this happens, we do not want to allocate and copy the String.
+        We want to just add a reference to an existing String.
+
+        Sometimes we need to canonicalize the String because there has been a syntaxViolation,
+        defined as any String that is different than its canonicalized URL String.  In such cases we need to
+        allocate a new String and fill it with the canonicalized URL String.  When a syntaxViolation happens for the
+        first time, we know that everything in the input String up to that point is equal to what it would have been
+        if we had canonicalized the beginning of the URL, copy it into a buffer, and continue parsing in a mode where
+        instead of just looking at the input URL String, we canonicalize each code point into the buffer.
+
+        Changes to behavior involve additional spec compliance with tabs and newlines in different places in URLs,
+        as well as additional spec compliance when parsing empty and null URLs relative to other URLs.
+        Both are covered by new API tests. Existing behavior covered by existing API tests.
+
+        This is about a 15% speed improvement on my URL parsing benchmark.
+
+        * platform/URL.cpp:
+        (WebCore::assertProtocolIsGood):
+        (WebCore::URL::protocolIs):
+        (WebCore::protocolIs):
+        * platform/URL.h:
+        * platform/URLParser.cpp:
+        (WebCore::isTabOrNewline):
+        (WebCore::URLParser::incrementIteratorSkippingTabsAndNewlines):
+        (WebCore::URLParser::isWindowsDriveLetter):
+        (WebCore::URLParser::appendToASCIIBuffer):
+        (WebCore::URLParser::checkWindowsDriveLetter):
+        (WebCore::URLParser::shouldCopyFileURL):
+        (WebCore::URLParser::utf8PercentEncode):
+        (WebCore::URLParser::utf8QueryEncode):
+        (WebCore::URLParser::copyURLPartsUntil):
+        (WebCore::URLParser::syntaxViolation):
+        (WebCore::URLParser::fragmentSyntaxViolation):
+        (WebCore::URLParser::parsedDataView):
+        (WebCore::URLParser::currentPosition):
+        (WebCore::URLParser::URLParser):
+        (WebCore::URLParser::parse):
+        (WebCore::URLParser::parseAuthority):
+        (WebCore::URLParser::parseIPv4Number):
+        (WebCore::URLParser::parseIPv4Host):
+        (WebCore::URLParser::parseIPv6Host):
+        (WebCore::URLParser::parsePort):
+        (WebCore::URLParser::parseHostAndPort):
+        (WebCore::serializeURLEncodedForm):
+        (WebCore::URLParser::allValuesEqual):
+        (WebCore::URLParser::internalValuesConsistent):
+        (WebCore::URLParser::incrementIteratorSkippingTabAndNewLine): Deleted.
+        (WebCore::URLParser::syntaxError): Deleted.
+        (WebCore::parseIPv4Number): Deleted.
+        * platform/URLParser.h:
+        (WebCore::URLParser::incrementIteratorSkippingTabsAndNewlines):
+
 2016-09-27  Wenson Hsieh  <wenson_hsieh@apple.com>
 
         Related videos on YouTube (and YouTube playlists) cause media controls to disappear
index a2385ba..2dc839d 100644 (file)
@@ -788,15 +788,15 @@ String URL::fileSystemPath() const
 
 #ifdef NDEBUG
 
-static inline void assertProtocolIsGood(const char*, size_t)
+static inline void assertProtocolIsGood(StringView)
 {
 }
 
 #else
 
-static void assertProtocolIsGood(const char* protocol, size_t length)
+static void assertProtocolIsGood(StringView protocol)
 {
-    for (size_t i = 0; i < length; ++i) {
+    for (size_t i = 0; i < protocol.length(); ++i) {
         const char c = protocol[i];
         ASSERT(c > ' ' && c < 0x7F && !(c >= 'A' && c <= 'Z'));
     }
@@ -806,7 +806,7 @@ static void assertProtocolIsGood(const char* protocol, size_t length)
 
 bool URL::protocolIs(const char* protocol) const
 {
-    assertProtocolIsGood(protocol, strlen(protocol));
+    assertProtocolIsGood(StringView(reinterpret_cast<const LChar*>(protocol), strlen(protocol)));
 
     // JavaScript URLs are "valid" and should be executed even if URL decides they are invalid.
     // The free function protocolIsJavaScript() should be used instead. 
@@ -823,14 +823,14 @@ bool URL::protocolIs(const char* protocol) const
     return !protocol[m_schemeEnd]; // We should have consumed all characters in the argument.
 }
 
-bool URL::protocolIs(const LChar* protocol, size_t length) const
+bool URL::protocolIs(StringView protocol) const
 {
-    assertProtocolIsGood(reinterpret_cast<const char*>(protocol), length);
+    assertProtocolIsGood(protocol);
 
     if (!m_isValid)
         return false;
     
-    if (m_schemeEnd != length)
+    if (m_schemeEnd != protocol.length())
         return false;
 
     // Do the comparison without making a new string object.
@@ -1914,7 +1914,7 @@ String encodeWithURLEscapeSequences(const String& notEncodedString)
 
 static bool protocolIs(StringView stringURL, const char* protocol)
 {
-    assertProtocolIsGood(protocol, strlen(protocol));
+    assertProtocolIsGood(StringView(reinterpret_cast<const LChar*>(protocol), strlen(protocol)));
     unsigned length = stringURL.length();
     for (unsigned i = 0; i < length; ++i) {
         if (!protocol[i])
@@ -2141,7 +2141,7 @@ void URL::copyToBuffer(Vector<char, 512>& buffer) const
 bool protocolIs(const String& url, const char* protocol)
 {
     // Do the comparison without making a new string object.
-    assertProtocolIsGood(protocol, strlen(protocol));
+    assertProtocolIsGood(StringView(reinterpret_cast<const LChar*>(protocol), strlen(protocol)));
     bool isLeading = true;
     for (unsigned i = 0, j = 0; url[i]; ++i) {
         // skip leading whitespace and control characters.
index 576703d..cd6f9ba 100644 (file)
@@ -129,7 +129,7 @@ public:
     // Returns true if the current URL's protocol is the same as the null-
     // terminated ASCII argument. The argument must be lower-case.
     WEBCORE_EXPORT bool protocolIs(const char*) const;
-    bool protocolIs(const LChar*, size_t) const;
+    bool protocolIs(StringView) const;
     bool protocolIsBlob() const { return protocolIs("blob"); }
     bool protocolIsData() const { return protocolIs("data"); }
     bool protocolIsInHTTPFamily() const;
index 6db37f0..f5d4131 100644 (file)
@@ -410,11 +410,11 @@ template<typename CharacterType> inline static bool isValidSchemeCharacter(Chara
 static bool shouldPercentEncodeQueryByte(uint8_t byte) { return characterClassTable[byte] & QueryPercent; }
 
 template<typename CharacterType>
-void URLParser::incrementIteratorSkippingTabAndNewLine(CodePointIterator<CharacterType>& iterator)
+void URLParser::advance(CodePointIterator<CharacterType>& iterator, const CodePointIterator<CharacterType>& iteratorForSyntaxViolationPosition)
 {
     ++iterator;
-    while (!iterator.atEnd() && isTabOrNewline(*iterator)) {
-        syntaxError(iterator);
+    while (UNLIKELY(!iterator.atEnd() && isTabOrNewline(*iterator))) {
+        syntaxViolation(iteratorForSyntaxViolationPosition);
         ++iterator;
     }
 }
@@ -424,13 +424,13 @@ bool URLParser::isWindowsDriveLetter(CodePointIterator<CharacterType> iterator)
 {
     if (iterator.atEnd() || !isASCIIAlpha(*iterator))
         return false;
-    incrementIteratorSkippingTabAndNewLine(iterator);
+    advance(iterator);
     if (iterator.atEnd())
         return false;
     if (*iterator == ':')
         return true;
-    if (*iterator == '|') {
-        syntaxError(iterator);
+    if (UNLIKELY(*iterator == '|')) {
+        syntaxViolation(iterator);
         return true;
     }
     return false;
@@ -447,14 +447,14 @@ void URLParser::appendToASCIIBuffer(UChar32 codePoint)
 {
     ASSERT(m_unicodeFragmentBuffer.isEmpty());
     ASSERT(isASCII(codePoint));
-    if (m_seenSyntaxError)
+    if (UNLIKELY(m_didSeeSyntaxViolation))
         m_asciiBuffer.append(codePoint);
 }
 
 void URLParser::appendToASCIIBuffer(const char* characters, size_t length)
 {
     ASSERT(m_unicodeFragmentBuffer.isEmpty());
-    if (m_seenSyntaxError)
+    if (UNLIKELY(m_didSeeSyntaxViolation))
         m_asciiBuffer.append(characters, length);
 }
 
@@ -463,11 +463,11 @@ void URLParser::checkWindowsDriveLetter(CodePointIterator<CharacterType>& iterat
 {
     if (isWindowsDriveLetter(iterator)) {
         appendToASCIIBuffer(*iterator);
-        incrementIteratorSkippingTabAndNewLine(iterator);
+        advance(iterator);
         ASSERT(!iterator.atEnd());
         ASSERT(*iterator == ':' || *iterator == '|');
         appendToASCIIBuffer(':');
-        incrementIteratorSkippingTabAndNewLine(iterator);
+        advance(iterator);
     }
 }
 
@@ -478,10 +478,10 @@ bool URLParser::shouldCopyFileURL(CodePointIterator<CharacterType> iterator)
         return true;
     if (iterator.atEnd())
         return false;
-    incrementIteratorSkippingTabAndNewLine(iterator);
+    advance(iterator);
     if (iterator.atEnd())
         return true;
-    incrementIteratorSkippingTabAndNewLine(iterator);
+    advance(iterator);
     if (iterator.atEnd())
         return true;
     return !isSlashQuestionOrHash(*iterator);
@@ -504,17 +504,21 @@ void URLParser::percentEncodeByte(uint8_t byte)
 const char replacementCharacterUTF8PercentEncoded[10] = "%EF%BF%BD";
 const size_t replacementCharacterUTF8PercentEncodedLength = sizeof(replacementCharacterUTF8PercentEncoded) - 1;
 
-template<bool(*isInCodeSet)(UChar32)>
-void URLParser::utf8PercentEncode(UChar32 codePoint)
+template<bool(*isInCodeSet)(UChar32), typename CharacterType>
+void URLParser::utf8PercentEncode(const CodePointIterator<CharacterType>& iterator)
 {
-    if (isASCII(codePoint)) {
-        if (isInCodeSet(codePoint))
+    ASSERT(!iterator.atEnd());
+    UChar32 codePoint = *iterator;
+    if (LIKELY(isASCII(codePoint))) {
+        if (UNLIKELY(isInCodeSet(codePoint))) {
+            syntaxViolation(iterator);
             percentEncodeByte(codePoint);
-        else
+        else
             appendToASCIIBuffer(codePoint);
         return;
     }
     ASSERT_WITH_MESSAGE(isInCodeSet(codePoint), "isInCodeSet should always return true for non-ASCII characters");
+    syntaxViolation(iterator);
     
     if (!U_IS_UNICODE_CHAR(codePoint)) {
         appendToASCIIBuffer(replacementCharacterUTF8PercentEncoded, replacementCharacterUTF8PercentEncodedLength);
@@ -528,17 +532,22 @@ void URLParser::utf8PercentEncode(UChar32 codePoint)
         percentEncodeByte(buffer[i]);
 }
 
-
-void URLParser::utf8QueryEncode(UChar32 codePoint)
+template<typename CharacterType>
+void URLParser::utf8QueryEncode(const CodePointIterator<CharacterType>& iterator)
 {
-    if (isASCII(codePoint)) {
-        if (shouldPercentEncodeQueryByte(codePoint))
+    ASSERT(!iterator.atEnd());
+    UChar32 codePoint = *iterator;
+    if (LIKELY(isASCII(codePoint))) {
+        if (UNLIKELY(shouldPercentEncodeQueryByte(codePoint))) {
+            syntaxViolation(iterator);
             percentEncodeByte(codePoint);
-        else
+        else
             appendToASCIIBuffer(codePoint);
         return;
     }
     
+    syntaxViolation(iterator);
+    
     if (!U_IS_UNICODE_CHAR(codePoint)) {
         appendToASCIIBuffer(replacementCharacterUTF8PercentEncoded, replacementCharacterUTF8PercentEncodedLength);
         return;
@@ -748,8 +757,11 @@ void URLParser::copyASCIIStringUntil(const String& string, size_t lengthIf8Bit,
     }
 }
 
-void URLParser::copyURLPartsUntil(const URL& base, URLPart part)
+template<typename CharacterType>
+void URLParser::copyURLPartsUntil(const URL& base, URLPart part, const CodePointIterator<CharacterType>& iterator)
 {
+    syntaxViolation(iterator);
+
     m_asciiBuffer.clear();
     m_unicodeFragmentBuffer.clear();
     if (part == URLPart::FragmentEnd) {
@@ -933,9 +945,45 @@ void URLParser::popPath()
 }
 
 template<typename CharacterType>
-void URLParser::syntaxError(const CodePointIterator<CharacterType>&)
+void URLParser::syntaxViolation(const CodePointIterator<CharacterType>& iterator)
 {
-    // FIXME: Implement.
+    if (m_didSeeSyntaxViolation)
+        return;
+    m_didSeeSyntaxViolation = true;
+    
+    ASSERT(m_asciiBuffer.isEmpty());
+    ASSERT(m_unicodeFragmentBuffer.isEmpty());
+    ASSERT_WITH_MESSAGE(!m_url.m_queryEnd, "syntaxViolation should not be used in the fragment, which might contain non-ASCII code points when serialized");
+    size_t codeUnitsToCopy = iterator.codeUnitsSince(reinterpret_cast<const CharacterType*>(m_inputBegin));
+    RELEASE_ASSERT(codeUnitsToCopy <= m_inputString.length());
+    m_asciiBuffer.reserveCapacity(m_inputString.length());
+    for (size_t i = 0; i < codeUnitsToCopy; ++i) {
+        ASSERT(isASCII(m_inputString[i]));
+        m_asciiBuffer.uncheckedAppend(m_inputString[i]);
+    }
+}
+
+template<typename CharacterType>
+void URLParser::fragmentSyntaxViolation(const CodePointIterator<CharacterType>& iterator)
+{
+    if (m_didSeeSyntaxViolation)
+        return;
+    m_didSeeSyntaxViolation = true;
+
+    ASSERT(m_asciiBuffer.isEmpty());
+    ASSERT(m_unicodeFragmentBuffer.isEmpty());
+    size_t codeUnitsToCopy = iterator.codeUnitsSince(reinterpret_cast<const CharacterType*>(m_inputBegin));
+    size_t asciiCodeUnitsToCopy = m_url.m_queryEnd;
+    size_t unicodeCodeUnitsToCopy = codeUnitsToCopy - asciiCodeUnitsToCopy;
+    RELEASE_ASSERT(codeUnitsToCopy <= m_inputString.length());
+    m_asciiBuffer.reserveCapacity(asciiCodeUnitsToCopy);
+    for (size_t i = 0; i < asciiCodeUnitsToCopy; ++i) {
+        ASSERT(isASCII(m_inputString[i]));
+        m_asciiBuffer.uncheckedAppend(m_inputString[i]);
+    }
+    m_unicodeFragmentBuffer.reserveCapacity(m_inputString.length() - asciiCodeUnitsToCopy);
+    for (size_t i = asciiCodeUnitsToCopy; i < asciiCodeUnitsToCopy + unicodeCodeUnitsToCopy; ++i)
+        m_unicodeFragmentBuffer.uncheckedAppend(m_inputString[i]);
 }
 
 void URLParser::failure()
@@ -944,11 +992,23 @@ void URLParser::failure()
     m_url.m_string = m_inputString;
 }
 
+StringView URLParser::parsedDataView(size_t start, size_t length)
+{
+    if (UNLIKELY(m_didSeeSyntaxViolation)) {
+        ASSERT(start + length <= m_asciiBuffer.size());
+        return StringView(m_asciiBuffer.data() + start, length);
+    }
+    ASSERT(start + length <= m_inputString.length());
+    return StringView(m_inputString).substring(start, length);
+}
+
 template<typename CharacterType>
 size_t URLParser::currentPosition(const CodePointIterator<CharacterType>& iterator)
 {
-    if (m_seenSyntaxError)
+    if (UNLIKELY(m_didSeeSyntaxViolation)) {
+        ASSERT(m_unicodeFragmentBuffer.isEmpty());
         return m_asciiBuffer.size();
+    }
     
     return iterator.codeUnitsSince(reinterpret_cast<const CharacterType*>(m_inputBegin));
 }
@@ -956,8 +1016,11 @@ size_t URLParser::currentPosition(const CodePointIterator<CharacterType>& iterat
 URLParser::URLParser(const String& input, const URL& base, const TextEncoding& encoding)
     : m_inputString(input)
 {
-    if (input.isNull())
+    if (input.isNull()) {
+        if (base.isValid() && !base.m_cannotBeABaseURL)
+            m_url = base;
         return;
+    }
 
     if (input.is8Bit()) {
         m_inputBegin = input.characters8();
@@ -966,6 +1029,9 @@ URLParser::URLParser(const String& input, const URL& base, const TextEncoding& e
         m_inputBegin = input.characters16();
         parse(input.characters16(), input.length(), base, encoding);
     }
+    ASSERT(!m_url.m_isValid
+        || m_didSeeSyntaxViolation == (m_url.string() != input)
+        || (input.isEmpty() && m_url.m_string == base.m_string));
 }
 
 template<typename CharacterType>
@@ -975,18 +1041,21 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
     m_url = { };
     ASSERT(m_asciiBuffer.isEmpty());
     ASSERT(m_unicodeFragmentBuffer.isEmpty());
-    m_asciiBuffer.reserveInitialCapacity(length);
     
     bool isUTF8Encoding = encoding == UTF8Encoding();
     Vector<UChar> queryBuffer;
 
     unsigned endIndex = length;
-    while (endIndex && isC0ControlOrSpace(input[endIndex - 1]))
+    while (UNLIKELY(endIndex && isC0ControlOrSpace(input[endIndex - 1]))) {
+        syntaxViolation(CodePointIterator<CharacterType>(input, input));
         endIndex--;
+    }
     CodePointIterator<CharacterType> c(input, input + endIndex);
     CodePointIterator<CharacterType> authorityOrHostBegin;
-    while (!c.atEnd() && isC0ControlOrSpace(*c))
+    while (UNLIKELY(!c.atEnd() && isC0ControlOrSpace(*c))) {
+        syntaxViolation(c);
         ++c;
+    }
     auto beginAfterControlAndSpace = c;
 
     enum class State : uint8_t {
@@ -1011,13 +1080,13 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
         Fragment,
     };
 
-#define LOG_STATE(x) LOG(URLParser, "State %s, code point %c, asciiBuffer size %zu", x, *c, currentPosition(c))
+#define LOG_STATE(x) LOG(URLParser, "State %s, code point %c, parsed data <%s> size %zu", x, *c, parsedDataView(0, currentPosition(c)).utf8().data(), currentPosition(c))
 #define LOG_FINAL_STATE(x) LOG(URLParser, "Final State: %s", x)
 
     State state = State::SchemeStart;
     while (!c.atEnd()) {
-        if (isTabOrNewline(*c)) {
-            syntaxError(c);
+        if (UNLIKELY(isTabOrNewline(*c))) {
+            syntaxViolation(c);
             ++c;
             continue;
         }
@@ -1026,8 +1095,10 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
         case State::SchemeStart:
             LOG_STATE("SchemeStart");
             if (isASCIIAlpha(*c)) {
+                if (UNLIKELY(isASCIIUpper(*c)))
+                    syntaxViolation(c);
                 appendToASCIIBuffer(toASCIILower(*c));
-                incrementIteratorSkippingTabAndNewLine(c);
+                advance(c);
                 if (c.atEnd()) {
                     m_asciiBuffer.clear();
                     state = State::NoScheme;
@@ -1039,36 +1110,40 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
             break;
         case State::Scheme:
             LOG_STATE("Scheme");
-            if (isValidSchemeCharacter(*c))
+            if (isValidSchemeCharacter(*c)) {
+                if (UNLIKELY(isASCIIUpper(*c)))
+                    syntaxViolation(c);
                 appendToASCIIBuffer(toASCIILower(*c));
-            else if (*c == ':') {
+            else if (*c == ':') {
                 m_url.m_schemeEnd = currentPosition(c);
-                StringView urlScheme = StringView(m_asciiBuffer.data(), m_url.m_schemeEnd);
+                StringView urlScheme = parsedDataView(0, m_url.m_schemeEnd);
                 m_url.m_protocolIsInHTTPFamily = urlScheme == "http" || urlScheme == "https";
+                appendToASCIIBuffer(':');
                 if (urlScheme == "file") {
                     m_urlIsSpecial = true;
                     state = State::File;
-                    appendToASCIIBuffer(':');
                     ++c;
                     break;
                 }
-                appendToASCIIBuffer(':');
                 if (isSpecialScheme(urlScheme)) {
                     m_urlIsSpecial = true;
-                    if (base.protocolIs(m_asciiBuffer.data(), currentPosition(c) - 1))
+                    if (base.protocolIs(urlScheme))
                         state = State::SpecialRelativeOrAuthority;
                     else
                         state = State::SpecialAuthoritySlashes;
+                    ++c;
                 } else {
                     auto maybeSlash = c;
-                    incrementIteratorSkippingTabAndNewLine(maybeSlash);
+                    advance(maybeSlash);
                     if (!maybeSlash.atEnd() && *maybeSlash == '/') {
                         appendToASCIIBuffer('/');
-                        m_url.m_userStart = currentPosition(c);
-                        state = State::PathOrAuthority;
                         c = maybeSlash;
+                        state = State::PathOrAuthority;
                         ASSERT(*c == '/');
+                        ++c;
+                        m_url.m_userStart = currentPosition(c);
                     } else {
+                        ++c;
                         m_url.m_userStart = currentPosition(c);
                         m_url.m_userEnd = m_url.m_userStart;
                         m_url.m_passwordEnd = m_url.m_userStart;
@@ -1079,7 +1154,6 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                         state = State::CannotBeABaseURLPath;
                     }
                 }
-                ++c;
                 break;
             } else {
                 m_asciiBuffer.clear();
@@ -1087,7 +1161,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 c = beginAfterControlAndSpace;
                 break;
             }
-            incrementIteratorSkippingTabAndNewLine(c);
+            advance(c);
             if (c.atEnd()) {
                 m_asciiBuffer.clear();
                 state = State::NoScheme;
@@ -1101,7 +1175,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 return;
             }
             if (base.m_cannotBeABaseURL && *c == '#') {
-                copyURLPartsUntil(base, URLPart::QueryEnd);
+                copyURLPartsUntil(base, URLPart::QueryEnd, c);
                 state = State::Fragment;
                 appendToASCIIBuffer('#');
                 ++c;
@@ -1111,7 +1185,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 state = State::Relative;
                 break;
             }
-            copyURLPartsUntil(base, URLPart::SchemeEnd);
+            copyURLPartsUntil(base, URLPart::SchemeEnd, c);
             appendToASCIIBuffer(':');
             state = State::File;
             break;
@@ -1119,7 +1193,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
             LOG_STATE("SpecialRelativeOrAuthority");
             if (*c == '/') {
                 appendToASCIIBuffer('/');
-                incrementIteratorSkippingTabAndNewLine(c);
+                advance(c);
                 if (c.atEnd()) {
                     failure();
                     return;
@@ -1137,12 +1211,12 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
             LOG_STATE("PathOrAuthority");
             if (*c == '/') {
                 appendToASCIIBuffer('/');
-                m_url.m_userStart = currentPosition(c);
                 state = State::AuthorityOrHost;
                 ++c;
+                m_url.m_userStart = currentPosition(c);
                 authorityOrHostBegin = c;
             } else {
-                ASSERT(m_asciiBuffer.last() == '/');
+                ASSERT(parsedDataView(currentPosition(c) - 1, 1) == "/");
                 m_url.m_userStart = currentPosition(c) - 1;
                 m_url.m_userEnd = m_url.m_userStart;
                 m_url.m_passwordEnd = m_url.m_userStart;
@@ -1161,19 +1235,19 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 ++c;
                 break;
             case '?':
-                copyURLPartsUntil(base, URLPart::PathEnd);
+                copyURLPartsUntil(base, URLPart::PathEnd, c);
                 appendToASCIIBuffer('?');
                 state = State::Query;
                 ++c;
                 break;
             case '#':
-                copyURLPartsUntil(base, URLPart::QueryEnd);
+                copyURLPartsUntil(base, URLPart::QueryEnd, c);
                 appendToASCIIBuffer('#');
                 state = State::Fragment;
                 ++c;
                 break;
             default:
-                copyURLPartsUntil(base, URLPart::PathAfterLastSlash);
+                copyURLPartsUntil(base, URLPart::PathAfterLastSlash, c);
                 state = State::Path;
                 break;
             }
@@ -1182,11 +1256,11 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
             LOG_STATE("RelativeSlash");
             if (*c == '/' || *c == '\\') {
                 ++c;
-                copyURLPartsUntil(base, URLPart::SchemeEnd);
+                copyURLPartsUntil(base, URLPart::SchemeEnd, c);
                 appendToASCIIBuffer("://", 3);
                 state = State::SpecialAuthorityIgnoreSlashes;
             } else {
-                copyURLPartsUntil(base, URLPart::PortEnd);
+                copyURLPartsUntil(base, URLPart::PortEnd, c);
                 appendToASCIIBuffer('/');
                 m_url.m_pathAfterLastSlash = base.m_portEnd + 1;
                 state = State::Path;
@@ -1194,11 +1268,23 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
             break;
         case State::SpecialAuthoritySlashes:
             LOG_STATE("SpecialAuthoritySlashes");
-            appendToASCIIBuffer("//", 2);
-            if (*c == '/' || *c == '\\') {
-                incrementIteratorSkippingTabAndNewLine(c);
-                if (!c.atEnd() && (*c == '/' || *c == '\\'))
+            if (LIKELY(*c == '/' || *c == '\\')) {
+                if (UNLIKELY(*c == '\\'))
+                    syntaxViolation(c);
+                appendToASCIIBuffer('/');
+                advance(c);
+                if (LIKELY(!c.atEnd() && (*c == '/' || *c == '\\'))) {
+                    if (UNLIKELY(*c == '\\'))
+                        syntaxViolation(c);
                     ++c;
+                    appendToASCIIBuffer('/');
+                } else {
+                    syntaxViolation(c);
+                    appendToASCIIBuffer('/');
+                }
+            } else {
+                syntaxViolation(c);
+                appendToASCIIBuffer("//", 2);
             }
             state = State::SpecialAuthorityIgnoreSlashes;
             break;
@@ -1213,8 +1299,8 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
             authorityOrHostBegin = c;
             break;
         case State::AuthorityOrHost:
-            LOG_STATE("AuthorityOrHost");
-            {
+            do {
+                LOG_STATE("AuthorityOrHost");
                 if (*c == '@') {
                     auto lastAt = c;
                     auto findLastAt = c;
@@ -1225,7 +1311,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                     }
                     parseAuthority(CodePointIterator<CharacterType>(authorityOrHostBegin, lastAt));
                     c = lastAt;
-                    incrementIteratorSkippingTabAndNewLine(c);
+                    advance(c);
                     authorityOrHostBegin = c;
                     state = State::Host;
                     m_hostHasPercentOrNonASCII = false;
@@ -1233,13 +1319,14 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 }
                 bool isSlash = *c == '/' || (m_urlIsSpecial && *c == '\\');
                 if (isSlash || *c == '?' || *c == '#') {
-                    m_url.m_userEnd = currentPosition(c);
+                    m_url.m_userEnd = currentPosition(authorityOrHostBegin);
                     m_url.m_passwordEnd = m_url.m_userEnd;
                     if (!parseHostAndPort(CodePointIterator<CharacterType>(authorityOrHostBegin, c))) {
                         failure();
                         return;
                     }
-                    if (!isSlash) {
+                    if (UNLIKELY(!isSlash)) {
+                        syntaxViolation(c);
                         appendToASCIIBuffer('/');
                         m_url.m_pathAfterLastSlash = currentPosition(c);
                     }
@@ -1249,7 +1336,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 if (isPercentOrNonASCII(*c))
                     m_hostHasPercentOrNonASCII = true;
                 ++c;
-            }
+            } while (!c.atEnd());
             break;
         case State::Host:
             LOG_STATE("Host");
@@ -1268,15 +1355,18 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
         case State::File:
             LOG_STATE("File");
             switch (*c) {
-            case '/':
             case '\\':
+                syntaxViolation(c);
+                FALLTHROUGH;
+            case '/':
                 appendToASCIIBuffer('/');
                 state = State::FileSlash;
                 ++c;
                 break;
             case '?':
+                syntaxViolation(c);
                 if (base.isValid() && base.protocolIs("file"))
-                    copyURLPartsUntil(base, URLPart::PathEnd);
+                    copyURLPartsUntil(base, URLPart::PathEnd, c);
                 appendToASCIIBuffer("///?", 4);
                 m_url.m_userStart = currentPosition(c) - 2;
                 m_url.m_userEnd = m_url.m_userStart;
@@ -1289,8 +1379,9 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 ++c;
                 break;
             case '#':
+                syntaxViolation(c);
                 if (base.isValid() && base.protocolIs("file"))
-                    copyURLPartsUntil(base, URLPart::QueryEnd);
+                    copyURLPartsUntil(base, URLPart::QueryEnd, c);
                 appendToASCIIBuffer("///#", 4);
                 m_url.m_userStart = currentPosition(c) - 2;
                 m_url.m_userEnd = m_url.m_userStart;
@@ -1304,8 +1395,9 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 ++c;
                 break;
             default:
+                syntaxViolation(c);
                 if (base.isValid() && base.protocolIs("file") && shouldCopyFileURL(c))
-                    copyURLPartsUntil(base, URLPart::PathAfterLastSlash);
+                    copyURLPartsUntil(base, URLPart::PathAfterLastSlash, c);
                 else {
                     appendToASCIIBuffer("///", 3);
                     m_url.m_userStart = currentPosition(c) - 1;
@@ -1322,7 +1414,9 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
             break;
         case State::FileSlash:
             LOG_STATE("FileSlash");
-            if (*c == '/' || *c == '\\') {
+            if (LIKELY(*c == '/' || *c == '\\')) {
+                if (UNLIKELY(*c == '\\'))
+                    syntaxViolation(c);
                 ++c;
                 appendToASCIIBuffer('/');
                 m_url.m_userStart = currentPosition(c);
@@ -1347,6 +1441,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                     }
                 }
             }
+            syntaxViolation(c);
             appendToASCIIBuffer("//", 2);
             m_url.m_userStart = currentPosition(c) - 1;
             m_url.m_userEnd = m_url.m_userStart;
@@ -1365,22 +1460,24 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                     break;
                 }
                 if (authorityOrHostBegin == c) {
-                    ASSERT(m_asciiBuffer[currentPosition(c) - 1] == '/');
-                    if (*c == '?') {
+                    ASSERT(parsedDataView(currentPosition(c) - 1, 1) == "/");
+                    if (UNLIKELY(*c == '?')) {
+                        syntaxViolation(c);
                         appendToASCIIBuffer("/?", 2);
+                        ++c;
                         m_url.m_pathAfterLastSlash = currentPosition(c) - 1;
                         m_url.m_pathEnd = m_url.m_pathAfterLastSlash;
                         state = State::Query;
-                        ++c;
                         break;
                     }
-                    if (*c == '#') {
+                    if (UNLIKELY(*c == '#')) {
+                        syntaxViolation(c);
                         appendToASCIIBuffer("/#", 2);
+                        ++c;
                         m_url.m_pathAfterLastSlash = currentPosition(c) - 1;
                         m_url.m_pathEnd = m_url.m_pathAfterLastSlash;
                         m_url.m_queryEnd = m_url.m_pathAfterLastSlash;
                         state = State::Fragment;
-                        ++c;
                         break;
                     }
                     state = State::Path;
@@ -1390,8 +1487,8 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                     failure();
                     return;
                 }
-                
-                if (StringView(m_asciiBuffer.data() + m_url.m_passwordEnd, currentPosition(c) - m_url.m_passwordEnd) == "localhost")  {
+                if (UNLIKELY(equalLettersIgnoringASCIICase(parsedDataView(m_url.m_passwordEnd, currentPosition(c) - m_url.m_passwordEnd), "localhost"))) {
+                    syntaxViolation(c);
                     m_asciiBuffer.shrink(m_url.m_passwordEnd);
                     m_url.m_hostEnd = currentPosition(c);
                     m_url.m_portEnd = m_url.m_hostEnd;
@@ -1413,18 +1510,22 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
         case State::Path:
             LOG_STATE("Path");
             if (*c == '/' || (m_urlIsSpecial && *c == '\\')) {
+                if (UNLIKELY(m_urlIsSpecial && *c == '\\'))
+                    syntaxViolation(c);
                 appendToASCIIBuffer('/');
-                m_url.m_pathAfterLastSlash = currentPosition(c);
                 ++c;
+                m_url.m_pathAfterLastSlash = currentPosition(c);
                 break;
             }
-            if (currentPosition(c) && m_asciiBuffer[currentPosition(c) - 1] == '/') {
-                if (isDoubleDotPathSegment(c)) {
+            if (UNLIKELY(currentPosition(c) && parsedDataView(currentPosition(c) - 1, 1) == "/")) {
+                if (UNLIKELY(isDoubleDotPathSegment(c))) {
+                    syntaxViolation(c);
                     consumeDoubleDotPathSegment(c);
                     popPath();
                     break;
                 }
-                if (m_asciiBuffer[currentPosition(c) - 1] == '/' && isSingleDotPathSegment(c)) {
+                if (UNLIKELY(isSingleDotPathSegment(c))) {
+                    syntaxViolation(c);
                     consumeSingleDotPathSegment(c);
                     break;
                 }
@@ -1441,6 +1542,8 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 break;
             }
             if (isPercentEncodedDot(c)) {
+                if (UNLIKELY(*c != '.'))
+                    syntaxViolation(c);
                 appendToASCIIBuffer('.');
                 ASSERT(*c == '%');
                 ++c;
@@ -1450,7 +1553,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 ++c;
                 break;
             }
-            utf8PercentEncode<isInDefaultEncodeSet>(*c);
+            utf8PercentEncode<isInDefaultEncodeSet>(c);
             ++c;
             break;
         case State::CannotBeABaseURLPath:
@@ -1464,10 +1567,10 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 state = State::Fragment;
             } else if (*c == '/') {
                 appendToASCIIBuffer('/');
-                m_url.m_pathAfterLastSlash = currentPosition(c);
                 ++c;
+                m_url.m_pathAfterLastSlash = currentPosition(c);
             } else {
-                utf8PercentEncode<isInSimpleEncodeSet>(*c);
+                utf8PercentEncode<isInSimpleEncodeSet>(c);
                 ++c;
             }
             break;
@@ -1481,18 +1584,31 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 break;
             }
             if (isUTF8Encoding)
-                utf8QueryEncode(*c);
+                utf8QueryEncode(c);
             else
                 appendCodePoint(queryBuffer, *c);
             ++c;
             break;
         case State::Fragment:
-            LOG_STATE("Fragment");
-            if (m_unicodeFragmentBuffer.isEmpty() && isASCII(*c))
-                appendToASCIIBuffer(*c);
-            else
-                appendCodePoint(m_unicodeFragmentBuffer, *c);
-            ++c;
+            do {
+                LOG(URLParser, "State Fragment");
+                if (!m_didSeeUnicodeFragmentCodePoint && isASCII(*c))
+                    appendToASCIIBuffer(*c);
+                else {
+                    m_didSeeUnicodeFragmentCodePoint = true;
+                    if (UNLIKELY(m_didSeeSyntaxViolation))
+                        appendCodePoint(m_unicodeFragmentBuffer, *c);
+                    else {
+                        ASSERT(m_asciiBuffer.isEmpty());
+                        ASSERT(m_unicodeFragmentBuffer.isEmpty());
+                    }
+                }
+                ++c;
+                while (UNLIKELY(!c.atEnd() && isTabOrNewline(*c))) {
+                    fragmentSyntaxViolation(c);
+                    ++c;
+                }
+            } while (!c.atEnd());
             break;
         }
     }
@@ -1500,7 +1616,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
     switch (state) {
     case State::SchemeStart:
         LOG_FINAL_STATE("SchemeStart");
-        if (!currentPosition(c) && base.isValid()) {
+        if (!currentPosition(c) && base.isValid() && !base.m_cannotBeABaseURL) {
             m_url = base;
             return;
         }
@@ -1515,14 +1631,14 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
         RELEASE_ASSERT_NOT_REACHED();
     case State::SpecialRelativeOrAuthority:
         LOG_FINAL_STATE("SpecialRelativeOrAuthority");
-        copyURLPartsUntil(base, URLPart::QueryEnd);
+        copyURLPartsUntil(base, URLPart::QueryEnd, c);
         m_url.m_fragmentEnd = m_url.m_queryEnd;
         break;
     case State::PathOrAuthority:
         LOG_FINAL_STATE("PathOrAuthority");
         ASSERT(m_url.m_userStart);
         ASSERT(m_url.m_userStart == currentPosition(c));
-        ASSERT(m_asciiBuffer.last() == '/');
+        ASSERT(parsedDataView(currentPosition(c) - 1, 1) == "/");
         m_url.m_userStart--;
         m_url.m_userEnd = m_url.m_userStart;
         m_url.m_passwordEnd = m_url.m_userStart;
@@ -1535,11 +1651,11 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
         break;
     case State::Relative:
         LOG_FINAL_STATE("Relative");
-        copyURLPartsUntil(base, URLPart::FragmentEnd);
+        copyURLPartsUntil(base, URLPart::FragmentEnd, c);
         break;
     case State::RelativeSlash:
         LOG_FINAL_STATE("RelativeSlash");
-        copyURLPartsUntil(base, URLPart::PortEnd);
+        copyURLPartsUntil(base, URLPart::PortEnd, c);
         appendToASCIIBuffer('/');
         m_url.m_pathAfterLastSlash = base.m_portEnd + 1;
         m_url.m_pathEnd = m_url.m_pathAfterLastSlash;
@@ -1565,7 +1681,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
         break;
     case State::AuthorityOrHost:
         LOG_FINAL_STATE("AuthorityOrHost");
-        m_url.m_userEnd = currentPosition(c);
+        m_url.m_userEnd = currentPosition(authorityOrHostBegin);
         m_url.m_passwordEnd = m_url.m_userEnd;
         if (authorityOrHostBegin.atEnd()) {
             m_url.m_hostEnd = m_url.m_userEnd;
@@ -1574,6 +1690,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
             failure();
             return;
         }
+        syntaxViolation(c);
         appendToASCIIBuffer('/');
         m_url.m_pathEnd = m_url.m_portEnd + 1;
         m_url.m_pathAfterLastSlash = m_url.m_pathEnd;
@@ -1586,6 +1703,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
             failure();
             return;
         }
+        syntaxViolation(c);
         appendToASCIIBuffer('/');
         m_url.m_pathEnd = m_url.m_portEnd + 1;
         m_url.m_pathAfterLastSlash = m_url.m_pathEnd;
@@ -1595,9 +1713,10 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
     case State::File:
         LOG_FINAL_STATE("File");
         if (base.isValid() && base.protocolIs("file")) {
-            copyURLPartsUntil(base, URLPart::QueryEnd);
+            copyURLPartsUntil(base, URLPart::QueryEnd, c);
             appendToASCIIBuffer(':');
         }
+        syntaxViolation(c);
         appendToASCIIBuffer("///", 3);
         m_url.m_userStart = currentPosition(c) - 1;
         m_url.m_userEnd = m_url.m_userStart;
@@ -1611,8 +1730,9 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
         break;
     case State::FileSlash:
         LOG_FINAL_STATE("FileSlash");
+        syntaxViolation(c);
+        m_url.m_userStart = currentPosition(c) + 1;
         appendToASCIIBuffer("//", 2);
-        m_url.m_userStart = currentPosition(c) - 1;
         m_url.m_userEnd = m_url.m_userStart;
         m_url.m_passwordEnd = m_url.m_userStart;
         m_url.m_hostEnd = m_url.m_userStart;
@@ -1625,6 +1745,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
     case State::FileHost:
         LOG_FINAL_STATE("FileHost");
         if (authorityOrHostBegin == c) {
+            syntaxViolation(c);
             appendToASCIIBuffer('/');
             m_url.m_userStart = currentPosition(c) - 1;
             m_url.m_userEnd = m_url.m_userStart;
@@ -1643,7 +1764,8 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
             return;
         }
 
-        if (StringView(m_asciiBuffer.data() + m_url.m_passwordEnd, currentPosition(c) - m_url.m_passwordEnd) == "localhost")  {
+        syntaxViolation(c);
+        if (equalLettersIgnoringASCIICase(parsedDataView(m_url.m_passwordEnd, currentPosition(c) - m_url.m_passwordEnd), "localhost")) {
             m_asciiBuffer.shrink(m_url.m_passwordEnd);
             m_url.m_hostEnd = currentPosition(c);
             m_url.m_portEnd = m_url.m_hostEnd;
@@ -1677,20 +1799,24 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
         m_url.m_fragmentEnd = m_url.m_queryEnd;
         break;
     case State::Fragment:
-        LOG_FINAL_STATE("Fragment");
-        m_url.m_fragmentEnd = currentPosition(c) + m_unicodeFragmentBuffer.size();
-        break;
+        {
+            LOG_FINAL_STATE("Fragment");
+            size_t length = m_didSeeSyntaxViolation ? m_asciiBuffer.size() + m_unicodeFragmentBuffer.size() : c.codeUnitsSince(reinterpret_cast<const CharacterType*>(m_inputBegin));
+            m_url.m_fragmentEnd = length;
+            break;
+        }
     }
 
-    if (!m_seenSyntaxError) {
+    if (LIKELY(!m_didSeeSyntaxViolation)) {
         m_url.m_string = m_inputString;
         ASSERT(m_asciiBuffer.isEmpty());
         ASSERT(m_unicodeFragmentBuffer.isEmpty());
-    } else if (m_unicodeFragmentBuffer.isEmpty())
+    } else if (!m_didSeeUnicodeFragmentCodePoint) {
+        ASSERT(m_unicodeFragmentBuffer.isEmpty());
         m_url.m_string = String::adopt(WTFMove(m_asciiBuffer));
-    else {
+    else {
         Vector<UChar> buffer;
-        buffer.reserveInitialCapacity(currentPosition(c) + m_unicodeFragmentBuffer.size());
+        buffer.reserveInitialCapacity(m_asciiBuffer.size() + m_unicodeFragmentBuffer.size());
         buffer.appendVector(m_asciiBuffer);
         buffer.appendVector(m_unicodeFragmentBuffer);
         m_url.m_string = String::adopt(WTFMove(buffer));
@@ -1703,16 +1829,20 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
 template<typename CharacterType>
 void URLParser::parseAuthority(CodePointIterator<CharacterType> iterator)
 {
-    if (iterator.atEnd()) {
+    if (UNLIKELY(iterator.atEnd())) {
+        syntaxViolation(iterator);
         m_url.m_userEnd = currentPosition(iterator);
         m_url.m_passwordEnd = m_url.m_userEnd;
         return;
     }
-    for (; !iterator.atEnd(); ++iterator) {
+    auto authorityOrHostBegin = iterator;
+    for (; !iterator.atEnd(); advance(iterator)) {
         if (*iterator == ':') {
-            ++iterator;
             m_url.m_userEnd = currentPosition(iterator);
-            if (iterator.atEnd()) {
+            auto iteratorAtColon = iterator;
+            advance(iterator, authorityOrHostBegin);
+            if (UNLIKELY(iterator.atEnd())) {
+                syntaxViolation(iteratorAtColon);
                 m_url.m_passwordEnd = m_url.m_userEnd;
                 if (m_url.m_userEnd > m_url.m_userStart)
                     appendToASCIIBuffer('@');
@@ -1721,10 +1851,10 @@ void URLParser::parseAuthority(CodePointIterator<CharacterType> iterator)
             appendToASCIIBuffer(':');
             break;
         }
-        utf8PercentEncode<isInUserInfoEncodeSet>(*iterator);
+        utf8PercentEncode<isInUserInfoEncodeSet>(iterator);
     }
-    for (; !iterator.atEnd(); ++iterator)
-        utf8PercentEncode<isInUserInfoEncodeSet>(*iterator);
+    for (; !iterator.atEnd(); advance(iterator))
+        utf8PercentEncode<isInUserInfoEncodeSet>(iterator);
     m_url.m_passwordEnd = currentPosition(iterator);
     if (!m_url.m_userEnd)
         m_url.m_userEnd = m_url.m_passwordEnd;
@@ -1824,7 +1954,7 @@ void URLParser::serializeIPv6(URLParser::IPv6Address address)
 }
 
 template<typename CharacterType>
-inline static Optional<uint32_t> parseIPv4Number(CodePointIterator<CharacterType>& iterator)
+Optional<uint32_t> URLParser::parseIPv4Number(CodePointIterator<CharacterType>& iterator, const CodePointIterator<CharacterType>& iteratorForSyntaxViolationPosition)
 {
     // FIXME: Check for overflow.
     enum class State : uint8_t {
@@ -1843,7 +1973,7 @@ inline static Optional<uint32_t> parseIPv4Number(CodePointIterator<CharacterType
         }
         switch (state) {
         case State::UnknownBase:
-            if (*iterator == '0') {
+            if (UNLIKELY(*iterator == '0')) {
                 ++iterator;
                 state = State::OctalOrHex;
                 break;
@@ -1851,6 +1981,7 @@ inline static Optional<uint32_t> parseIPv4Number(CodePointIterator<CharacterType
             state = State::Decimal;
             break;
         case State::OctalOrHex:
+            syntaxViolation(iteratorForSyntaxViolationPosition);
             if (*iterator == 'x' || *iterator == 'X') {
                 ++iterator;
                 state = State::Hex;
@@ -1866,6 +1997,7 @@ inline static Optional<uint32_t> parseIPv4Number(CodePointIterator<CharacterType
             ++iterator;
             break;
         case State::Octal:
+            ASSERT(m_didSeeSyntaxViolation);
             if (*iterator < '0' || *iterator > '7')
                 return Nullopt;
             value *= 8;
@@ -1873,6 +2005,7 @@ inline static Optional<uint32_t> parseIPv4Number(CodePointIterator<CharacterType
             ++iterator;
             break;
         case State::Hex:
+            ASSERT(m_didSeeSyntaxViolation);
             if (!isASCIIHexDigit(*iterator))
                 return Nullopt;
             value *= 16;
@@ -1894,12 +2027,14 @@ inline static uint64_t pow256(size_t exponent)
 template<typename CharacterType>
 Optional<URLParser::IPv4Address> URLParser::parseIPv4Host(CodePointIterator<CharacterType> iterator)
 {
+    auto hostBegin = iterator;
+
     Vector<uint32_t, 4> items;
     items.reserveInitialCapacity(4);
     while (!iterator.atEnd()) {
         if (items.size() >= 4)
             return Nullopt;
-        if (auto item = parseIPv4Number(iterator))
+        if (auto item = parseIPv4Number(iterator, hostBegin))
             items.append(item.value());
         else
             return Nullopt;
@@ -1918,6 +2053,10 @@ Optional<URLParser::IPv4Address> URLParser::parseIPv4Host(CodePointIterator<Char
         if (item > 255)
             return Nullopt;
     }
+
+    if (UNLIKELY(items.size() != 4))
+        syntaxViolation(hostBegin);
+
     IPv4Address ipv4 = items.takeLast();
     for (size_t counter = 0; counter < items.size(); ++counter)
         ipv4 += items[counter] * pow256(3 - counter);
@@ -1927,6 +2066,9 @@ Optional<URLParser::IPv4Address> URLParser::parseIPv4Host(CodePointIterator<Char
 template<typename CharacterType>
 Optional<URLParser::IPv6Address> URLParser::parseIPv6Host(CodePointIterator<CharacterType> c)
 {
+    ASSERT(*c == '[');
+    auto hostBegin = c;
+    advance(c, hostBegin);
     if (c.atEnd())
         return Nullopt;
 
@@ -1935,12 +2077,12 @@ Optional<URLParser::IPv6Address> URLParser::parseIPv6Host(CodePointIterator<Char
     Optional<size_t> compressPointer;
 
     if (*c == ':') {
-        ++c;
+        advance(c, hostBegin);
         if (c.atEnd())
             return Nullopt;
         if (*c != ':')
             return Nullopt;
-        ++c;
+        advance(c, hostBegin);
         ++piecePointer;
         compressPointer = piecePointer;
     }
@@ -1951,26 +2093,30 @@ Optional<URLParser::IPv6Address> URLParser::parseIPv6Host(CodePointIterator<Char
         if (*c == ':') {
             if (compressPointer)
                 return Nullopt;
-            ++c;
+            advance(c, hostBegin);
             ++piecePointer;
             compressPointer = piecePointer;
             continue;
         }
         uint16_t value = 0;
-        for (size_t length = 0; length < 4; length++) {
+        size_t length = 0;
+        for (; length < 4; length++) {
             if (c.atEnd())
                 break;
             if (!isASCIIHexDigit(*c))
                 break;
             value = value * 0x10 + toASCIIHexValue(*c);
-            ++c;
+            advance(c, hostBegin);
         }
+        if (UNLIKELY(length > 1 && !value))
+            syntaxViolation(hostBegin);
+
         address[piecePointer++] = value;
         if (c.atEnd())
             break;
         if (*c != ':')
             return Nullopt;
-        ++c;
+        advance(c, hostBegin);
     }
     
     if (!c.atEnd()) {
@@ -1989,7 +2135,7 @@ Optional<URLParser::IPv6Address> URLParser::parseIPv6Host(CodePointIterator<Char
                     return Nullopt;
                 else
                     value = value.value() * 10 + number;
-                ++c;
+                advance(c, hostBegin);
                 if (c.atEnd())
                     return Nullopt;
                 if (value.value() > 255)
@@ -2001,7 +2147,7 @@ Optional<URLParser::IPv6Address> URLParser::parseIPv6Host(CodePointIterator<Char
             if (dotsSeen == 1 || dotsSeen == 3)
                 piecePointer++;
             if (!c.atEnd())
-                ++c;
+                advance(c, hostBegin);
             if (dotsSeen == 3 && !c.atEnd())
                 return Nullopt;
             dotsSeen++;
@@ -2014,6 +2160,13 @@ Optional<URLParser::IPv6Address> URLParser::parseIPv6Host(CodePointIterator<Char
             std::swap(address[piecePointer--], address[compressPointer.value() + swaps-- - 1]);
     } else if (piecePointer != 8)
         return Nullopt;
+
+    Optional<size_t> possibleCompressPointer = findLongestZeroSequence(address);
+    if (possibleCompressPointer)
+        possibleCompressPointer.value()++;
+    if (UNLIKELY(compressPointer != possibleCompressPointer))
+        syntaxViolation(hostBegin);
+    
     return address;
 }
 
@@ -2105,15 +2258,20 @@ inline static bool hasInvalidDomainCharacter(const Vector<LChar, defaultInlineBu
 template<typename CharacterType>
 bool URLParser::parsePort(CodePointIterator<CharacterType>& iterator)
 {
+    ASSERT(*iterator == ':');
+    auto colonIterator = iterator;
+    advance(iterator, colonIterator);
     uint32_t port = 0;
-    if (iterator.atEnd()) {
-        m_url.m_portEnd = currentPosition(iterator);
+    if (UNLIKELY(iterator.atEnd())) {
+        m_url.m_portEnd = currentPosition(colonIterator);
+        syntaxViolation(colonIterator);
         return true;
     }
-    appendToASCIIBuffer(':');
     for (; !iterator.atEnd(); ++iterator) {
-        if (isTabOrNewline(*iterator))
+        if (UNLIKELY(isTabOrNewline(*iterator))) {
+            syntaxViolation(colonIterator);
             continue;
+        }
         if (isASCIIDigit(*iterator)) {
             port = port * 10 + *iterator - '0';
             if (port > std::numeric_limits<uint16_t>::max())
@@ -2122,10 +2280,10 @@ bool URLParser::parsePort(CodePointIterator<CharacterType>& iterator)
             return false;
     }
 
-    if (isDefaultPort(StringView(m_asciiBuffer.data(), m_url.m_schemeEnd), port)) {
-        ASSERT(m_asciiBuffer.last() == ':');
-        m_asciiBuffer.shrink(currentPosition(iterator) - 1);
-    } else {
+    if (UNLIKELY(isDefaultPort(parsedDataView(0, m_url.m_schemeEnd), port)))
+        syntaxViolation(colonIterator);
+    else {
+        appendToASCIIBuffer(':');
         ASSERT(port <= std::numeric_limits<uint16_t>::max());
         appendNumberToASCIIBuffer<uint16_t>(static_cast<uint16_t>(port));
     }
@@ -2140,27 +2298,27 @@ bool URLParser::parseHostAndPort(CodePointIterator<CharacterType> iterator)
     if (iterator.atEnd())
         return false;
     if (*iterator == '[') {
-        ++iterator;
         auto ipv6End = iterator;
         while (!ipv6End.atEnd() && *ipv6End != ']')
             ++ipv6End;
         if (auto address = parseIPv6Host(CodePointIterator<CharacterType>(iterator, ipv6End))) {
             serializeIPv6(address.value());
-            m_url.m_hostEnd = currentPosition(iterator);
             if (!ipv6End.atEnd()) {
-                ++ipv6End;
+                advance(ipv6End);
                 if (!ipv6End.atEnd() && *ipv6End == ':') {
-                    ++ipv6End;
+                    m_url.m_hostEnd = currentPosition(ipv6End);
                     return parsePort(ipv6End);
                 }
-                m_url.m_portEnd = currentPosition(iterator);
+                m_url.m_hostEnd = currentPosition(ipv6End);
+                m_url.m_portEnd = m_url.m_hostEnd;
                 return true;
             }
+            m_url.m_hostEnd = currentPosition(ipv6End);
             return true;
         }
     }
-    
-    if (!m_hostHasPercentOrNonASCII) {
+
+    if (LIKELY(!m_hostHasPercentOrNonASCII)) {
         auto hostIterator = iterator;
         for (; !iterator.atEnd(); ++iterator) {
             if (isTabOrNewline(*iterator))
@@ -2177,23 +2335,25 @@ bool URLParser::parseHostAndPort(CodePointIterator<CharacterType> iterator)
                 m_url.m_portEnd = currentPosition(iterator);
                 return true;
             }
-            ++iterator;
             return parsePort(iterator);
         }
         for (; hostIterator != iterator; ++hostIterator) {
-            if (!isTabOrNewline(*hostIterator))
+            if (LIKELY(!isTabOrNewline(*hostIterator))) {
+                if (UNLIKELY(isASCIIUpper(*hostIterator)))
+                    syntaxViolation(hostIterator);
                 appendToASCIIBuffer(toASCIILower(*hostIterator));
+            } else
+                syntaxViolation(hostIterator);
         }
         m_url.m_hostEnd = currentPosition(iterator);
-        if (!hostIterator.atEnd()) {
-            ASSERT(*hostIterator == ':');
-            incrementIteratorSkippingTabAndNewLine(hostIterator);
+        if (!hostIterator.atEnd())
             return parsePort(hostIterator);
-        }
         m_url.m_portEnd = currentPosition(iterator);
         return true;
     }
     
+    syntaxViolation(iterator);
+    
     Vector<LChar, defaultInlineBufferSize> utf8Encoded;
     for (; !iterator.atEnd(); ++iterator) {
         if (isTabOrNewline(*iterator))
@@ -2223,17 +2383,13 @@ bool URLParser::parseHostAndPort(CodePointIterator<CharacterType> iterator)
             m_url.m_portEnd = currentPosition(iterator);
             return true;
         }
-        ++iterator;
         return parsePort(iterator);
     }
 
     appendToASCIIBuffer(asciiDomainCharacters, asciiDomainValue.size());
     m_url.m_hostEnd = currentPosition(iterator);
-    if (!iterator.atEnd()) {
-        ASSERT(*iterator == ':');
-        incrementIteratorSkippingTabAndNewLine(iterator);
+    if (!iterator.atEnd())
         return parsePort(iterator);
-    }
     m_url.m_portEnd = currentPosition(iterator);
     return true;
 }
@@ -2281,7 +2437,7 @@ inline static void serializeURLEncodedForm(const String& input, Vector<LChar>& o
             || (byte >= 0x30 && byte <= 0x39)
             || (byte >= 0x41 && byte <= 0x5A)
             || byte == 0x5F
-            || (byte >= 0x61 && byte <= 0x7A))
+            || (byte >= 0x61 && byte <= 0x7A)) // FIXME: Put these in the characterClassTable to avoid branches.
             output.append(byte);
         else
             percentEncodeByte(byte, output);
@@ -2349,7 +2505,7 @@ bool URLParser::allValuesEqual(const URL& a, const URL& b)
 }
 
 bool URLParser::internalValuesConsistent(const URL& url)
-{    
+{
     return url.m_schemeEnd <= url.m_userStart
         && url.m_userStart <= url.m_userEnd
         && url.m_userEnd <= url.m_passwordEnd
index 9f5ba41..eeb9ad7 100644 (file)
@@ -28,7 +28,6 @@
 #include "TextEncoding.h"
 #include "URL.h"
 #include <wtf/Forward.h>
-#include <wtf/text/StringBuilder.h>
 
 namespace WebCore {
 
@@ -53,13 +52,13 @@ private:
     URL m_url;
     Vector<LChar> m_asciiBuffer;
     Vector<UChar> m_unicodeFragmentBuffer;
+    bool m_didSeeUnicodeFragmentCodePoint { false };
     bool m_urlIsSpecial { false };
     bool m_hostHasPercentOrNonASCII { false };
     String m_inputString;
     const void* m_inputBegin { nullptr };
 
-    // FIXME: This should start out as false and only change to true when we see a syntax error once syntax error handling is implemented.
-    bool m_seenSyntaxError { true };
+    bool m_didSeeSyntaxViolation { false };
 
     template<typename CharacterType> void parse(const CharacterType*, const unsigned length, const URL&, const TextEncoding&);
     template<typename CharacterType> void parseAuthority(CodePointIterator<CharacterType>);
@@ -67,32 +66,36 @@ private:
     template<typename CharacterType> bool parsePort(CodePointIterator<CharacterType>&);
 
     void failure();
-    template<typename CharacterType> void incrementIteratorSkippingTabAndNewLine(CodePointIterator<CharacterType>&);
-    template<typename CharacterType> void syntaxError(const CodePointIterator<CharacterType>&);
+    template<typename CharacterType> void advance(CodePointIterator<CharacterType>& iterator) { advance(iterator, iterator); }
+    template<typename CharacterType> void advance(CodePointIterator<CharacterType>&, const CodePointIterator<CharacterType>& iteratorForSyntaxViolationPosition);
+    template<typename CharacterType> void syntaxViolation(const CodePointIterator<CharacterType>&);
+    template<typename CharacterType> void fragmentSyntaxViolation(const CodePointIterator<CharacterType>&);
     template<typename CharacterType> bool isWindowsDriveLetter(CodePointIterator<CharacterType>);
     template<typename CharacterType> bool shouldCopyFileURL(CodePointIterator<CharacterType>);
     template<typename CharacterType> void checkWindowsDriveLetter(CodePointIterator<CharacterType>&);
     template<typename CharacterType> size_t currentPosition(const CodePointIterator<CharacterType>&);
     template<typename UnsignedIntegerType> void appendNumberToASCIIBuffer(UnsignedIntegerType);
-    template<bool(*isInCodeSet)(UChar32)> void utf8PercentEncode(UChar32);
-    void utf8QueryEncode(UChar32);
+    template<bool(*isInCodeSet)(UChar32), typename CharacterType> void utf8PercentEncode(const CodePointIterator<CharacterType>&);
+    template<typename CharacterType> void utf8QueryEncode(const CodePointIterator<CharacterType>&);
     void percentEncodeByte(uint8_t);
     void appendToASCIIBuffer(UChar32);
     void appendToASCIIBuffer(const char*, size_t);
     void appendToASCIIBuffer(const LChar* characters, size_t size) { appendToASCIIBuffer(reinterpret_cast<const char*>(characters), size); }
     void encodeQuery(const Vector<UChar>& source, const TextEncoding&);
     void copyASCIIStringUntil(const String&, size_t lengthIf8Bit, size_t lengthIf16Bit);
+    StringView parsedDataView(size_t start, size_t length);
 
     using IPv4Address = uint32_t;
     void serializeIPv4(IPv4Address);
     template<typename CharacterType> Optional<IPv4Address> parseIPv4Host(CodePointIterator<CharacterType>);
+    template<typename CharacterType> Optional<uint32_t> parseIPv4Number(CodePointIterator<CharacterType>&, const CodePointIterator<CharacterType>& iteratorForSyntaxViolationPosition);
     using IPv6Address = std::array<uint16_t, 8>;
     template<typename CharacterType> Optional<IPv6Address> parseIPv6Host(CodePointIterator<CharacterType>);
     void serializeIPv6Piece(uint16_t piece);
     void serializeIPv6(URLParser::IPv6Address);
 
     enum class URLPart;
-    void copyURLPartsUntil(const URL& base, URLPart);
+    template<typename CharacterType> void copyURLPartsUntil(const URL& base, URLPart, const CodePointIterator<CharacterType>&);
     static size_t urlLengthUntilPart(const URL&, URLPart);
     void popPath();
 };
index e421220..5f8cb80 100644 (file)
@@ -1,3 +1,13 @@
+2016-09-26  Alex Christensen  <achristensen@webkit.org>
+
+        Implement URLParser::syntaxViolation
+        https://bugs.webkit.org/show_bug.cgi?id=162593
+
+        Reviewed by Geoffrey Garen.
+
+        * TestWebKitAPI/Tests/WebCore/URLParser.cpp:
+        (TestWebKitAPI::TEST_F):
+
 2016-09-27  Anders Carlsson  <andersca@apple.com>
 
         Turns out OptionSet.cpp wasn't added to the right target. Add it and fix the build for real.
index e2d81a7..b2b626c 100644 (file)
@@ -119,6 +119,15 @@ TEST_F(URLParserTest, Basic)
     checkURL("http://[0:f::f:f:0:0]", {"http", "", "", "[0:f::f:f:0:0]", 0, "/", "", "", "http://[0:f::f:f:0:0]/"});
     checkURL("http://[0:f:0:0:f::]", {"http", "", "", "[0:f:0:0:f::]", 0, "/", "", "", "http://[0:f:0:0:f::]/"});
     checkURL("http://[::f:0:0:f:0:0]", {"http", "", "", "[::f:0:0:f:0:0]", 0, "/", "", "", "http://[::f:0:0:f:0:0]/"});
+    checkURL("http://[0:f:0:0:f::]:", {"http", "", "", "[0:f:0:0:f::]", 0, "/", "", "", "http://[0:f:0:0:f::]/"});
+    checkURL("http://[0:f:0:0:f::]:\t", {"http", "", "", "[0:f:0:0:f::]", 0, "/", "", "", "http://[0:f:0:0:f::]/"});
+    checkURL("http://[0:f:0:0:f::]\t:", {"http", "", "", "[0:f:0:0:f::]", 0, "/", "", "", "http://[0:f:0:0:f::]/"});
+    checkURL("http://\t[::f:0:0:f:0:0]", {"http", "", "", "[::f:0:0:f:0:0]", 0, "/", "", "", "http://[::f:0:0:f:0:0]/"});
+    checkURL("http://[\t::f:0:0:f:0:0]", {"http", "", "", "[::f:0:0:f:0:0]", 0, "/", "", "", "http://[::f:0:0:f:0:0]/"});
+    checkURL("http://[:\t:f:0:0:f:0:0]", {"http", "", "", "[::f:0:0:f:0:0]", 0, "/", "", "", "http://[::f:0:0:f:0:0]/"});
+    checkURL("http://[::\tf:0:0:f:0:0]", {"http", "", "", "[::f:0:0:f:0:0]", 0, "/", "", "", "http://[::f:0:0:f:0:0]/"});
+    checkURL("http://[::f\t:0:0:f:0:0]", {"http", "", "", "[::f:0:0:f:0:0]", 0, "/", "", "", "http://[::f:0:0:f:0:0]/"});
+    checkURL("http://[::f:\t0:0:f:0:0]", {"http", "", "", "[::f:0:0:f:0:0]", 0, "/", "", "", "http://[::f:0:0:f:0:0]/"});
     checkURL("http://example.com/path1/path2/.", {"http", "", "", "example.com", 0, "/path1/path2/", "", "", "http://example.com/path1/path2/"});
     checkURL("http://example.com/path1/path2/..", {"http", "", "", "example.com", 0, "/path1/", "", "", "http://example.com/path1/"});
     checkURL("http://example.com/path1/path2/./path3", {"http", "", "", "example.com", 0, "/path1/path2/path3", "", "", "http://example.com/path1/path2/path3"});
@@ -191,12 +200,23 @@ TEST_F(URLParserTest, Basic)
     checkURL("http://host/a%20B", {"http", "", "", "host", 0, "/a%20B", "", "", "http://host/a%20B"});
     checkURL("http://host?q=@ <>!#fragment", {"http", "", "", "host", 0, "/", "q=@%20%3C%3E!", "fragment", "http://host/?q=@%20%3C%3E!#fragment"});
     checkURL("http://user:@host", {"http", "user", "", "host", 0, "/", "", "", "http://user@host/"});
+    checkURL("http://user:@\thost", {"http", "user", "", "host", 0, "/", "", "", "http://user@host/"});
+    checkURL("http://user\t:@host", {"http", "user", "", "host", 0, "/", "", "", "http://user@host/"});
+    checkURL("http://use\tr:@host", {"http", "user", "", "host", 0, "/", "", "", "http://user@host/"});
     checkURL("http://127.0.0.1:10100/path", {"http", "", "", "127.0.0.1", 10100, "/path", "", "", "http://127.0.0.1:10100/path"});
     checkURL("http://127.0.0.1:/path", {"http", "", "", "127.0.0.1", 0, "/path", "", "", "http://127.0.0.1/path"});
+    checkURL("http://127.0.0.1\t:/path", {"http", "", "", "127.0.0.1", 0, "/path", "", "", "http://127.0.0.1/path"});
+    checkURL("http://127.0.0.1:\t/path", {"http", "", "", "127.0.0.1", 0, "/path", "", "", "http://127.0.0.1/path"});
+    checkURL("http://127.0.0.1:/\tpath", {"http", "", "", "127.0.0.1", 0, "/path", "", "", "http://127.0.0.1/path"});
     checkURL("http://127.0.0.1:123", {"http", "", "", "127.0.0.1", 123, "/", "", "", "http://127.0.0.1:123/"});
     checkURL("http://127.0.0.1:", {"http", "", "", "127.0.0.1", 0, "/", "", "", "http://127.0.0.1/"});
     checkURL("http://[0:f::f:f:0:0]:123/path", {"http", "", "", "[0:f::f:f:0:0]", 123, "/path", "", "", "http://[0:f::f:f:0:0]:123/path"});
     checkURL("http://[0:f::f:f:0:0]:123", {"http", "", "", "[0:f::f:f:0:0]", 123, "/", "", "", "http://[0:f::f:f:0:0]:123/"});
+    checkURL("http://[0:f:0:0:f:\t:]:123", {"http", "", "", "[0:f:0:0:f::]", 123, "/", "", "", "http://[0:f:0:0:f::]:123/"});
+    checkURL("http://[0:f:0:0:f::\t]:123", {"http", "", "", "[0:f:0:0:f::]", 123, "/", "", "", "http://[0:f:0:0:f::]:123/"});
+    checkURL("http://[0:f:0:0:f::]\t:123", {"http", "", "", "[0:f:0:0:f::]", 123, "/", "", "", "http://[0:f:0:0:f::]:123/"});
+    checkURL("http://[0:f:0:0:f::]:\t123", {"http", "", "", "[0:f:0:0:f::]", 123, "/", "", "", "http://[0:f:0:0:f::]:123/"});
+    checkURL("http://[0:f:0:0:f::]:1\t23", {"http", "", "", "[0:f:0:0:f::]", 123, "/", "", "", "http://[0:f:0:0:f::]:123/"});
     checkURL("http://[0:f::f:f:0:0]:/path", {"http", "", "", "[0:f::f:f:0:0]", 0, "/path", "", "", "http://[0:f::f:f:0:0]/path"});
     checkURL("http://[0:f::f:f:0:0]:", {"http", "", "", "[0:f::f:f:0:0]", 0, "/", "", "", "http://[0:f::f:f:0:0]/"});
     checkURL("http://host:10100/path", {"http", "", "", "host", 10100, "/path", "", "", "http://host:10100/path"});
@@ -210,6 +230,9 @@ TEST_F(URLParserTest, Basic)
     checkURL("sc:/pa/", {"sc", "", "", "", 0, "/pa/", "", "", "sc:/pa/"});
     checkURL("notspecial:/notuser:notpassword@nothost", {"notspecial", "", "", "", 0, "/notuser:notpassword@nothost", "", "", "notspecial:/notuser:notpassword@nothost"});
     checkURL("sc://pa/", {"sc", "", "", "pa", 0, "/", "", "", "sc://pa/"});
+    checkURL("sc://\tpa/", {"sc", "", "", "pa", 0, "/", "", "", "sc://pa/"});
+    checkURL("sc:/\t/pa/", {"sc", "", "", "pa", 0, "/", "", "", "sc://pa/"});
+    checkURL("sc:\t//pa/", {"sc", "", "", "pa", 0, "/", "", "", "sc://pa/"});
     checkURL("http://host   \a   ", {"http", "", "", "host", 0, "/", "", "", "http://host/"});
     checkURL("notspecial:/a", {"notspecial", "", "", "", 0, "/a", "", "", "notspecial:/a"});
     checkURL("notspecial:", {"notspecial", "", "", "", 0, "", "", "", "notspecial:"});
@@ -217,11 +240,14 @@ TEST_F(URLParserTest, Basic)
     checkURL("http://256/", {"http", "", "", "256", 0, "/", "", "", "http://256/"});
     checkURL("http://256./", {"http", "", "", "256.", 0, "/", "", "", "http://256./"});
     checkURL("http://123.256/", {"http", "", "", "123.256", 0, "/", "", "", "http://123.256/"});
+    checkURL("http://123\t.256/", {"http", "", "", "123.256", 0, "/", "", "", "http://123.256/"});
+    checkURL("http://123.\t256/", {"http", "", "", "123.256", 0, "/", "", "", "http://123.256/"});
     checkURL("notspecial:/a", {"notspecial", "", "", "", 0, "/a", "", "", "notspecial:/a"});
     checkURL("notspecial:", {"notspecial", "", "", "", 0, "", "", "", "notspecial:"});
     checkURL("notspecial:/", {"notspecial", "", "", "", 0, "/", "", "", "notspecial:/"});
     checkURL("-data-follows-here", {"data", "", "", "", 0, "image/png;base64,encoded-data-follows-here", "", "", "-data-follows-here"});
     checkURL("-with-slash", {"data", "", "", "", 0, "image/png;base64,encoded/data-with-slash", "", "", "-with-slash"});
+    checkURL("about:~", {"about", "", "", "", 0, "~", "", "", "about:~"});
 
     // This disagrees with the web platform test for http://:@www.example.com but agrees with Chrome and URL::parse,
     // and Firefox fails the web platform test differently. Maybe the web platform test ought to be changed.
@@ -287,6 +313,8 @@ TEST_F(URLParserTest, ParseRelative)
     checkRelativeURL("\\@", "http://example.org/foo/bar", {"http", "", "", "example.org", 0, "/@", "", "", "http://example.org/@"});
     checkRelativeURL("/path3", "http://user@example.org/path1/path2", {"http", "user", "", "example.org", 0, "/path3", "", "", "http://user@example.org/path3"});
     checkRelativeURL("", "http://example.org/foo/bar", {"http", "", "", "example.org", 0, "/foo/bar", "", "", "http://example.org/foo/bar"});
+    checkRelativeURL("\t", "http://example.org/foo/bar", {"http", "", "", "example.org", 0, "/foo/bar", "", "", "http://example.org/foo/bar"});
+    checkRelativeURL(" ", "http://example.org/foo/bar", {"http", "", "", "example.org", 0, "/foo/bar", "", "", "http://example.org/foo/bar"});
     checkRelativeURL("  \a  \t\n", "http://example.org/foo/bar", {"http", "", "", "example.org", 0, "/foo/bar", "", "", "http://example.org/foo/bar"});
     checkRelativeURL(":foo.com\\", "http://example.org/foo/bar", {"http", "", "", "example.org", 0, "/foo/:foo.com/", "", "", "http://example.org/foo/:foo.com/"});
     checkRelativeURL("http:/example.com/", "about:blank", {"http", "", "", "example.com", 0, "/", "", "", "http://example.com/"});
@@ -314,6 +342,7 @@ TEST_F(URLParserTest, ParseRelative)
     checkRelativeURL("notspecial:/", "http://host", {"notspecial", "", "", "", 0, "/", "", "", "notspecial:/"});
     checkRelativeURL("foo:/", "http://example.org/foo/bar", {"foo", "", "", "", 0, "/", "", "", "foo:/"});
     checkRelativeURL("://:0/", "http://webkit.org/", {"http", "", "", "webkit.org", 0, "/://:0/", "", "", "http://webkit.org/://:0/"});
+    checkRelativeURL(String(), "http://webkit.org/", {"http", "", "", "webkit.org", 0, "/", "", "", "http://webkit.org/"});
 
     // The checking of slashes in SpecialAuthoritySlashes needed to get this to pass contradicts what is in the spec,
     // but it is included in the web platform tests.
@@ -590,12 +619,42 @@ TEST_F(URLParserTest, ParserDifferences)
     checkURLDifferences(utf16String(u"http://host/path#šŸ’©\tšŸ’©"),
         {"http", "", "", "host", 0, "/path", "", utf16String(u"šŸ’©šŸ’©"), utf16String(u"http://host/path#šŸ’©šŸ’©")},
         {"http", "", "", "host", 0, "/path", "", "%F0%9F%92%A9%F0%9F%92%A9", "http://host/path#%F0%9F%92%A9%F0%9F%92%A9"});
+    checkURLDifferences("http://%48OsT",
+        {"http", "", "", "host", 0, "/", "", "", "http://host/"},
+        {"http", "", "", "%48ost", 0, "/", "", "", "http://%48ost/"});
+    checkURLDifferences("http://h%4FsT",
+        {"http", "", "", "host", 0, "/", "", "", "http://host/"},
+        {"http", "", "", "h%4fst", 0, "/", "", "", "http://h%4fst/"});
+    checkURLDifferences("http://h%4fsT",
+        {"http", "", "", "host", 0, "/", "", "", "http://host/"},
+        {"http", "", "", "h%4fst", 0, "/", "", "", "http://h%4fst/"});
+    checkURLDifferences("http://h%6fsT",
+        {"http", "", "", "host", 0, "/", "", "", "http://host/"},
+        {"http", "", "", "h%6fst", 0, "/", "", "", "http://h%6fst/"});
+    checkURLDifferences("http://host/`",
+        {"http", "", "", "host", 0, "/%60", "", "", "http://host/%60"},
+        {"http", "", "", "host", 0, "/`", "", "", "http://host/`"});
+    checkURLDifferences("aA://",
+        {"aa", "", "", "", 0, "/", "", "", "aa:///"},
+        {"aa", "", "", "", 0, "//", "", "", "aa://"});
+    checkURLDifferences("A://",
+        {"a", "", "", "", 0, "/", "", "", "a:///"},
+        {"a", "", "", "", 0, "//", "", "", "a://"});
 }
 
 TEST_F(URLParserTest, DefaultPort)
 {
     checkURL("FtP://host:21/", {"ftp", "", "", "host", 0, "/", "", "", "ftp://host/"});
     checkURL("ftp://host:21/", {"ftp", "", "", "host", 0, "/", "", "", "ftp://host/"});
+    checkURL("f\ttp://host:21/", {"ftp", "", "", "host", 0, "/", "", "", "ftp://host/"});
+    checkURL("f\ttp://host\t:21/", {"ftp", "", "", "host", 0, "/", "", "", "ftp://host/"});
+    checkURL("f\ttp://host:\t21/", {"ftp", "", "", "host", 0, "/", "", "", "ftp://host/"});
+    checkURL("f\ttp://host:2\t1/", {"ftp", "", "", "host", 0, "/", "", "", "ftp://host/"});
+    checkURL("f\ttp://host:21\t/", {"ftp", "", "", "host", 0, "/", "", "", "ftp://host/"});
+    checkURL("ftp://host\t:21/", {"ftp", "", "", "host", 0, "/", "", "", "ftp://host/"});
+    checkURL("ftp://host:\t21/", {"ftp", "", "", "host", 0, "/", "", "", "ftp://host/"});
+    checkURL("ftp://host:2\t1/", {"ftp", "", "", "host", 0, "/", "", "", "ftp://host/"});
+    checkURL("ftp://host:21\t/", {"ftp", "", "", "host", 0, "/", "", "", "ftp://host/"});
     checkURL("ftp://host:22/", {"ftp", "", "", "host", 22, "/", "", "", "ftp://host:22/"});
     checkURLDifferences("ftp://host:21",
         {"ftp", "", "", "host", 0, "/", "", "", "ftp://host/"},
@@ -670,12 +729,6 @@ TEST_F(URLParserTest, DefaultPort)
     checkURLDifferences("unknown://host:81",
         {"unknown", "", "", "host", 81, "/", "", "", "unknown://host:81/"},
         {"unknown", "", "", "host", 81, "", "", "", "unknown://host:81"});
-    checkURLDifferences("http://%48OsT",
-        {"http", "", "", "host", 0, "/", "", "", "http://host/"},
-        {"http", "", "", "%48ost", 0, "/", "", "", "http://%48ost/"});
-    checkURLDifferences("http://host/`",
-        {"http", "", "", "host", 0, "/%60", "", "", "http://host/%60"},
-        {"http", "", "", "host", 0, "/`", "", "", "http://host/`"});
 }
     
 static void shouldFail(const String& urlString)
@@ -694,6 +747,8 @@ TEST_F(URLParserTest, ParserFailures)
     shouldFail("  \a  ");
     shouldFail("");
     shouldFail(String());
+    shouldFail("", "about:blank");
+    shouldFail(String(), "about:blank");
     shouldFail("http://127.0.0.1:abc");
     shouldFail("http://host:abc");
     shouldFail("http://a:@", "about:blank");
@@ -723,6 +778,7 @@ TEST_F(URLParserTest, ParserFailures)
     shouldFail("://:0/");
     shouldFail("://:0/", "");
     shouldFail("://:0/", "about:blank");
+    shouldFail("about~");
 }
 
 // These are in the spec but not in the web platform tests.