URLParser should use TextEncoding through an abstract class
authorachristensen@apple.com <achristensen@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 27 Sep 2018 20:05:52 +0000 (20:05 +0000)
committerachristensen@apple.com <achristensen@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 27 Sep 2018 20:05:52 +0000 (20:05 +0000)
https://bugs.webkit.org/show_bug.cgi?id=190027

Reviewed by Andy Estes.

Source/WebCore:

URLParser uses TextEncoding for one call to encode, which is only used for encoding the query of URLs in documents with non-UTF encodings.
There are 3 call sites that specify the TextEncoding to use from the Document, and even those call sites use a UTF encoding most of the time.
All other URL parsing is done using a well-optimized path which assumes UTF-8 encoding and uses macros from ICU headers, not a TextEncoding.
Moving the logic in this way breaks URL and URLParser's dependency on TextEncoding, which makes it possible to use in a lower-level project
without also moving TextEncoding, TextCodec, TextCodecICU, ThreadGlobalData, and the rest of WebCore and JavaScriptCore.

There is no observable change in behavior.  There is now one virtual function call in a code path in URLParser that is not performance-sensitive,
and TextEncodings now have a vtable, which uses a few more bytes of memory total for WebKit.

* css/parser/CSSParserContext.h:
(WebCore::CSSParserContext::completeURL const):
* css/parser/CSSParserIdioms.cpp:
(WebCore::completeURL):
* dom/Document.cpp:
(WebCore::Document::completeURL const):
* html/HTMLBaseElement.cpp:
(WebCore::HTMLBaseElement::href const):
Move the call to encodingForFormSubmission from the URL constructor to the 3 call sites that specify the encoding from the Document.
* loader/FormSubmission.cpp:
(WebCore::FormSubmission::create):
* loader/TextResourceDecoder.cpp:
(WebCore::TextResourceDecoder::encodingForURLParsing):
* loader/TextResourceDecoder.h:
* platform/URL.cpp:
(WebCore::URL::URL):
* platform/URL.h:
(WebCore::URLTextEncoding::~URLTextEncoding):
* platform/URLParser.cpp:
(WebCore::URLParser::encodeNonUTF8Query):
(WebCore::URLParser::copyURLPartsUntil):
(WebCore::URLParser::URLParser):
(WebCore::URLParser::parse):
(WebCore::URLParser::encodeQuery): Deleted.
A pointer replaces the boolean isUTF8Encoding and the TextEncoding& which had a default value of UTF8Encoding.
Now the pointer being null means that we use UTF8, and the pointer being non-null means we use that encoding.
* platform/URLParser.h:
(WebCore::URLParser::URLParser):
* platform/text/TextEncoding.cpp:
(WebCore::UTF7Encoding):
(WebCore::TextEncoding::encodingForFormSubmissionOrURLParsing const):
(WebCore::ASCIIEncoding):
(WebCore::Latin1Encoding):
(WebCore::UTF16BigEndianEncoding):
(WebCore::UTF16LittleEndianEncoding):
(WebCore::UTF8Encoding):
(WebCore::WindowsLatin1Encoding):
(WebCore::TextEncoding::encodingForFormSubmission const): Deleted.
Use NeverDestroyed because TextEncoding now has a virtual destructor.
* platform/text/TextEncoding.h:
Rename encodingForFormSubmission to encodingForFormSubmissionOrURLParsing to make it more clear that we are intentionally using it for both.

Tools:

* TestWebKitAPI/Tests/WebCore/URLParser.cpp:
(TestWebKitAPI::checkURL):
(TestWebKitAPI::TEST_F):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@236565 268f45cc-cd09-0410-ab3c-d52691b4dbfc

16 files changed:
Source/WebCore/ChangeLog
Source/WebCore/css/parser/CSSParserContext.h
Source/WebCore/css/parser/CSSParserIdioms.cpp
Source/WebCore/dom/Document.cpp
Source/WebCore/html/HTMLBaseElement.cpp
Source/WebCore/loader/FormSubmission.cpp
Source/WebCore/loader/TextResourceDecoder.cpp
Source/WebCore/loader/TextResourceDecoder.h
Source/WebCore/platform/URL.cpp
Source/WebCore/platform/URL.h
Source/WebCore/platform/URLParser.cpp
Source/WebCore/platform/URLParser.h
Source/WebCore/platform/text/TextEncoding.cpp
Source/WebCore/platform/text/TextEncoding.h
Tools/ChangeLog
Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp

index 56f80d2..6a370d4 100644 (file)
@@ -1,3 +1,61 @@
+2018-09-27  Alex Christensen  <achristensen@webkit.org>
+
+        URLParser should use TextEncoding through an abstract class
+        https://bugs.webkit.org/show_bug.cgi?id=190027
+
+        Reviewed by Andy Estes.
+
+        URLParser uses TextEncoding for one call to encode, which is only used for encoding the query of URLs in documents with non-UTF encodings.
+        There are 3 call sites that specify the TextEncoding to use from the Document, and even those call sites use a UTF encoding most of the time.
+        All other URL parsing is done using a well-optimized path which assumes UTF-8 encoding and uses macros from ICU headers, not a TextEncoding.
+        Moving the logic in this way breaks URL and URLParser's dependency on TextEncoding, which makes it possible to use in a lower-level project
+        without also moving TextEncoding, TextCodec, TextCodecICU, ThreadGlobalData, and the rest of WebCore and JavaScriptCore.
+
+        There is no observable change in behavior.  There is now one virtual function call in a code path in URLParser that is not performance-sensitive,
+        and TextEncodings now have a vtable, which uses a few more bytes of memory total for WebKit.
+
+        * css/parser/CSSParserContext.h:
+        (WebCore::CSSParserContext::completeURL const):
+        * css/parser/CSSParserIdioms.cpp:
+        (WebCore::completeURL):
+        * dom/Document.cpp:
+        (WebCore::Document::completeURL const):
+        * html/HTMLBaseElement.cpp:
+        (WebCore::HTMLBaseElement::href const):
+        Move the call to encodingForFormSubmission from the URL constructor to the 3 call sites that specify the encoding from the Document.
+        * loader/FormSubmission.cpp:
+        (WebCore::FormSubmission::create):
+        * loader/TextResourceDecoder.cpp:
+        (WebCore::TextResourceDecoder::encodingForURLParsing):
+        * loader/TextResourceDecoder.h:
+        * platform/URL.cpp:
+        (WebCore::URL::URL):
+        * platform/URL.h:
+        (WebCore::URLTextEncoding::~URLTextEncoding):
+        * platform/URLParser.cpp:
+        (WebCore::URLParser::encodeNonUTF8Query):
+        (WebCore::URLParser::copyURLPartsUntil):
+        (WebCore::URLParser::URLParser):
+        (WebCore::URLParser::parse):
+        (WebCore::URLParser::encodeQuery): Deleted.
+        A pointer replaces the boolean isUTF8Encoding and the TextEncoding& which had a default value of UTF8Encoding.
+        Now the pointer being null means that we use UTF8, and the pointer being non-null means we use that encoding.
+        * platform/URLParser.h:
+        (WebCore::URLParser::URLParser):
+        * platform/text/TextEncoding.cpp:
+        (WebCore::UTF7Encoding):
+        (WebCore::TextEncoding::encodingForFormSubmissionOrURLParsing const):
+        (WebCore::ASCIIEncoding):
+        (WebCore::Latin1Encoding):
+        (WebCore::UTF16BigEndianEncoding):
+        (WebCore::UTF16LittleEndianEncoding):
+        (WebCore::UTF8Encoding):
+        (WebCore::WindowsLatin1Encoding):
+        (WebCore::TextEncoding::encodingForFormSubmission const): Deleted.
+        Use NeverDestroyed because TextEncoding now has a virtual destructor.
+        * platform/text/TextEncoding.h:
+        Rename encodingForFormSubmission to encodingForFormSubmissionOrURLParsing to make it more clear that we are intentionally using it for both.
+
 2018-09-27  John Wilander  <wilander@apple.com>
 
         Resource Load Statistics: Remove temporary compatibility fix for auto-dismiss popups
index e458c1d..1ef7711 100644 (file)
@@ -69,7 +69,9 @@ public:
             return URL();
         if (charset.isEmpty())
             return URL(baseURL, url);
-        return URL(baseURL, url, TextEncoding(charset));
+        TextEncoding encoding(charset);
+        auto& encodingForURLParsing = encoding.encodingForFormSubmissionOrURLParsing();
+        return URL(baseURL, url, encodingForURLParsing == UTF8Encoding() ? nullptr : &encodingForURLParsing);
     }
 };
 
index 4bfcd8c..266e7bc 100644 (file)
@@ -47,11 +47,7 @@ bool isValueAllowedInMode(unsigned short id, CSSParserMode mode)
 
 URL completeURL(const CSSParserContext& context, const String& url)
 {
-    if (url.isNull())
-        return URL();
-    if (context.charset.isEmpty())
-        return URL(context.baseURL, url);
-    return URL(context.baseURL, url, context.charset);
+    return context.completeURL(url);
 }
 
 } // namespace WebCore
index 6290192..7e93053 100644 (file)
@@ -4894,7 +4894,7 @@ URL Document::completeURL(const String& url, const URL& baseURLOverride) const
     const URL& baseURL = ((baseURLOverride.isEmpty() || baseURLOverride == blankURL()) && parentDocument()) ? parentDocument()->baseURL() : baseURLOverride;
     if (!m_decoder)
         return URL(baseURL, url);
-    return URL(baseURL, url, m_decoder->encoding());
+    return URL(baseURL, url, m_decoder->encodingForURLParsing());
 }
 
 URL Document::completeURL(const String& url) const
index e6e2e6f..fa411c4 100644 (file)
@@ -89,9 +89,8 @@ URL HTMLBaseElement::href() const
     if (attributeValue.isNull())
         return document().url();
 
-    URL url = !document().decoder() ?
-        URL(document().url(), stripLeadingAndTrailingHTMLSpaces(attributeValue)) :
-        URL(document().url(), stripLeadingAndTrailingHTMLSpaces(attributeValue), document().decoder()->encoding());
+    auto* encoding = document().decoder() ? document().decoder()->encodingForURLParsing() : nullptr;
+    URL url(document().url(), stripLeadingAndTrailingHTMLSpaces(attributeValue), encoding);
 
     if (!url.isValid())
         return URL();
index 336dcc1..9b8e604 100644 (file)
@@ -175,7 +175,7 @@ Ref<FormSubmission> FormSubmission::create(HTMLFormElement& form, const Attribut
     }
 
     auto dataEncoding = isMailtoForm ? UTF8Encoding() : encodingFromAcceptCharset(copiedAttributes.acceptCharset(), document);
-    auto domFormData = DOMFormData::create(dataEncoding.encodingForFormSubmission());
+    auto domFormData = DOMFormData::create(dataEncoding.encodingForFormSubmissionOrURLParsing());
     StringPairVector formValues;
 
     bool containsPasswordData = false;
index 10e58d9..35bdf2b 100644 (file)
@@ -659,4 +659,16 @@ String TextResourceDecoder::decodeAndFlush(const char* data, size_t length)
     return decoded + flush();
 }
 
+const TextEncoding* TextResourceDecoder::encodingForURLParsing()
+{
+    // For UTF-{7,16,32}, we want to use UTF-8 for the query part as
+    // we do when submitting a form. A form with GET method
+    // has its contents added to a URL as query params and it makes sense
+    // to be consistent.
+    auto& encoding = m_encoding.encodingForFormSubmissionOrURLParsing();
+    if (encoding == UTF8Encoding())
+        return nullptr;
+    return &encoding;
+}
+
 }
index 1de2252..bd826cb 100644 (file)
@@ -48,6 +48,7 @@ public:
 
     void setEncoding(const TextEncoding&, EncodingSource);
     const TextEncoding& encoding() const { return m_encoding; }
+    const TextEncoding* encodingForURLParsing();
 
     bool hasEqualEncodingForCharset(const String& charset) const;
 
index 62618bd..a15c590 100644 (file)
@@ -103,19 +103,9 @@ URL::URL(ParsedURLStringTag, const String& url)
 #endif
 }
 
-URL::URL(const URL& base, const String& relative)
+URL::URL(const URL& base, const String& relative, const URLTextEncoding* encoding)
 {
-    URLParser parser(relative, base);
-    *this = parser.result();
-}
-
-URL::URL(const URL& base, const String& relative, const TextEncoding& encoding)
-{
-    // For UTF-{7,16,32}, we want to use UTF-8 for the query part as
-    // we do when submitting a form. A form with GET method
-    // has its contents added to a URL as query params and it makes sense
-    // to be consistent.
-    URLParser parser(relative, base, encoding.encodingForFormSubmission());
+    URLParser parser(relative, base, encoding);
     *this = parser.result();
 }
 
index d7e779a..1a6e6e4 100644 (file)
@@ -47,7 +47,12 @@ class TextStream;
 
 namespace WebCore {
 
-class TextEncoding;
+class URLTextEncoding {
+public:
+    virtual Vector<uint8_t> encodeForURLParsing(StringView) const = 0;
+    virtual ~URLTextEncoding() { };
+};
+
 struct URLHash;
 
 enum ParsedURLStringTag { ParsedURLString };
@@ -65,14 +70,13 @@ public:
     bool isHashTableDeletedValue() const { return string().isHashTableDeletedValue(); }
 
     // Resolves the relative URL with the given base URL. If provided, the
-    // TextEncoding is used to encode non-ASCII characers. The base URL can be
+    // URLTextEncoding is used to encode non-ASCII characers. The base URL can be
     // null or empty, in which case the relative URL will be interpreted as
     // absolute.
     // FIXME: If the base URL is invalid, this always creates an invalid
     // URL. Instead I think it would be better to treat all invalid base URLs
     // the same way we treate null and empty base URLs.
-    WEBCORE_EXPORT URL(const URL& base, const String& relative);
-    URL(const URL& base, const String& relative, const TextEncoding&);
+    WEBCORE_EXPORT URL(const URL& base, const String& relative, const URLTextEncoding* = nullptr);
 
     WEBCORE_EXPORT static URL fakeURLWithRelativePart(const String&);
     WEBCORE_EXPORT static URL fileURLWithFileSystemPath(const String&);
@@ -208,7 +212,6 @@ private:
     friend class URLParser;
     WEBCORE_EXPORT void invalidate();
     static bool protocolIs(const String&, const char*);
-    void init(const URL&, const String&, const TextEncoding&);
     void copyToBuffer(Vector<char, 512>& buffer) const;
     unsigned hostStart() const;
 
@@ -303,6 +306,7 @@ String mimeTypeFromDataURL(const String& url);
 // encoding (defaulting to UTF-8 otherwise). DANGER: If the URL has "%00"
 // in it, the resulting string will have embedded null characters!
 WEBCORE_EXPORT String decodeURLEscapeSequences(const String&);
+class TextEncoding;
 String decodeURLEscapeSequences(const String&, const TextEncoding&);
 
 // FIXME: This is a wrong concept to expose, different parts of a URL need different escaping per the URL Standard.
index 43ecab5..17bac49 100644 (file)
@@ -618,9 +618,9 @@ ALWAYS_INLINE void URLParser::utf8QueryEncode(const CodePointIterator<CharacterT
 }
 
 template<typename CharacterType>
-void URLParser::encodeQuery(const Vector<UChar>& source, const TextEncoding& encoding, CodePointIterator<CharacterType> iterator)
+void URLParser::encodeNonUTF8Query(const Vector<UChar>& source, const URLTextEncoding& encoding, CodePointIterator<CharacterType> iterator)
 {
-    auto encoded = encoding.encode(StringView(source.data(), source.size()), UnencodableHandling::URLEncodedEntities);
+    auto encoded = encoding.encodeForURLParsing(StringView(source.data(), source.size()));
     auto* data = encoded.data();
     size_t length = encoded.size();
     
@@ -880,7 +880,7 @@ void URLParser::copyASCIIStringUntil(const String& string, size_t length)
 }
 
 template<typename CharacterType>
-void URLParser::copyURLPartsUntil(const URL& base, URLPart part, const CodePointIterator<CharacterType>& iterator, bool& isUTF8Encoding)
+void URLParser::copyURLPartsUntil(const URL& base, URLPart part, const CodePointIterator<CharacterType>& iterator, const URLTextEncoding*& nonUTF8QueryEncoding)
 {
     syntaxViolation(iterator);
 
@@ -919,7 +919,7 @@ void URLParser::copyURLPartsUntil(const URL& base, URLPart part, const CodePoint
     switch (scheme(StringView(m_asciiBuffer.data(), m_url.m_schemeEnd))) {
     case Scheme::WS:
     case Scheme::WSS:
-        isUTF8Encoding = true;
+        nonUTF8QueryEncoding = nullptr;
         m_urlIsSpecial = true;
         return;
     case Scheme::File:
@@ -933,7 +933,7 @@ void URLParser::copyURLPartsUntil(const URL& base, URLPart part, const CodePoint
         return;
     case Scheme::NonSpecial:
         m_urlIsSpecial = false;
-        isUTF8Encoding = true;
+        nonUTF8QueryEncoding = nullptr;
         return;
     }
     ASSERT_NOT_REACHED();
@@ -1152,7 +1152,7 @@ ALWAYS_INLINE size_t URLParser::currentPosition(const CodePointIterator<Characte
     return iterator.codeUnitsSince(reinterpret_cast<const CharacterType*>(m_inputBegin));
 }
 
-URLParser::URLParser(const String& input, const URL& base, const TextEncoding& encoding)
+URLParser::URLParser(const String& input, const URL& base, const URLTextEncoding* nonUTF8QueryEncoding)
     : m_inputString(input)
 {
     if (input.isNull()) {
@@ -1165,10 +1165,10 @@ URLParser::URLParser(const String& input, const URL& base, const TextEncoding& e
 
     if (input.is8Bit()) {
         m_inputBegin = input.characters8();
-        parse(input.characters8(), input.length(), base, encoding);
+        parse(input.characters8(), input.length(), base, nonUTF8QueryEncoding);
     } else {
         m_inputBegin = input.characters16();
-        parse(input.characters16(), input.length(), base, encoding);
+        parse(input.characters16(), input.length(), base, nonUTF8QueryEncoding);
     }
 
     ASSERT(!m_url.m_isValid
@@ -1179,7 +1179,7 @@ URLParser::URLParser(const String& input, const URL& base, const TextEncoding& e
 #if !ASSERT_DISABLED
     if (!m_didSeeSyntaxViolation) {
         // Force a syntax violation at the beginning to make sure we get the same result.
-        URLParser parser(makeString(" ", input), base, encoding);
+        URLParser parser(makeString(" ", input), base, nonUTF8QueryEncoding);
         URL parsed = parser.result();
         if (parsed.isValid())
             ASSERT(allValuesEqual(parser.result(), m_url));
@@ -1188,13 +1188,12 @@ URLParser::URLParser(const String& input, const URL& base, const TextEncoding& e
 }
 
 template<typename CharacterType>
-void URLParser::parse(const CharacterType* input, const unsigned length, const URL& base, const TextEncoding& encoding)
+void URLParser::parse(const CharacterType* input, const unsigned length, const URL& base, const URLTextEncoding* nonUTF8QueryEncoding)
 {
-    URL_PARSER_LOG("Parsing URL <%s> base <%s> encoding <%s>", String(input, length).utf8().data(), base.string().utf8().data(), encoding.name());
+    URL_PARSER_LOG("Parsing URL <%s> base <%s>", String(input, length).utf8().data(), base.string().utf8().data());
     m_url = { };
     ASSERT(m_asciiBuffer.isEmpty());
-    
-    bool isUTF8Encoding = encoding == UTF8Encoding();
+
     Vector<UChar> queryBuffer;
 
     unsigned endIndex = length;
@@ -1287,7 +1286,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                     break;
                 case Scheme::WS:
                 case Scheme::WSS:
-                    isUTF8Encoding = true;
+                    nonUTF8QueryEncoding = nullptr;
                     m_urlIsSpecial = true;
                     if (base.protocolIs(urlScheme))
                         state = State::SpecialRelativeOrAuthority;
@@ -1309,7 +1308,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                     ++c;
                     break;
                 case Scheme::NonSpecial:
-                    isUTF8Encoding = true;
+                    nonUTF8QueryEncoding = nullptr;
                     auto maybeSlash = c;
                     advance(maybeSlash);
                     if (!maybeSlash.atEnd() && *maybeSlash == '/') {
@@ -1353,7 +1352,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 return;
             }
             if (base.m_cannotBeABaseURL && *c == '#') {
-                copyURLPartsUntil(base, URLPart::QueryEnd, c, isUTF8Encoding);
+                copyURLPartsUntil(base, URLPart::QueryEnd, c, nonUTF8QueryEncoding);
                 state = State::Fragment;
                 appendToASCIIBuffer('#');
                 ++c;
@@ -1363,7 +1362,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 state = State::Relative;
                 break;
             }
-            copyURLPartsUntil(base, URLPart::SchemeEnd, c, isUTF8Encoding);
+            copyURLPartsUntil(base, URLPart::SchemeEnd, c, nonUTF8QueryEncoding);
             appendToASCIIBuffer(':');
             state = State::File;
             break;
@@ -1413,24 +1412,23 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 ++c;
                 break;
             case '?':
-                copyURLPartsUntil(base, URLPart::PathEnd, c, isUTF8Encoding);
+                copyURLPartsUntil(base, URLPart::PathEnd, c, nonUTF8QueryEncoding);
                 appendToASCIIBuffer('?');
                 ++c;
-                if (isUTF8Encoding)
-                    state = State::UTF8Query;
-                else {
+                if (nonUTF8QueryEncoding) {
                     queryBegin = c;
                     state = State::NonUTF8Query;
-                }
+                } else
+                    state = State::UTF8Query;
                 break;
             case '#':
-                copyURLPartsUntil(base, URLPart::QueryEnd, c, isUTF8Encoding);
+                copyURLPartsUntil(base, URLPart::QueryEnd, c, nonUTF8QueryEncoding);
                 appendToASCIIBuffer('#');
                 state = State::Fragment;
                 ++c;
                 break;
             default:
-                copyURLPartsUntil(base, URLPart::PathAfterLastSlash, c, isUTF8Encoding);
+                copyURLPartsUntil(base, URLPart::PathAfterLastSlash, c, nonUTF8QueryEncoding);
                 if (currentPosition(c) && parsedDataView(currentPosition(c) - 1) != '/') {
                     appendToASCIIBuffer('/');
                     m_url.m_pathAfterLastSlash = currentPosition(c);
@@ -1443,7 +1441,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
             LOG_STATE("RelativeSlash");
             if (*c == '/' || *c == '\\') {
                 ++c;
-                copyURLPartsUntil(base, URLPart::SchemeEnd, c, isUTF8Encoding);
+                copyURLPartsUntil(base, URLPart::SchemeEnd, c, nonUTF8QueryEncoding);
                 appendToASCIIBuffer("://", 3);
                 if (m_urlIsSpecial)
                     state = State::SpecialAuthorityIgnoreSlashes;
@@ -1453,7 +1451,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                     authorityOrHostBegin = c;
                 }
             } else {
-                copyURLPartsUntil(base, URLPart::PortEnd, c, isUTF8Encoding);
+                copyURLPartsUntil(base, URLPart::PortEnd, c, nonUTF8QueryEncoding);
                 appendToASCIIBuffer('/');
                 m_url.m_pathAfterLastSlash = base.m_hostEnd + base.m_portLength + 1;
                 state = State::Path;
@@ -1584,7 +1582,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
             case '?':
                 syntaxViolation(c);
                 if (base.isValid() && base.protocolIs("file")) {
-                    copyURLPartsUntil(base, URLPart::PathEnd, c, isUTF8Encoding);
+                    copyURLPartsUntil(base, URLPart::PathEnd, c, nonUTF8QueryEncoding);
                     appendToASCIIBuffer('?');
                     ++c;
                 } else {
@@ -1598,17 +1596,16 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                     m_url.m_pathAfterLastSlash = m_url.m_userStart + 1;
                     m_url.m_pathEnd = m_url.m_pathAfterLastSlash;
                 }
-                if (isUTF8Encoding)
-                    state = State::UTF8Query;
-                else {
+                if (nonUTF8QueryEncoding) {
                     queryBegin = c;
                     state = State::NonUTF8Query;
-                }
+                } else
+                    state = State::UTF8Query;
                 break;
             case '#':
                 syntaxViolation(c);
                 if (base.isValid() && base.protocolIs("file")) {
-                    copyURLPartsUntil(base, URLPart::QueryEnd, c, isUTF8Encoding);
+                    copyURLPartsUntil(base, URLPart::QueryEnd, c, nonUTF8QueryEncoding);
                     appendToASCIIBuffer('#');
                 } else {
                     appendToASCIIBuffer("///#", 4);
@@ -1627,7 +1624,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
             default:
                 syntaxViolation(c);
                 if (base.isValid() && base.protocolIs("file") && shouldCopyFileURL(c))
-                    copyURLPartsUntil(base, URLPart::PathAfterLastSlash, c, isUTF8Encoding);
+                    copyURLPartsUntil(base, URLPart::PathAfterLastSlash, c, nonUTF8QueryEncoding);
                 else {
                     appendToASCIIBuffer("///", 3);
                     m_url.m_userStart = currentPosition(c) - 1;
@@ -1693,12 +1690,11 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                             syntaxViolation(c);
                             appendToASCIIBuffer("/?", 2);
                             ++c;
-                            if (isUTF8Encoding)
-                                state = State::UTF8Query;
-                            else {
+                            if (nonUTF8QueryEncoding) {
                                 queryBegin = c;
                                 state = State::NonUTF8Query;
-                            }
+                            } else
+                                state = State::UTF8Query;
                             m_url.m_pathAfterLastSlash = currentPosition(c) - 1;
                             m_url.m_pathEnd = m_url.m_pathAfterLastSlash;
                             break;
@@ -1771,12 +1767,11 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 m_url.m_pathEnd = currentPosition(c);
                 appendToASCIIBuffer('?');
                 ++c;
-                if (isUTF8Encoding)
-                    state = State::UTF8Query;
-                else {
+                if (nonUTF8QueryEncoding) {
                     queryBegin = c;
                     state = State::NonUTF8Query;
-                }
+                } else
+                    state = State::UTF8Query;
                 break;
             }
             if (*c == '#') {
@@ -1794,12 +1789,11 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 m_url.m_pathEnd = currentPosition(c);
                 appendToASCIIBuffer('?');
                 ++c;
-                if (isUTF8Encoding)
-                    state = State::UTF8Query;
-                else {
+                if (nonUTF8QueryEncoding) {
                     queryBegin = c;
                     state = State::NonUTF8Query;
-                }
+                } else
+                    state = State::UTF8Query;
             } else if (*c == '#') {
                 m_url.m_pathEnd = currentPosition(c);
                 m_url.m_queryEnd = m_url.m_pathEnd;
@@ -1821,10 +1815,8 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 state = State::Fragment;
                 break;
             }
-            if (isUTF8Encoding)
-                utf8QueryEncode(c);
-            else
-                appendCodePoint(queryBuffer, *c);
+            ASSERT(!nonUTF8QueryEncoding);
+            utf8QueryEncode(c);
             ++c;
             break;
         case State::NonUTF8Query:
@@ -1832,7 +1824,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
                 LOG_STATE("NonUTF8Query");
                 ASSERT(queryBegin != CodePointIterator<CharacterType>());
                 if (*c == '#') {
-                    encodeQuery(queryBuffer, encoding, CodePointIterator<CharacterType>(queryBegin, c));
+                    encodeNonUTF8Query(queryBuffer, *nonUTF8QueryEncoding, CodePointIterator<CharacterType>(queryBegin, c));
                     m_url.m_queryEnd = currentPosition(c);
                     state = State::Fragment;
                     break;
@@ -1868,7 +1860,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
         RELEASE_ASSERT_NOT_REACHED();
     case State::SpecialRelativeOrAuthority:
         LOG_FINAL_STATE("SpecialRelativeOrAuthority");
-        copyURLPartsUntil(base, URLPart::QueryEnd, c, isUTF8Encoding);
+        copyURLPartsUntil(base, URLPart::QueryEnd, c, nonUTF8QueryEncoding);
         break;
     case State::PathOrAuthority:
         LOG_FINAL_STATE("PathOrAuthority");
@@ -1889,7 +1881,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
         RELEASE_ASSERT_NOT_REACHED();
     case State::RelativeSlash:
         LOG_FINAL_STATE("RelativeSlash");
-        copyURLPartsUntil(base, URLPart::PortEnd, c, isUTF8Encoding);
+        copyURLPartsUntil(base, URLPart::PortEnd, c, nonUTF8QueryEncoding);
         appendToASCIIBuffer('/');
         m_url.m_pathAfterLastSlash = m_url.m_hostEnd + m_url.m_portLength + 1;
         m_url.m_pathEnd = m_url.m_pathAfterLastSlash;
@@ -1952,7 +1944,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
     case State::File:
         LOG_FINAL_STATE("File");
         if (base.isValid() && base.protocolIs("file")) {
-            copyURLPartsUntil(base, URLPart::QueryEnd, c, isUTF8Encoding);
+            copyURLPartsUntil(base, URLPart::QueryEnd, c, nonUTF8QueryEncoding);
             break;
         }
         syntaxViolation(c);
@@ -2047,7 +2039,7 @@ void URLParser::parse(const CharacterType* input, const unsigned length, const U
     case State::NonUTF8Query:
         LOG_FINAL_STATE("NonUTF8Query");
         ASSERT(queryBegin != CodePointIterator<CharacterType>());
-        encodeQuery(queryBuffer, encoding, CodePointIterator<CharacterType>(queryBegin, c));
+        encodeNonUTF8Query(queryBuffer, *nonUTF8QueryEncoding, CodePointIterator<CharacterType>(queryBegin, c));
         m_url.m_queryEnd = currentPosition(c);
         break;
     case State::Fragment:
index 5534de2..f65beb0 100644 (file)
@@ -25,7 +25,6 @@
 
 #pragma once
 
-#include "TextEncoding.h"
 #include "URL.h"
 #include <wtf/Expected.h>
 #include <wtf/Forward.h>
@@ -38,7 +37,7 @@ template<typename CharacterType> class CodePointIterator;
 
 class URLParser {
 public:
-    WEBCORE_EXPORT URLParser(const String&, const URL& = { }, const TextEncoding& = UTF8Encoding());
+    WEBCORE_EXPORT URLParser(const String&, const URL& = { }, const URLTextEncoding* = nullptr);
     URL result() { return m_url; }
 
     WEBCORE_EXPORT static bool allValuesEqual(const URL&, const URL&);
@@ -70,7 +69,7 @@ private:
     static constexpr size_t defaultInlineBufferSize = 2048;
     using LCharBuffer = Vector<LChar, defaultInlineBufferSize>;
 
-    template<typename CharacterType> void parse(const CharacterType*, const unsigned length, const URL&, const TextEncoding&);
+    template<typename CharacterType> void parse(const CharacterType*, const unsigned length, const URL&, const URLTextEncoding*);
     template<typename CharacterType> void parseAuthority(CodePointIterator<CharacterType>);
     template<typename CharacterType> bool parseHostAndPort(CodePointIterator<CharacterType>);
     template<typename CharacterType> bool parsePort(CodePointIterator<CharacterType>&);
@@ -107,7 +106,7 @@ private:
     void appendToASCIIBuffer(UChar32);
     void appendToASCIIBuffer(const char*, size_t);
     void appendToASCIIBuffer(const LChar* characters, size_t size) { appendToASCIIBuffer(reinterpret_cast<const char*>(characters), size); }
-    template<typename CharacterType> void encodeQuery(const Vector<UChar>& source, const TextEncoding&, CodePointIterator<CharacterType>);
+    template<typename CharacterType> void encodeNonUTF8Query(const Vector<UChar>& source, const URLTextEncoding&, CodePointIterator<CharacterType>);
     void copyASCIIStringUntil(const String&, size_t length);
     bool copyBaseWindowsDriveLetter(const URL&);
     StringView parsedDataView(size_t start, size_t length);
@@ -127,7 +126,7 @@ private:
     void serializeIPv6(IPv6Address);
 
     enum class URLPart;
-    template<typename CharacterType> void copyURLPartsUntil(const URL& base, URLPart, const CodePointIterator<CharacterType>&, bool& isUTF8Encoding);
+    template<typename CharacterType> void copyURLPartsUntil(const URL& base, URLPart, const CodePointIterator<CharacterType>&, const URLTextEncoding*&);
     static size_t urlLengthUntilPart(const URL&, URLPart);
     void popPath();
     bool shouldPopPath(unsigned);
index 405f933..a009616 100644 (file)
@@ -31,6 +31,7 @@
 #include "TextCodec.h"
 #include "TextEncodingRegistry.h"
 #include <unicode/unorm.h>
+#include <wtf/NeverDestroyed.h>
 #include <wtf/StdLibExtras.h>
 #include <wtf/text/CString.h>
 #include <wtf/text/StringView.h>
@@ -39,7 +40,7 @@ namespace WebCore {
 
 static const TextEncoding& UTF7Encoding()
 {
-    static TextEncoding globalUTF7Encoding("UTF-7");
+    static NeverDestroyed<TextEncoding> globalUTF7Encoding("UTF-7");
     return globalUTF7Encoding;
 }
 
@@ -173,7 +174,7 @@ const TextEncoding& TextEncoding::closestByteBasedEquivalent() const
 // byte-based encoding and can contain 0x00. By extension, the same
 // should be done for UTF-32. In case of UTF-7, it is a byte-based encoding,
 // but it's fraught with problems and we'd rather steer clear of it.
-const TextEncoding& TextEncoding::encodingForFormSubmission() const
+const TextEncoding& TextEncoding::encodingForFormSubmissionOrURLParsing() const
 {
     if (isNonByteBasedEncoding() || isUTF7Encoding())
         return UTF8Encoding();
@@ -182,38 +183,38 @@ const TextEncoding& TextEncoding::encodingForFormSubmission() const
 
 const TextEncoding& ASCIIEncoding()
 {
-    static TextEncoding globalASCIIEncoding("ASCII");
+    static NeverDestroyed<TextEncoding> globalASCIIEncoding("ASCII");
     return globalASCIIEncoding;
 }
 
 const TextEncoding& Latin1Encoding()
 {
-    static TextEncoding globalLatin1Encoding("latin1");
+    static NeverDestroyed<TextEncoding> globalLatin1Encoding("latin1");
     return globalLatin1Encoding;
 }
 
 const TextEncoding& UTF16BigEndianEncoding()
 {
-    static TextEncoding globalUTF16BigEndianEncoding("UTF-16BE");
+    static NeverDestroyed<TextEncoding> globalUTF16BigEndianEncoding("UTF-16BE");
     return globalUTF16BigEndianEncoding;
 }
 
 const TextEncoding& UTF16LittleEndianEncoding()
 {
-    static TextEncoding globalUTF16LittleEndianEncoding("UTF-16LE");
+    static NeverDestroyed<TextEncoding> globalUTF16LittleEndianEncoding("UTF-16LE");
     return globalUTF16LittleEndianEncoding;
 }
 
 const TextEncoding& UTF8Encoding()
 {
-    static TextEncoding globalUTF8Encoding("UTF-8");
-    ASSERT(globalUTF8Encoding.isValid());
+    static NeverDestroyed<TextEncoding> globalUTF8Encoding("UTF-8");
+    ASSERT(globalUTF8Encoding.get().isValid());
     return globalUTF8Encoding;
 }
 
 const TextEncoding& WindowsLatin1Encoding()
 {
-    static TextEncoding globalWindowsLatin1Encoding("WinLatin-1");
+    static NeverDestroyed<TextEncoding> globalWindowsLatin1Encoding("WinLatin-1");
     return globalWindowsLatin1Encoding;
 }
 
index 64654d8..ffd35fc 100644 (file)
 
 #pragma once
 
+#include "URL.h"
 #include <pal/text/UnencodableHandling.h>
 #include <wtf/text/WTFString.h>
 
 namespace WebCore {
 
-class TextEncoding {
+class TextEncoding : public URLTextEncoding {
 public:
     TextEncoding() = default;
     WEBCORE_EXPORT TextEncoding(const char* name);
@@ -43,11 +44,12 @@ public:
     bool isJapanese() const;
 
     const TextEncoding& closestByteBasedEquivalent() const;
-    const TextEncoding& encodingForFormSubmission() const;
+    const TextEncoding& encodingForFormSubmissionOrURLParsing() const;
 
     WEBCORE_EXPORT String decode(const char*, size_t length, bool stopOnError, bool& sawError) const;
     String decode(const char*, size_t length) const;
-    Vector<uint8_t> encode(StringView, UnencodableHandling) const;
+    WEBCORE_EXPORT Vector<uint8_t> encode(StringView, UnencodableHandling) const;
+    Vector<uint8_t> encodeForURLParsing(StringView string) const final { return encode(string, UnencodableHandling::URLEncodedEntities); }
 
     UChar backslashAsCurrencySymbol() const;
     bool isByteBasedEncoding() const { return !isNonByteBasedEncoding(); }
index 74c6925..f6ef1ec 100644 (file)
@@ -1,3 +1,14 @@
+2018-09-27  Alex Christensen  <achristensen@webkit.org>
+
+        URLParser should use TextEncoding through an abstract class
+        https://bugs.webkit.org/show_bug.cgi?id=190027
+
+        Reviewed by Andy Estes.
+
+        * TestWebKitAPI/Tests/WebCore/URLParser.cpp:
+        (TestWebKitAPI::checkURL):
+        (TestWebKitAPI::TEST_F):
+
 2018-09-27  Ryan Haddad  <ryanhaddad@apple.com>
 
         iOS Simulator bots should pass '--dedicated-simulators' to run-webkit-tests
index 4485ec4..56cb7a7 100644 (file)
@@ -25,6 +25,7 @@
 
 #include "config.h"
 #include "WTFStringUtilities.h"
+#include <WebCore/TextEncoding.h>
 #include <WebCore/URLParser.h>
 #include <wtf/MainThread.h>
 #include <wtf/text/StringBuilder.h>
@@ -210,7 +211,7 @@ static void shouldFail(const String& urlString, const String& baseString)
     checkRelativeURL(urlString, baseString, {"", "", "", "", 0, "", "", "", urlString});
 }
 
-static void checkURL(const String& urlString, const TextEncoding& encoding, const ExpectedParts& parts, TestTabs testTabs = TestTabs::Yes)
+static void checkURL(const String& urlString, const TextEncoding* encoding, const ExpectedParts& parts, TestTabs testTabs = TestTabs::Yes)
 {
     URLParser parser(urlString, { }, encoding);
     auto url = parser.result();
@@ -235,7 +236,7 @@ static void checkURL(const String& urlString, const TextEncoding& encoding, cons
     }
 }
 
-static void checkURL(const String& urlString, const String& baseURLString, const TextEncoding& encoding, const ExpectedParts& parts, TestTabs testTabs = TestTabs::Yes)
+static void checkURL(const String& urlString, const String& baseURLString, const TextEncoding* encoding, const ExpectedParts& parts, TestTabs testTabs = TestTabs::Yes)
 {
     URLParser baseParser(baseURLString, { }, encoding);
     URLParser parser(urlString, baseParser.result(), encoding);
@@ -1285,37 +1286,37 @@ TEST_F(URLParserTest, AdditionalTests)
 
 TEST_F(URLParserTest, QueryEncoding)
 {
-    checkURL(utf16String(u"http://host?ß😍#ß😍"), UTF8Encoding(), {"http", "", "", "host", 0, "/", "%C3%9F%F0%9F%98%8D", "%C3%9F%F0%9F%98%8D", utf16String(u"http://host/?%C3%9F%F0%9F%98%8D#%C3%9F%F0%9F%98%8D")}, testTabsValueForSurrogatePairs);
+    checkURL(utf16String(u"http://host?ß😍#ß😍"), nullptr, {"http", "", "", "host", 0, "/", "%C3%9F%F0%9F%98%8D", "%C3%9F%F0%9F%98%8D", utf16String(u"http://host/?%C3%9F%F0%9F%98%8D#%C3%9F%F0%9F%98%8D")}, testTabsValueForSurrogatePairs);
 
     TextEncoding latin1(String("latin1"));
-    checkURL("http://host/?query with%20spaces", latin1, {"http", "", "", "host", 0, "/", "query%20with%20spaces", "", "http://host/?query%20with%20spaces"});
-    checkURL("http://host/?query", latin1, {"http", "", "", "host", 0, "/", "query", "", "http://host/?query"});
-    checkURL("http://host/?\tquery", latin1, {"http", "", "", "host", 0, "/", "query", "", "http://host/?query"});
-    checkURL("http://host/?q\tuery", latin1, {"http", "", "", "host", 0, "/", "query", "", "http://host/?query"});
-    checkURL("http://host/?query with SpAcEs#fragment", latin1, {"http", "", "", "host", 0, "/", "query%20with%20SpAcEs", "fragment", "http://host/?query%20with%20SpAcEs#fragment"});
-    checkURL("http://host/?que\rry\t\r\n#fragment", latin1, {"http", "", "", "host", 0, "/", "query", "fragment", "http://host/?query#fragment"});
+    checkURL("http://host/?query with%20spaces", &latin1, {"http", "", "", "host", 0, "/", "query%20with%20spaces", "", "http://host/?query%20with%20spaces"});
+    checkURL("http://host/?query", &latin1, {"http", "", "", "host", 0, "/", "query", "", "http://host/?query"});
+    checkURL("http://host/?\tquery", &latin1, {"http", "", "", "host", 0, "/", "query", "", "http://host/?query"});
+    checkURL("http://host/?q\tuery", &latin1, {"http", "", "", "host", 0, "/", "query", "", "http://host/?query"});
+    checkURL("http://host/?query with SpAcEs#fragment", &latin1, {"http", "", "", "host", 0, "/", "query%20with%20SpAcEs", "fragment", "http://host/?query%20with%20SpAcEs#fragment"});
+    checkURL("http://host/?que\rry\t\r\n#fragment", &latin1, {"http", "", "", "host", 0, "/", "query", "fragment", "http://host/?query#fragment"});
 
     TextEncoding unrecognized(String("unrecognized invalid encoding name"));
-    checkURL("http://host/?query", unrecognized, {"http", "", "", "host", 0, "/", "", "", "http://host/?"});
-    checkURL("http://host/?", unrecognized, {"http", "", "", "host", 0, "/", "", "", "http://host/?"});
+    checkURL("http://host/?query", &unrecognized, {"http", "", "", "host", 0, "/", "", "", "http://host/?"});
+    checkURL("http://host/?", &unrecognized, {"http", "", "", "host", 0, "/", "", "", "http://host/?"});
 
     TextEncoding iso88591(String("ISO-8859-1"));
     String withUmlauts = utf16String<4>({0xDC, 0x430, 0x451, '\0'});
-    checkURL(makeString("ws://host/path?", withUmlauts), iso88591, {"ws", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "", "ws://host/path?%C3%9C%D0%B0%D1%91"});
-    checkURL(makeString("wss://host/path?", withUmlauts), iso88591, {"wss", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "", "wss://host/path?%C3%9C%D0%B0%D1%91"});
-    checkURL(makeString("asdf://host/path?", withUmlauts), iso88591, {"asdf", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "", "asdf://host/path?%C3%9C%D0%B0%D1%91"});
-    checkURL(makeString("https://host/path?", withUmlauts), iso88591, {"https", "", "", "host", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "", "https://host/path?%DC%26%231072%3B%26%231105%3B"});
-    checkURL(makeString("gopher://host/path?", withUmlauts), iso88591, {"gopher", "", "", "host", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "", "gopher://host/path?%DC%26%231072%3B%26%231105%3B"});
-    checkURL(makeString("/path?", withUmlauts, "#fragment"), "ws://example.com/", iso88591, {"ws", "", "", "example.com", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "ws://example.com/path?%C3%9C%D0%B0%D1%91#fragment"});
-    checkURL(makeString("/path?", withUmlauts, "#fragment"), "wss://example.com/", iso88591, {"wss", "", "", "example.com", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "wss://example.com/path?%C3%9C%D0%B0%D1%91#fragment"});
-    checkURL(makeString("/path?", withUmlauts, "#fragment"), "asdf://example.com/", iso88591, {"asdf", "", "", "example.com", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "asdf://example.com/path?%C3%9C%D0%B0%D1%91#fragment"});
-    checkURL(makeString("/path?", withUmlauts, "#fragment"), "https://example.com/", iso88591, {"https", "", "", "example.com", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "fragment", "https://example.com/path?%DC%26%231072%3B%26%231105%3B#fragment"});
-    checkURL(makeString("/path?", withUmlauts, "#fragment"), "gopher://example.com/", iso88591, {"gopher", "", "", "example.com", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "fragment", "gopher://example.com/path?%DC%26%231072%3B%26%231105%3B#fragment"});
-    checkURL(makeString("gopher://host/path?", withUmlauts, "#fragment"), "asdf://example.com/?doesntmatter", iso88591, {"gopher", "", "", "host", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "fragment", "gopher://host/path?%DC%26%231072%3B%26%231105%3B#fragment"});
-    checkURL(makeString("asdf://host/path?", withUmlauts, "#fragment"), "http://example.com/?doesntmatter", iso88591, {"asdf", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "asdf://host/path?%C3%9C%D0%B0%D1%91#fragment"});
+    checkURL(makeString("ws://host/path?", withUmlauts), &iso88591, {"ws", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "", "ws://host/path?%C3%9C%D0%B0%D1%91"});
+    checkURL(makeString("wss://host/path?", withUmlauts), &iso88591, {"wss", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "", "wss://host/path?%C3%9C%D0%B0%D1%91"});
+    checkURL(makeString("asdf://host/path?", withUmlauts), &iso88591, {"asdf", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "", "asdf://host/path?%C3%9C%D0%B0%D1%91"});
+    checkURL(makeString("https://host/path?", withUmlauts), &iso88591, {"https", "", "", "host", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "", "https://host/path?%DC%26%231072%3B%26%231105%3B"});
+    checkURL(makeString("gopher://host/path?", withUmlauts), &iso88591, {"gopher", "", "", "host", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "", "gopher://host/path?%DC%26%231072%3B%26%231105%3B"});
+    checkURL(makeString("/path?", withUmlauts, "#fragment"), "ws://example.com/", &iso88591, {"ws", "", "", "example.com", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "ws://example.com/path?%C3%9C%D0%B0%D1%91#fragment"});
+    checkURL(makeString("/path?", withUmlauts, "#fragment"), "wss://example.com/", &iso88591, {"wss", "", "", "example.com", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "wss://example.com/path?%C3%9C%D0%B0%D1%91#fragment"});
+    checkURL(makeString("/path?", withUmlauts, "#fragment"), "asdf://example.com/", &iso88591, {"asdf", "", "", "example.com", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "asdf://example.com/path?%C3%9C%D0%B0%D1%91#fragment"});
+    checkURL(makeString("/path?", withUmlauts, "#fragment"), "https://example.com/", &iso88591, {"https", "", "", "example.com", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "fragment", "https://example.com/path?%DC%26%231072%3B%26%231105%3B#fragment"});
+    checkURL(makeString("/path?", withUmlauts, "#fragment"), "gopher://example.com/", &iso88591, {"gopher", "", "", "example.com", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "fragment", "gopher://example.com/path?%DC%26%231072%3B%26%231105%3B#fragment"});
+    checkURL(makeString("gopher://host/path?", withUmlauts, "#fragment"), "asdf://example.com/?doesntmatter", &iso88591, {"gopher", "", "", "host", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "fragment", "gopher://host/path?%DC%26%231072%3B%26%231105%3B#fragment"});
+    checkURL(makeString("asdf://host/path?", withUmlauts, "#fragment"), "http://example.com/?doesntmatter", &iso88591, {"asdf", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "asdf://host/path?%C3%9C%D0%B0%D1%91#fragment"});
 
-    checkURL("http://host/pa'th?qu'ery#fr'agment", UTF8Encoding(), {"http", "", "", "host", 0, "/pa'th", "qu%27ery", "fr'agment", "http://host/pa'th?qu%27ery#fr'agment"});
-    checkURL("asdf://host/pa'th?qu'ery#fr'agment", UTF8Encoding(), {"asdf", "", "", "host", 0, "/pa'th", "qu'ery", "fr'agment", "asdf://host/pa'th?qu'ery#fr'agment"});
+    checkURL("http://host/pa'th?qu'ery#fr'agment", nullptr, {"http", "", "", "host", 0, "/pa'th", "qu%27ery", "fr'agment", "http://host/pa'th?qu%27ery#fr'agment"});
+    checkURL("asdf://host/pa'th?qu'ery#fr'agment", nullptr, {"asdf", "", "", "host", 0, "/pa'th", "qu'ery", "fr'agment", "asdf://host/pa'th?qu'ery#fr'agment"});
     // FIXME: Add more tests with other encodings and things like non-ascii characters, emoji and unmatched surrogate pairs.
 }