2011-02-09 Eric Seidel <eric@webkit.org>
authoreric@webkit.org <eric@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 9 Feb 2011 11:53:26 +0000 (11:53 +0000)
committereric@webkit.org <eric@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 9 Feb 2011 11:53:26 +0000 (11:53 +0000)
        Reviewed by Adam Barth.

        Make WebKit's fragment cannonicalization match other browsers
        https://bugs.webkit.org/show_bug.cgi?id=53850

        * fast/dom/HTMLAnchorElement/set-href-attribute-hash.html: Updated to match IE/Chrome
        * fast/dom/HTMLAnchorElement/set-href-attribute-hash-expected.txt: Updated to match IE/Chrome.
        * fast/url/anchor-expected.txt:
         - "hello world": Our new behavior here matches IE and Chrome, but diverges from FF.
         - The last two tests involving #, we were the odd man out. Now match all browsers.
        * fast/url/segments-expected.txt:
         - Don't percent encode spaces in fragments (to match other browsers)
         - WebKit was the only engine encoding # in fragments.
        * fast/url/segments-from-data-url-expected.txt:
2011-02-09  Eric Seidel  <eric@webkit.org>

        Reviewed by Adam Barth.

        Make WebKit's fragment cannonicalization match other browsers
        https://bugs.webkit.org/show_bug.cgi?id=53850

        This doesn't make us match perfectly, but it brings us closer.

        * platform/KURL.cpp:
        (WebCore::appendEscapingBadChars):
        (WebCore::escapeAndAppendFragment):
        (WebCore::KURL::parse):

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@78040 268f45cc-cd09-0410-ab3c-d52691b4dbfc

LayoutTests/ChangeLog
LayoutTests/fast/dom/HTMLAnchorElement/script-tests/set-href-attribute-hash.js
LayoutTests/fast/dom/HTMLAnchorElement/set-href-attribute-hash-expected.txt
LayoutTests/fast/dom/anchor-getParameter-expected.txt
LayoutTests/fast/url/anchor-expected.txt
LayoutTests/fast/url/script-tests/anchor.js
LayoutTests/fast/url/segments-expected.txt
LayoutTests/fast/url/segments-from-data-url-expected.txt
Source/WebCore/ChangeLog
Source/WebCore/platform/KURL.cpp

index b0122c3..4024f87 100644 (file)
@@ -1,3 +1,20 @@
+2011-02-09  Eric Seidel  <eric@webkit.org>
+
+        Reviewed by Adam Barth.
+
+        Make WebKit's fragment cannonicalization match other browsers
+        https://bugs.webkit.org/show_bug.cgi?id=53850
+
+        * fast/dom/HTMLAnchorElement/set-href-attribute-hash.html: Updated to match IE/Chrome
+        * fast/dom/HTMLAnchorElement/set-href-attribute-hash-expected.txt: Updated to match IE/Chrome.
+        * fast/url/anchor-expected.txt:
+         - "hello world": Our new behavior here matches IE and Chrome, but diverges from FF.
+         - The last two tests involving #, we were the odd man out. Now match all browsers.
+        * fast/url/segments-expected.txt:
+         - Don't percent encode spaces in fragments (to match other browsers)
+         - WebKit was the only engine encoding # in fragments.
+        * fast/url/segments-from-data-url-expected.txt:
+
 2011-02-09  Csaba Osztrogonác  <ossy@webkit.org>
 
         Unreviewed.
index 5cce834..c28dceb 100644 (file)
@@ -50,7 +50,7 @@ shouldBe("a.href", "'mailto:e-mail_address@goes_here#hash-value'");
 debug("Add hash to file: protocol");
 a.href = "file:///some path";
 a.hash = "hash value";
-shouldBe("a.href", "'file:///some%20path#hash%20value'");
+shouldBe("a.href", "'file:///some%20path#hash value'");
 
 debug("Set hash to '#'");
 a.href = "http://mydomain.com#middle";
index 6515820..a757f84 100644 (file)
@@ -18,7 +18,7 @@ PASS a.href is 'https://www.mydomain.com/path/testurl.html#'
 Add hash to mailto: protocol
 PASS a.href is 'mailto:e-mail_address@goes_here#hash-value'
 Add hash to file: protocol
-PASS a.href is 'file:///some%20path#hash%20value'
+PASS a.href is 'file:///some%20path#hash value'
 Set hash to '#'
 PASS a.href is 'http://mydomain.com/#'
 Add hash to non-standard protocol
index 6f0253f..2d8f5d7 100644 (file)
@@ -16,5 +16,5 @@ http://example.com/foo/bar?a=b=& (b) =>
 http://example.com/foo/bar?&a=b (a) => b
 http://example.com/foo/bar?&a=b& (a) => b
 http://example.com/foo/bar?a=b#xyz (a) => b
-http://example.com/foo/bar?#a=b%23xyz (a) => 
+http://example.com/foo/bar?#a=b#xyz (a) => 
 
index 15dc84f..c7c8b1d 100644 (file)
@@ -3,14 +3,15 @@ Test URLs that have an anchor.
 On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
 
 
-FAIL canonicalize('http://www.example.com/#hello, world') should be http://www.example.com/#hello, world. Was http://www.example.com/#hello,%20world.
+PASS canonicalize('http://www.example.com/#hello, world') is 'http://www.example.com/#hello, world'
 FAIL canonicalize('http://www.example.com/#©') should be http://www.example.com/#©. Was http://www.example.com/#%C2%A9.
 FAIL canonicalize('http://www.example.com/#𐌀ss') should be http://www.example.com/#𐌀ss. Was http://www.example.com/#%26%2366304%3Bss.
 PASS canonicalize('http://www.example.com/#%41%a') is 'http://www.example.com/#%41%a'
 FAIL canonicalize('http://www.example.com/#\ud800\u597d') should be http://www.example.com/#�好. Was http://www.example.com/#%26%2355296%3B%26%2322909%3B.
 FAIL canonicalize('http://www.example.com/#a\uFDD0') should be http://www.example.com/#a﷐. Was http://www.example.com/#a%26%2364976%3B.
-FAIL canonicalize('http://www.example.com/#asdf#qwer') should be http://www.example.com/#asdf#qwer. Was http://www.example.com/#asdf%23qwer.
-FAIL canonicalize('http://www.example.com/##asdf') should be http://www.example.com/##asdf. Was http://www.example.com/#%23asdf.
+PASS canonicalize('http://www.example.com/#asdf#qwer') is 'http://www.example.com/#asdf#qwer'
+PASS canonicalize('http://www.example.com/##asdf') is 'http://www.example.com/##asdf'
+PASS canonicalize('http://www.example.com/#a\nb\rc\td') is 'http://www.example.com/#abcd'
 PASS successfullyParsed is true
 
 TEST COMPLETE
index 0387a86..8791b74 100644 (file)
@@ -9,6 +9,7 @@ cases = [
   ["a\\uFDD0", "a\\uFDD0"],
   ["asdf#qwer", "asdf#qwer"],
   ["#asdf", "#asdf"],
+  ["a\\nb\\rc\\td", "abcd"],
 ];
 
 for (var i = 0; i < cases.length; ++i) {
index 2da6912..f53fbb1 100644 (file)
@@ -8,7 +8,7 @@ PASS segments('http:foo.com') is '["http:","example.org","0","/foo/foo.com","","
 PASS segments('\t   :foo.com   \n') is '["http:","example.org","0","/foo/:foo.com","",""]'
 PASS segments(' foo.com  ') is '["http:","example.org","0","/foo/foo.com","",""]'
 PASS segments('a:\t foo.com') is '["a:","","0"," foo.com","",""]'
-FAIL segments('http://f:21/ b ? d # e ') should be ["http:","f","21","/%20b%20","?%20d%20","# e"]. Was ["http:","f","21","/ b ","?%20d%20","#%20e"].
+FAIL segments('http://f:21/ b ? d # e ') should be ["http:","f","21","/%20b%20","?%20d%20","# e"]. Was ["http:","f","21","/ b ","?%20d%20","# e"].
 PASS segments('http://f:/c') is '["http:","f","0","/c","",""]'
 PASS segments('http://f:0/c') is '["http:","f","0","/c","",""]'
 PASS segments('http://f:00000000000000/c') is '["http:","f","0","/c","",""]'
@@ -55,7 +55,7 @@ PASS segments('foo://///////bar.com/') is '["foo:","","0","/////////bar.com/",""
 PASS segments('foo:////://///') is '["foo:","","0","////://///","",""]'
 PASS segments('c:/foo') is '["c:","","0","/foo","",""]'
 PASS segments('//foo/bar') is '["http:","foo","0","/bar","",""]'
-FAIL segments('http://foo/path;a??e#f#g') should be ["http:","foo","0","/path;a","??e","#f#g"]. Was ["http:","foo","0","/path;a","??e","#f%23g"].
+PASS segments('http://foo/path;a??e#f#g') is '["http:","foo","0","/path;a","??e","#f#g"]'
 PASS segments('http://foo/abcd?efgh?ijkl') is '["http:","foo","0","/abcd","?efgh?ijkl",""]'
 PASS segments('http://foo/abcd#foo?bar') is '["http:","foo","0","/abcd","","#foo?bar"]'
 PASS segments('[61:24:74]:98') is '["http:","example.org","0","/foo/[61:24:74]:98","",""]'
index 6e1853e..6ae93b8 100644 (file)
@@ -8,7 +8,7 @@ FAIL segments('http:foo.com') should be ["http:","foo.com","0","/","",""]. Was [
 PASS segments('\t   :foo.com   \n') is '[":","","0","","",""]'
 PASS segments(' foo.com  ') is '[":","","0","","",""]'
 PASS segments('a:\t foo.com') is '["a:","","0"," foo.com","",""]'
-FAIL segments('http://f:21/ b ? d # e ') should be ["http:","f","21","/%20b%20","?%20d%20","# e"]. Was ["http:","f","21","/ b ","?%20d%20","#%20e"].
+FAIL segments('http://f:21/ b ? d # e ') should be ["http:","f","21","/%20b%20","?%20d%20","# e"]. Was ["http:","f","21","/ b ","?%20d%20","# e"].
 PASS segments('http://f:/c') is '["http:","f","0","/c","",""]'
 PASS segments('http://f:0/c') is '["http:","f","0","/c","",""]'
 PASS segments('http://f:00000000000000/c') is '["http:","f","0","/c","",""]'
@@ -55,7 +55,7 @@ PASS segments('foo://///////bar.com/') is '["foo:","","0","/////////bar.com/",""
 PASS segments('foo:////://///') is '["foo:","","0","////://///","",""]'
 PASS segments('c:/foo') is '["c:","","0","/foo","",""]'
 PASS segments('//foo/bar') is '[":","","0","","",""]'
-FAIL segments('http://foo/path;a??e#f#g') should be ["http:","foo","0","/path;a","??e","#f#g"]. Was ["http:","foo","0","/path;a","??e","#f%23g"].
+PASS segments('http://foo/path;a??e#f#g') is '["http:","foo","0","/path;a","??e","#f#g"]'
 PASS segments('http://foo/abcd?efgh?ijkl') is '["http:","foo","0","/abcd","?efgh?ijkl",""]'
 PASS segments('http://foo/abcd#foo?bar') is '["http:","foo","0","/abcd","","#foo?bar"]'
 FAIL segments('[61:24:74]:98') should be ["data:","","0","text/[61:24:74]:98","",""]. Was [":","","0","","",""].
index e901267..cc5791c 100644 (file)
@@ -1,3 +1,17 @@
+2011-02-09  Eric Seidel  <eric@webkit.org>
+
+        Reviewed by Adam Barth.
+
+        Make WebKit's fragment cannonicalization match other browsers
+        https://bugs.webkit.org/show_bug.cgi?id=53850
+
+        This doesn't make us match perfectly, but it brings us closer.
+
+        * platform/KURL.cpp:
+        (WebCore::appendEscapingBadChars):
+        (WebCore::escapeAndAppendFragment):
+        (WebCore::KURL::parse):
+
 2011-02-09  Hans Wennborg  <hans@chromium.org>
 
         Reviewed by Jeremy Orlow.
index 35a48e9..e2ae14c 100644 (file)
@@ -968,6 +968,14 @@ bool KURL::isLocalFile() const
     return protocolIs("file");
 }
 
+// Caution: This function does not bounds check.
+static void appendEscapedChar(char*& buffer, unsigned char c)
+{
+    *buffer++ = '%';
+    *buffer++ = hexDigits[c >> 4];
+    *buffer++ = hexDigits[c & 0xF];
+}
+
 static void appendEscapingBadChars(char*& buffer, const char* strStart, size_t length)
 {
     char* p = buffer;
@@ -977,16 +985,37 @@ static void appendEscapingBadChars(char*& buffer, const char* strStart, size_t l
     while (str < strEnd) {
         unsigned char c = *str++;
         if (isBadChar(c)) {
-            if (c == '%' || c == '?') {
+            if (c == '%' || c == '?')
                 *p++ = c;
-            } else if (c != 0x09 && c != 0x0a && c != 0x0d) {
-                *p++ = '%';
-                *p++ = hexDigits[c >> 4];
-                *p++ = hexDigits[c & 0xF];
-            }
-        } else {
+            else if (c != 0x09 && c != 0x0a && c != 0x0d)
+                appendEscapedChar(p, c);
+        } else
             *p++ = c;
+    }
+
+    buffer = p;
+}
+
+static void escapeAndAppendFragment(char*& buffer, const char* strStart, size_t length)
+{
+    char* p = buffer;
+
+    const char* str = strStart;
+    const char* strEnd = strStart + length;
+    while (str < strEnd) {
+        unsigned char c = *str++;
+        // Strip CR, LF and Tab from fragments, per:
+        // https://bugs.webkit.org/show_bug.cgi?id=8770
+        if (c == 0x09 || c == 0x0a || c == 0x0d)
+            continue;
+
+        // Chrome and IE allow non-ascii characters in fragments, however doing
+        // so would hit an ASSERT in checkEncodedString, so for now we don't.
+        if (c < 0x20 || c >= 127) {
+            appendEscapedChar(p, c);
+            continue;
         }
+        *p++ = c;
     }
 
     buffer = p;
@@ -1350,7 +1379,7 @@ void KURL::parse(const char* url, const String* originalString)
     // add fragment, escaping bad characters
     if (fragmentEnd != queryEnd) {
         *p++ = '#';
-        appendEscapingBadChars(p, url + fragmentStart, fragmentEnd - fragmentStart);
+        escapeAndAppendFragment(p, url + fragmentStart, fragmentEnd - fragmentStart);
     }
     m_fragmentEnd = p - buffer.data();
 
@@ -1416,11 +1445,9 @@ String encodeWithURLEscapeSequences(const String& notEncodedString)
     const char* strEnd = str + asUTF8.length();
     while (str < strEnd) {
         unsigned char c = *str++;
-        if (isBadChar(c)) {
-            *p++ = '%';
-            *p++ = hexDigits[c >> 4];
-            *p++ = hexDigits[c & 0xF];
-        } else
+        if (isBadChar(c))
+            appendEscapedChar(p, c);
+        else
             *p++ = c;
     }