XMLHttpRequest has the wrong fallback encoding
authorcommit-queue@webkit.org <commit-queue@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 17 Apr 2019 10:11:12 +0000 (10:11 +0000)
committercommit-queue@webkit.org <commit-queue@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Wed, 17 Apr 2019 10:11:12 +0000 (10:11 +0000)
https://bugs.webkit.org/show_bug.cgi?id=191741

Patch by Rob Buis <rbuis@igalia.com> on 2019-04-17
Reviewed by Alex Christensen.

LayoutTests/imported/w3c:

Update improved test expectations and sync unsupported-encodings.any.js and
replacement-encodings.any.js tests.

* web-platform-tests/encoding/replacement-encodings.any-expected.txt: Copied from LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings-expected.txt.
* web-platform-tests/encoding/replacement-encodings.any.html: Added.
* web-platform-tests/encoding/replacement-encodings.any.js: Renamed from LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings.html.
* web-platform-tests/encoding/replacement-encodings.any.worker-expected.txt: Renamed from LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings-expected.txt.
* web-platform-tests/encoding/replacement-encodings.any.worker.html: Added.
* web-platform-tests/encoding/unsupported-encodings-expected.txt: Removed.
* web-platform-tests/encoding/unsupported-encodings.any-expected.txt: Added.
* web-platform-tests/encoding/unsupported-encodings.any.html: Added.
* web-platform-tests/encoding/unsupported-encodings.any.js: Added.
* web-platform-tests/encoding/unsupported-encodings.any.worker-expected.txt: Added.
* web-platform-tests/encoding/unsupported-encodings.any.worker.html: Added.
* web-platform-tests/encoding/unsupported-encodings.html: Removed.
* web-platform-tests/xhr/overridemimetype-edge-cases.window-expected.txt:
* web-platform-tests/xhr/responsetext-decoding-expected.txt:

Source/WebCore:

Allow overriding the response charset as specified here:
https://xhr.spec.whatwg.org/#final-charset

Behavior matches Firefox and Chrome.

Tests: imported/w3c/web-platform-tests/encoding/replacement-encodings.any.html
       imported/w3c/web-platform-tests/encoding/replacement-encodings.any.worker.html
       imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.html
       imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.worker.html

* xml/XMLHttpRequest.cpp:
(WebCore::XMLHttpRequest::finalResponseCharset const):
(WebCore::XMLHttpRequest::createDecoder const):
* xml/XMLHttpRequest.h:

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@244377 268f45cc-cd09-0410-ab3c-d52691b4dbfc

18 files changed:
LayoutTests/imported/w3c/ChangeLog
LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings.any-expected.txt [moved from LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings-expected.txt with 97% similarity]
LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings.any.html [new file with mode: 0644]
LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings.any.js [moved from LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings.html with 66% similarity]
LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings.any.worker-expected.txt [new file with mode: 0644]
LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings.any.worker.html [new file with mode: 0644]
LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings-expected.txt [deleted file]
LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.any-expected.txt [new file with mode: 0644]
LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.html [new file with mode: 0644]
LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.js [new file with mode: 0644]
LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.worker-expected.txt [new file with mode: 0644]
LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.worker.html [new file with mode: 0644]
LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.html [deleted file]
LayoutTests/imported/w3c/web-platform-tests/xhr/overridemimetype-edge-cases.window-expected.txt
LayoutTests/imported/w3c/web-platform-tests/xhr/responsetext-decoding-expected.txt
Source/WebCore/ChangeLog
Source/WebCore/xml/XMLHttpRequest.cpp
Source/WebCore/xml/XMLHttpRequest.h

index 7306279..9745a4e 100644 (file)
@@ -1,3 +1,28 @@
+2019-04-17  Rob Buis  <rbuis@igalia.com>
+
+        XMLHttpRequest has the wrong fallback encoding
+        https://bugs.webkit.org/show_bug.cgi?id=191741
+
+        Reviewed by Alex Christensen.
+
+        Update improved test expectations and sync unsupported-encodings.any.js and
+        replacement-encodings.any.js tests.
+
+        * web-platform-tests/encoding/replacement-encodings.any-expected.txt: Copied from LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings-expected.txt.
+        * web-platform-tests/encoding/replacement-encodings.any.html: Added.
+        * web-platform-tests/encoding/replacement-encodings.any.js: Renamed from LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings.html.
+        * web-platform-tests/encoding/replacement-encodings.any.worker-expected.txt: Renamed from LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings-expected.txt.
+        * web-platform-tests/encoding/replacement-encodings.any.worker.html: Added.
+        * web-platform-tests/encoding/unsupported-encodings-expected.txt: Removed.
+        * web-platform-tests/encoding/unsupported-encodings.any-expected.txt: Added.
+        * web-platform-tests/encoding/unsupported-encodings.any.html: Added.
+        * web-platform-tests/encoding/unsupported-encodings.any.js: Added.
+        * web-platform-tests/encoding/unsupported-encodings.any.worker-expected.txt: Added.
+        * web-platform-tests/encoding/unsupported-encodings.any.worker.html: Added.
+        * web-platform-tests/encoding/unsupported-encodings.html: Removed.
+        * web-platform-tests/xhr/overridemimetype-edge-cases.window-expected.txt:
+        * web-platform-tests/xhr/responsetext-decoding-expected.txt:
+
 2019-04-17  Cathie Chen  <cathiechen@igalia.com>
 
         Update the test result of resize-observer/eventloop.html.
@@ -9,6 +9,6 @@ PASS iso-2022-cn-ext - non-empty input decodes to one replacement character.
 PASS iso-2022-cn-ext - empty input decodes to empty output. 
 PASS iso-2022-kr - non-empty input decodes to one replacement character. 
 PASS iso-2022-kr - empty input decodes to empty output. 
-FAIL replacement - non-empty input decodes to one replacement character. assert_equals: Decoding with replacement expected "U+FFFD" but got "U+0041/U+0042/U+0043/U+0061/U+0062/U+0063/U+0031/U+0032/U+0033/U+00A0"
+FAIL replacement - non-empty input decodes to one replacement character. assert_equals: Decoding with replacement expected "U+FFFD" but got "U+0041/U+0042/U+0043/U+0061/U+0062/U+0063/U+0031/U+0032/U+0033/U+FFFD"
 PASS replacement - empty input decodes to empty output. 
 
diff --git a/LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings.any.html b/LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings.any.html
new file mode 100644 (file)
index 0000000..2382913
--- /dev/null
@@ -0,0 +1 @@
+<!-- This file is required for WebKit test infrastructure to run the templated test -->
\ No newline at end of file
@@ -1,10 +1,6 @@
-<!DOCTYPE html>
-<title>Encoding API: replacement encoding</title>
-<script src="/resources/testharness.js"></script>
-<script src="/resources/testharnessreport.js"></script>
-<script src="resources/encodings.js"></script>
-<script src="resources/decoding-helpers.js"></script>
-<script>
+// META: title=Encoding API: replacement encoding
+// META: script=resources/encodings.js
+// META: script=resources/decoding-helpers.js
 
 const replacement_labels = [];
 encodings_table.forEach(section => {
@@ -27,4 +23,3 @@ replacement_labels.forEach(label => {
     '',
     '', `${label} - empty input decodes to empty output.`);
 });
-</script>
diff --git a/LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings.any.worker-expected.txt b/LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings.any.worker-expected.txt
new file mode 100644 (file)
index 0000000..e61eaff
--- /dev/null
@@ -0,0 +1,14 @@
+
+PASS csiso2022kr - non-empty input decodes to one replacement character. 
+PASS csiso2022kr - empty input decodes to empty output. 
+PASS hz-gb-2312 - non-empty input decodes to one replacement character. 
+PASS hz-gb-2312 - empty input decodes to empty output. 
+PASS iso-2022-cn - non-empty input decodes to one replacement character. 
+PASS iso-2022-cn - empty input decodes to empty output. 
+PASS iso-2022-cn-ext - non-empty input decodes to one replacement character. 
+PASS iso-2022-cn-ext - empty input decodes to empty output. 
+PASS iso-2022-kr - non-empty input decodes to one replacement character. 
+PASS iso-2022-kr - empty input decodes to empty output. 
+FAIL replacement - non-empty input decodes to one replacement character. assert_equals: Decoding with replacement expected "U+FFFD" but got "U+0041/U+0042/U+0043/U+0061/U+0062/U+0063/U+0031/U+0032/U+0033/U+FFFD"
+PASS replacement - empty input decodes to empty output. 
+
diff --git a/LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings.any.worker.html b/LayoutTests/imported/w3c/web-platform-tests/encoding/replacement-encodings.any.worker.html
new file mode 100644 (file)
index 0000000..2382913
--- /dev/null
@@ -0,0 +1 @@
+<!-- This file is required for WebKit test infrastructure to run the templated test -->
\ No newline at end of file
diff --git a/LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings-expected.txt b/LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings-expected.txt
deleted file mode 100644 (file)
index f497009..0000000
+++ /dev/null
@@ -1,16 +0,0 @@
-
-PASS UTF-7 should not be supported 
-PASS utf-7 should not be supported 
-PASS UTF-32 with BOM should decode as UTF-16LE 
-PASS UTF-32 with no BOM should decode as windows-1252 
-PASS utf-32 with BOM should decode as UTF-16LE 
-PASS utf-32 with no BOM should decode as windows-1252 
-PASS UTF-32LE with BOM should decode as UTF-16LE 
-PASS UTF-32LE with no BOM should decode as windows-1252 
-PASS utf-32le with BOM should decode as UTF-16LE 
-PASS utf-32le with no BOM should decode as windows-1252 
-PASS UTF-32be with no BOM should decode as windows-1252 
-PASS UTF-32be with BOM should decode as windows-1252 
-PASS utf-32be with no BOM should decode as windows-1252 
-PASS utf-32be with BOM should decode as windows-1252 
-
diff --git a/LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.any-expected.txt b/LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.any-expected.txt
new file mode 100644 (file)
index 0000000..988554b
--- /dev/null
@@ -0,0 +1,16 @@
+
+PASS UTF-7 should not be supported 
+PASS utf-7 should not be supported 
+PASS UTF-32 with BOM should decode as UTF-16LE 
+PASS UTF-32 with no BOM should decode as UTF-8 
+PASS utf-32 with BOM should decode as UTF-16LE 
+PASS utf-32 with no BOM should decode as UTF-8 
+PASS UTF-32LE with BOM should decode as UTF-16LE 
+PASS UTF-32LE with no BOM should decode as UTF-8 
+PASS utf-32le with BOM should decode as UTF-16LE 
+PASS utf-32le with no BOM should decode as UTF-8 
+PASS UTF-32be with no BOM should decode as UTF-8 
+PASS UTF-32be with BOM should decode as UTF-8 
+PASS utf-32be with no BOM should decode as UTF-8 
+PASS utf-32be with BOM should decode as UTF-8 
+
diff --git a/LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.html b/LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.html
new file mode 100644 (file)
index 0000000..2382913
--- /dev/null
@@ -0,0 +1 @@
+<!-- This file is required for WebKit test infrastructure to run the templated test -->
\ No newline at end of file
diff --git a/LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.js b/LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.js
new file mode 100644 (file)
index 0000000..50c9d07
--- /dev/null
@@ -0,0 +1,32 @@
+// META: title=Encoding API: unsupported encodings
+// META: script=resources/decoding-helpers.js
+
+// Attempting to decode '<' as UTF-7 (+AD4) ends up as '+AD4'.
+['UTF-7', 'utf-7'].forEach(label => {
+  decode_test(label, '+AD4', 'U+002B/U+0041/U+0044/U+0034',
+              `${label} should not be supported`);
+});
+
+// UTF-32 will be detected as UTF-16LE if leading BOM, or UTF-8 otherwise (due to XMLHttpRequest).
+['UTF-32', 'utf-32', 'UTF-32LE', 'utf-32le'].forEach(label => {
+  decode_test(label,
+              '%FF%FE%00%00%41%00%00%00%42%00%00%00',
+              'U+0000/U+0041/U+0000/U+0042/U+0000',
+              `${label} with BOM should decode as UTF-16LE`);
+
+  decode_test(label,
+              '%41%00%00%00%42%00%00%C2%80',
+              'U+0041/U+0000/U+0000/U+0000/U+0042/U+0000/U+0000/U+0080',
+              `${label} with no BOM should decode as UTF-8`);;
+});
+['UTF-32be', 'utf-32be'].forEach(label => {
+  decode_test(label,
+            '%00%00%00%41%00%00%00%42%C2%80',
+            'U+0000/U+0000/U+0000/U+0041/U+0000/U+0000/U+0000/U+0042/U+0080',
+            `${label} with no BOM should decode as UTF-8`);
+
+  decode_test(label,
+              '%00%00%FE%FF%00%00%00%41%00%C2%80%42',
+              'U+0000/U+0000/U+FFFD/U+FFFD/U+0000/U+0000/U+0000/U+0041/U+0000/U+0080/U+0042',
+              `${label} with BOM should decode as UTF-8`);
+});
diff --git a/LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.worker-expected.txt b/LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.worker-expected.txt
new file mode 100644 (file)
index 0000000..988554b
--- /dev/null
@@ -0,0 +1,16 @@
+
+PASS UTF-7 should not be supported 
+PASS utf-7 should not be supported 
+PASS UTF-32 with BOM should decode as UTF-16LE 
+PASS UTF-32 with no BOM should decode as UTF-8 
+PASS utf-32 with BOM should decode as UTF-16LE 
+PASS utf-32 with no BOM should decode as UTF-8 
+PASS UTF-32LE with BOM should decode as UTF-16LE 
+PASS UTF-32LE with no BOM should decode as UTF-8 
+PASS utf-32le with BOM should decode as UTF-16LE 
+PASS utf-32le with no BOM should decode as UTF-8 
+PASS UTF-32be with no BOM should decode as UTF-8 
+PASS UTF-32be with BOM should decode as UTF-8 
+PASS utf-32be with no BOM should decode as UTF-8 
+PASS utf-32be with BOM should decode as UTF-8 
+
diff --git a/LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.worker.html b/LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.worker.html
new file mode 100644 (file)
index 0000000..2382913
--- /dev/null
@@ -0,0 +1 @@
+<!-- This file is required for WebKit test infrastructure to run the templated test -->
\ No newline at end of file
diff --git a/LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.html b/LayoutTests/imported/w3c/web-platform-tests/encoding/unsupported-encodings.html
deleted file mode 100644 (file)
index 7584d3d..0000000
+++ /dev/null
@@ -1,37 +0,0 @@
-<!DOCTYPE html>
-<title>Encoding API: unsupported encodings</title>
-<script src="/resources/testharness.js"></script>
-<script src="/resources/testharnessreport.js"></script>
-<script src="resources/decoding-helpers.js"></script>
-<script>
-
-// Attempting to decode '<' as UTF-7 (+AD4) ends up as '+AD4'.
-['UTF-7', 'utf-7'].forEach(label => {
-  decode_test(label, '+AD4', 'U+002B/U+0041/U+0044/U+0034',
-              `${label} should not be supported`);
-});
-
-// UTF-32 will be detected as UTF-16LE if leading BOM, or windows-1252 otherwise.
-['UTF-32', 'utf-32', 'UTF-32LE', 'utf-32le'].forEach(label => {
-  decode_test(label,
-              '%FF%FE%00%00%41%00%00%00%42%00%00%00',
-              'U+0000/U+0041/U+0000/U+0042/U+0000',
-              `${label} with BOM should decode as UTF-16LE`);
-
-  decode_test(label,
-              '%41%00%00%00%42%00%00%00',
-              'U+0041/U+0000/U+0000/U+0000/U+0042/U+0000/U+0000/U+0000',
-              `${label} with no BOM should decode as windows-1252`);;
-});
-['UTF-32be', 'utf-32be'].forEach(label => {
-  decode_test(label,
-            '%00%00%00%41%00%00%00%42',
-            'U+0000/U+0000/U+0000/U+0041/U+0000/U+0000/U+0000/U+0042',
-            `${label} with no BOM should decode as windows-1252`);
-
-  decode_test(label,
-              '%00%00%FE%FF%00%00%00%41%00%00%00%42',
-              'U+0000/U+0000/U+00FE/U+00FF/U+0000/U+0000/U+0000/U+0041/U+0000/U+0000/U+0000/U+0042',
-              `${label} with BOM should decode as windows-1252`);
-});
-</script>
index 543b332..85b7fff 100644 (file)
@@ -2,5 +2,5 @@
 PASS overrideMimeType() is not reset by open(), basic 
 PASS overrideMimeType() is not reset by open() 
 PASS If charset is not overridden by overrideMimeType() the original continues to be used 
-FAIL Charset can be overridden by overrideMimeType() with a bogus charset assert_equals: expected "\ufffd\ufffd" but got "Âð"
+PASS Charset can be overridden by overrideMimeType() with a bogus charset 
 
index e1fe16a..acd169c 100644 (file)
@@ -30,7 +30,7 @@ PASS XMLHttpRequest: responseText decoding (text/plain %FE%FF%FE%FF  text)
 PASS XMLHttpRequest: responseText decoding (text/plain %EF%BB%BF  text) 
 PASS XMLHttpRequest: responseText decoding (text/plain %EF%BB%BF%EF%BB%BF  text) 
 PASS XMLHttpRequest: responseText decoding (text/plain %C2  text) 
-FAIL XMLHttpRequest: responseText decoding (text/plain;charset=bogus %C2  text) assert_equals: expected "\ufffd" but got "Â"
+PASS XMLHttpRequest: responseText decoding (text/plain;charset=bogus %C2  text) 
 PASS XMLHttpRequest: responseText decoding (text/xml %FE%FF  text) 
 PASS XMLHttpRequest: responseText decoding (text/xml %FE%FF%FE%FF  text) 
 PASS XMLHttpRequest: responseText decoding (text/xml %EF%BB%BF  text) 
index 2776867..04e4bdb 100644 (file)
@@ -1,3 +1,25 @@
+2019-04-17  Rob Buis  <rbuis@igalia.com>
+
+        XMLHttpRequest has the wrong fallback encoding
+        https://bugs.webkit.org/show_bug.cgi?id=191741
+
+        Reviewed by Alex Christensen.
+
+        Allow overriding the response charset as specified here:
+        https://xhr.spec.whatwg.org/#final-charset
+
+        Behavior matches Firefox and Chrome.
+
+        Tests: imported/w3c/web-platform-tests/encoding/replacement-encodings.any.html
+               imported/w3c/web-platform-tests/encoding/replacement-encodings.any.worker.html
+               imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.html
+               imported/w3c/web-platform-tests/encoding/unsupported-encodings.any.worker.html
+
+        * xml/XMLHttpRequest.cpp:
+        (WebCore::XMLHttpRequest::finalResponseCharset const):
+        (WebCore::XMLHttpRequest::createDecoder const):
+        * xml/XMLHttpRequest.h:
+
 2019-04-16  Antoine Quint  <graouts@apple.com>
 
         Opt Google Maps into simulated mouse events dispatch quirk
index 38902a3..f4b97a9 100644 (file)
@@ -981,10 +981,23 @@ static inline bool shouldDecodeResponse(XMLHttpRequest::ResponseType type)
     return true;
 }
 
+// https://xhr.spec.whatwg.org/#final-charset
+TextEncoding XMLHttpRequest::finalResponseCharset() const
+{
+    String label = m_responseEncoding;
+
+    String overrideResponseCharset = extractCharsetFromMediaType(m_mimeTypeOverride);
+    if (!overrideResponseCharset.isEmpty())
+        label = overrideResponseCharset;
+
+    return TextEncoding(label);
+}
+
 Ref<TextResourceDecoder> XMLHttpRequest::createDecoder() const
 {
-    if (!m_responseEncoding.isEmpty())
-        return TextResourceDecoder::create("text/plain", m_responseEncoding);
+    TextEncoding finalResponseCharset = this->finalResponseCharset();
+    if (finalResponseCharset.isValid())
+        return TextResourceDecoder::create("text/plain", finalResponseCharset);
 
     switch (responseType()) {
     case ResponseType::EmptyString:
index 2816457..9299f4c 100644 (file)
@@ -130,6 +130,8 @@ public:
 private:
     explicit XMLHttpRequest(ScriptExecutionContext&);
 
+    TextEncoding finalResponseCharset() const;
+
     // ActiveDOMObject
     void contextDestroyed() override;
     bool canSuspendForDocumentSuspension() const override;