Yensign hack should work with Shift_JIS and ISO-2022-JP encodings.
authortkent@chromium.org <tkent@chromium.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 9 Dec 2010 00:54:23 +0000 (00:54 +0000)
committertkent@chromium.org <tkent@chromium.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 9 Dec 2010 00:54:23 +0000 (00:54 +0000)
https://bugs.webkit.org/show_bug.cgi?id=49714

Reviewed by Alexey Proskuryakov.

WebCore:

IE chooses a font which shows a yensign for 0x5c code point for a page
encoded in x-mac-japanese, ISO-2022-JP, EUC-JP, Shift_JIS, Shift_JIS_X0213-2000,
x-sjis, and Windows-31J.
We have emulated this behavior by replacing 0x5c with 0xa5 for EUC-JP and
Shift_JIS_X0213-2000. This change adds other encodings above.

Also, we move the HashSet initialization for isJapanese() and
backslashAsCurrencySymbol() to TextEncodingRegistry.cpp because of
ease of making them multi-thread safe.

* platform/text/TextEncoding.cpp:
(WebCore::TextEncoding::isJapanese): Just calls isJapaneseEncoding().
(WebCore::TextEncoding::backslashAsCurrencySymbol): Uses shouldShowBackslashAsCurrencySymbolIn().
* platform/text/TextEncodingRegistry.cpp:
(WebCore::addEncodingName): Moved from TextEncoding.cpp, and stop using atomicCanonicalTextEncodingName().
(WebCore::buildQuirksSets): Added. Initializes HashSets for isJapaneseEncoding() and shouldShowBackslashAsCurrencySymbolIn().
(WebCore::isJapaneseEncoding):
(WebCore::shouldShowBackslashAsCurrencySymbolIn):
(WebCore::extendTextCodecMaps): Add a call to buildQuirksSets().
* platform/text/TextEncodingRegistry.h:

LayoutTests:

Use Shift_JIS instead of Shift_JIS_X0213-2000 because Shift_JIS_X0213-2000
encoding is available only on Mac.
Add a test for ISO-2022-JP.

* editing/selection/find-yensign-and-backslash-expected.txt:
* editing/selection/find-yensign-and-backslash.html:
* platform/chromium/test_expectations.txt:

git-svn-id: http://svn.webkit.org/repository/webkit/trunk@73566 268f45cc-cd09-0410-ab3c-d52691b4dbfc

LayoutTests/ChangeLog
LayoutTests/editing/selection/find-yensign-and-backslash-expected.txt
LayoutTests/editing/selection/find-yensign-and-backslash.html
LayoutTests/platform/chromium/test_expectations.txt
WebCore/ChangeLog
WebCore/platform/text/TextEncoding.cpp
WebCore/platform/text/TextEncodingRegistry.cpp
WebCore/platform/text/TextEncodingRegistry.h

index 25cd880..f7a2f68 100644 (file)
@@ -1,3 +1,18 @@
+2010-12-08  Kent Tamura  <tkent@chromium.org>
+
+        Reviewed by Alexey Proskuryakov.
+
+        Yensign hack should work with Shift_JIS and ISO-2022-JP encodings.
+        https://bugs.webkit.org/show_bug.cgi?id=49714
+
+        Use Shift_JIS instead of Shift_JIS_X0213-2000 because Shift_JIS_X0213-2000
+        encoding is available only on Mac.
+        Add a test for ISO-2022-JP.
+
+        * editing/selection/find-yensign-and-backslash-expected.txt:
+        * editing/selection/find-yensign-and-backslash.html:
+        * platform/chromium/test_expectations.txt:
+
 2010-12-08  Andy Estes  <aestes@apple.com>
 
         Reviewed by Darin Adler.
index baa0493..6f0ee93 100644 (file)
@@ -1,11 +1,13 @@
 \-in-body
-  
+   
 Results
 
 We can find a backslash in EUC-JP page by finding a yen sign: PASS
 We can find a backslash in EUC-JP text control by finding a yen sign: PASS
-We can find a backslash in Shift_JIS_X0213-2000 page by finding a yen sign: PASS
-We can find a backslash in Shift_JIS_X0213-2000 text control by finding a yen sign: PASS
+We can find a backslash in Shift_JIS page by finding a yen sign: PASS
+We can find a backslash in Shift_JIS text control by finding a yen sign: PASS
+We can find a backslash in ISO-2022-JP page by finding a yen sign: PASS
+We can find a backslash in ISO-2022-JP text control by finding a yen sign: PASS
 We can NOT find a backslash in UTF8 page by finding a yen sign: PASS
 We can NOT find a backslash in UTF8 text control by finding a yen sign: PASS
 
index 1b748a1..216a3b6 100644 (file)
@@ -1,6 +1,5 @@
 <!DOCTYPE html>
 <html>
-
 <head>
 <meta charset="UTF-8">
 <script>
@@ -22,7 +21,7 @@ function shouldBeTrue(condition, testName)
 function test()
 {
     // With these encodings, backslashes will be transcoded into yen signs.
-    var encodings = ["EUC-JP", "Shift_JIS_X0213-2000"];
+    var encodings = ["EUC-JP", "Shift_JIS", "ISO-2022-JP"];
     for (var i = 0; i < encodings.length; i++) {
         var encoding = encodings[i];
         var frameDocument = frames[i].document;
@@ -36,19 +35,17 @@ function test()
 
 </script>
 </head>
-
 <body onload="test()">
 
 <div>\-in-body</div>
 <input value=\-in-input>
 <iframe src="data:text/html;charset=EUC-JP,<body>\-in-body<input value=\-in-input></body>"></iframe>
-<iframe src="data:text/html;charset=Shift_JIS_X0213-2000,<body>\-in-body<input value=\-in-input></body>"></iframe>
+<iframe src="data:text/html;charset=Shift_JIS,<body>\-in-body<input value=\-in-input></body>"></iframe>
+<iframe src="data:text/html;charset=ISO-2022-JP,<body>\-in-body<input value=\-in-input></body>"></iframe>
 
 <p>Results</p>
 
 <p id="results">
 </p>
-
 </body>
-
 </html>
index c3c3b11..71a2dee 100644 (file)
@@ -676,7 +676,6 @@ BUG28916 MAC : editing/pasteboard/paste-xml.xhtml = TEXT
 // Flaky
 BUG31803 MAC LINUX : editing/inserting/12882.html = IMAGE PASS
 
-BUG38653 MAC : editing/selection/find-yensign-and-backslash.html = TEXT
 BUGWK45438 : editing/spelling/spelling-backspace-between-lines.html = TEXT
 
 // Tests added in r69269.
index de7f771..b47ccdf 100644 (file)
@@ -1,3 +1,31 @@
+2010-12-08  Kent Tamura  <tkent@chromium.org>
+
+        Reviewed by Alexey Proskuryakov.
+
+        Yensign hack should work with Shift_JIS and ISO-2022-JP encodings.
+        https://bugs.webkit.org/show_bug.cgi?id=49714
+
+        IE chooses a font which shows a yensign for 0x5c code point for a page
+        encoded in x-mac-japanese, ISO-2022-JP, EUC-JP, Shift_JIS, Shift_JIS_X0213-2000,
+        x-sjis, and Windows-31J.
+        We have emulated this behavior by replacing 0x5c with 0xa5 for EUC-JP and
+        Shift_JIS_X0213-2000. This change adds other encodings above.
+
+        Also, we move the HashSet initialization for isJapanese() and
+        backslashAsCurrencySymbol() to TextEncodingRegistry.cpp because of
+        ease of making them multi-thread safe.
+
+        * platform/text/TextEncoding.cpp:
+        (WebCore::TextEncoding::isJapanese): Just calls isJapaneseEncoding().
+        (WebCore::TextEncoding::backslashAsCurrencySymbol): Uses shouldShowBackslashAsCurrencySymbolIn().
+        * platform/text/TextEncodingRegistry.cpp:
+        (WebCore::addEncodingName): Moved from TextEncoding.cpp, and stop using atomicCanonicalTextEncodingName().
+        (WebCore::buildQuirksSets): Added. Initializes HashSets for isJapaneseEncoding() and shouldShowBackslashAsCurrencySymbolIn().
+        (WebCore::isJapaneseEncoding):
+        (WebCore::shouldShowBackslashAsCurrencySymbolIn):
+        (WebCore::extendTextCodecMaps): Add a call to buildQuirksSets().
+        * platform/text/TextEncodingRegistry.h:
+
 2010-12-08  Andy Estes  <aestes@apple.com>
 
         Reviewed by Darin Adler.
index 58e691f..33313a0 100644 (file)
 #include "GOwnPtr.h"
 #endif
 #include <wtf/text/CString.h>
-#include <wtf/HashSet.h>
 #include <wtf/OwnPtr.h>
 #include <wtf/StdLibExtras.h>
 
 namespace WebCore {
 
-static void addEncodingName(HashSet<const char*>& set, const char* name)
-{
-    const char* atomicName = atomicCanonicalTextEncodingName(name);
-    if (atomicName)
-        set.add(atomicName);
-}
-
 static const TextEncoding& UTF7Encoding()
 {
     static TextEncoding globalUTF7Encoding("UTF-7");
@@ -173,39 +165,12 @@ bool TextEncoding::usesVisualOrdering() const
 
 bool TextEncoding::isJapanese() const
 {
-    if (noExtendedTextEncodingNameUsed())
-        return false;
-
-    DEFINE_STATIC_LOCAL(HashSet<const char*>, set, ());
-    if (set.isEmpty()) {
-        addEncodingName(set, "x-mac-japanese");
-        addEncodingName(set, "cp932");
-        addEncodingName(set, "JIS_X0201");
-        addEncodingName(set, "JIS_X0208-1983");
-        addEncodingName(set, "JIS_X0208-1990");
-        addEncodingName(set, "JIS_X0212-1990");
-        addEncodingName(set, "JIS_C6226-1978");
-        addEncodingName(set, "Shift_JIS_X0213-2000");
-        addEncodingName(set, "ISO-2022-JP");
-        addEncodingName(set, "ISO-2022-JP-2");
-        addEncodingName(set, "ISO-2022-JP-1");
-        addEncodingName(set, "ISO-2022-JP-3");
-        addEncodingName(set, "EUC-JP");
-        addEncodingName(set, "Shift_JIS");
-    }
-    return m_name && set.contains(m_name);
+    return isJapaneseEncoding(m_name);
 }
 
 UChar TextEncoding::backslashAsCurrencySymbol() const
 {
-    if (noExtendedTextEncodingNameUsed())
-        return '\\';
-
-    // The text encodings below treat backslash as a currency symbol.
-    // See http://blogs.msdn.com/michkap/archive/2005/09/17/469941.aspx for more information.
-    static const char* const a = atomicCanonicalTextEncodingName("Shift_JIS_X0213-2000");
-    static const char* const b = atomicCanonicalTextEncodingName("EUC-JP");
-    return (m_name == a || m_name == b) ? 0x00A5 : '\\';
+    return shouldShowBackslashAsCurrencySymbolIn(m_name) ? 0x00A5 : '\\';
 }
 
 bool TextEncoding::isNonByteBasedEncoding() const
index 6bf5552..c0c0255 100644 (file)
@@ -36,6 +36,7 @@
 #include <wtf/Assertions.h>
 #include <wtf/HashFunctions.h>
 #include <wtf/HashMap.h>
+#include <wtf/HashSet.h>
 #include <wtf/StdLibExtras.h>
 #include <wtf/StringExtras.h>
 #include <wtf/Threading.h>
@@ -125,6 +126,8 @@ static Mutex& encodingRegistryMutex()
 static TextEncodingNameMap* textEncodingNameMap;
 static TextCodecMap* textCodecMap;
 static bool didExtendTextCodecMaps;
+static HashSet<const char*>* japaneseEncodings;
+static HashSet<const char*>* nonBackslashEncodings;
 
 static const char* const textEncodingNameBlacklist[] = {
     "UTF-7"
@@ -249,6 +252,59 @@ static void buildBaseTextCodecMaps()
 #endif
 }
 
+static void addEncodingName(HashSet<const char*>* set, const char* name)
+{
+    // We must not use atomicCanonicalTextEncodingName() because this function is called in it.
+    const char* atomicName = textEncodingNameMap->get(name);
+    if (atomicName)
+        set->add(atomicName);
+}
+
+static void buildQuirksSets()
+{
+    // FIXME: Having isJapaneseEncoding() and shouldShowBackslashAsCurrencySymbolIn()
+    // and initializing the sets for them in TextEncodingRegistry.cpp look strange.
+
+    ASSERT(!japaneseEncodings);
+    ASSERT(!nonBackslashEncodings);
+
+    japaneseEncodings = new HashSet<const char*>();
+    addEncodingName(japaneseEncodings, "EUC-JP");
+    addEncodingName(japaneseEncodings, "ISO-2022-JP");
+    addEncodingName(japaneseEncodings, "ISO-2022-JP-1");
+    addEncodingName(japaneseEncodings, "ISO-2022-JP-2");
+    addEncodingName(japaneseEncodings, "ISO-2022-JP-3");
+    addEncodingName(japaneseEncodings, "JIS_C6226-1978");
+    addEncodingName(japaneseEncodings, "JIS_X0201");
+    addEncodingName(japaneseEncodings, "JIS_X0208-1983");
+    addEncodingName(japaneseEncodings, "JIS_X0208-1990");
+    addEncodingName(japaneseEncodings, "JIS_X0212-1990");
+    addEncodingName(japaneseEncodings, "Shift_JIS");
+    addEncodingName(japaneseEncodings, "Shift_JIS_X0213-2000");
+    addEncodingName(japaneseEncodings, "cp932");
+    addEncodingName(japaneseEncodings, "x-mac-japanese");
+
+    nonBackslashEncodings = new HashSet<const char*>();
+    // The text encodings below treat backslash as a currency symbol for IE compatibility.
+    // See http://blogs.msdn.com/michkap/archive/2005/09/17/469941.aspx for more information.
+    addEncodingName(nonBackslashEncodings, "x-mac-japanese");
+    addEncodingName(nonBackslashEncodings, "ISO-2022-JP");
+    addEncodingName(nonBackslashEncodings, "EUC-JP");
+    // Shift_JIS_X0213-2000 is not the same encoding as Shift_JIS on Mac. We need to register both of them.
+    addEncodingName(nonBackslashEncodings, "Shift_JIS");
+    addEncodingName(nonBackslashEncodings, "Shift_JIS_X0213-2000");
+}
+
+bool isJapaneseEncoding(const char* canonicalEncodingName)
+{
+    return canonicalEncodingName && japaneseEncodings && japaneseEncodings->contains(canonicalEncodingName);
+}
+
+bool shouldShowBackslashAsCurrencySymbolIn(const char* canonicalEncodingName)
+{
+    return canonicalEncodingName && nonBackslashEncodings && nonBackslashEncodings->contains(canonicalEncodingName);
+}
+
 static void extendTextCodecMaps()
 {
 #if USE(ICU_UNICODE)
@@ -277,6 +333,7 @@ static void extendTextCodecMaps()
 #endif
 
     pruneBlacklistedCodecs();
+    buildQuirksSets();
 }
 
 PassOwnPtr<TextCodec> newTextCodec(const TextEncoding& encoding)
index 81b7c4c..16844c6 100644 (file)
@@ -39,12 +39,12 @@ namespace WebCore {
     // Use TextEncoding::encode to encode, since it takes care of normalization.
     PassOwnPtr<TextCodec> newTextCodec(const TextEncoding&);
 
-    // Only TextEncoding should use this function directly.
+    // Only TextEncoding should use the following functions directly.
     const char* atomicCanonicalTextEncodingName(const char* alias);
     const char* atomicCanonicalTextEncodingName(const UChar* aliasCharacters, size_t aliasLength);
-
-    // Only TextEncoding should use this function directly.
     bool noExtendedTextEncodingNameUsed();
+    bool isJapaneseEncoding(const char* canonicalEncodingName);
+    bool shouldShowBackslashAsCurrencySymbolIn(const char* canonicalEncodingName);
 
 #ifndef NDEBUG
     void dumpTextEncodingNameMap();