HTMLElement::nodeName should not upper case non-ASCII characters
authorrniwa@webkit.org <rniwa@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Sat, 23 Jan 2016 02:04:41 +0000 (02:04 +0000)
committerrniwa@webkit.org <rniwa@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Sat, 23 Jan 2016 02:04:41 +0000 (02:04 +0000)
https://bugs.webkit.org/show_bug.cgi?id=153231

Reviewed by Darin Adler.

LayoutTests/imported/w3c:

Rebaselined the test now that all test cases pass.

* web-platform-tests/dom/nodes/Document-createElement-expected.txt:

Source/WebCore:

Use the newly added convertToASCIIUppercase to generate the string for tagName and nodeName.

Test: fast/dom/Element/tagName-must-be-ASCII-uppercase-in-HTML-document.html

* dom/QualifiedName.cpp:
(WebCore::QualifiedName::localNameUpper): Use convertToASCIIUppercase.
* html/HTMLElement.cpp:
(WebCore::HTMLElement::nodeName): Use convertToASCIIUppercase.

Source/WTF:

Added convertToASCIIUppercase to AtomicString, String, and StringImpl.

* wtf/text/AtomicString.cpp:
(WTF::AtomicString::convertASCIICase): Generalized from convertToASCIILowercase.
(WTF::AtomicString::convertToASCIILowercase):
(WTF::AtomicString::convertToASCIIUppercase):
* wtf/text/AtomicString.h:
* wtf/text/StringImpl.cpp:
(WTF::StringImpl::convertASCIICase): Generalized from convertToASCIILowercase.
(WTF::StringImpl::convertToASCIILowercase):
(WTF::StringImpl::convertToASCIIUppercase):
* wtf/text/StringImpl.h:
* wtf/text/WTFString.cpp:
(WTF::String::convertToASCIIUppercase): Added.
* wtf/text/WTFString.h:

LayoutTests:

Added a regression test since the rebaselined W3C test case is very simple and doesn't all permutations.

* fast/dom/Element/tagName-must-be-ASCII-uppercase-in-HTML-document-expected.txt: Added.
* fast/dom/Element/tagName-must-be-ASCII-uppercase-in-HTML-document.html: Added.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@195501 268f45cc-cd09-0410-ab3c-d52691b4dbfc

15 files changed:
LayoutTests/ChangeLog
LayoutTests/fast/dom/Element/tagName-must-be-ASCII-uppercase-in-HTML-document-expected.txt [new file with mode: 0644]
LayoutTests/fast/dom/Element/tagName-must-be-ASCII-uppercase-in-HTML-document.html [new file with mode: 0644]
LayoutTests/imported/w3c/ChangeLog
LayoutTests/imported/w3c/web-platform-tests/dom/nodes/Document-createElement-expected.txt
Source/WTF/ChangeLog
Source/WTF/wtf/text/AtomicString.cpp
Source/WTF/wtf/text/AtomicString.h
Source/WTF/wtf/text/StringImpl.cpp
Source/WTF/wtf/text/StringImpl.h
Source/WTF/wtf/text/WTFString.cpp
Source/WTF/wtf/text/WTFString.h
Source/WebCore/ChangeLog
Source/WebCore/dom/QualifiedName.cpp
Source/WebCore/html/HTMLElement.cpp

index b13f0f6..60d179f 100644 (file)
@@ -1,3 +1,15 @@
+2016-01-20  Ryosuke Niwa  <rniwa@webkit.org>
+
+        HTMLElement::nodeName should not upper case non-ASCII characters
+        https://bugs.webkit.org/show_bug.cgi?id=153231
+
+        Reviewed by Darin Adler.
+
+        Added a regression test since the rebaselined W3C test case is very simple and doesn't all permutations.
+
+        * fast/dom/Element/tagName-must-be-ASCII-uppercase-in-HTML-document-expected.txt: Added.
+        * fast/dom/Element/tagName-must-be-ASCII-uppercase-in-HTML-document.html: Added.
+
 2016-01-22  Brady Eidson  <beidson@apple.com>
 
         Modern IDB: Disable simultaneous transactions in the SQLite backend for now.
diff --git a/LayoutTests/fast/dom/Element/tagName-must-be-ASCII-uppercase-in-HTML-document-expected.txt b/LayoutTests/fast/dom/Element/tagName-must-be-ASCII-uppercase-in-HTML-document-expected.txt
new file mode 100644 (file)
index 0000000..75314a9
--- /dev/null
@@ -0,0 +1,28 @@
+Tests that tagName and nodeName uppercases ASCII and only ASCII letters in a HTML document.
+
+On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
+
+
+htmlDocument = document
+PASS htmlDocument.createElement("İnput").tagName is "İNPUT"
+PASS htmlDocument.createElement("ınput").tagName is "ıNPUT"
+PASS htmlDocument.createElement("xİnput").nodeName is "XİNPUT"
+PASS htmlDocument.createElement("xınput").nodeName is "XıNPUT"
+PASS htmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "x:İnput").tagName is "X:İNPUT"
+PASS htmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "xİ:ınput").tagName is "Xİ:ıNPUT"
+PASS htmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "x:İnput").nodeName is "X:İNPUT"
+PASS htmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "xı:İnput").nodeName is "Xı:İNPUT"
+
+xmlDocument = document.implementation.createDocument("http://www.w3.org/1999/xhtml", "html")
+PASS xmlDocument.createElement("İnput").tagName is "İnput"
+PASS xmlDocument.createElement("ınput").tagName is "ınput"
+PASS xmlDocument.createElement("xİnput").nodeName is "xİnput"
+PASS xmlDocument.createElement("xınput").nodeName is "xınput"
+PASS xmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "x:İnput").tagName is "x:İnput"
+PASS xmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "xİ:ınput").tagName is "xİ:ınput"
+PASS xmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "x:İnput").nodeName is "x:İnput"
+PASS xmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "xı:İnput").nodeName is "xı:İnput"
+PASS successfullyParsed is true
+
+TEST COMPLETE
+
diff --git a/LayoutTests/fast/dom/Element/tagName-must-be-ASCII-uppercase-in-HTML-document.html b/LayoutTests/fast/dom/Element/tagName-must-be-ASCII-uppercase-in-HTML-document.html
new file mode 100644 (file)
index 0000000..3fe8a3f
--- /dev/null
@@ -0,0 +1,35 @@
+<!DOCTYPE html>
+<html>
+<body>
+<script src="../../../resources/js-test-pre.js"></script>
+<script>
+
+description('Tests that tagName and nodeName uppercases ASCII and only ASCII letters in a HTML document.');
+
+evalAndLog('htmlDocument = document');
+shouldBeEqualToString('htmlDocument.createElement("\u0130nput").tagName', '\u0130NPUT');
+shouldBeEqualToString('htmlDocument.createElement("\u0131nput").tagName', '\u0131NPUT');
+shouldBeEqualToString('htmlDocument.createElement("x\u0130nput").nodeName', 'X\u0130NPUT');
+shouldBeEqualToString('htmlDocument.createElement("x\u0131nput").nodeName', 'X\u0131NPUT');
+
+shouldBeEqualToString('htmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "x:\u0130nput").tagName', 'X:\u0130NPUT');
+shouldBeEqualToString('htmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "x\u0130:\u0131nput").tagName', 'X\u0130:\u0131NPUT');
+shouldBeEqualToString('htmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "x:\u0130nput").nodeName', 'X:\u0130NPUT');
+shouldBeEqualToString('htmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "x\u0131:\u0130nput").nodeName', 'X\u0131:\u0130NPUT');
+
+debug('');
+evalAndLog('xmlDocument = document.implementation.createDocument("http://www.w3.org/1999/xhtml", "html")');
+shouldBeEqualToString('xmlDocument.createElement("\u0130nput").tagName', '\u0130nput');
+shouldBeEqualToString('xmlDocument.createElement("\u0131nput").tagName', '\u0131nput');
+shouldBeEqualToString('xmlDocument.createElement("x\u0130nput").nodeName', 'x\u0130nput');
+shouldBeEqualToString('xmlDocument.createElement("x\u0131nput").nodeName', 'x\u0131nput');
+
+shouldBeEqualToString('xmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "x:\u0130nput").tagName', 'x:\u0130nput');
+shouldBeEqualToString('xmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "x\u0130:\u0131nput").tagName', 'x\u0130:\u0131nput');
+shouldBeEqualToString('xmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "x:\u0130nput").nodeName', 'x:\u0130nput');
+shouldBeEqualToString('xmlDocument.createElementNS("http://www.w3.org/1999/xhtml", "x\u0131:\u0130nput").nodeName', 'x\u0131:\u0130nput');
+
+</script>
+<script src="../../../resources/js-test-post.js"></script>
+</body>
+</html>
index b5db11a..536eedf 100644 (file)
@@ -1,3 +1,14 @@
+2016-01-20  Ryosuke Niwa  <rniwa@webkit.org>
+
+        HTMLElement::nodeName should not upper case non-ASCII characters
+        https://bugs.webkit.org/show_bug.cgi?id=153231
+
+        Reviewed by Darin Adler.
+
+        Rebaselined the test now that all test cases pass.
+
+        * web-platform-tests/dom/nodes/Document-createElement-expected.txt:
+
 2016-01-22  Chris Dumez  <cdumez@apple.com>
 
         document.charset should be an alias for document.characterSet
index 13e368b..9982827 100644 (file)
@@ -21,7 +21,7 @@ PASS createElement("math")
 PASS createElement("FOO") 
 PASS createElement("marK") 
 PASS createElement("İnput") 
-FAIL createElement("ınput") assert_equals: expected "ıNPUT" but got "INPUT"
+PASS createElement("ınput") 
 PASS createElement("") 
 PASS createElement("1foo") 
 PASS createElement("̀foo") 
index 771e58b..fc28766 100644 (file)
@@ -1,3 +1,26 @@
+2016-01-20  Ryosuke Niwa  <rniwa@webkit.org>
+
+        HTMLElement::nodeName should not upper case non-ASCII characters
+        https://bugs.webkit.org/show_bug.cgi?id=153231
+
+        Reviewed by Darin Adler.
+
+        Added convertToASCIIUppercase to AtomicString, String, and StringImpl. 
+
+        * wtf/text/AtomicString.cpp:
+        (WTF::AtomicString::convertASCIICase): Generalized from convertToASCIILowercase.
+        (WTF::AtomicString::convertToASCIILowercase):
+        (WTF::AtomicString::convertToASCIIUppercase):
+        * wtf/text/AtomicString.h:
+        * wtf/text/StringImpl.cpp:
+        (WTF::StringImpl::convertASCIICase): Generalized from convertToASCIILowercase.
+        (WTF::StringImpl::convertToASCIILowercase):
+        (WTF::StringImpl::convertToASCIIUppercase):
+        * wtf/text/StringImpl.h:
+        * wtf/text/WTFString.cpp:
+        (WTF::String::convertToASCIIUppercase): Added.
+        * wtf/text/WTFString.h:
+
 2016-01-22  Chris Dumez  <cdumez@apple.com>
 
         Unreviewed attempt to fix the Windows build after r195452.
index 19c9a0e..99e0d6e 100644 (file)
@@ -48,7 +48,8 @@ AtomicString AtomicString::lower() const
     return result;
 }
 
-AtomicString AtomicString::convertToASCIILowercase() const
+template<AtomicString::CaseConvertType type>
+ALWAYS_INLINE AtomicString AtomicString::convertASCIICase() const
 {
     StringImpl* impl = this->impl();
     if (UNLIKELY(!impl))
@@ -63,7 +64,7 @@ AtomicString AtomicString::convertToASCIILowercase() const
         const LChar* characters = impl->characters8();
         unsigned failingIndex;
         for (unsigned i = 0; i < length; ++i) {
-            if (UNLIKELY(isASCIIUpper(characters[i]))) {
+            if (type == CaseConvertType::Lower ? UNLIKELY(isASCIIUpper(characters[i])) : LIKELY(isASCIILower(characters[i]))) {
                 failingIndex = i;
                 goto SlowPath;
             }
@@ -74,19 +75,29 @@ SlowPath:
         for (unsigned i = 0; i < failingIndex; ++i)
             localBuffer[i] = characters[i];
         for (unsigned i = failingIndex; i < length; ++i)
-            localBuffer[i] = toASCIILower(characters[i]);
+            localBuffer[i] = type == CaseConvertType::Lower ? toASCIILower(characters[i]) : toASCIIUpper(characters[i]);
         return AtomicString(localBuffer, length);
     }
 
-    RefPtr<StringImpl> convertedString = impl->convertToASCIILowercase();
-    if (LIKELY(convertedString == impl))
+    Ref<StringImpl> convertedString = type == CaseConvertType::Lower ? impl->convertToASCIILowercase() : impl->convertToASCIIUppercase();
+    if (LIKELY(convertedString.ptr() == impl))
         return *this;
 
     AtomicString result;
-    result.m_string = AtomicStringImpl::add(convertedString.get());
+    result.m_string = AtomicStringImpl::add(convertedString.ptr());
     return result;
 }
 
+AtomicString AtomicString::convertToASCIILowercase() const
+{
+    return convertASCIICase<CaseConvertType::Lower>();
+}
+
+AtomicString AtomicString::convertToASCIIUppercase() const
+{
+    return convertASCIICase<CaseConvertType::Upper>();
+}
+
 AtomicString AtomicString::number(int number)
 {
     return numberToStringSigned<AtomicString>(number);
index a2bc28e..cbf5560 100644 (file)
@@ -154,6 +154,7 @@ public:
         { return m_string.endsWith<matchLength>(prefix, caseSensitive); }
 
     WTF_EXPORT_STRING_API AtomicString convertToASCIILowercase() const;
+    WTF_EXPORT_STRING_API AtomicString convertToASCIIUppercase() const;
     WTF_EXPORT_STRING_API AtomicString lower() const;
     AtomicString upper() const { return AtomicString(impl()->upper()); }
 
@@ -186,6 +187,9 @@ private:
     // The explicit constructors with AtomicString::ConstructFromLiteral must be used for literals.
     AtomicString(ASCIILiteral);
 
+    enum class CaseConvertType { Upper, Lower };
+    template<CaseConvertType> AtomicString convertASCIICase() const;
+
     WTF_EXPORT_STRING_API static AtomicString fromUTF8Internal(const char*, const char*);
 
     String m_string;
index 2850a16..d5b7dd6 100644 (file)
@@ -679,42 +679,41 @@ Ref<StringImpl> StringImpl::foldCase()
     return newImpl.releaseNonNull();
 }
 
-Ref<StringImpl> StringImpl::convertToASCIILowercase()
-{
-    if (is8Bit()) {
-        unsigned failingIndex;
-        for (unsigned i = 0; i < m_length; ++i) {
-            LChar character = m_data8[i];
-            if (UNLIKELY(isASCIIUpper(character))) {
-                failingIndex = i;
-                goto SlowPath;
-            }
+template<StringImpl::CaseConvertType type, typename CharacterType>
+ALWAYS_INLINE Ref<StringImpl> StringImpl::convertASCIICase(StringImpl& impl, const CharacterType* data, unsigned length)
+{
+    unsigned failingIndex;
+    for (unsigned i = 0; i < length; ++i) {
+        CharacterType character = data[i];
+        if (type == CaseConvertType::Lower ? UNLIKELY(isASCIIUpper(character)) : LIKELY(isASCIILower(character))) {
+            failingIndex = i;
+            goto SlowPath;
         }
-        return *this;
+    }
+    return impl;
 
 SlowPath:
-        LChar* data8;
-        Ref<StringImpl> newImpl = createUninitializedInternalNonEmpty(m_length, data8);
-        for (unsigned i = 0; i < failingIndex; ++i)
-            data8[i] = m_data8[i];
-        for (unsigned i = failingIndex; i < m_length; ++i)
-            data8[i] = toASCIILower(m_data8[i]);
-        return newImpl;
-    }
+    CharacterType* newData;
+    Ref<StringImpl> newImpl = createUninitializedInternalNonEmpty(length, newData);
+    for (unsigned i = 0; i < failingIndex; ++i)
+        newData[i] = data[i];
+    for (unsigned i = failingIndex; i < length; ++i)
+        newData[i] = type == CaseConvertType::Lower ? toASCIILower(data[i]) : toASCIIUpper(data[i]);
+    return newImpl;
+}
 
-    bool noUpper = true;
-    for (unsigned i = 0; i < m_length; ++i) {
-        if (UNLIKELY(isASCIIUpper(m_data16[i])))
-            noUpper = false;
-    }
-    if (noUpper)
-        return *this;
+Ref<StringImpl> StringImpl::convertToASCIILowercase()
+{
+    if (is8Bit())
+        return convertASCIICase<CaseConvertType::Lower>(*this, m_data8, m_length);
+    return convertASCIICase<CaseConvertType::Lower>(*this, m_data16, m_length);
+}
 
-    UChar* data16;
-    Ref<StringImpl> newImpl = createUninitializedInternalNonEmpty(m_length, data16);
-    for (unsigned i = 0; i < m_length; ++i)
-        data16[i] = toASCIILower(m_data16[i]);
-    return newImpl;
+Ref<StringImpl> StringImpl::convertToASCIIUppercase()
+{
+    if (is8Bit())
+        return convertASCIICase<CaseConvertType::Upper>(*this, m_data8, m_length);
+    return convertASCIICase<CaseConvertType::Upper>(*this, m_data16, m_length);
 }
 
 template <class UCharPredicate>
index 63e630b..ca293c6 100644 (file)
@@ -676,6 +676,7 @@ public:
     float toFloat(bool* ok = 0);
 
     WTF_EXPORT_STRING_API Ref<StringImpl> convertToASCIILowercase();
+    WTF_EXPORT_STRING_API Ref<StringImpl> convertToASCIIUppercase();
     WTF_EXPORT_STRING_API Ref<StringImpl> lower();
     WTF_EXPORT_STRING_API Ref<StringImpl> upper();
     WTF_EXPORT_STRING_API Ref<StringImpl> lower(const AtomicString& localeIdentifier);
@@ -851,6 +852,9 @@ private:
     // This number must be at least 2 to avoid sharing empty, null as well as 1 character strings from SmallStrings.
     static const unsigned s_copyCharsInlineCutOff = 20;
 
+    enum class CaseConvertType { Upper, Lower };
+    template<CaseConvertType type, typename CharacterType> static Ref<StringImpl> convertASCIICase(StringImpl&, const CharacterType*, unsigned);
+
     BufferOwnership bufferOwnership() const { return static_cast<BufferOwnership>(m_hashAndFlags & s_hashMaskBufferOwnership); }
     template <class UCharPredicate> Ref<StringImpl> stripMatchedCharacters(UCharPredicate);
     template <typename CharType, class UCharPredicate> Ref<StringImpl> simplifyMatchedCharactersToSpace(UCharPredicate);
index ec1155b..1b43056 100644 (file)
@@ -343,6 +343,14 @@ String String::convertToASCIILowercase() const
     return m_impl->convertToASCIILowercase();
 }
 
+String String::convertToASCIIUppercase() const
+{
+    // FIXME: Should this function, and the many others like it, be inlined?
+    if (!m_impl)
+        return String();
+    return m_impl->convertToASCIIUppercase();
+}
+
 String String::lower() const
 {
     if (!m_impl)
index 3440404..3ca0fbf 100644 (file)
@@ -339,6 +339,7 @@ public:
     // want to do any conversion for non-ASCII letters.
     WTF_EXPORT_STRING_API String convertToASCIILowercase() const;
     WTF_EXPORT_STRING_API String lower() const;
+    WTF_EXPORT_STRING_API String convertToASCIIUppercase() const;
     WTF_EXPORT_STRING_API String upper() const;
 
     WTF_EXPORT_STRING_API String lower(const AtomicString& localeIdentifier) const;
index bc89c79..8acbd11 100644 (file)
@@ -1,3 +1,19 @@
+2016-01-20  Ryosuke Niwa  <rniwa@webkit.org>
+
+        HTMLElement::nodeName should not upper case non-ASCII characters
+        https://bugs.webkit.org/show_bug.cgi?id=153231
+
+        Reviewed by Darin Adler.
+
+        Use the newly added convertToASCIIUppercase to generate the string for tagName and nodeName.
+
+        Test: fast/dom/Element/tagName-must-be-ASCII-uppercase-in-HTML-document.html
+
+        * dom/QualifiedName.cpp:
+        (WebCore::QualifiedName::localNameUpper): Use convertToASCIIUppercase.
+        * html/HTMLElement.cpp:
+        (WebCore::HTMLElement::nodeName): Use convertToASCIIUppercase.
+
 2016-01-22  Brady Eidson  <beidson@apple.com>
 
         Modern IDB: Disable simultaneous transactions in the SQLite backend for now.
index 56e4ffb..c1b6575 100644 (file)
@@ -123,7 +123,7 @@ const QualifiedName& nullQName()
 const AtomicString& QualifiedName::localNameUpper() const
 {
     if (!m_impl->m_localNameUpper)
-        m_impl->m_localNameUpper = m_impl->m_localName.upper();
+        m_impl->m_localNameUpper = m_impl->m_localName.convertToASCIIUppercase();
     return m_impl->m_localNameUpper;
 }
 
index d52238c..6e93af0 100644 (file)
@@ -76,9 +76,9 @@ String HTMLElement::nodeName() const
     // FIXME: Would be nice to have an AtomicString lookup based off uppercase
     // ASCII characters that does not have to copy the string on a hit in the hash.
     if (document().isHTMLDocument()) {
-        if (!tagQName().hasPrefix())
+        if (LIKELY(!tagQName().hasPrefix()))
             return tagQName().localNameUpper();
-        return Element::nodeName().upper();
+        return Element::nodeName().convertToASCIIUppercase();
     }
     return Element::nodeName();
 }