Fix svg/in-html/script-write.html with threaded HTML parser
authortonyg@chromium.org <tonyg@chromium.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 14 Feb 2013 01:25:19 +0000 (01:25 +0000)
committertonyg@chromium.org <tonyg@chromium.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 14 Feb 2013 01:25:19 +0000 (01:25 +0000)
https://bugs.webkit.org/show_bug.cgi?id=109495

Reviewed by Eric Seidel.

Source/WebCore:

This patch makes the background parser's simulateTreeBuilder() more realistic.
1. The HTMLTreeBuilder does not call the updateStateFor() setState()s when in foreign content mode so we shouldn't do it when simulating the tree builder.
2. HTMLTreeBuilder::processTokenInForeignContent has a list of tags which exit foreign content mode. We need to respect those.
3. Support the <foreignObject> tag which enters and leaves foreign content mode.
4. The tree builder sets state to DataState upon a </script> tag when not in foreign content mode. We need to do the same.

This involved creating a namespace stack where we push upon entering each namespace and pop upon leaving.
We are in foreign content if the topmost namespace is SVG or MathML.

This fixes svg/in-html/script-write.html and likely others.

* html/parser/BackgroundHTMLParser.cpp:
(WebCore::BackgroundHTMLParser::simulateTreeBuilder):
* html/parser/BackgroundHTMLParser.h:
(BackgroundHTMLParser):
* html/parser/CompactHTMLToken.cpp:
(WebCore::CompactHTMLToken::getAttributeItem): Returns the attribute of the given name. Necessary to test for <font> attributes in simulateTreeBuilder.
(WebCore):
* html/parser/CompactHTMLToken.h:
(WebCore):
(CompactHTMLToken):

LayoutTests:

Added 3 new test cases:
1. Test the behavior of a plaintext tag inside an svg foreignObject. It applies to the remainder of the document. This behavior seems a little wonky, but it matches our current behavior and Firefox's behavior.
2. Test that we don't blindly go into HTML mode after </foreignObject>.
3. Test that unmatched </foreignObject>s are ignored.

* html5lib/resources/webkit02.dat:

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@142829 268f45cc-cd09-0410-ab3c-d52691b4dbfc

LayoutTests/ChangeLog
LayoutTests/html5lib/resources/webkit02.dat
Source/WebCore/ChangeLog
Source/WebCore/html/parser/BackgroundHTMLParser.cpp
Source/WebCore/html/parser/BackgroundHTMLParser.h
Source/WebCore/html/parser/CompactHTMLToken.cpp
Source/WebCore/html/parser/CompactHTMLToken.h

index 34b1f1f..baedd2b 100644 (file)
@@ -1,3 +1,17 @@
+2013-02-13  Tony Gentilcore  <tonyg@chromium.org>
+
+        Fix svg/in-html/script-write.html with threaded HTML parser
+        https://bugs.webkit.org/show_bug.cgi?id=109495
+
+        Reviewed by Eric Seidel.
+
+        Added 3 new test cases:
+        1. Test the behavior of a plaintext tag inside an svg foreignObject. It applies to the remainder of the document. This behavior seems a little wonky, but it matches our current behavior and Firefox's behavior.
+        2. Test that we don't blindly go into HTML mode after </foreignObject>.
+        3. Test that unmatched </foreignObject>s are ignored.
+
+        * html5lib/resources/webkit02.dat:
+
 2013-02-13  Emil A Eklund  <eae@chromium.org>
 
         getComputedStyle returns truncated value for margin-right
index 905783d..87f685d 100644 (file)
@@ -157,3 +157,39 @@ table
 select
 #document
 | <option>
+
+#data
+<svg><foreignObject><div>foo</div><plaintext></foreignObject></svg><div>bar</div>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <svg svg>
+|       <svg foreignObject>
+|         <div>
+|           "foo"
+|         <plaintext>
+|           "</foreignObject></svg><div>bar</div>"
+
+#data
+<svg><foreignObject></foreignObject><title></svg>foo
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <svg svg>
+|       <svg foreignObject>
+|       <svg title>
+|     "foo"
+
+#data
+</foreignObject><plaintext><div>foo</div>
+#errors
+#document
+| <html>
+|   <head>
+|   <body>
+|     <plaintext>
+|       "<div>foo</div>"
index 4cd9c4d..9a608f0 100644 (file)
@@ -1,3 +1,32 @@
+2013-02-13  Tony Gentilcore  <tonyg@chromium.org>
+
+        Fix svg/in-html/script-write.html with threaded HTML parser
+        https://bugs.webkit.org/show_bug.cgi?id=109495
+
+        Reviewed by Eric Seidel.
+
+        This patch makes the background parser's simulateTreeBuilder() more realistic.
+        1. The HTMLTreeBuilder does not call the updateStateFor() setState()s when in foreign content mode so we shouldn't do it when simulating the tree builder.
+        2. HTMLTreeBuilder::processTokenInForeignContent has a list of tags which exit foreign content mode. We need to respect those.
+        3. Support the <foreignObject> tag which enters and leaves foreign content mode.
+        4. The tree builder sets state to DataState upon a </script> tag when not in foreign content mode. We need to do the same.
+
+        This involved creating a namespace stack where we push upon entering each namespace and pop upon leaving.
+        We are in foreign content if the topmost namespace is SVG or MathML.
+
+        This fixes svg/in-html/script-write.html and likely others.
+
+        * html/parser/BackgroundHTMLParser.cpp:
+        (WebCore::BackgroundHTMLParser::simulateTreeBuilder):
+        * html/parser/BackgroundHTMLParser.h:
+        (BackgroundHTMLParser):
+        * html/parser/CompactHTMLToken.cpp:
+        (WebCore::CompactHTMLToken::getAttributeItem): Returns the attribute of the given name. Necessary to test for <font> attributes in simulateTreeBuilder.
+        (WebCore):
+        * html/parser/CompactHTMLToken.h:
+        (WebCore):
+        (CompactHTMLToken):
+
 2013-02-13  Andreas Kling  <akling@apple.com>
 
         Remove Element::getAttributeItem() overload that returned a mutable Attribute*.
index 304f541..bb8534b 100644 (file)
@@ -55,12 +55,62 @@ static void checkThatTokensAreSafeToSendToAnotherThread(const CompactHTMLTokenSt
 
 #endif
 
+static inline bool tokenExitsForeignContent(const CompactHTMLToken& token)
+{
+    // FIXME: This is copied from HTMLTreeBuilder::processTokenInForeignContent and changed to use threadSafeMatch.
+    const String& tagName = token.data();
+    return threadSafeMatch(tagName, bTag)
+        || threadSafeMatch(tagName, bigTag)
+        || threadSafeMatch(tagName, blockquoteTag)
+        || threadSafeMatch(tagName, bodyTag)
+        || threadSafeMatch(tagName, brTag)
+        || threadSafeMatch(tagName, centerTag)
+        || threadSafeMatch(tagName, codeTag)
+        || threadSafeMatch(tagName, ddTag)
+        || threadSafeMatch(tagName, divTag)
+        || threadSafeMatch(tagName, dlTag)
+        || threadSafeMatch(tagName, dtTag)
+        || threadSafeMatch(tagName, emTag)
+        || threadSafeMatch(tagName, embedTag)
+        || threadSafeMatch(tagName, h1Tag)
+        || threadSafeMatch(tagName, h2Tag)
+        || threadSafeMatch(tagName, h3Tag)
+        || threadSafeMatch(tagName, h4Tag)
+        || threadSafeMatch(tagName, h5Tag)
+        || threadSafeMatch(tagName, h6Tag)
+        || threadSafeMatch(tagName, headTag)
+        || threadSafeMatch(tagName, hrTag)
+        || threadSafeMatch(tagName, iTag)
+        || threadSafeMatch(tagName, imgTag)
+        || threadSafeMatch(tagName, liTag)
+        || threadSafeMatch(tagName, listingTag)
+        || threadSafeMatch(tagName, menuTag)
+        || threadSafeMatch(tagName, metaTag)
+        || threadSafeMatch(tagName, nobrTag)
+        || threadSafeMatch(tagName, olTag)
+        || threadSafeMatch(tagName, pTag)
+        || threadSafeMatch(tagName, preTag)
+        || threadSafeMatch(tagName, rubyTag)
+        || threadSafeMatch(tagName, sTag)
+        || threadSafeMatch(tagName, smallTag)
+        || threadSafeMatch(tagName, spanTag)
+        || threadSafeMatch(tagName, strongTag)
+        || threadSafeMatch(tagName, strikeTag)
+        || threadSafeMatch(tagName, subTag)
+        || threadSafeMatch(tagName, supTag)
+        || threadSafeMatch(tagName, tableTag)
+        || threadSafeMatch(tagName, ttTag)
+        || threadSafeMatch(tagName, uTag)
+        || threadSafeMatch(tagName, ulTag)
+        || threadSafeMatch(tagName, varTag)
+        || (threadSafeMatch(tagName, fontTag) && (token.getAttributeItem(colorAttr) || token.getAttributeItem(faceAttr) || token.getAttributeItem(sizeAttr)));
+}
+
 // FIXME: Tune this constant based on a benchmark. The current value was choosen arbitrarily.
 static const size_t pendingTokenLimit = 4000;
 
 BackgroundHTMLParser::BackgroundHTMLParser(PassRefPtr<WeakReference<BackgroundHTMLParser> > reference, const HTMLParserOptions& options, const WeakPtr<HTMLDocumentParser>& parser, PassOwnPtr<XSSAuditor> xssAuditor)
-    : m_inForeignContent(false)
-    , m_weakFactory(reference, this)
+    : m_weakFactory(reference, this)
     , m_token(adoptPtr(new HTMLToken))
     , m_tokenizer(HTMLTokenizer::create(options))
     , m_options(options)
@@ -68,6 +118,7 @@ BackgroundHTMLParser::BackgroundHTMLParser(PassRefPtr<WeakReference<BackgroundHT
     , m_pendingTokens(adoptPtr(new CompactHTMLTokenStream))
     , m_xssAuditor(xssAuditor)
 {
+    m_namespaceStack.append(HTML);
 }
 
 void BackgroundHTMLParser::append(const String& input)
@@ -115,36 +166,49 @@ bool BackgroundHTMLParser::simulateTreeBuilder(const CompactHTMLToken& token)
 {
     if (token.type() == HTMLToken::StartTag) {
         const String& tagName = token.data();
-        if (threadSafeMatch(tagName, SVGNames::svgTag)
-            || threadSafeMatch(tagName, MathMLNames::mathTag))
-            m_inForeignContent = true;
-
-        // FIXME: This is just a copy of Tokenizer::updateStateFor which uses threadSafeMatches.
-        if (threadSafeMatch(tagName, textareaTag) || threadSafeMatch(tagName, titleTag))
-            m_tokenizer->setState(HTMLTokenizer::RCDATAState);
-        else if (threadSafeMatch(tagName, plaintextTag))
-            m_tokenizer->setState(HTMLTokenizer::PLAINTEXTState);
-        else if (threadSafeMatch(tagName, scriptTag))
-            m_tokenizer->setState(HTMLTokenizer::ScriptDataState);
-        else if (threadSafeMatch(tagName, styleTag)
-            || threadSafeMatch(tagName, iframeTag)
-            || threadSafeMatch(tagName, xmpTag)
-            || (threadSafeMatch(tagName, noembedTag) && m_options.pluginsEnabled)
-            || threadSafeMatch(tagName, noframesTag)
-            || (threadSafeMatch(tagName, noscriptTag) && m_options.scriptEnabled))
-            m_tokenizer->setState(HTMLTokenizer::RAWTEXTState);
+        if (threadSafeMatch(tagName, SVGNames::svgTag))
+            m_namespaceStack.append(SVG);
+        if (threadSafeMatch(tagName, MathMLNames::mathTag))
+            m_namespaceStack.append(MathML);
+        if (inForeignContent() && tokenExitsForeignContent(token))
+            m_namespaceStack.removeLast();
+        // FIXME: Support tags that exit MathML.
+        if (m_namespaceStack.last() == SVG && equalIgnoringCase(tagName, SVGNames::foreignObjectTag.localName()))
+            m_namespaceStack.append(HTML);
+        if (!inForeignContent()) {
+            // FIXME: This is just a copy of Tokenizer::updateStateFor which uses threadSafeMatches.
+            if (threadSafeMatch(tagName, textareaTag) || threadSafeMatch(tagName, titleTag))
+                m_tokenizer->setState(HTMLTokenizer::RCDATAState);
+            else if (threadSafeMatch(tagName, plaintextTag))
+                m_tokenizer->setState(HTMLTokenizer::PLAINTEXTState);
+            else if (threadSafeMatch(tagName, scriptTag))
+                m_tokenizer->setState(HTMLTokenizer::ScriptDataState);
+            else if (threadSafeMatch(tagName, styleTag)
+                || threadSafeMatch(tagName, iframeTag)
+                || threadSafeMatch(tagName, xmpTag)
+                || (threadSafeMatch(tagName, noembedTag) && m_options.pluginsEnabled)
+                || threadSafeMatch(tagName, noframesTag)
+                || (threadSafeMatch(tagName, noscriptTag) && m_options.scriptEnabled))
+                m_tokenizer->setState(HTMLTokenizer::RAWTEXTState);
+        }
     }
 
     if (token.type() == HTMLToken::EndTag) {
         const String& tagName = token.data();
-        if (threadSafeMatch(tagName, SVGNames::svgTag) || threadSafeMatch(tagName, MathMLNames::mathTag))
-            m_inForeignContent = false;
-        if (threadSafeMatch(tagName, scriptTag))
+        // FIXME: Support tags that exit MathML.
+        if ((m_namespaceStack.last() == SVG && threadSafeMatch(tagName, SVGNames::svgTag))
+            || (m_namespaceStack.last() == MathML && threadSafeMatch(tagName, MathMLNames::mathTag))
+            || (m_namespaceStack.contains(SVG) && m_namespaceStack.last() == HTML && equalIgnoringCase(tagName, SVGNames::foreignObjectTag.localName())))
+            m_namespaceStack.removeLast();
+        if (threadSafeMatch(tagName, scriptTag)) {
+            if (!inForeignContent())
+                m_tokenizer->setState(HTMLTokenizer::DataState);
             return false;
+        }
     }
 
     // FIXME: Need to set setForceNullCharacterReplacement based on m_inForeignContent as well.
-    m_tokenizer->setShouldAllowCDATA(m_inForeignContent);
+    m_tokenizer->setShouldAllowCDATA(inForeignContent());
     return true;
 }
 
index 4f151a9..7a01148 100644 (file)
@@ -36,6 +36,7 @@
 #include "HTMLTokenizer.h"
 #include <wtf/PassOwnPtr.h>
 #include <wtf/RefPtr.h>
+#include <wtf/Vector.h>
 #include <wtf/WeakPtr.h>
 
 namespace WebCore {
@@ -69,6 +70,12 @@ public:
     void forcePlaintextForTextDocument();
 
 private:
+    enum Namespace {
+        HTML,
+        SVG,
+        MathML
+    };
+
     BackgroundHTMLParser(PassRefPtr<WeakReference<BackgroundHTMLParser> >, const HTMLParserOptions&, const WeakPtr<HTMLDocumentParser>&, PassOwnPtr<XSSAuditor>);
 
     void markEndOfFile();
@@ -76,8 +83,9 @@ private:
     bool simulateTreeBuilder(const CompactHTMLToken&);
 
     void sendTokensToMainThread();
+    bool inForeignContent() const { return m_namespaceStack.last() != HTML; }
 
-    bool m_inForeignContent; // FIXME: We need a stack of foreign content markers.
+    Vector<Namespace, 1> m_namespaceStack;
     WeakPtrFactory<BackgroundHTMLParser> m_weakFactory;
     BackgroundHTMLInputStream m_input;
     HTMLSourceTracker m_sourceTracker;
index 9c67ff0..3132d25 100644 (file)
@@ -29,7 +29,9 @@
 
 #include "CompactHTMLToken.h"
 
+#include "HTMLParserIdioms.h"
 #include "HTMLToken.h"
+#include "QualifiedName.h"
 #include "XSSAuditorDelegate.h"
 
 namespace WebCore {
@@ -100,6 +102,15 @@ CompactHTMLToken::CompactHTMLToken(const CompactHTMLToken& other)
         m_xssInfo = adoptPtr(new XSSInfo(*other.m_xssInfo));
 }
 
+const CompactAttribute* CompactHTMLToken::getAttributeItem(const QualifiedName& name) const
+{
+    for (unsigned i = 0; i < m_attributes.size(); ++i) {
+        if (threadSafeMatch(m_attributes.at(i).name(), name))
+            return &m_attributes.at(i);
+    }
+    return 0;
+}
+
 bool CompactHTMLToken::isSafeToSendToAnotherThread() const
 {
     for (Vector<CompactAttribute>::const_iterator it = m_attributes.begin(); it != m_attributes.end(); ++it) {
index b7c8569..5621042 100644 (file)
@@ -39,6 +39,7 @@
 
 namespace WebCore {
 
+class QualifiedName;
 class XSSInfo;
 
 class CompactAttribute {
@@ -69,6 +70,7 @@ public:
     bool selfClosing() const { return m_selfClosing; }
     bool isAll8BitData() const { return m_isAll8BitData; }
     const Vector<CompactAttribute>& attributes() const { return m_attributes; }
+    const CompactAttribute* getAttributeItem(const QualifiedName&) const;
     const TextPosition& textPosition() const { return m_textPosition; }
 
     // There is only 1 DOCTYPE token per document, so to avoid increasing the