WebKit should support HTML entities that expand to more than one character
authorabarth@webkit.org <abarth@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 19 Dec 2011 18:30:20 +0000 (18:30 +0000)
committerabarth@webkit.org <abarth@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 19 Dec 2011 18:30:20 +0000 (18:30 +0000)
commit464e9fc86381805845c29748dea3bcb471e23939
tree73f0433ec07962bb001fdf137bcd73cc6ed6e55c
parentfa7bb1e828aa45175a72ed5bf3c168f4706d6396
WebKit should support HTML entities that expand to more than one character
https://bugs.webkit.org/show_bug.cgi?id=74826

Reviewed by Darin Adler.

Source/WebCore:

Tests: html5lib/runner.html

* html/parser/HTMLEntityNames.in:
    - Add missing HTML entities from HTML5 spec.  I'll sort this file
      in a followup patch.  (It's not quite sorted perfectly and
      sorting in this patch would introduce noise into the patch.)
* html/parser/HTMLEntityParser.cpp:
(WebCore::decodeNamedEntity):
    - convertToUTF16 always returns true, so make it return void instead.
    - Teach the entity parse that some entities expand to two characters.
* html/parser/HTMLEntityParser.h:
    - Add a warning that decodeNamedEntity is really a broken API.
    - This patch doesn't actually change any behavior of this API, but
      it does illustrate that the two callers of this API (the two XML
      parsers) really need to move a more sensible API.
* html/parser/HTMLEntitySearch.cpp:
(WebCore::HTMLEntitySearch::HTMLEntitySearch):
(WebCore::HTMLEntitySearch::advance):
* html/parser/HTMLEntitySearch.h:
(WebCore::HTMLEntitySearch::fail):
    - Remove the concept of currentValue.  This isn't really used for
      anything and conflicts with the idea that entities can expand
      to more than one character.
* html/parser/HTMLEntityTable.h:
    - Add storage for two UChar32 values per entity.
* html/parser/create-html-entity-table:
(convert_value_to_int):
    - Teach this script to handle entities that expand to multiple
      Unicode characters.
* xml/parser/CharacterReferenceParserInlineMethods.h:
(WebCore::consumeCharacterReference):
    - Update this function now that convertToUTF16 returns void.
* xml/parser/XMLCharacterReferenceParser.cpp:
    - The XML version of convertToUTF16 also needs to return void to
      match the HTML signature.  (It used to return true all the time
      as well.)
* xml/parser/XMLTreeBuilder.cpp:
(WebCore::XMLTreeBuilder::processHTMLEntity):
    - Update this caller use leftValue instead of value.  My sense is
      that this code is moderately broken today because it's using HTML
      entities in parsing XML.  I've added a FIXME.  This code is
      disabled in all builds, so I don't feel a big need to fix this
      issue in this patch.  We should either finish this project or
      delete this complexity from the project.

LayoutTests:

Show test progression.

* html5lib/runner-expected.txt:
* platform/chromium/html5lib/runner-expected.txt:

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@103246 268f45cc-cd09-0410-ab3c-d52691b4dbfc
14 files changed:
LayoutTests/ChangeLog
LayoutTests/html5lib/runner-expected.txt
LayoutTests/platform/chromium/html5lib/runner-expected.txt
Source/WebCore/ChangeLog
Source/WebCore/html/parser/HTMLEntityNames.in
Source/WebCore/html/parser/HTMLEntityParser.cpp
Source/WebCore/html/parser/HTMLEntityParser.h
Source/WebCore/html/parser/HTMLEntitySearch.cpp
Source/WebCore/html/parser/HTMLEntitySearch.h
Source/WebCore/html/parser/HTMLEntityTable.h
Source/WebCore/html/parser/create-html-entity-table
Source/WebCore/xml/parser/CharacterReferenceParserInlineMethods.h
Source/WebCore/xml/parser/XMLCharacterReferenceParser.cpp
Source/WebCore/xml/parser/XMLTreeBuilder.cpp