[Cocoa] LastResort in the font family list causes emoji with joiners to be rendered...
authormmaxfield@apple.com <mmaxfield@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 2 Jul 2018 01:36:43 +0000 (01:36 +0000)
committermmaxfield@apple.com <mmaxfield@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 2 Jul 2018 01:36:43 +0000 (01:36 +0000)
https://bugs.webkit.org/show_bug.cgi?id=187209
<rdar://problem/40920785>

Reviewed by Darin Adler.

Source/WebCore:

Inside our complex text codepath, we perform our own font fallback, which
includes a function that asks "can this font support this grapheme cluster?"
Because of the mechanics of how fonts work, the implementation of this
function is "Does the font's cmap table support every character of the
cluster?" We were using Font::glyphForCharacter() to determine this; however,
this function maps certain control characters to the zero width space
character (with the intention that these control characters shouldn't be
visible in the fast text codepath). That replacement, however, was causing
us to get false negatives, because Apple Color Emoji doesn't support zero
width space. Therefore, Apple Color Emoji was looking like it didn't support
emoji combining sequences.

The best solution to this would be to get Font::glyphForCharacter() to stop
performing these replacements (see https://bugs.webkit.org/show_bug.cgi?id=187166).
However, that is too risky of a change to be making right now. Instead,
a more localized solution is to implement a version of "Does the font's cmap
table support every character of the cluster" that doesn't perform the
substitutions. This patch does exactly that, and uses a bit vector to cache
the results. In order to not have a giant bit vector, we take the old code
path if we know the substitutions won't affect us (and uses ASSERT()s to
validate this) so the bit vector only holds at maximum 3 words of storage.

Test: fast/text/emoji-with-joiner.html

* platform/graphics/Font.cpp:
(WebCore::codePointSupportIndex):
(WebCore::createAndFillGlyphPage):
(WebCore::Font::platformSupportsCodePoint const):
(WebCore::Font::supportsCodePoint const):
(WebCore::Font::canRenderCombiningCharacterSequence const):
* platform/graphics/Font.h:
* platform/graphics/cocoa/FontCocoa.mm:
(WebCore::Font::platformSupportsCodePoint const):

Source/WTF:

* wtf/unicode/CharacterNames.h:

LayoutTests:

* fast/text/emoji-with-joiner-expected.txt: Added.
* fast/text/emoji-with-joiner.html: Added.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@233413 268f45cc-cd09-0410-ab3c-d52691b4dbfc

LayoutTests/ChangeLog
LayoutTests/fast/text/emoji-with-joiner-expected.txt [new file with mode: 0644]
LayoutTests/fast/text/emoji-with-joiner.html [new file with mode: 0644]
Source/WTF/ChangeLog
Source/WTF/wtf/unicode/CharacterNames.h
Source/WebCore/ChangeLog
Source/WebCore/platform/graphics/Font.cpp
Source/WebCore/platform/graphics/Font.h
Source/WebCore/platform/graphics/cocoa/FontCocoa.mm

index 933f424..7680207 100644 (file)
@@ -1,3 +1,14 @@
+2018-07-01  Myles C. Maxfield  <mmaxfield@apple.com>
+
+        [Cocoa] LastResort in the font family list causes emoji with joiners to be rendered as multiple .notdef characters
+        https://bugs.webkit.org/show_bug.cgi?id=187209
+        <rdar://problem/40920785>
+
+        Reviewed by Darin Adler.
+
+        * fast/text/emoji-with-joiner-expected.txt: Added.
+        * fast/text/emoji-with-joiner.html: Added.
+
 2018-07-01  Wenson Hsieh  <wenson_hsieh@apple.com>
 
         [macOS] Text replacements that end with symbols are expanded immediately
diff --git a/LayoutTests/fast/text/emoji-with-joiner-expected.txt b/LayoutTests/fast/text/emoji-with-joiner-expected.txt
new file mode 100644 (file)
index 0000000..fe1c6f5
--- /dev/null
@@ -0,0 +1,10 @@
+PASS element.offsetWidth is 48
+PASS element.offsetWidth is 48
+PASS element.offsetWidth is 48
+PASS element.offsetWidth is 48
+PASS element.offsetWidth is 48
+PASS element.offsetWidth is 48
+PASS successfullyParsed is true
+
+TEST COMPLETE
+👱‍♂️ 👱🏻‍♂️ 👱🏼‍♂️ 👱🏽‍♂️ 👱🏾‍♂️ 👱🏿‍♂️
diff --git a/LayoutTests/fast/text/emoji-with-joiner.html b/LayoutTests/fast/text/emoji-with-joiner.html
new file mode 100644 (file)
index 0000000..8527164
--- /dev/null
@@ -0,0 +1,28 @@
+<!DOCTYPE html>
+<html>
+<head>
+<script src="../../resources/js-test-pre.js"></script>
+<style>
+.ib {
+    font: 48px 'Apple Color Emoji', 'LastResort';
+    display: inline-block;
+}
+</style>
+</head>
+<body>
+<div class="ib">&#x1F471;&zwj;&#x2642;&#xFE0F;</div>
+<div class="ib">&#x1F471;&#x1F3FB;&zwj;&#x2642;&#xFE0F;</div>
+<div class="ib">&#x1F471;&#x1F3FC;&zwj;&#x2642;&#xFE0F;</div>
+<div class="ib">&#x1F471;&#x1F3FD;&zwj;&#x2642;&#xFE0F;</div>
+<div class="ib">&#x1F471;&#x1F3FE;&zwj;&#x2642;&#xFE0F;</div>
+<div class="ib">&#x1F471;&#x1F3FF;&zwj;&#x2642;&#xFE0F;</div>
+<script>
+var elements = document.querySelectorAll(".ib");
+for (var element of elements) {
+    shouldBe("element.offsetWidth", "48");
+}
+</script>
+<script src="../../resources/js-test-post.js"></script>
+</body>
+</html>
+
index 9e0dfe6..8c32440 100644 (file)
@@ -1,3 +1,13 @@
+2018-07-01  Myles C. Maxfield  <mmaxfield@apple.com>
+
+        [Cocoa] LastResort in the font family list causes emoji with joiners to be rendered as multiple .notdef characters
+        https://bugs.webkit.org/show_bug.cgi?id=187209
+        <rdar://problem/40920785>
+
+        Reviewed by Darin Adler.
+
+        * wtf/unicode/CharacterNames.h:
+
 2018-06-23  Darin Adler  <darin@apple.com>
 
         [Cocoa] Improve ARC compatibility of more code in JavaScriptCore
index eb18792..f80758f 100644 (file)
@@ -87,6 +87,7 @@ const UChar sesameDot = 0xFE45;
 const UChar smallLetterSharpS = 0x00DF;
 const UChar softHyphen = 0x00AD;
 const UChar space = 0x0020;
+const UChar tabCharacter = 0x0009;
 const UChar tibetanMarkDelimiterTshegBstar = 0x0F0C;
 const UChar tibetanMarkIntersyllabicTsheg = 0x0F0B;
 const UChar32 ugariticWordDivider = 0x1039F;
@@ -152,6 +153,7 @@ using WTF::Unicode::rightToLeftOverride;
 using WTF::Unicode::sesameDot;
 using WTF::Unicode::softHyphen;
 using WTF::Unicode::space;
+using WTF::Unicode::tabCharacter;
 using WTF::Unicode::tibetanMarkDelimiterTshegBstar;
 using WTF::Unicode::tibetanMarkIntersyllabicTsheg;
 using WTF::Unicode::ugariticWordDivider;
index 41bd726..8909282 100644 (file)
@@ -1,3 +1,45 @@
+2018-07-01  Myles C. Maxfield  <mmaxfield@apple.com>
+
+        [Cocoa] LastResort in the font family list causes emoji with joiners to be rendered as multiple .notdef characters
+        https://bugs.webkit.org/show_bug.cgi?id=187209
+        <rdar://problem/40920785>
+
+        Reviewed by Darin Adler.
+
+        Inside our complex text codepath, we perform our own font fallback, which
+        includes a function that asks "can this font support this grapheme cluster?"
+        Because of the mechanics of how fonts work, the implementation of this
+        function is "Does the font's cmap table support every character of the
+        cluster?" We were using Font::glyphForCharacter() to determine this; however,
+        this function maps certain control characters to the zero width space
+        character (with the intention that these control characters shouldn't be
+        visible in the fast text codepath). That replacement, however, was causing
+        us to get false negatives, because Apple Color Emoji doesn't support zero
+        width space. Therefore, Apple Color Emoji was looking like it didn't support
+        emoji combining sequences.
+
+        The best solution to this would be to get Font::glyphForCharacter() to stop
+        performing these replacements (see https://bugs.webkit.org/show_bug.cgi?id=187166).
+        However, that is too risky of a change to be making right now. Instead,
+        a more localized solution is to implement a version of "Does the font's cmap
+        table support every character of the cluster" that doesn't perform the
+        substitutions. This patch does exactly that, and uses a bit vector to cache
+        the results. In order to not have a giant bit vector, we take the old code
+        path if we know the substitutions won't affect us (and uses ASSERT()s to 
+        validate this) so the bit vector only holds at maximum 3 words of storage.
+
+        Test: fast/text/emoji-with-joiner.html
+
+        * platform/graphics/Font.cpp:
+        (WebCore::codePointSupportIndex):
+        (WebCore::createAndFillGlyphPage):
+        (WebCore::Font::platformSupportsCodePoint const):
+        (WebCore::Font::supportsCodePoint const):
+        (WebCore::Font::canRenderCombiningCharacterSequence const):
+        * platform/graphics/Font.h:
+        * platform/graphics/cocoa/FontCocoa.mm:
+        (WebCore::Font::platformSupportsCodePoint const):
+
 2018-07-01  Wenson Hsieh  <wenson_hsieh@apple.com>
 
         [macOS] Text replacements that end with symbols are expanded immediately
index 2cdae74..52a3482 100644 (file)
@@ -151,6 +151,121 @@ static bool fillGlyphPage(GlyphPage& pageToFill, UChar* buffer, unsigned bufferL
     return hasGlyphs;
 }
 
+static std::optional<size_t> codePointSupportIndex(UChar32 codePoint)
+{
+    // FIXME: Consider reordering these so the most common ones are at the front.
+    // Doing this could cause the BitVector to fit inside inline storage and therefore
+    // be both a performance and a memory progression.
+    if (codePoint < 0x20)
+        return codePoint;
+    if (codePoint >= 0x7F && codePoint < 0xA0)
+        return codePoint - 0x7F + 0x20;
+    std::optional<size_t> result;
+    switch (codePoint) {
+    case softHyphen:
+        result = 0x41;
+        break;
+    case newlineCharacter:
+        result = 0x42;
+        break;
+    case tabCharacter:
+        result = 0x43;
+        break;
+    case noBreakSpace:
+        result = 0x44;
+        break;
+    case narrowNoBreakSpace:
+        result = 0x45;
+        break;
+    case leftToRightMark:
+        result = 0x46;
+        break;
+    case rightToLeftMark:
+        result = 0x47;
+        break;
+    case leftToRightEmbed:
+        result = 0x48;
+        break;
+    case rightToLeftEmbed:
+        result = 0x49;
+        break;
+    case leftToRightOverride:
+        result = 0x4A;
+        break;
+    case rightToLeftOverride:
+        result = 0x4B;
+        break;
+    case leftToRightIsolate:
+        result = 0x4C;
+        break;
+    case rightToLeftIsolate:
+        result = 0x4D;
+        break;
+    case zeroWidthNonJoiner:
+        result = 0x4E;
+        break;
+    case zeroWidthJoiner:
+        result = 0x4F;
+        break;
+    case popDirectionalFormatting:
+        result = 0x50;
+        break;
+    case popDirectionalIsolate:
+        result = 0x51;
+        break;
+    case firstStrongIsolate:
+        result = 0x52;
+        break;
+    case objectReplacementCharacter:
+        result = 0x53;
+        break;
+    case zeroWidthNoBreakSpace:
+        result = 0x54;
+        break;
+    default:
+        result = std::nullopt;
+    }
+
+#ifndef NDEBUG
+    UChar32 codePointOrder[] = {
+        0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
+        0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F,
+        0x7F,
+        0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8A, 0x8B, 0x8C, 0x8D, 0x8E, 0x8F,
+        0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9A, 0x9B, 0x9C, 0x9D, 0x9E, 0x9F,
+        softHyphen,
+        newlineCharacter,
+        tabCharacter,
+        noBreakSpace,
+        narrowNoBreakSpace,
+        leftToRightMark,
+        rightToLeftMark,
+        leftToRightEmbed,
+        rightToLeftEmbed,
+        leftToRightOverride,
+        rightToLeftOverride,
+        leftToRightIsolate,
+        rightToLeftIsolate,
+        zeroWidthNonJoiner,
+        zeroWidthJoiner,
+        popDirectionalFormatting,
+        popDirectionalIsolate,
+        firstStrongIsolate,
+        objectReplacementCharacter,
+        zeroWidthNoBreakSpace
+    };
+    bool found = false;
+    for (size_t i = 0; i < WTF_ARRAY_LENGTH(codePointOrder); ++i) {
+        if (codePointOrder[i] == codePoint) {
+            ASSERT(i == result);
+            found = true;
+        }
+    }
+    ASSERT(found == static_cast<bool>(result));
+#endif
+    return result;
+}
+
 static RefPtr<GlyphPage> createAndFillGlyphPage(unsigned pageNumber, const Font& font)
 {
 #if PLATFORM(IOS)
@@ -177,11 +292,14 @@ static RefPtr<GlyphPage> createAndFillGlyphPage(unsigned pageNumber, const Font&
         auto overwriteCodePoints = [&](unsigned minimum, unsigned maximum, UChar newCodePoint) {
             unsigned begin = std::max(start, minimum);
             unsigned complete = std::min(end, maximum);
-            for (unsigned i = begin; i < complete; ++i)
+            for (unsigned i = begin; i < complete; ++i) {
+                ASSERT(codePointSupportIndex(i));
                 buffer[i - start] = newCodePoint;
+            }
         };
 
         auto overwriteCodePoint = [&](UChar codePoint, UChar newCodePoint) {
+            ASSERT(codePointSupportIndex(codePoint));
             if (codePoint >= start && codePoint < end)
                 buffer[codePoint - start] = newCodePoint;
         };
@@ -511,14 +629,39 @@ bool Font::variantCapsSupportsCharacterForSynthesis(FontVariantCaps fontVariantC
         return true;
     }
 }
+
+bool Font::platformSupportsCodePoint(UChar32 character) const
+{
+    return glyphForCharacter(character);
+}
 #endif
 
+bool Font::supportsCodePoint(UChar32 character) const
+{
+    // This is very similar to static_cast<bool>(glyphForCharacter(character))
+    // except that glyphForCharacter() maps certain code points to ZWS (because they
+    // shouldn't be visible). This function doesn't do that mapping, and instead is
+    // as honest as possible about what code points the font supports. This is so
+    // that we can accurately determine which characters are supported by this font
+    // so we know which boundaries to break strings when we send them to the complex
+    // text codepath. The complex text codepath is totally separate from this ZWS
+    // replacement logic (because CoreText handles those characters instead of WebKit).
+    if (auto index = codePointSupportIndex(character)) {
+        m_codePointSupport.ensureSize(2 * (*index + 1));
+        bool hasBeenSet = m_codePointSupport.quickSet(2 * *index);
+        if (!hasBeenSet && platformSupportsCodePoint(character))
+            m_codePointSupport.quickSet(2 * *index + 1);
+        return m_codePointSupport.quickGet(2 * *index + 1);
+    }
+    return glyphForCharacter(character);
+}
+
 bool Font::canRenderCombiningCharacterSequence(const UChar* characters, size_t length) const
 {
     ASSERT(isMainThread());
 
     for (UChar32 codePoint : StringView(characters, length).codePoints()) {
-        if (!glyphForCharacter(codePoint))
+        if (!supportsCodePoint(codePoint))
             return false;
     }
     return true;
index 73cdf2a..12a72fd 100644 (file)
@@ -171,6 +171,8 @@ public:
 
     GlyphData glyphDataForCharacter(UChar32) const;
     Glyph glyphForCharacter(UChar32) const;
+    bool supportsCodePoint(UChar32) const;
+    bool platformSupportsCodePoint(UChar32) const;
 
     RefPtr<Font> systemFallbackFontForCharacter(UChar32, const FontDescription&, bool isForPlatformFont) const;
 
@@ -251,6 +253,7 @@ private:
     mutable HashMap<unsigned, RefPtr<GlyphPage>> m_glyphPages;
     mutable std::unique_ptr<GlyphMetricsMap<FloatRect>> m_glyphToBoundsMap;
     mutable GlyphMetricsMap<float> m_glyphToWidthMap;
+    mutable BitVector m_codePointSupport;
 
     mutable RefPtr<OpenTypeMathData> m_mathData;
 #if ENABLE(OPENTYPE_VERTICAL)
index 2239a83..0938738 100644 (file)
@@ -597,4 +597,13 @@ float Font::platformWidthForGlyph(Glyph glyph) const
     return advance.width + m_syntheticBoldOffset;
 }
 
+bool Font::platformSupportsCodePoint(UChar32 character) const
+{
+    UniChar codeUnits[2];
+    CGGlyph glyphs[2];
+    CFIndex count = 0;
+    U16_APPEND_UNSAFE(codeUnits, count, character);
+    return CTFontGetGlyphsForCharacters(platformData().ctFont(), codeUnits, glyphs, count);
+}
+
 } // namespace WebCore