Cannot access images included in the content pasted from Microsoft Word
authorrniwa@webkit.org <rniwa@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 16 Oct 2017 21:44:28 +0000 (21:44 +0000)
committerrniwa@webkit.org <rniwa@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 16 Oct 2017 21:44:28 +0000 (21:44 +0000)
commit0764dca64a7de9d373d1ead8214787eae4198ccf
tree72fdaf175c92a8a94acb3206822e8540be84e55e
parentde125f685da7174891fca1c6ff5faf7a9cef3a94
Cannot access images included in the content pasted from Microsoft Word
https://bugs.webkit.org/show_bug.cgi?id=124391
<rdar://problem/26862741>

Reviewed by Antti Koivisto.

Source/WebCore:

The bug is caused by the fact Microsoft Word generates HTML content which references an image using file URL.
Because the websites don't have access to arbtirary file URLs, this prevents editors such as TinyMCE to save
those images.

This patch fixes the problem by converting file URLs for images and all other subresources in the web archive
generated by Microsoft Word by blob URLs like r222839 for RTF/RTFD and r222119 for images.

To avoid revealing privacy sensitive information such as the absolute local file path to the user's home directory
Microsoft Word and other applications in the system includes in the web archive placed in the system pasteboard,
this patch also introduces the mechanism to sanitize when the HTML content is read by DataTransfer's getData.

This patch also introduces the sanitization for when writing HTML into the pasteboard since other applications
in the syste which is capable to processing web archives are not necessarily equipped to pretect itself and the
rest of the system from potentially dangerous JavaScript included in the web archive placed in the system pasteboard.

Finally, this patch expands the list of clipboard types that are exposed as "text/html" to the Web platform by
adding the capability to convert RTF, RTFD, and web archive into HTML markup by introducing WebContentMarkupReader,
a new subclass of PasteboardWebContentReader which creates a HTML markup instead of a document fragment. Most of
the sanitization process happens in this new class, and will be expanded to WebContentReader to make pasting safer.

Tests: editing/pasteboard/data-transfer-get-data-on-pasting-html-uses-blob-url.html
       editing/pasteboard/data-transfer-set-data-sanitizes-html-when-copying-in-null-origin.html
       editing/pasteboard/data-transfer-set-data-sanitizes-html-when-copying.html
       editing/pasteboard/data-transfer-set-data-sanitlize-html-when-dragging-in-null-origin.html
       http/tests/security/clipboard/copy-paste-html-across-origin-sanitizes-html.html
       CopyHTML.Sanitizes
       DataInteractionTests.DataTransferSanitizeHTML
       PasteRTF.ExposesHTMLTypeInDataTransfer
       PasteRTFD.ExposesHTMLTypeInDataTransfer
       PasteRTFD.ImageElementUsesBlobURLInHTML
       PasteWebArchive.ExposesHTMLTypeInDataTransfer

* dom/DataTransfer.cpp:
(WebCore::originIdentifierForDocument): Moved to Document::originIdentifierForPasteboard.
(WebCore::DataTransfer::createForCopyAndPaste):
(WebCore::DataTransfer::getDataForItem const): Use WebContentMarkupReader read HTMl content so that we can read
web arhive, RTF, and RTFD as text/html.
(WebCore::DataTransfer::getData const):
(WebCore::DataTransfer::setData):
(WebCore::DataTransfer::setDataFromItemList): Sanitize the HTML before placing into the system pasteboard.
(WebCore::DataTransfer::createForDragStartEvent):
(WebCore::DataTransfer::createForDrop):
(WebCore::DataTransfer::createForUpdatingDropTarget):
* dom/DataTransfer.h:
* dom/DataTransfer.idl:
* dom/DataTransferItem.cpp:
(WebCore::DataTransferItem::getAsString const):
* dom/Document.cpp:
(WebCore::Document::originIdentifierForPasteboard): Renamed from uniqueIdentifier. Moved the code to use the origin
string and then falling back to the UUID here from originIdentifierForDocument in DataTransfer.cpp.
* dom/Document.h:
* editing/WebContentReader.cpp:
(WebCore::WebContentMarkupReader::shouldSanitize const): Added.
* editing/WebContentReader.h:
(WebCore::WebContentMarkupReader): Added.
(WebCore::WebContentMarkupReader::WebContentMarkupReader):
* editing/cocoa/WebContentReaderCocoa.mm:
(WebCore::createFragmentFromWebArchive): Extracted out of WebContentReader::readWebArchive to share code.
(WebCore::WebContentReader::readWebArchive):
(WebCore::WebContentMarkupReader::readWebArchive): Added. Reads the web archive, replace all subresource URLs by
blob URLs, and re-generate the markup using our copy & paste code. The last step is requied to strip away any privacy
sensitive information as well as potentially dangerous JavaScript code.
(WebCore::stripMicrosoftPrefix): Extracted out of WebContentReader::readHTML to share code.
(WebCore::WebContentReader::readHTML):
(WebCore::WebContentMarkupReader::readHTML): Added. Only sanitize the markup when it comes from a different origin.
(WebCore::WebContentReader::readRTFD): Added a nullity check for frame.document().
(WebCore::WebContentMarkupReader::readRTFD): Added.
(WebCore::WebContentMarkupReader::readRTF): Added.
* editing/markup.h:
* editing/markup.cpp:
(WebCore::createPageForSanitizingWebContent): Added.
(WebCore::sanitizeMarkup): Added. This function "pastes" the markup into a new isolated document then reserializes
using our serialization code for copy. It strips away all invisible information such as comments, and strips away
event handlers and script elements to remove potentially dangerous scripts.
* platform/Pasteboard.h:
* platform/ios/PasteboardIOS.mm:
(WebCore::Pasteboard::readPasteboardWebContentDataForType): Now that this code can be called by DataTransfer, added
the checks for the change count to make sure we stop letting web content read if the pasteboard had been changed by
some other applications. To do this, turned this function into a member of Pasteboard. Also changed the return type
to an enum with tri-state to exist the loop early in the call sites.
(WebCore::Pasteboard::read):
(WebCore::Pasteboard::readRespectingUTIFidelities):
* platform/ios/PlatformPasteboardIOS.mm:
(WebCore::safeTypeForDOMToReadAndWriteForPlatformType): Treat RTF, RTFD, and web archive as HTML.
* platform/mac/PasteboardMac.mm:
(WebCore::Pasteboard::read): Add the change count checks now that this code can be called by DataTransfer.
* platform/mac/PlatformPasteboardMac.mm:
(WebCore::safeTypeForDOMToReadAndWriteForPlatformType): Treat RTF, RTFD, and web archive as HTML.

Tools:

Added tests for sanitizing HTML contents for copy & paste and drag & drop.

* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WebKitCocoa/CopyHTML.mm: Added.
(readHTMLFromPasteboard): Added.
(createWebViewWithCustomPasteboardDataEnabled): Added.
(CopyHTML.Sanitizes): Added.

* TestWebKitAPI/Tests/WebKitCocoa/CopyURL.mm:
(createWebViewWithCustomPasteboardDataEnabled): Added to enable more tests on bots.

* TestWebKitAPI/Tests/WebKitCocoa/PasteRTFD.mm:
(writeRTFToPasteboard): Added.
(createWebViewWithCustomPasteboardDataEnabled): Added.
(createHelloWorldString): Added.
(PasteRTF.ExposesHTMLTypeInDataTransfer): Added.
(PasteRTFD.ExposesHTMLTypeInDataTransfer): Added.
(PasteRTFD.ImageElementUsesBlobURLInHTML): Added.

* TestWebKitAPI/Tests/WebKitCocoa/copy-html.html: Added.
* TestWebKitAPI/Tests/WebKitCocoa/paste-rtfd.html: Store the clipboardData contents for
PasteRTF.ExposesHTMLTypeInDataTransfer and PasteRTFD.ExposesHTMLTypeInDataTransfer.

* TestWebKitAPI/Tests/ios/DataInteractionTests.mm:
(DataInteractionTests.DataTransferSanitizeHTML):

LayoutTests:

Added tests for copying & pasting and dragging & dropping HTML contents.

* TestExpectations:
* editing/pasteboard/data-transfer-get-data-on-drop-rich-text-expected.txt: Rebaselined.
* editing/pasteboard/data-transfer-get-data-on-paste-rich-text-expected.txt: Ditto.
* editing/pasteboard/data-transfer-get-data-on-paste-rich-text.html: Modified the test to strip away platform specific
inline style properties.
* editing/pasteboard/data-transfer-get-data-on-pasting-html-uses-blob-url-expected.txt: Added.
* editing/pasteboard/data-transfer-get-data-on-pasting-html-uses-blob-url.html: Added.
* editing/pasteboard/data-transfer-set-data-sanitizes-html-when-copying-expected.txt: Added.
* editing/pasteboard/data-transfer-set-data-sanitizes-html-when-copying-in-null-origin-expected.txt: Added.
* editing/pasteboard/data-transfer-set-data-sanitizes-html-when-copying-in-null-origin.html: Added.
* editing/pasteboard/data-transfer-set-data-sanitizes-html-when-copying.html: Added.
* editing/pasteboard/data-transfer-set-data-sanitizes-html-when-dragging-in-null-origin-expected.txt: Added.
* editing/pasteboard/data-transfer-set-data-sanitizes-html-when-dragging-in-null-origin.html: Added.
* editing/pasteboard/data-transfer-set-data-sanitizes-url-when-dragging-in-null-origin.html: Removed the superflous
call to setTimeout that was errornously added during debugging. Also updated the test to not claim all URL and
HTML values are read in the same origin, and updated the assertion for cross-origin case as it's now sanitized.
* editing/pasteboard/onpaste-text-html-expected.txt: Rebaselined. The order of CSS properties have changed.
* http/tests/security/clipboard/copy-paste-html-across-origin-sanitizes-html-expected.txt: Added.
* http/tests/security/clipboard/copy-paste-html-across-origin-sanitizes-html.html: Added.
* http/tests/security/clipboard/copy-paste-url-across-origin-sanitizes-url.html:
* http/tests/security/clipboard/resources/copy-html.html: Added.
* http/tests/security/clipboard/resources/copy-url.html: Renamed from copy.html.
* platform/ios-wk2/editing/pasteboard/data-transfer-get-data-on-paste-rich-text-expected.txt: Remoevd.
* platform/ios-wk1/editing/pasteboard/data-transfer-get-data-on-paste-rich-text-expected.txt: Remoevd.
* platform/mac-wk1/TestExpectations:

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@223440 268f45cc-cd09-0410-ab3c-d52691b4dbfc
54 files changed:
LayoutTests/ChangeLog
LayoutTests/TestExpectations
LayoutTests/editing/pasteboard/data-transfer-get-data-on-drop-rich-text-expected.txt
LayoutTests/editing/pasteboard/data-transfer-get-data-on-paste-rich-text-expected.txt
LayoutTests/editing/pasteboard/data-transfer-get-data-on-paste-rich-text.html
LayoutTests/editing/pasteboard/data-transfer-get-data-on-pasting-html-uses-blob-url-expected.txt [new file with mode: 0644]
LayoutTests/editing/pasteboard/data-transfer-get-data-on-pasting-html-uses-blob-url.html [new file with mode: 0644]
LayoutTests/editing/pasteboard/data-transfer-set-data-sanitize-html-when-dragging-in-null-origin-expected.txt [new file with mode: 0644]
LayoutTests/editing/pasteboard/data-transfer-set-data-sanitize-html-when-dragging-in-null-origin.html [new file with mode: 0644]
LayoutTests/editing/pasteboard/data-transfer-set-data-sanitize-url-when-dragging-in-null-origin-expected.txt
LayoutTests/editing/pasteboard/data-transfer-set-data-sanitize-url-when-dragging-in-null-origin.html
LayoutTests/editing/pasteboard/data-transfer-set-data-sanitizes-html-when-copying-expected.txt [new file with mode: 0644]
LayoutTests/editing/pasteboard/data-transfer-set-data-sanitizes-html-when-copying-in-null-origin-expected.txt [new file with mode: 0644]
LayoutTests/editing/pasteboard/data-transfer-set-data-sanitizes-html-when-copying-in-null-origin.html [new file with mode: 0644]
LayoutTests/editing/pasteboard/data-transfer-set-data-sanitizes-html-when-copying.html [new file with mode: 0644]
LayoutTests/editing/pasteboard/onpaste-text-html-expected.txt
LayoutTests/fast/events/ondrop-text-html-expected.txt
LayoutTests/http/tests/security/clipboard/copy-paste-html-across-origin-sanitizes-html-expected.txt [new file with mode: 0644]
LayoutTests/http/tests/security/clipboard/copy-paste-html-across-origin-sanitizes-html.html [new file with mode: 0644]
LayoutTests/http/tests/security/clipboard/copy-paste-url-across-origin-sanitizes-url.html
LayoutTests/http/tests/security/clipboard/resources/copy-html.html [new file with mode: 0644]
LayoutTests/http/tests/security/clipboard/resources/copy-url.html [moved from LayoutTests/http/tests/security/clipboard/resources/copy.html with 100% similarity]
LayoutTests/platform/ios-wk1/editing/pasteboard/data-transfer-get-data-on-paste-rich-text-expected.txt [deleted file]
LayoutTests/platform/ios-wk2/editing/pasteboard/data-transfer-get-data-on-paste-rich-text-expected.txt [deleted file]
LayoutTests/platform/mac-wk1/TestExpectations
LayoutTests/platform/win/TestExpectations
Source/WebCore/ChangeLog
Source/WebCore/dom/DataTransfer.cpp
Source/WebCore/dom/DataTransfer.h
Source/WebCore/dom/DataTransfer.idl
Source/WebCore/dom/DataTransferItem.cpp
Source/WebCore/dom/DataTransferItem.h
Source/WebCore/dom/DataTransferItem.idl
Source/WebCore/dom/Document.cpp
Source/WebCore/dom/Document.h
Source/WebCore/editing/WebContentReader.cpp
Source/WebCore/editing/WebContentReader.h
Source/WebCore/editing/cocoa/WebContentReaderCocoa.mm
Source/WebCore/editing/markup.cpp
Source/WebCore/editing/markup.h
Source/WebCore/platform/Pasteboard.h
Source/WebCore/platform/ios/PasteboardIOS.mm
Source/WebCore/platform/ios/PlatformPasteboardIOS.mm
Source/WebCore/platform/mac/PasteboardMac.mm
Source/WebCore/platform/mac/PlatformPasteboardMac.mm
Tools/ChangeLog
Tools/TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj
Tools/TestWebKitAPI/Tests/WebKitCocoa/CopyHTML.mm [new file with mode: 0644]
Tools/TestWebKitAPI/Tests/WebKitCocoa/CopyURL.mm
Tools/TestWebKitAPI/Tests/WebKitCocoa/PasteRTFD.mm
Tools/TestWebKitAPI/Tests/WebKitCocoa/PasteWebArchive.mm [new file with mode: 0644]
Tools/TestWebKitAPI/Tests/WebKitCocoa/copy-html.html [new file with mode: 0644]
Tools/TestWebKitAPI/Tests/WebKitCocoa/paste-rtfd.html
Tools/TestWebKitAPI/Tests/ios/DataInteractionTests.mm