[XSS Auditor] Extract attribute truncation logic and formalize string canonicalization
authordbates@webkit.org <dbates@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 14 Jan 2016 21:40:13 +0000 (21:40 +0000)
committerdbates@webkit.org <dbates@webkit.org@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 14 Jan 2016 21:40:13 +0000 (21:40 +0000)
https://bugs.webkit.org/show_bug.cgi?id=152874

Reviewed by Brent Fulgham.

Derived from Blink patch (by Tom Sepez <tsepez@chromium.org>):
<https://src.chromium.org/viewvc/blink?revision=176339&view=revision>

Extract the src-like and script-like attribute truncation logic into independent functions
towards making it more straightforward to re-purpose this logic. Additionally, formalize the
concept of string canonicalization as a member function that consolidates the process of
decoding URL escape sequences, truncating the decoded string (if applicable), and removing
characters that are considered noise.

* html/parser/XSSAuditor.cpp:
(WebCore::truncateForSrcLikeAttribute): Extracted from XSSAuditor::decodedSnippetForAttribute().
(WebCore::truncateForScriptLikeAttribute): Ditto.
(WebCore::XSSAuditor::init): Write in terms of XSSAuditor::canonicalize().
(WebCore::XSSAuditor::filterCharacterToken): Updated to make use of formalized canonicalization methods.
(WebCore::XSSAuditor::filterScriptToken): Ditto.
(WebCore::XSSAuditor::filterObjectToken): Ditto.
(WebCore::XSSAuditor::filterParamToken): Ditto.
(WebCore::XSSAuditor::filterEmbedToken): Ditto.
(WebCore::XSSAuditor::filterAppletToken): Ditto.
(WebCore::XSSAuditor::filterFrameToken): Ditto.
(WebCore::XSSAuditor::filterInputToken): Ditto.
(WebCore::XSSAuditor::filterButtonToken): Ditto.
(WebCore::XSSAuditor::eraseDangerousAttributesIfInjected): Ditto.
(WebCore::XSSAuditor::eraseAttributeIfInjected): Updated code to use early return style and avoid an unnecessary string
comparison when we know that a src attribute was injected.
(WebCore::XSSAuditor::canonicalizedSnippetForTagName): Renamed; formerly known as XSSAuditor::decodedSnippetForName(). Updated
to make use of XSSAuditor::canonicalize().
(WebCore::XSSAuditor::snippetFromAttribute): Renamed; formerly known as XSSAuditor::decodedSnippetForAttribute(). Moved
truncation logic from here to WebCore::truncateFor{Script, Src}LikeAttribute.
(WebCore::XSSAuditor::canonicalize): Added.
(WebCore::XSSAuditor::canonicalizedSnippetForJavaScript): Added.
(WebCore::canonicalize): Deleted.
(WebCore::XSSAuditor::decodedSnippetForName): Deleted.
(WebCore::XSSAuditor::decodedSnippetForAttribute): Deleted.
(WebCore::XSSAuditor::decodedSnippetForJavaScript): Deleted.
* html/parser/XSSAuditor.h: Define enum class for the various attribute truncation styles.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@195074 268f45cc-cd09-0410-ab3c-d52691b4dbfc

Source/WebCore/ChangeLog
Source/WebCore/html/parser/XSSAuditor.cpp
Source/WebCore/html/parser/XSSAuditor.h

index 1b5a43f..104bf52 100644 (file)
@@ -1,5 +1,49 @@
 2016-01-14  Daniel Bates  <dabates@apple.com>
 
+        [XSS Auditor] Extract attribute truncation logic and formalize string canonicalization
+        https://bugs.webkit.org/show_bug.cgi?id=152874
+
+        Reviewed by Brent Fulgham.
+
+        Derived from Blink patch (by Tom Sepez <tsepez@chromium.org>):
+        <https://src.chromium.org/viewvc/blink?revision=176339&view=revision>
+
+        Extract the src-like and script-like attribute truncation logic into independent functions
+        towards making it more straightforward to re-purpose this logic. Additionally, formalize the
+        concept of string canonicalization as a member function that consolidates the process of
+        decoding URL escape sequences, truncating the decoded string (if applicable), and removing
+        characters that are considered noise.
+
+        * html/parser/XSSAuditor.cpp:
+        (WebCore::truncateForSrcLikeAttribute): Extracted from XSSAuditor::decodedSnippetForAttribute().
+        (WebCore::truncateForScriptLikeAttribute): Ditto.
+        (WebCore::XSSAuditor::init): Write in terms of XSSAuditor::canonicalize().
+        (WebCore::XSSAuditor::filterCharacterToken): Updated to make use of formalized canonicalization methods.
+        (WebCore::XSSAuditor::filterScriptToken): Ditto.
+        (WebCore::XSSAuditor::filterObjectToken): Ditto.
+        (WebCore::XSSAuditor::filterParamToken): Ditto.
+        (WebCore::XSSAuditor::filterEmbedToken): Ditto.
+        (WebCore::XSSAuditor::filterAppletToken): Ditto.
+        (WebCore::XSSAuditor::filterFrameToken): Ditto.
+        (WebCore::XSSAuditor::filterInputToken): Ditto.
+        (WebCore::XSSAuditor::filterButtonToken): Ditto.
+        (WebCore::XSSAuditor::eraseDangerousAttributesIfInjected): Ditto.
+        (WebCore::XSSAuditor::eraseAttributeIfInjected): Updated code to use early return style and avoid an unnecessary string
+        comparison when we know that a src attribute was injected.
+        (WebCore::XSSAuditor::canonicalizedSnippetForTagName): Renamed; formerly known as XSSAuditor::decodedSnippetForName(). Updated
+        to make use of XSSAuditor::canonicalize().
+        (WebCore::XSSAuditor::snippetFromAttribute): Renamed; formerly known as XSSAuditor::decodedSnippetForAttribute(). Moved
+        truncation logic from here to WebCore::truncateFor{Script, Src}LikeAttribute.
+        (WebCore::XSSAuditor::canonicalize): Added.
+        (WebCore::XSSAuditor::canonicalizedSnippetForJavaScript): Added.
+        (WebCore::canonicalize): Deleted.
+        (WebCore::XSSAuditor::decodedSnippetForName): Deleted.
+        (WebCore::XSSAuditor::decodedSnippetForAttribute): Deleted.
+        (WebCore::XSSAuditor::decodedSnippetForJavaScript): Deleted.
+        * html/parser/XSSAuditor.h: Define enum class for the various attribute truncation styles.
+
+2016-01-14  Daniel Bates  <dabates@apple.com>
+
         [XSS Auditor] Partial bypass when web server collapses path components
         https://bugs.webkit.org/show_bug.cgi?id=152872
 
index 38c5208..c26ffc8 100644 (file)
@@ -63,11 +63,6 @@ static bool isNonCanonicalCharacter(UChar c)
     return (c == '\\' || c == '0' || c == '\0' || c == '/' || c >= 127);
 }
 
-static String canonicalize(const String& string)
-{
-    return string.removeCharacters(&isNonCanonicalCharacter);
-}
-
 static bool isRequiredForInjection(UChar c)
 {
     return (c == '\'' || c == '"' || c == '<' || c == '>');
@@ -180,6 +175,57 @@ static String fullyDecodeString(const String& string, const TextEncoding& encodi
     return workingString;
 }
 
+static void truncateForSrcLikeAttribute(String& decodedSnippet)
+{
+    // In HTTP URLs, characters following the first ?, #, or third slash may come from
+    // the page itself and can be merely ignored by an attacker's server when a remote
+    // script or script-like resource is requested. In DATA URLS, the payload starts at
+    // the first comma, and the the first /*, //, or <!-- may introduce a comment. Characters
+    // following this may come from the page itself and may be ignored when the script is
+    // executed. For simplicity, we don't differentiate based on URL scheme, and stop at
+    // the first # or ?, the third slash, or the first slash or < once a comma is seen.
+    int slashCount = 0;
+    bool commaSeen = false;
+    for (size_t currentLength = 0; currentLength < decodedSnippet.length(); ++currentLength) {
+        UChar currentChar = decodedSnippet[currentLength];
+        if (currentChar == '?'
+            || currentChar == '#'
+            || ((currentChar == '/' || currentChar == '\\') && (commaSeen || ++slashCount > 2))
+            || (currentChar == '<' && commaSeen)) {
+            decodedSnippet.truncate(currentLength);
+            return;
+        }
+        if (currentChar == ',')
+            commaSeen = true;
+    }
+}
+
+static void truncateForScriptLikeAttribute(String& decodedSnippet)
+{
+    // Beware of trailing characters which came from the page itself, not the
+    // injected vector. Excluding the terminating character covers common cases
+    // where the page immediately ends the attribute, but doesn't cover more
+    // complex cases where there is other page data following the injection.
+    // Generally, these won't parse as JavaScript, so the injected vector
+    // typically excludes them from consideration via a single-line comment or
+    // by enclosing them in a string literal terminated later by the page's own
+    // closing punctuation. Since the snippet has not been parsed, the vector
+    // may also try to introduce these via entities. As a result, we'd like to
+    // stop before the first "//", the first <!--, the first entity, or the first
+    // quote not immediately following the first equals sign (taking whitespace
+    // into consideration). To keep things simpler, we don't try to distinguish
+    // between entity-introducing ampersands vs. other uses, nor do we bother to
+    // check for a second slash for a comment, nor do we bother to check for
+    // !-- following a less-than sign. We stop instead on any ampersand
+    // slash, or less-than sign.
+    size_t position = 0;
+    if ((position = decodedSnippet.find('=')) != notFound
+        && (position = decodedSnippet.find(isNotHTMLSpace, position + 1)) != notFound
+        && (position = decodedSnippet.find(isTerminatingCharacter, isHTMLQuote(decodedSnippet[position]) ? position + 1 : position)) != notFound) {
+        decodedSnippet.truncate(position);
+    }
+}
+
 static ContentSecurityPolicy::ReflectedXSSDisposition combineXSSProtectionHeaderAndCSP(ContentSecurityPolicy::ReflectedXSSDisposition xssProtection, ContentSecurityPolicy::ReflectedXSSDisposition reflectedXSS)
 {
     ContentSecurityPolicy::ReflectedXSSDisposition result = std::max(xssProtection, reflectedXSS);
@@ -269,7 +315,7 @@ void XSSAuditor::init(Document* document, XSSAuditorDelegate* auditorDelegate)
     if (document->decoder())
         m_encoding = document->decoder()->encoding();
 
-    m_decodedURL = canonicalize(fullyDecodeString(m_documentURL.string(), m_encoding));
+    m_decodedURL = canonicalize(m_documentURL.string(), TruncationStyle::None);
     if (m_decodedURL.find(isRequiredForInjection) == notFound)
         m_decodedURL = String();
 
@@ -307,7 +353,7 @@ void XSSAuditor::init(Document* document, XSSAuditorDelegate* auditorDelegate)
         if (httpBody && !httpBody->isEmpty()) {
             httpBodyAsString = httpBody->flattenToString();
             if (!httpBodyAsString.isEmpty()) {
-                m_decodedHTTPBody = canonicalize(fullyDecodeString(httpBodyAsString, m_encoding));
+                m_decodedHTTPBody = canonicalize(httpBodyAsString, TruncationStyle::None);
                 if (m_decodedHTTPBody.find(isRequiredForInjection) == notFound)
                     m_decodedHTTPBody = String();
                 if (m_decodedHTTPBody.length() >= minimumLengthForSuffixTree)
@@ -389,7 +435,7 @@ void XSSAuditor::filterEndToken(const FilterTokenRequest& request)
 bool XSSAuditor::filterCharacterToken(const FilterTokenRequest& request)
 {
     ASSERT(m_scriptTagNestingLevel);
-    if (m_wasScriptTagFoundInRequest && isContainedInRequest(decodedSnippetForJavaScript(request))) {
+    if (m_wasScriptTagFoundInRequest && isContainedInRequest(canonicalizedSnippetForJavaScript(request))) {
         request.token.clear();
         LChar space = ' ';
         request.token.appendToCharacter(space); // Technically, character tokens can't be empty.
@@ -403,12 +449,12 @@ bool XSSAuditor::filterScriptToken(const FilterTokenRequest& request)
     ASSERT(request.token.type() == HTMLToken::StartTag);
     ASSERT(hasName(request.token, scriptTag));
 
-    m_wasScriptTagFoundInRequest = isContainedInRequest(decodedSnippetForName(request));
+    m_wasScriptTagFoundInRequest = isContainedInRequest(canonicalizedSnippetForTagName(request));
 
     bool didBlockScript = false;
     if (m_wasScriptTagFoundInRequest) {
-        didBlockScript |= eraseAttributeIfInjected(request, srcAttr, blankURL().string(), SrcLikeAttribute);
-        didBlockScript |= eraseAttributeIfInjected(request, XLinkNames::hrefAttr, blankURL().string(), SrcLikeAttribute);
+        didBlockScript |= eraseAttributeIfInjected(request, srcAttr, blankURL().string(), TruncationStyle::SrcLikeAttribute);
+        didBlockScript |= eraseAttributeIfInjected(request, XLinkNames::hrefAttr, blankURL().string(), TruncationStyle::SrcLikeAttribute);
     }
 
     return didBlockScript;
@@ -420,8 +466,8 @@ bool XSSAuditor::filterObjectToken(const FilterTokenRequest& request)
     ASSERT(hasName(request.token, objectTag));
 
     bool didBlockScript = false;
-    if (isContainedInRequest(decodedSnippetForName(request))) {
-        didBlockScript |= eraseAttributeIfInjected(request, dataAttr, blankURL().string(), SrcLikeAttribute);
+    if (isContainedInRequest(canonicalizedSnippetForTagName(request))) {
+        didBlockScript |= eraseAttributeIfInjected(request, dataAttr, blankURL().string(), TruncationStyle::SrcLikeAttribute);
         didBlockScript |= eraseAttributeIfInjected(request, typeAttr);
         didBlockScript |= eraseAttributeIfInjected(request, classidAttr);
     }
@@ -441,7 +487,7 @@ bool XSSAuditor::filterParamToken(const FilterTokenRequest& request)
     if (!HTMLParamElement::isURLParameter(String(nameAttribute.value)))
         return false;
 
-    return eraseAttributeIfInjected(request, valueAttr, blankURL().string(), SrcLikeAttribute);
+    return eraseAttributeIfInjected(request, valueAttr, blankURL().string(), TruncationStyle::SrcLikeAttribute);
 }
 
 bool XSSAuditor::filterEmbedToken(const FilterTokenRequest& request)
@@ -450,9 +496,9 @@ bool XSSAuditor::filterEmbedToken(const FilterTokenRequest& request)
     ASSERT(hasName(request.token, embedTag));
 
     bool didBlockScript = false;
-    if (isContainedInRequest(decodedSnippetForName(request))) {
-        didBlockScript |= eraseAttributeIfInjected(request, codeAttr, String(), SrcLikeAttribute);
-        didBlockScript |= eraseAttributeIfInjected(request, srcAttr, blankURL().string(), SrcLikeAttribute);
+    if (isContainedInRequest(canonicalizedSnippetForTagName(request))) {
+        didBlockScript |= eraseAttributeIfInjected(request, codeAttr, String(), TruncationStyle::SrcLikeAttribute);
+        didBlockScript |= eraseAttributeIfInjected(request, srcAttr, blankURL().string(), TruncationStyle::SrcLikeAttribute);
         didBlockScript |= eraseAttributeIfInjected(request, typeAttr);
     }
     return didBlockScript;
@@ -464,8 +510,8 @@ bool XSSAuditor::filterAppletToken(const FilterTokenRequest& request)
     ASSERT(hasName(request.token, appletTag));
 
     bool didBlockScript = false;
-    if (isContainedInRequest(decodedSnippetForName(request))) {
-        didBlockScript |= eraseAttributeIfInjected(request, codeAttr, String(), SrcLikeAttribute);
+    if (isContainedInRequest(canonicalizedSnippetForTagName(request))) {
+        didBlockScript |= eraseAttributeIfInjected(request, codeAttr, String(), TruncationStyle::SrcLikeAttribute);
         didBlockScript |= eraseAttributeIfInjected(request, objectAttr);
     }
     return didBlockScript;
@@ -476,9 +522,9 @@ bool XSSAuditor::filterFrameToken(const FilterTokenRequest& request)
     ASSERT(request.token.type() == HTMLToken::StartTag);
     ASSERT(hasName(request.token, iframeTag) || hasName(request.token, frameTag));
 
-    bool didBlockScript = eraseAttributeIfInjected(request, srcdocAttr, String(), ScriptLikeAttribute);
-    if (isContainedInRequest(decodedSnippetForName(request)))
-        didBlockScript |= eraseAttributeIfInjected(request, srcAttr, String(), SrcLikeAttribute);
+    bool didBlockScript = eraseAttributeIfInjected(request, srcdocAttr, String(), TruncationStyle::ScriptLikeAttribute);
+    if (isContainedInRequest(canonicalizedSnippetForTagName(request)))
+        didBlockScript |= eraseAttributeIfInjected(request, srcAttr, String(), TruncationStyle::SrcLikeAttribute);
 
     return didBlockScript;
 }
@@ -512,7 +558,7 @@ bool XSSAuditor::filterInputToken(const FilterTokenRequest& request)
     ASSERT(request.token.type() == HTMLToken::StartTag);
     ASSERT(hasName(request.token, inputTag));
 
-    return eraseAttributeIfInjected(request, formactionAttr, blankURL().string(), SrcLikeAttribute);
+    return eraseAttributeIfInjected(request, formactionAttr, blankURL().string(), TruncationStyle::SrcLikeAttribute);
 }
 
 bool XSSAuditor::filterButtonToken(const FilterTokenRequest& request)
@@ -520,7 +566,7 @@ bool XSSAuditor::filterButtonToken(const FilterTokenRequest& request)
     ASSERT(request.token.type() == HTMLToken::StartTag);
     ASSERT(hasName(request.token, buttonTag));
 
-    return eraseAttributeIfInjected(request, formactionAttr, blankURL().string(), SrcLikeAttribute);
+    return eraseAttributeIfInjected(request, formactionAttr, blankURL().string(), TruncationStyle::SrcLikeAttribute);
 }
 
 bool XSSAuditor::eraseDangerousAttributesIfInjected(const FilterTokenRequest& request)
@@ -536,7 +582,7 @@ bool XSSAuditor::eraseDangerousAttributesIfInjected(const FilterTokenRequest& re
         bool valueContainsJavaScriptURL = (!isInlineEventHandler && protocolIsJavaScript(strippedValue)) || (isSemicolonSeparatedAttribute(attribute) && semicolonSeparatedValueContainsJavaScriptURL(strippedValue));
         if (!isInlineEventHandler && !valueContainsJavaScriptURL)
             continue;
-        if (!isContainedInRequest(decodedSnippetForAttribute(request, attribute, ScriptLikeAttribute)))
+        if (!isContainedInRequest(canonicalize(snippetFromAttribute(request, attribute), TruncationStyle::ScriptLikeAttribute)))
             continue;
         request.token.eraseValueOfAttribute(i);
         if (valueContainsJavaScriptURL)
@@ -546,94 +592,59 @@ bool XSSAuditor::eraseDangerousAttributesIfInjected(const FilterTokenRequest& re
     return didBlockScript;
 }
 
-bool XSSAuditor::eraseAttributeIfInjected(const FilterTokenRequest& request, const QualifiedName& attributeName, const String& replacementValue, AttributeKind treatment)
+bool XSSAuditor::eraseAttributeIfInjected(const FilterTokenRequest& request, const QualifiedName& attributeName, const String& replacementValue, TruncationStyle truncationStyle)
 {
     size_t indexOfAttribute = 0;
-    if (findAttributeWithName(request.token, attributeName, indexOfAttribute)) {
-        const HTMLToken::Attribute& attribute = request.token.attributes().at(indexOfAttribute);
-        if (isContainedInRequest(decodedSnippetForAttribute(request, attribute, treatment))) {
-            if (threadSafeMatch(attributeName, srcAttr) && isLikelySafeResource(String(attribute.value)))
-                return false;
-            if (threadSafeMatch(attributeName, http_equivAttr) && !isDangerousHTTPEquiv(String(attribute.value)))
-                return false;
-            request.token.eraseValueOfAttribute(indexOfAttribute);
-            if (!replacementValue.isEmpty())
-                request.token.appendToAttributeValue(indexOfAttribute, replacementValue);
-            return true;
-        }
+    if (!findAttributeWithName(request.token, attributeName, indexOfAttribute))
+        return false;
+
+    const HTMLToken::Attribute& attribute = request.token.attributes().at(indexOfAttribute);
+    if (!isContainedInRequest(canonicalize(snippetFromAttribute(request, attribute), truncationStyle)))
+        return false;
+
+    if (threadSafeMatch(attributeName, srcAttr)) {
+        if (isLikelySafeResource(String(attribute.value)))
+            return false;
+    } else if (threadSafeMatch(attributeName, http_equivAttr)) {
+        if (!isDangerousHTTPEquiv(String(attribute.value)))
+            return false;
     }
-    return false;
+
+    request.token.eraseValueOfAttribute(indexOfAttribute);
+    if (!replacementValue.isEmpty())
+        request.token.appendToAttributeValue(indexOfAttribute, replacementValue);
+    return true;
 }
 
-String XSSAuditor::decodedSnippetForName(const FilterTokenRequest& request)
+String XSSAuditor::canonicalizedSnippetForTagName(const FilterTokenRequest& request)
 {
     // Grab a fixed number of characters equal to the length of the token's name plus one (to account for the "<").
-    return canonicalize(fullyDecodeString(request.sourceTracker.source(request.token), m_encoding).substring(0, request.token.name().size() + 1));
+    return canonicalize(request.sourceTracker.source(request.token).substring(0, request.token.name().size() + 1), TruncationStyle::None);
 }
 
-String XSSAuditor::decodedSnippetForAttribute(const FilterTokenRequest& request, const HTMLToken::Attribute& attribute, AttributeKind treatment)
+String XSSAuditor::snippetFromAttribute(const FilterTokenRequest& request, const HTMLToken::Attribute& attribute)
 {
     // The range doesn't include the character which terminates the value. So,
     // for an input of |name="value"|, the snippet is |name="value|. For an
     // unquoted input of |name=value |, the snippet is |name=value|.
     // FIXME: We should grab one character before the name also.
-    unsigned start = attribute.startOffset;
-    unsigned end = attribute.endOffset;
-
-    // We defer canonicalizing the decoded string here to preserve embedded slashes (if any) that
-    // may lead us to truncate the string.
-    String decodedSnippet = fullyDecodeString(request.sourceTracker.source(request.token, start, end), m_encoding);
-    decodedSnippet.truncate(kMaximumFragmentLengthTarget);
-    if (treatment == SrcLikeAttribute) {
-        int slashCount = 0;
-        bool commaSeen = false;
-        // In HTTP URLs, characters following the first ?, #, or third slash may come from 
-        // the page itself and can be merely ignored by an attacker's server when a remote
-        // script or script-like resource is requested. In DATA URLS, the payload starts at
-        // the first comma, and the the first /*, //, or <!-- may introduce a comment. Characters
-        // following this may come from the page itself and may be ignored when the script is
-        // executed. For simplicity, we don't differentiate based on URL scheme, and stop at
-        // the first # or ?, the third slash, or the first slash or < once a comma is seen.
-        for (size_t currentLength = 0; currentLength < decodedSnippet.length(); ++currentLength) {
-            UChar currentChar = decodedSnippet[currentLength];
-            if (currentChar == '?'
-                || currentChar == '#'
-                || ((currentChar == '/' || currentChar == '\\') && (commaSeen || ++slashCount > 2))
-                || (currentChar == '<' && commaSeen)) {
-                decodedSnippet.truncate(currentLength);
-                break;
-            }
-            if (currentChar == ',')
-                commaSeen = true;
-        }
-    } else if (treatment == ScriptLikeAttribute) {
-        // Beware of trailing characters which came from the page itself, not the 
-        // injected vector. Excluding the terminating character covers common cases
-        // where the page immediately ends the attribute, but doesn't cover more
-        // complex cases where there is other page data following the injection. 
-        // Generally, these won't parse as javascript, so the injected vector
-        // typically excludes them from consideration via a single-line comment or
-        // by enclosing them in a string literal terminated later by the page's own
-        // closing punctuation. Since the snippet has not been parsed, the vector
-        // may also try to introduce these via entities. As a result, we'd like to
-        // stop before the first "//", the first <!--, the first entity, or the first
-        // quote not immediately following the first equals sign (taking whitespace
-        // into consideration). To keep things simpler, we don't try to distinguish
-        // between entity-introducing amperands vs. other uses, nor do we bother to
-        // check for a second slash for a comment, nor do we bother to check for
-        // !-- following a less-than sign. We stop instead on any ampersand
-        // slash, or less-than sign.
-        size_t position = 0;
-        if ((position = decodedSnippet.find('=')) != notFound
-            && (position = decodedSnippet.find(isNotHTMLSpace, position + 1)) != notFound
-            && (position = decodedSnippet.find(isTerminatingCharacter, isHTMLQuote(decodedSnippet[position]) ? position + 1 : position)) != notFound) {
-            decodedSnippet.truncate(position);
-        }
+    return request.sourceTracker.source(request.token, attribute.startOffset, attribute.endOffset);
+}
+
+String XSSAuditor::canonicalize(const String& snippet, TruncationStyle truncationStyle)
+{
+    String decodedSnippet = fullyDecodeString(snippet, m_encoding);
+    if (truncationStyle != TruncationStyle::None) {
+        decodedSnippet.truncate(kMaximumFragmentLengthTarget);
+        if (truncationStyle == TruncationStyle::SrcLikeAttribute)
+            truncateForSrcLikeAttribute(decodedSnippet);
+        else if (truncationStyle == TruncationStyle::ScriptLikeAttribute)
+            truncateForScriptLikeAttribute(decodedSnippet);
     }
-    return canonicalize(decodedSnippet);
+    return decodedSnippet.removeCharacters(&isNonCanonicalCharacter);
 }
 
-String XSSAuditor::decodedSnippetForJavaScript(const FilterTokenRequest& request)
+String XSSAuditor::canonicalizedSnippetForJavaScript(const FilterTokenRequest& request)
 {
     String string = request.sourceTracker.source(request.token);
     size_t startPosition = 0;
@@ -687,7 +698,6 @@ String XSSAuditor::decodedSnippetForJavaScript(const FilterTokenRequest& request
                 foundPosition = lastNonSpacePosition;
                 break;
             }
-
             if (foundPosition > startPosition + kMaximumFragmentLengthTarget) {
                 // After hitting the length target, we can only stop at a point where we know we are
                 // not in the middle of a %-escape sequence. For the sake of simplicity, approximate
@@ -701,7 +711,7 @@ String XSSAuditor::decodedSnippetForJavaScript(const FilterTokenRequest& request
                 lastNonSpacePosition = foundPosition;
         }
 
-        result = canonicalize(fullyDecodeString(string.substring(startPosition, foundPosition - startPosition), m_encoding));
+        result = canonicalize(string.substring(startPosition, foundPosition - startPosition), TruncationStyle::None);
         startPosition = foundPosition + 1;
     }
     return result;
index 0c516d9..f47b67c 100644 (file)
@@ -70,7 +70,8 @@ private:
         Initialized
     };
 
-    enum AttributeKind {
+    enum class TruncationStyle {
+        None,
         NormalAttribute,
         SrcLikeAttribute,
         ScriptLikeAttribute
@@ -92,12 +93,12 @@ private:
     bool filterButtonToken(const FilterTokenRequest&);
 
     bool eraseDangerousAttributesIfInjected(const FilterTokenRequest&);
-    bool eraseAttributeIfInjected(const FilterTokenRequest&, const QualifiedName&, const String& replacementValue = String(), AttributeKind treatment = NormalAttribute);
+    bool eraseAttributeIfInjected(const FilterTokenRequest&, const QualifiedName&, const String& replacementValue = String(), TruncationStyle = TruncationStyle::NormalAttribute);
 
-    String decodedSnippetForToken(const HTMLToken&);
-    String decodedSnippetForName(const FilterTokenRequest&);
-    String decodedSnippetForAttribute(const FilterTokenRequest&, const HTMLToken::Attribute&, AttributeKind treatment = NormalAttribute);
-    String decodedSnippetForJavaScript(const FilterTokenRequest&);
+    String canonicalizedSnippetForTagName(const FilterTokenRequest&);
+    String canonicalizedSnippetForJavaScript(const FilterTokenRequest&);
+    String snippetFromAttribute(const FilterTokenRequest&, const HTMLToken::Attribute&);
+    String canonicalize(const String&, TruncationStyle);
 
     bool isContainedInRequest(const String&);
     bool isLikelySafeResource(const String& url);