Update Resource Load Statistics
authorwilander@apple.com <wilander@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 6 Oct 2016 17:40:12 +0000 (17:40 +0000)
committerwilander@apple.com <wilander@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Thu, 6 Oct 2016 17:40:12 +0000 (17:40 +0000)
https://bugs.webkit.org/show_bug.cgi?id=162811

Reviewed by Alex Christensen.

Source/WebCore:

No new tests. The counting is based on top privately owned domains
which currently is not supported by layout tests nor API tests.

* Modules/websockets/WebSocket.cpp:
(WebCore::WebSocket::connect):
    Now captures statistics for web sockets too.
* loader/FrameLoader.cpp:
(WebCore::FrameLoader::loadResourceSynchronously):
* loader/ResourceLoadObserver.cpp:
    Now captures statistics for synchronous XHR too.
(WebCore::is3xxRedirect):
    Convenience function.
(WebCore::ResourceLoadObserver::shouldLog):
    Convenience function.
(WebCore::ResourceLoadObserver::logFrameNavigation):
    Updated to make use of new convenience functions.
(WebCore::ResourceLoadObserver::logSubresourceLoading):
    Updated to make use of new convenience functions.
(WebCore::ResourceLoadObserver::logWebSocketLoading):
    Added.
(WebCore::ResourceLoadObserver::logUserInteraction):
    Updated to make use of new convenience functions.
(WebCore::ResourceLoadObserver::primaryDomain):
    Now makes use of the Public Suffix list.
    Removed old custom parsing of primary domain.
* loader/ResourceLoadObserver.h:
* loader/ResourceLoadStatisticsStore.cpp:
(WebCore::ResourceLoadStatisticsStore::prevalentResourceDomainsWithoutUserInteraction):
    Convenience function.
(WebCore::ResourceLoadStatisticsStore::processStatistics): Deleted.
* loader/ResourceLoadStatisticsStore.h:
* loader/SubresourceLoader.cpp:
(WebCore::SubresourceLoader::willSendRequestInternal):
    Moved logging call higher up and added a check for whether we
    are loading the main resource. The reason for moving it up is
    to capture the request before some data may be cleared out in
    redirect handling. We also want to capture failed CORS requests
    since they are sent and then cancelled on the way back.

Source/WebKit2:

* UIProcess/WebResourceLoadStatisticsStore.cpp:
(WebKit::WebResourceLoadStatisticsStore::hasPrevalentResourceCharacteristics):
    Switched to vector-based classification.
(WebKit::WebResourceLoadStatisticsStore::classifyResource):
    Simplified logic and moved the split between has and has
    no user interaction into ResourceLoadStatisticsStore.
(WebKit::WebResourceLoadStatisticsStore::clearDataRecords):
    Added.
(WebKit::WebResourceLoadStatisticsStore::resourceLoadStatisticsUpdated):
    Updated to make use of the new functions.
(WebKit::WebResourceLoadStatisticsStore::persistentStoragePath):
    Removed stray whitespace.
(WebKit::WebResourceLoadStatisticsStore::writeEncoderToDisk):
    Removed stray whitespace.
(WebKit::WebResourceLoadStatisticsStore::createDecoderFromDisk):
    Removed stray whitespace.
(WebKit::hasPrevalentResourceCharacteristics): Deleted.
(WebKit::classifyPrevalentResources): Deleted.
* UIProcess/WebResourceLoadStatisticsStore.h:
    Added member variables for clearing of data records.

Tools:

* TestWebKitAPI/Tests/mac/PublicSuffix.mm:
    Change from USE(PUBLIC_SUFFIX_LIST) to ENABLE(PUBLIC_SUFFIX_LIST)

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@206869 268f45cc-cd09-0410-ab3c-d52691b4dbfc

13 files changed:
Source/WebCore/ChangeLog
Source/WebCore/Modules/websockets/WebSocket.cpp
Source/WebCore/loader/FrameLoader.cpp
Source/WebCore/loader/ResourceLoadObserver.cpp
Source/WebCore/loader/ResourceLoadObserver.h
Source/WebCore/loader/ResourceLoadStatisticsStore.cpp
Source/WebCore/loader/ResourceLoadStatisticsStore.h
Source/WebCore/loader/SubresourceLoader.cpp
Source/WebKit2/ChangeLog
Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp
Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.h
Tools/ChangeLog
Tools/TestWebKitAPI/Tests/mac/PublicSuffix.mm

index 2a06d18..d1c728f 100644 (file)
@@ -1,3 +1,49 @@
+2016-10-06  John Wilander  <wilander@apple.com>
+
+        Update Resource Load Statistics
+        https://bugs.webkit.org/show_bug.cgi?id=162811
+
+        Reviewed by Alex Christensen.
+
+        No new tests. The counting is based on top privately owned domains
+        which currently is not supported by layout tests nor API tests.
+
+        * Modules/websockets/WebSocket.cpp:
+        (WebCore::WebSocket::connect):
+            Now captures statistics for web sockets too.
+        * loader/FrameLoader.cpp:
+        (WebCore::FrameLoader::loadResourceSynchronously):
+        * loader/ResourceLoadObserver.cpp:
+            Now captures statistics for synchronous XHR too.
+        (WebCore::is3xxRedirect):
+            Convenience function.
+        (WebCore::ResourceLoadObserver::shouldLog):
+            Convenience function.
+        (WebCore::ResourceLoadObserver::logFrameNavigation):
+            Updated to make use of new convenience functions.
+        (WebCore::ResourceLoadObserver::logSubresourceLoading):
+            Updated to make use of new convenience functions.
+        (WebCore::ResourceLoadObserver::logWebSocketLoading):
+            Added.
+        (WebCore::ResourceLoadObserver::logUserInteraction):
+            Updated to make use of new convenience functions.
+        (WebCore::ResourceLoadObserver::primaryDomain):
+            Now makes use of the Public Suffix list.
+            Removed old custom parsing of primary domain.
+        * loader/ResourceLoadObserver.h:
+        * loader/ResourceLoadStatisticsStore.cpp:
+        (WebCore::ResourceLoadStatisticsStore::prevalentResourceDomainsWithoutUserInteraction):
+            Convenience function.
+        (WebCore::ResourceLoadStatisticsStore::processStatistics): Deleted.
+        * loader/ResourceLoadStatisticsStore.h:
+        * loader/SubresourceLoader.cpp:
+        (WebCore::SubresourceLoader::willSendRequestInternal):
+            Moved logging call higher up and added a check for whether we
+            are loading the main resource. The reason for moving it up is
+            to capture the request before some data may be cleared out in
+            redirect handling. We also want to capture failed CORS requests
+            since they are sent and then cancelled on the way back.
+
 2016-10-06  Adam Bergkvist  <adam.bergkvist@ericsson.com>
 
         WebRTC: Add support for the iceconnectionstatechange event in MediaEndpointPeerConnection
index 02495be..7fbaf1f 100644 (file)
@@ -47,6 +47,7 @@
 #include "Frame.h"
 #include "Logging.h"
 #include "MessageEvent.h"
+#include "ResourceLoadObserver.h"
 #include "ScriptController.h"
 #include "ScriptExecutionContext.h"
 #include "SecurityOrigin.h"
@@ -318,7 +319,8 @@ void WebSocket::connect(const String& url, const Vector<String>& protocols, Exce
             });
 #endif
             return;
-        }
+        } else
+            ResourceLoadObserver::sharedObserver().logWebSocketLoading(document.frame(), m_url);
     }
 
     String protocolString;
index 51e79e0..81ed14b 100644 (file)
@@ -97,6 +97,7 @@
 #include "ProgressTracker.h"
 #include "ResourceHandle.h"
 #include "ResourceLoadInfo.h"
+#include "ResourceLoadObserver.h"
 #include "ResourceRequest.h"
 #include "SVGDocument.h"
 #include "SVGLocatable.h"
@@ -2774,6 +2775,7 @@ unsigned long FrameLoader::loadResourceSynchronously(const ResourceRequest& requ
             platformStrategies()->loaderStrategy()->loadResourceSynchronously(networkingContext(), identifier, newRequest, storedCredentials, clientCredentialPolicy, error, response, buffer);
             data = SharedBuffer::adoptVector(buffer);
             documentLoader()->applicationCacheHost()->maybeLoadFallbackSynchronously(newRequest, error, response, data);
+            ResourceLoadObserver::sharedObserver().logSubresourceLoading(&m_frame, newRequest, response);
         }
     }
     notifier().sendRemainingDelegateMessages(m_documentLoader.get(), identifier, request, response, data ? data->data() : nullptr, data ? data->size() : 0, -1, error);
index 4023b1d..bcc43ca 100644 (file)
@@ -33,6 +33,7 @@
 #include "NetworkStorageSession.h"
 #include "Page.h"
 #include "PlatformStrategies.h"
+#include "PublicSuffix.h"
 #include "ResourceLoadStatistics.h"
 #include "ResourceLoadStatisticsStore.h"
 #include "ResourceRequest.h"
@@ -57,23 +58,31 @@ void ResourceLoadObserver::setStatisticsStore(Ref<ResourceLoadStatisticsStore>&&
     m_store = WTFMove(store);
 }
 
-void ResourceLoadObserver::logFrameNavigation(const Frame& frame, const Frame& topFrame, const ResourceRequest& newRequest, const ResourceResponse& redirectResponse)
+static inline bool is3xxRedirect(const ResourceResponse& response)
 {
-    if (!Settings::resourceLoadStatisticsEnabled())
-        return;
+    return response.httpStatusCode() >= 300 && response.httpStatusCode() <= 399;
+}
 
-    if (!m_store)
-        return;
+bool ResourceLoadObserver::shouldLog(Page* page)
+{
+    // FIXME: Err on the safe side until we have sorted out what to do in worker contexts
+    if (!page)
+        return false;
+    return Settings::resourceLoadStatisticsEnabled()
+        && !page->usesEphemeralSession()
+        && m_store;
+}
 
+void ResourceLoadObserver::logFrameNavigation(const Frame& frame, const Frame& topFrame, const ResourceRequest& newRequest, const ResourceResponse& redirectResponse)
+{
     ASSERT(frame.document());
     ASSERT(topFrame.document());
     ASSERT(topFrame.page());
-
-    bool needPrivacy = topFrame.page() ? topFrame.page()->usesEphemeralSession() : false;
-    if (needPrivacy)
+    
+    if (!shouldLog(topFrame.page()))
         return;
 
-    bool isRedirect = !redirectResponse.isNull();
+    bool isRedirect = is3xxRedirect(redirectResponse);
     bool isMainFrame = frame.isMainFrame();
     const URL& sourceURL = frame.document()->url();
     const URL& targetURL = newRequest.url();
@@ -148,17 +157,12 @@ void ResourceLoadObserver::logFrameNavigation(const Frame& frame, const Frame& t
     
 void ResourceLoadObserver::logSubresourceLoading(const Frame* frame, const ResourceRequest& newRequest, const ResourceResponse& redirectResponse)
 {
-    if (!Settings::resourceLoadStatisticsEnabled())
-        return;
+    ASSERT(frame->page());
 
-    if (!m_store)
+    if (!shouldLog(frame->page()))
         return;
 
-    bool needPrivacy = (frame && frame->page()) ? frame->page()->usesEphemeralSession() : false;
-    if (needPrivacy)
-        return;
-    
-    bool isRedirect = !redirectResponse.isNull();
+    bool isRedirect = is3xxRedirect(redirectResponse);
     const URL& sourceURL = redirectResponse.url();
     const URL& targetURL = newRequest.url();
     const URL& mainFrameURL = frame ? frame->mainFrame().document()->url() : URL();
@@ -166,14 +170,17 @@ void ResourceLoadObserver::logSubresourceLoading(const Frame* frame, const Resou
     auto targetHost = targetURL.host();
     auto mainFrameHost = mainFrameURL.host();
 
-    if (targetHost.isEmpty() || mainFrameHost.isEmpty() || targetHost == mainFrameHost || targetHost == sourceURL.host())
+    if (targetHost.isEmpty()
+        || mainFrameHost.isEmpty()
+        || targetHost == mainFrameHost
+        || (isRedirect && targetHost == sourceURL.host()))
         return;
 
     auto targetPrimaryDomain = primaryDomain(targetURL);
     auto mainFramePrimaryDomain = primaryDomain(mainFrameURL);
     auto sourcePrimaryDomain = primaryDomain(sourceURL);
     
-    if (targetPrimaryDomain == mainFramePrimaryDomain || targetPrimaryDomain == sourcePrimaryDomain)
+    if (targetPrimaryDomain == mainFramePrimaryDomain || (isRedirect && targetPrimaryDomain == sourcePrimaryDomain))
         return;
 
     auto& targetStatistics = m_store->ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);
@@ -210,62 +217,82 @@ void ResourceLoadObserver::logSubresourceLoading(const Frame* frame, const Resou
     
     m_store->fireDataModificationHandler();
 }
+
+void ResourceLoadObserver::logWebSocketLoading(const Frame* frame, const URL& targetURL)
+{
+    // FIXME: Web sockets can run in detached frames. Decide how to count such connections.
+    // See LayoutTests/http/tests/websocket/construct-in-detached-frame.html
+    if (!frame)
+        return;
+
+    if (!shouldLog(frame->page()))
+        return;
+
+    const URL& mainFrameURL = frame->mainFrame().document()->url();
+
+    auto targetHost = targetURL.host();
+    auto mainFrameHost = mainFrameURL.host();
+    
+    if (targetHost.isEmpty()
+        || mainFrameHost.isEmpty()
+        || targetHost == mainFrameHost)
+        return;
+    
+    auto targetPrimaryDomain = primaryDomain(targetURL);
+    auto mainFramePrimaryDomain = primaryDomain(mainFrameURL);
+    
+    if (targetPrimaryDomain == mainFramePrimaryDomain)
+        return;
+
+    auto& targetStatistics = m_store->ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);
+    
+    auto mainFrameOrigin = SecurityOrigin::create(mainFrameURL);
+    targetStatistics.subresourceUnderTopFrameOrigins.add(mainFramePrimaryDomain);
+    
+    ++targetStatistics.subresourceHasBeenSubresourceCount;
     
+    auto totalVisited = std::max(m_originsVisitedMap.size(), 1U);
+    
+    targetStatistics.subresourceHasBeenSubresourceCountDividedByTotalNumberOfOriginsVisited = static_cast<double>(targetStatistics.subresourceHasBeenSubresourceCount) / totalVisited;
+
+    m_store->fireDataModificationHandler();
+}
+
 void ResourceLoadObserver::logUserInteraction(const Document& document)
 {
-    if (!Settings::resourceLoadStatisticsEnabled())
-        return;
+    ASSERT(document.page());
 
-    if (!m_store)
+    if (!shouldLog(document.page()))
         return;
 
-    bool needPrivacy = document.page() ? document.page()->usesEphemeralSession() : false;
-    if (needPrivacy)
+    auto& url = document.url();
+
+    if (url.isBlankURL() || url.isEmpty())
         return;
 
-    auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primaryDomain(document.url()));
+    auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primaryDomain(url));
     statistics.hadUserInteraction = true;
     m_store->fireDataModificationHandler();
 }
     
 String ResourceLoadObserver::primaryDomain(const URL& url)
 {
-    String host = url.host();
-    Vector<String> hostSplitOnDot;
-    
-    host.split('.', false, hostSplitOnDot);
-
     String primaryDomain;
-    if (host.isNull())
+    String host = url.host();
+    if (host.isNull() || host.isEmpty())
         primaryDomain = "nullOrigin";
-    else if (hostSplitOnDot.size() < 3)
-        primaryDomain = host;
+#if ENABLE(PUBLIC_SUFFIX_LIST)
     else {
-        // Skip TLD and then up to two domains smaller than 4 characters
-        int primaryDomainCutOffIndex = hostSplitOnDot.size() - 2;
-
-        // Start with TLD as a given part
-        size_t numberOfParts = 1;
-        for (; primaryDomainCutOffIndex >= 0; --primaryDomainCutOffIndex) {
-            ++numberOfParts;
-
-            // We have either a domain part that's 4 chars or longer, or 3 domain parts including TLD
-            if (hostSplitOnDot.at(primaryDomainCutOffIndex).length() >= 4 || numberOfParts >= 3)
-                break;
-        }
-
-        if (primaryDomainCutOffIndex < 0)
+        primaryDomain = topPrivatelyControlledDomain(host);
+        // We will have an empty string here if there is no TLD.
+        // Use the host in such case.
+        if (primaryDomain.isEmpty())
             primaryDomain = host;
-        else {
-            StringBuilder builder;
-            builder.append(hostSplitOnDot.at(primaryDomainCutOffIndex));
-            for (size_t j = primaryDomainCutOffIndex + 1; j < hostSplitOnDot.size(); ++j) {
-                builder.append('.');
-                builder.append(hostSplitOnDot[j]);
-            }
-            primaryDomain = builder.toString();
-        }
     }
+#else
+    else
+        primaryDomain = host;
+#endif
 
     return primaryDomain;
 }
index 50fc73b..bc3ae4c 100644 (file)
@@ -34,6 +34,7 @@ namespace WebCore {
 
 class Document;
 class Frame;
+class Page;
 class ResourceRequest;
 class ResourceResponse;
 class URL;
@@ -47,6 +48,7 @@ public:
     
     void logFrameNavigation(const Frame& frame, const Frame& topFrame, const ResourceRequest& newRequest, const ResourceResponse& redirectResponse);
     void logSubresourceLoading(const Frame*, const ResourceRequest& newRequest, const ResourceResponse& redirectResponse);
+    void logWebSocketLoading(const Frame*, const URL&);
 
     void logUserInteraction(const Document&);
     
@@ -55,6 +57,7 @@ public:
     WEBCORE_EXPORT String statisticsForOrigin(const String&);
 
 private:
+    bool shouldLog(Page*);
     static String primaryDomain(const URL&);
 
     RefPtr<ResourceLoadStatisticsStore> m_store;
index 2306edb..151aad3 100644 (file)
@@ -154,4 +154,14 @@ void ResourceLoadStatisticsStore::processStatistics(std::function<void(ResourceL
     for (auto& resourceStatistic : m_resourceStatisticsMap.values())
         processFunction(resourceStatistic);
 }
+
+Vector<String> ResourceLoadStatisticsStore::prevalentResourceDomainsWithoutUserInteraction()
+{
+    Vector<String> prevalentResources;
+    for (auto& resourceStatistic : m_resourceStatisticsMap.values()) {
+        if (resourceStatistic.isPrevalentResource && !resourceStatistic.hadUserInteraction)
+            prevalentResources.append(resourceStatistic.highLevelDomain);
+    }
+    return prevalentResources;
+}
 }
index ac96966..c77b4c9 100644 (file)
@@ -63,6 +63,8 @@ public:
 
     WEBCORE_EXPORT bool hasEnoughDataForStatisticsProcessing();
     WEBCORE_EXPORT void processStatistics(std::function<void(ResourceLoadStatistics&)>&&);
+
+    WEBCORE_EXPORT Vector<String> prevalentResourceDomainsWithoutUserInteraction();
 private:
     ResourceLoadStatisticsStore() = default;
 
index b6e83d0..c621293 100644 (file)
@@ -173,6 +173,9 @@ void SubresourceLoader::willSendRequestInternal(ResourceRequest& newRequest, con
         return;
     }
 
+    if (newRequest.requester() != ResourceRequestBase::Requester::Main)
+        ResourceLoadObserver::sharedObserver().logSubresourceLoading(m_frame.get(), newRequest, redirectResponse);
+
     ASSERT(!newRequest.isNull());
     if (!redirectResponse.isNull()) {
         if (options().redirect != FetchOptions::Redirect::Follow) {
@@ -228,8 +231,6 @@ void SubresourceLoader::willSendRequestInternal(ResourceRequest& newRequest, con
     ResourceLoader::willSendRequestInternal(newRequest, redirectResponse);
     if (newRequest.isNull())
         cancel();
-
-    ResourceLoadObserver::sharedObserver().logSubresourceLoading(m_frame.get(), newRequest, redirectResponse);
 }
 
 void SubresourceLoader::didSendData(unsigned long long bytesSent, unsigned long long totalBytesToBeSent)
index fb708ac..48705e3 100644 (file)
@@ -1,3 +1,31 @@
+2016-10-06  John Wilander  <wilander@apple.com>
+
+        Update Resource Load Statistics
+        https://bugs.webkit.org/show_bug.cgi?id=162811
+
+        Reviewed by Alex Christensen.
+
+        * UIProcess/WebResourceLoadStatisticsStore.cpp:
+        (WebKit::WebResourceLoadStatisticsStore::hasPrevalentResourceCharacteristics):
+            Switched to vector-based classification.
+        (WebKit::WebResourceLoadStatisticsStore::classifyResource):
+            Simplified logic and moved the split between has and has
+            no user interaction into ResourceLoadStatisticsStore.
+        (WebKit::WebResourceLoadStatisticsStore::clearDataRecords):
+            Added.
+        (WebKit::WebResourceLoadStatisticsStore::resourceLoadStatisticsUpdated):
+            Updated to make use of the new functions.
+        (WebKit::WebResourceLoadStatisticsStore::persistentStoragePath):
+            Removed stray whitespace.
+        (WebKit::WebResourceLoadStatisticsStore::writeEncoderToDisk):
+            Removed stray whitespace.
+        (WebKit::WebResourceLoadStatisticsStore::createDecoderFromDisk):
+            Removed stray whitespace.
+        (WebKit::hasPrevalentResourceCharacteristics): Deleted.
+        (WebKit::classifyPrevalentResources): Deleted.
+        * UIProcess/WebResourceLoadStatisticsStore.h:
+            Added member variables for clearing of data records.
+
 2016-10-06  Youenn Fablet  <youenn@apple.com>
 
         [WK2] 304 revalidation on the network process does not update the validated response
index 96db5c3..eecae1b 100644 (file)
 #include "config.h"
 #include "WebResourceLoadStatisticsStore.h"
 
+#include "APIWebsiteDataStore.h"
 #include "WebProcessMessages.h"
 #include "WebProcessPool.h"
 #include "WebResourceLoadStatisticsStoreMessages.h"
+#include "WebsiteDataFetchOption.h"
+#include "WebsiteDataType.h"
 #include <WebCore/KeyedCoding.h>
 #include <WebCore/ResourceLoadStatistics.h>
+#include <wtf/CurrentTime.h>
+#include <wtf/MainThread.h>
+#include <wtf/MathExtras.h>
+#include <wtf/RunLoop.h>
 #include <wtf/threads/BinarySemaphore.h>
 
 using namespace WebCore;
 
 namespace WebKit {
 
-// Sub frame classification thresholds
-static const unsigned subframeUnderTopFrameOriginsThreshold = 3;
-    
-// Subresource classification thresholds
-static const unsigned subresourceUnderTopFrameOriginsThreshold = 5;
-static const unsigned subresourceHasBeenRedirectedFromToUniqueDomainsThreshold = 3;
-static const unsigned redirectedToOtherPrevalentResourceOriginsThreshold = 2;
+static const auto numberOfSecondsBetweenClearingDataRecords = 600;
+static const auto featureVectorLengthThreshold = 3;
 
 Ref<WebResourceLoadStatisticsStore> WebResourceLoadStatisticsStore::create(const String& resourceLoadStatisticsDirectory)
 {
@@ -51,7 +53,7 @@ Ref<WebResourceLoadStatisticsStore> WebResourceLoadStatisticsStore::create(const
 }
 
 WebResourceLoadStatisticsStore::WebResourceLoadStatisticsStore(const String& resourceLoadStatisticsDirectory)
-    : m_resourceStatisticsStore(WebCore::ResourceLoadStatisticsStore::create())
+    : m_resourceStatisticsStore(ResourceLoadStatisticsStore::create())
     , m_statisticsQueue(WorkQueue::create("WebResourceLoadStatisticsStore Process Data Queue"))
     , m_storagePath(resourceLoadStatisticsDirectory)
 {
@@ -61,37 +63,103 @@ WebResourceLoadStatisticsStore::~WebResourceLoadStatisticsStore()
 {
 }
 
-static inline bool hasPrevalentResourceCharacteristics(const ResourceLoadStatistics& resourceStatistic)
+bool WebResourceLoadStatisticsStore::hasPrevalentResourceCharacteristics(const ResourceLoadStatistics& resourceStatistic)
 {
-    return resourceStatistic.subframeUnderTopFrameOrigins.size() > subframeUnderTopFrameOriginsThreshold
-        || resourceStatistic.subresourceUnderTopFrameOrigins.size() > subresourceUnderTopFrameOriginsThreshold
-        || resourceStatistic.subresourceUniqueRedirectsTo.size() > subresourceHasBeenRedirectedFromToUniqueDomainsThreshold
-        || resourceStatistic.redirectedToOtherPrevalentResourceOrigins.size() > redirectedToOtherPrevalentResourceOriginsThreshold;
+    auto subresourceUnderTopFrameOriginsCount = resourceStatistic.subresourceUnderTopFrameOrigins.size();
+    auto subresourceUniqueRedirectsToCount = resourceStatistic.subresourceUniqueRedirectsTo.size();
+    auto subframeUnderTopFrameOriginsCount = resourceStatistic.subframeUnderTopFrameOrigins.size();
+    
+    if (!subresourceUnderTopFrameOriginsCount
+        && !subresourceUniqueRedirectsToCount
+        && !subframeUnderTopFrameOriginsCount)
+        return false;
+
+    if (subresourceUnderTopFrameOriginsCount > featureVectorLengthThreshold
+        || subresourceUniqueRedirectsToCount > featureVectorLengthThreshold
+        || subframeUnderTopFrameOriginsCount > featureVectorLengthThreshold)
+        return true;
+
+    // The resource is considered prevalent if the feature vector
+    // is longer than the threshold.
+    // Vector length for n dimensions is sqrt(a^2 + (...) + n^2).
+    double vectorLength = 0;
+    vectorLength += subresourceUnderTopFrameOriginsCount * subresourceUnderTopFrameOriginsCount;
+    vectorLength += subresourceUniqueRedirectsToCount * subresourceUniqueRedirectsToCount;
+    vectorLength += subframeUnderTopFrameOriginsCount * subframeUnderTopFrameOriginsCount;
+
+    ASSERT(vectorLength > 0);
+
+    return sqrt(vectorLength) > featureVectorLengthThreshold;
 }
     
-static inline void classifyPrevalentResources(ResourceLoadStatistics& resourceStatistic, Vector<String>& prevalentResources, Vector<String>& prevalentResourcesWithUserInteraction)
+void WebResourceLoadStatisticsStore::classifyResource(ResourceLoadStatistics& resourceStatistic)
 {
-    if (resourceStatistic.isPrevalentResource || hasPrevalentResourceCharacteristics(resourceStatistic)) {
+    if (!resourceStatistic.isPrevalentResource && hasPrevalentResourceCharacteristics(resourceStatistic)) {
         resourceStatistic.isPrevalentResource = true;
-        if (resourceStatistic.hadUserInteraction)
-            prevalentResourcesWithUserInteraction.append(resourceStatistic.highLevelDomain);
-        else
-            prevalentResources.append(resourceStatistic.highLevelDomain);
     }
 }
 
+void WebResourceLoadStatisticsStore::clearDataRecords()
+{
+    if (m_dataStoreClearPending)
+        return;
+
+    Vector<String> prevalentResourceDomains = coreStore().prevalentResourceDomainsWithoutUserInteraction();
+    if (!prevalentResourceDomains.size())
+        return;
+
+    double now = currentTime();
+    if (!m_lastTimeDataRecordsWereCleared) {
+        m_lastTimeDataRecordsWereCleared = now;
+        return;
+    }
+
+    if (now < (m_lastTimeDataRecordsWereCleared + numberOfSecondsBetweenClearingDataRecords))
+        return;
+
+    m_dataStoreClearPending = true;
+    m_lastTimeDataRecordsWereCleared = now;
+
+    // Switch to the main thread to get the default website data store
+    RunLoop::main().dispatch([prevalentResourceDomains = WTFMove(prevalentResourceDomains), this] () mutable {
+        auto& websiteDataStore = API::WebsiteDataStore::defaultDataStore()->websiteDataStore();
+
+        websiteDataStore.fetchData(WebsiteDataType::Cookies, { }, [prevalentResourceDomains = WTFMove(prevalentResourceDomains), this](auto websiteDataRecords) {
+            Vector<WebsiteDataRecord> dataRecords;
+            for (auto& websiteDataRecord : websiteDataRecords) {
+                for (auto& prevalentResourceDomain : prevalentResourceDomains) {
+                    if (websiteDataRecord.displayName.endsWithIgnoringASCIICase(prevalentResourceDomain)) {
+                        auto suffixStart = websiteDataRecord.displayName.length() - prevalentResourceDomain.length();
+                        if (!suffixStart || websiteDataRecord.displayName[suffixStart - 1] == '.')
+                            dataRecords.append(websiteDataRecord);
+                    }
+                }
+            }
+
+            if (!dataRecords.size()) {
+                m_dataStoreClearPending = false;
+                return;
+            }
+
+            auto& websiteDataStore = API::WebsiteDataStore::defaultDataStore()->websiteDataStore();
+            websiteDataStore.removeData(WebsiteDataType::Cookies, { WTFMove(dataRecords) }, [this] {
+                m_dataStoreClearPending = false;
+            });
+        });
+    });
+}
+
 void WebResourceLoadStatisticsStore::resourceLoadStatisticsUpdated(const Vector<WebCore::ResourceLoadStatistics>& origins)
 {
     coreStore().mergeStatistics(origins);
 
-    Vector<String> prevalentResources, prevalentResourcesWithUserInteraction;
     if (coreStore().hasEnoughDataForStatisticsProcessing()) {
-        coreStore().processStatistics([this, &prevalentResources, &prevalentResourcesWithUserInteraction] (ResourceLoadStatistics& resourceStatistic) {
-            classifyPrevalentResources(resourceStatistic, prevalentResources, prevalentResourcesWithUserInteraction);
+        coreStore().processStatistics([this] (ResourceLoadStatistics& resourceStatistic) {
+            classifyResource(resourceStatistic);
+            clearDataRecords();
         });
     }
 
-    // FIXME: Notify individual WebProcesses of prevalent domains using the two vectors populated by the classifier. <rdar://problem/24703099>
     auto encoder = coreStore().createEncoderFromData();
     
     writeEncoderToDisk(*encoder.get(), "full_browsing_session");
@@ -152,7 +220,7 @@ String WebResourceLoadStatisticsStore::persistentStoragePath(const String& label
 {
     if (m_storagePath.isEmpty())
         return emptyString();
-    
+
     // TODO Decide what to call this file
     return pathByAppendingComponent(m_storagePath, label + "_resourceLog.plist");
 }
@@ -162,21 +230,21 @@ void WebResourceLoadStatisticsStore::writeEncoderToDisk(KeyedEncoder& encoder, c
     RefPtr<SharedBuffer> rawData = encoder.finishEncoding();
     if (!rawData)
         return;
-    
+
     String resourceLog = persistentStoragePath(label);
     if (resourceLog.isEmpty())
         return;
-    
+
     if (!m_storagePath.isEmpty())
         makeAllDirectories(m_storagePath);
-    
+
     auto handle = openFile(resourceLog, OpenForWrite);
     if (!handle)
         return;
     
     int64_t writtenBytes = writeToFile(handle, rawData->data(), rawData->size());
     closeFile(handle);
-    
+
     if (writtenBytes != static_cast<int64_t>(rawData->size()))
         WTFLogAlways("WebResourceLoadStatisticsStore: We only wrote %d out of %d bytes to disk", static_cast<unsigned>(writtenBytes), rawData->size());
 }
@@ -186,11 +254,11 @@ std::unique_ptr<KeyedDecoder> WebResourceLoadStatisticsStore::createDecoderFromD
     String resourceLog = persistentStoragePath(label);
     if (resourceLog.isEmpty())
         return nullptr;
-    
+
     RefPtr<SharedBuffer> rawData = SharedBuffer::createWithContentsOfFile(resourceLog);
     if (!rawData)
         return nullptr;
-    
+
     return KeyedDecoder::decoder(reinterpret_cast<const uint8_t*>(rawData->data()), rawData->size());
 }
 
index 057b378..9b9784d 100644 (file)
@@ -28,6 +28,7 @@
 
 #include "APIObject.h"
 #include "Connection.h"
+#include "WebsiteDataRecord.h"
 #include <WebCore/ResourceLoadStatisticsStore.h>
 #include <wtf/Vector.h>
 #include <wtf/text/WTFString.h>
@@ -71,6 +72,10 @@ public:
 private:
     explicit WebResourceLoadStatisticsStore(const String&);
 
+    bool hasPrevalentResourceCharacteristics(const WebCore::ResourceLoadStatistics&);
+    void classifyResource(WebCore::ResourceLoadStatistics&);
+    void clearDataRecords();
+
     String persistentStoragePath(const String& label) const;
 
     // IPC::MessageReceiver
@@ -83,6 +88,9 @@ private:
     Ref<WTF::WorkQueue> m_statisticsQueue;
     String m_storagePath;
     bool m_resourceLoadStatisticsEnabled { false };
+
+    double m_lastTimeDataRecordsWereCleared { 0 };
+    bool m_dataStoreClearPending { false };
 };
 
 } // namespace WebKit
index 63fba6b..e38c0f5 100644 (file)
@@ -1,3 +1,13 @@
+2016-10-06  John Wilander  <wilander@apple.com>
+
+        Update Resource Load Statistics
+        https://bugs.webkit.org/show_bug.cgi?id=162811
+
+        Reviewed by Alex Christensen.
+
+        * TestWebKitAPI/Tests/mac/PublicSuffix.mm:
+            Change from USE(PUBLIC_SUFFIX_LIST) to ENABLE(PUBLIC_SUFFIX_LIST)
+
 2016-10-05  Philippe Normand  <pnormand@igalia.com>
 
         [GStreamer][OWR] GL rendering support
index cb72414..e06b31f 100644 (file)
@@ -25,7 +25,7 @@
 
 #include "config.h"
 
-#if USE(PUBLIC_SUFFIX_LIST)
+#if ENABLE(PUBLIC_SUFFIX_LIST)
 
 #include "WTFStringUtilities.h"
 #include <WebCore/PublicSuffix.h>