bmalloc: segregate small and large objects again, and allocate more objects on the...
authorggaren@apple.com <ggaren@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 4 Apr 2016 05:26:43 +0000 (05:26 +0000)
committerggaren@apple.com <ggaren@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 4 Apr 2016 05:26:43 +0000 (05:26 +0000)
https://bugs.webkit.org/show_bug.cgi?id=156152

Reviewed by Sam Weinig.

Microbenchmark data suggested that it was a good idea for small and large
objects to share memory. But r198675 did not improve memory use in
full browser benchmarks.

This patch reverts to segregating small and large objects -- but without
going back to doubled VM usage -- in order to capture a few benefits:

(*) Small pages fragment the large heap. Separating them out saves a lot
of memory in our worst case fragmentation recording:

    nimlang                                          276,076kB                      209,636kB                ^ 1.32x smaller

(*) Small objects are common enough that even their slow paths benefit
from simpler code:

Execution Time:
    ...
    facebook                                             234ms                          216ms                 ^ 1.08x faster
    reddit                                               114ms                          108ms                 ^ 1.06x faster
    flickr                                               118ms                          111ms                 ^ 1.06x faster
    theverge                                             146ms                          140ms                 ^ 1.04x faster
    ...
    <arithmetic mean>                                    107ms                          102ms                 ^ 1.04x faster

(*) We can use less metadata:

Memory at End:
    ...
    list_allocate                                        460kB                          384kB                 ^ 1.2x smaller
    tree_allocate                                        492kB                          424kB                ^ 1.16x smaller
    tree_churn                                           480kB                          404kB                ^ 1.19x smaller
    fragment                                             532kB                          452kB                ^ 1.18x smaller
    fragment_iterate                                     712kB                          588kB                ^ 1.21x smaller
    medium                                            15,152kB                       11,796kB                ^ 1.28x smaller
    big                                               15,044kB                       10,976kB                ^ 1.37x smaller
    ...
    <arithmetic mean>                                  7,724kB                        7,190kB                ^ 1.07x smaller

This patch also takes advantage of our support for varying the page size
at runtime by allocating more objects on the small object path:

    medium                                               178ms                          150ms                 ^ 1.19x faster

Some microbenchmarks report memory use increases from this change -- like
they reported memory use decreases from r198675 -- but I'm ignoring them
for now because I expect our full browser memory benchmarks to confirm
that this patch is fine.

* bmalloc/BumpAllocator.h:
(bmalloc::BumpAllocator::BumpAllocator): Use a full unsigned because we
can allocate objects larger than 16kB - 1, and a full unsigned does not
make BumpAllocator any larger on 64bit systems.

* bmalloc/Chunk.h:
(bmalloc::Chunk::begin):
(bmalloc::Chunk::end):
(bmalloc::Chunk::size):
(bmalloc::Chunk::objectType): Store ObjectType in the Chunk, since it only
varies by Chunk now, and not from page to page within a Chunk. Also,
union together small and large object metadata, since we will only use
one or the other. This saves memory.

(bmalloc::Chunk::Chunk): Conditionalize initialization based on object
type, since only one kind of metadata or the other can be used at runtime.

(bmalloc::Object::Object):
(bmalloc::Object::begin):
(bmalloc::SmallPage::end): Deleted.

* bmalloc/Heap.cpp:
(bmalloc::Heap::Heap):
(bmalloc::Heap::initializeLineMetadata): Save a little space, since we
know that lines are only 256 bytes long.

(bmalloc::Heap::initializePageMetadata): Store a dynamic page size for
each size class. We used to use only one page size (the system page size)
but that limited our ability to allocate objects larger than 1kB on the
small object path. Now we can handle any object size we want by storing
objects of that size in a custom page size.

(bmalloc::Heap::concurrentScavenge):
(bmalloc::Heap::scavenge):
(bmalloc::Heap::scavengeSmallPages): Revert to our old linked list
strategy for storing small pages.

(bmalloc::Heap::splitAndAllocate): Object type is per Chunk now.

(bmalloc::Heap::allocateLarge): Don't nuke the small page list when
allocating a large object because the two don't share memory anymore.

(bmalloc::Heap::allocateSmallPage): Revert to our old linked list
strategy for storing small pages.

(bmalloc::Heap::deallocateSmallLine): Don't return early in the case
where this is the first free object in the page. In the case of large-ish
objects, the first free object might also be the last free object,
since there's one object per page.

(bmalloc::Heap::allocateSmallBumpRangesByMetadata): Split out some helper
lambdas to make this code clearer.

(bmalloc::Heap::allocateSmallBumpRangesByObject): Added a fast scan
for objects larger than the line size. When multiple objects fit in
a single line, it's an optimization to scan a line at a time. But when
it's one object per line, or one object per 64 lines, it's better just
to scan an object at a time.

* bmalloc/Heap.h:
(bmalloc::Heap::allocateSmallBumpRanges):
(bmalloc::Heap::derefSmallLine): Match the changes above.

* bmalloc/LineMetadata.h: We weren't using all those bits.

* bmalloc/List.h:
(bmalloc::List::remove): Put a removed Node fully back into the default
(empty) state it was in before it entered the list. This change is not
observable, but it makes things clearer when you're debugging.

* bmalloc/Object.h:
(bmalloc::Object::Object):
(bmalloc::Object::chunk):
(bmalloc::Object::offset):
(bmalloc::Object::operator+):
(bmalloc::Object::operator<=): Added some helpers for iterating by object.

* bmalloc/ObjectType.cpp:
(bmalloc::objectType): Updated for API change.

* bmalloc/Sizes.h:
(bmalloc::Sizes::maskObjectSize):
(bmalloc::Sizes::objectSize):
(bmalloc::Sizes::pageSize): Support more page sizes.

* bmalloc/SmallPage.h:
(bmalloc::SmallPage::SmallPage):
(bmalloc::SmallPage::objectType): Deleted.
(bmalloc::SmallPage::setObjectType): Deleted.
(bmalloc::SmallPage::smallPageCount): Deleted.
(bmalloc::SmallPage::setSmallPageCount): Deleted. Object type is per
Chunk now, and we can infer page count from size class.

* bmalloc/VMHeap.cpp:
(bmalloc::VMHeap::allocateChunk):
(bmalloc::VMHeap::allocateSmallChunk):
* bmalloc/VMHeap.h:
(bmalloc::VMHeap::allocateSmallPage):
(bmalloc::VMHeap::deallocateSmallPage):
(bmalloc::VMHeap::allocateLargeObject): Support our old behavior of
storing free pages in linked lists.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@198995 268f45cc-cd09-0410-ab3c-d52691b4dbfc

13 files changed:
Source/bmalloc/ChangeLog
Source/bmalloc/bmalloc/BumpAllocator.h
Source/bmalloc/bmalloc/Chunk.h
Source/bmalloc/bmalloc/Heap.cpp
Source/bmalloc/bmalloc/Heap.h
Source/bmalloc/bmalloc/LineMetadata.h
Source/bmalloc/bmalloc/List.h
Source/bmalloc/bmalloc/Object.h
Source/bmalloc/bmalloc/ObjectType.cpp
Source/bmalloc/bmalloc/Sizes.h
Source/bmalloc/bmalloc/SmallPage.h
Source/bmalloc/bmalloc/VMHeap.cpp
Source/bmalloc/bmalloc/VMHeap.h

index fc0ec4c..59c9c58 100644 (file)
@@ -1,3 +1,160 @@
+2016-04-03  Geoffrey Garen  <ggaren@apple.com>
+
+        bmalloc: segregate small and large objects again, and allocate more objects on the small path
+        https://bugs.webkit.org/show_bug.cgi?id=156152
+
+        Reviewed by Sam Weinig.
+
+        Microbenchmark data suggested that it was a good idea for small and large
+        objects to share memory. But r198675 did not improve memory use in
+        full browser benchmarks.
+
+        This patch reverts to segregating small and large objects -- but without
+        going back to doubled VM usage -- in order to capture a few benefits:
+
+        (*) Small pages fragment the large heap. Separating them out saves a lot
+        of memory in our worst case fragmentation recording:
+
+            nimlang                                          276,076kB                      209,636kB                ^ 1.32x smaller
+
+        (*) Small objects are common enough that even their slow paths benefit 
+        from simpler code:
+
+        Execution Time:
+            ...
+            facebook                                             234ms                          216ms                 ^ 1.08x faster
+            reddit                                               114ms                          108ms                 ^ 1.06x faster
+            flickr                                               118ms                          111ms                 ^ 1.06x faster
+            theverge                                             146ms                          140ms                 ^ 1.04x faster
+            ...
+            <arithmetic mean>                                    107ms                          102ms                 ^ 1.04x faster
+
+        (*) We can use less metadata:
+
+        Memory at End:
+            ...
+            list_allocate                                        460kB                          384kB                 ^ 1.2x smaller
+            tree_allocate                                        492kB                          424kB                ^ 1.16x smaller
+            tree_churn                                           480kB                          404kB                ^ 1.19x smaller
+            fragment                                             532kB                          452kB                ^ 1.18x smaller
+            fragment_iterate                                     712kB                          588kB                ^ 1.21x smaller
+            medium                                            15,152kB                       11,796kB                ^ 1.28x smaller
+            big                                               15,044kB                       10,976kB                ^ 1.37x smaller
+            ...
+            <arithmetic mean>                                  7,724kB                        7,190kB                ^ 1.07x smaller
+
+        This patch also takes advantage of our support for varying the page size
+        at runtime by allocating more objects on the small object path:
+
+            medium                                               178ms                          150ms                 ^ 1.19x faster
+
+        Some microbenchmarks report memory use increases from this change -- like
+        they reported memory use decreases from r198675 -- but I'm ignoring them
+        for now because I expect our full browser memory benchmarks to confirm
+        that this patch is fine.
+
+        * bmalloc/BumpAllocator.h:
+        (bmalloc::BumpAllocator::BumpAllocator): Use a full unsigned because we
+        can allocate objects larger than 16kB - 1, and a full unsigned does not
+        make BumpAllocator any larger on 64bit systems.
+
+        * bmalloc/Chunk.h:
+        (bmalloc::Chunk::begin):
+        (bmalloc::Chunk::end):
+        (bmalloc::Chunk::size):
+        (bmalloc::Chunk::objectType): Store ObjectType in the Chunk, since it only
+        varies by Chunk now, and not from page to page within a Chunk. Also,
+        union together small and large object metadata, since we will only use
+        one or the other. This saves memory.
+
+        (bmalloc::Chunk::Chunk): Conditionalize initialization based on object
+        type, since only one kind of metadata or the other can be used at runtime.
+
+        (bmalloc::Object::Object):
+        (bmalloc::Object::begin):
+        (bmalloc::SmallPage::end): Deleted.
+
+        * bmalloc/Heap.cpp:
+        (bmalloc::Heap::Heap):
+        (bmalloc::Heap::initializeLineMetadata): Save a little space, since we
+        know that lines are only 256 bytes long.
+
+        (bmalloc::Heap::initializePageMetadata): Store a dynamic page size for
+        each size class. We used to use only one page size (the system page size)
+        but that limited our ability to allocate objects larger than 1kB on the
+        small object path. Now we can handle any object size we want by storing
+        objects of that size in a custom page size.
+
+        (bmalloc::Heap::concurrentScavenge):
+        (bmalloc::Heap::scavenge):
+        (bmalloc::Heap::scavengeSmallPages): Revert to our old linked list
+        strategy for storing small pages.
+
+        (bmalloc::Heap::splitAndAllocate): Object type is per Chunk now.
+
+        (bmalloc::Heap::allocateLarge): Don't nuke the small page list when
+        allocating a large object because the two don't share memory anymore.
+
+        (bmalloc::Heap::allocateSmallPage): Revert to our old linked list
+        strategy for storing small pages.
+
+        (bmalloc::Heap::deallocateSmallLine): Don't return early in the case
+        where this is the first free object in the page. In the case of large-ish
+        objects, the first free object might also be the last free object,
+        since there's one object per page.
+
+        (bmalloc::Heap::allocateSmallBumpRangesByMetadata): Split out some helper
+        lambdas to make this code clearer.
+
+        (bmalloc::Heap::allocateSmallBumpRangesByObject): Added a fast scan
+        for objects larger than the line size. When multiple objects fit in
+        a single line, it's an optimization to scan a line at a time. But when
+        it's one object per line, or one object per 64 lines, it's better just
+        to scan an object at a time.
+
+        * bmalloc/Heap.h:
+        (bmalloc::Heap::allocateSmallBumpRanges):
+        (bmalloc::Heap::derefSmallLine): Match the changes above.
+
+        * bmalloc/LineMetadata.h: We weren't using all those bits.
+
+        * bmalloc/List.h:
+        (bmalloc::List::remove): Put a removed Node fully back into the default
+        (empty) state it was in before it entered the list. This change is not
+        observable, but it makes things clearer when you're debugging.
+
+        * bmalloc/Object.h:
+        (bmalloc::Object::Object):
+        (bmalloc::Object::chunk):
+        (bmalloc::Object::offset):
+        (bmalloc::Object::operator+):
+        (bmalloc::Object::operator<=): Added some helpers for iterating by object.
+
+        * bmalloc/ObjectType.cpp:
+        (bmalloc::objectType): Updated for API change.
+
+        * bmalloc/Sizes.h:
+        (bmalloc::Sizes::maskObjectSize):
+        (bmalloc::Sizes::objectSize):
+        (bmalloc::Sizes::pageSize): Support more page sizes.
+
+        * bmalloc/SmallPage.h:
+        (bmalloc::SmallPage::SmallPage):
+        (bmalloc::SmallPage::objectType): Deleted.
+        (bmalloc::SmallPage::setObjectType): Deleted.
+        (bmalloc::SmallPage::smallPageCount): Deleted.
+        (bmalloc::SmallPage::setSmallPageCount): Deleted. Object type is per
+        Chunk now, and we can infer page count from size class.
+
+        * bmalloc/VMHeap.cpp:
+        (bmalloc::VMHeap::allocateChunk):
+        (bmalloc::VMHeap::allocateSmallChunk):
+        * bmalloc/VMHeap.h:
+        (bmalloc::VMHeap::allocateSmallPage):
+        (bmalloc::VMHeap::deallocateSmallPage):
+        (bmalloc::VMHeap::allocateLargeObject): Support our old behavior of
+        storing free pages in linked lists.
+
 2016-03-29  Geoffrey Garen  <ggaren@apple.com>
 
         bmalloc: support physical page sizes that don't match the virtual page size (take 2)
index b3fa4f8..6f70917 100644 (file)
@@ -51,8 +51,8 @@ public:
 
 private:
     char* m_ptr;
-    unsigned short m_size;
-    unsigned short m_remaining;
+    unsigned m_size;
+    unsigned m_remaining;
 };
 
 inline BumpAllocator::BumpAllocator()
index 4a5c824..cad295d 100644 (file)
@@ -45,19 +45,22 @@ public:
     static BeginTag* beginTag(void*);
     static EndTag* endTag(void*, size_t);
 
-    Chunk(std::lock_guard<StaticMutex>&);
+    Chunk(std::lock_guard<StaticMutex>&, ObjectType);
 
     size_t offset(void*);
 
-    void* object(size_t offset);
+    char* object(size_t offset);
     SmallPage* page(size_t offset);
     SmallLine* line(size_t offset);
 
     SmallLine* lines() { return m_lines.begin(); }
     SmallPage* pages() { return m_pages.begin(); }
 
-    char* begin() { return m_memory; }
+    char* begin() { return roundUpToMultipleOf(vmPageSizePhysical(), m_memory); }
     char* end() { return reinterpret_cast<char*>(this) + chunkSize; }
+    size_t size() { return end() - begin(); }
+
+    ObjectType objectType() { return m_objectType; }
 
 private:
     static const size_t boundaryTagCount = chunkSize / largeMin;
@@ -77,17 +80,34 @@ private:
     //
     // We use the X's for boundary tags and the O's for edge sentinels.
 
-    std::array<SmallLine, chunkSize / smallLineSize> m_lines;
-    std::array<SmallPage, chunkSize / smallPageSize> m_pages;
-    std::array<BoundaryTag, boundaryTagCount> m_boundaryTags;
-    char m_memory[] __attribute__((aligned(largeAlignment + 0)));
+    union {
+        // The first few bytes of metadata cover the metadata region, so they're
+        // not used. We can steal them to store m_objectType.
+        ObjectType m_objectType;
+        std::array<SmallLine, chunkSize / smallLineSize> m_lines;
+    };
+
+    union {
+        // A chunk is either small or large for its lifetime, so we can union
+        // small and large metadata, and then use one or the other at runtime.
+        std::array<SmallPage, chunkSize / smallPageSize> m_pages;
+        std::array<BoundaryTag, boundaryTagCount> m_boundaryTags;
+    };
+    char m_memory[];
 };
 
 static_assert(sizeof(Chunk) + largeMax <= chunkSize, "largeMax is too big");
 
-inline Chunk::Chunk(std::lock_guard<StaticMutex>& lock)
+static_assert(sizeof(Chunk) / smallLineSize > sizeof(ObjectType),
+    "Chunk::m_objectType overlaps with metadata");
+
+inline Chunk::Chunk(std::lock_guard<StaticMutex>&, ObjectType objectType)
+    : m_objectType(objectType)
 {
-    Range range(begin(), end() - begin());
+    if (objectType != ObjectType::Large)
+        return;
+
+    Range range(begin(), size());
     BASSERT(range.size() <= largeObjectMax);
 
     BeginTag* beginTag = Chunk::beginTag(range.begin());
@@ -111,13 +131,6 @@ inline Chunk::Chunk(std::lock_guard<StaticMutex>& lock)
     BASSERT(rightSentinel >= m_boundaryTags.begin());
     BASSERT(rightSentinel < m_boundaryTags.end());
     rightSentinel->initSentinel();
-
-    // Track the memory used for metadata by allocating imaginary objects.
-    for (char* it = reinterpret_cast<char*>(this); it < m_memory; it += smallLineSize) {
-        Object object(it);
-        object.line()->ref(lock);
-        object.page()->ref(lock);
-    }
 }
 
 inline Chunk* Chunk::get(void* object)
@@ -152,7 +165,7 @@ inline size_t Chunk::offset(void* object)
     return static_cast<char*>(object) - reinterpret_cast<char*>(this);
 }
 
-inline void* Chunk::object(size_t offset)
+inline char* Chunk::object(size_t offset)
 {
     return reinterpret_cast<char*>(this) + offset;
 }
@@ -192,12 +205,6 @@ inline SmallLine* SmallPage::begin()
     return &chunk->lines()[lineNumber];
 }
 
-inline SmallLine* SmallPage::end()
-{
-    BASSERT(!m_slide);
-    return begin() + m_smallPageCount * smallPageLineCount;
-}
-
 inline Object::Object(void* object)
     : m_chunk(Chunk::get(object))
     , m_offset(m_chunk->offset(object))
@@ -211,7 +218,7 @@ inline Object::Object(Chunk* chunk, void* object)
     BASSERT(chunk == Chunk::get(object));
 }
 
-inline void* Object::begin()
+inline char* Object::begin()
 {
     return m_chunk->object(m_offset);
 }
index 5c286e7..7dfd71c 100644 (file)
@@ -45,12 +45,12 @@ Heap::Heap(std::lock_guard<StaticMutex>&)
     RELEASE_BASSERT(xLargeAlignment >= vmPageSize());
 
     initializeLineMetadata();
+    initializePageMetadata();
 }
 
 void Heap::initializeLineMetadata()
 {
-    // We assume that m_smallLineMetadata is zero-filled.
-
+    size_t sizeClassCount = bmalloc::sizeClass(smallLineSize);
     size_t smallLineCount = m_vmPageSizePhysical / smallLineSize;
     m_smallLineMetadata.grow(sizeClassCount * smallLineCount);
 
@@ -68,7 +68,7 @@ void Heap::initializeLineMetadata()
             size_t remainder;
             divideRoundingUp(smallLineSize - leftover, size, objectCount, remainder);
 
-            pageMetadata[line] = { static_cast<unsigned short>(leftover), static_cast<unsigned short>(objectCount) };
+            pageMetadata[line] = { static_cast<unsigned char>(leftover), static_cast<unsigned char>(objectCount) };
 
             object += objectCount * size;
         }
@@ -81,6 +81,29 @@ void Heap::initializeLineMetadata()
     }
 }
 
+void Heap::initializePageMetadata()
+{
+    auto computePageSize = [&](size_t sizeClass) {
+        size_t size = objectSize(sizeClass);
+        if (sizeClass < bmalloc::sizeClass(smallLineSize))
+            return m_vmPageSizePhysical;
+
+        for (size_t pageSize = m_vmPageSizePhysical;
+            pageSize < pageSizeMax;
+            pageSize += m_vmPageSizePhysical) {
+            RELEASE_BASSERT(pageSize <= largeObjectMax);
+            size_t waste = pageSize % size;
+            if (waste <= pageSize / pageSizeWasteFactor)
+                return pageSize;
+        }
+        
+        return pageSizeMax;
+    };
+
+    for (size_t i = 0; i < sizeClassCount; ++i)
+        m_pageClasses[i] = (computePageSize(i) - 1) / smallPageSize;
+}
+
 void Heap::concurrentScavenge()
 {
     std::unique_lock<StaticMutex> lock(PerProcess<Heap>::mutex());
@@ -91,39 +114,23 @@ void Heap::scavenge(std::unique_lock<StaticMutex>& lock, std::chrono::millisecon
 {
     waitUntilFalse(lock, sleepDuration, m_isAllocatingPages);
 
-    lock.unlock();
-    {
-        std::lock_guard<StaticMutex> lock(PerProcess<Heap>::mutex());
-        scavengeSmallPages(lock);
-    }
-    lock.lock();
-
+    scavengeSmallPages(lock, sleepDuration);
     scavengeLargeObjects(lock, sleepDuration);
     scavengeXLargeObjects(lock, sleepDuration);
 
     sleep(lock, sleepDuration);
 }
 
-void Heap::scavengeSmallPage(std::lock_guard<StaticMutex>& lock)
-{
-    SmallPage* page = m_smallPages.pop();
-
-    // Revert the slide() value on intermediate SmallPages so they hash to
-    // themselves again.
-    for (size_t i = 1; i < page->smallPageCount(); ++i)
-        page[i].setSlide(0);
-
-    // Revert our small object page back to large object.
-    page->setObjectType(ObjectType::Large);
-
-    LargeObject largeObject(page->begin()->begin());
-    deallocateLarge(lock, largeObject);
-}
-
-void Heap::scavengeSmallPages(std::lock_guard<StaticMutex>& lock)
+void Heap::scavengeSmallPages(std::unique_lock<StaticMutex>& lock, std::chrono::milliseconds sleepDuration)
 {
-    while (!m_smallPages.isEmpty())
-        scavengeSmallPage(lock);
+    for (auto& smallPages : m_smallPages) {
+        while (!smallPages.isEmpty()) {
+            SmallPage* page = smallPages.pop();
+            size_t pageClass = m_pageClasses[page->sizeClass()];
+            m_vmHeap.deallocateSmallPage(lock, pageClass, page);
+            waitUntilFalse(lock, sleepDuration, m_isAllocatingPages);
+        }
+    }
 }
 
 void Heap::scavengeLargeObjects(std::unique_lock<StaticMutex>& lock, std::chrono::milliseconds sleepDuration)
@@ -150,116 +157,6 @@ void Heap::scavengeXLargeObjects(std::unique_lock<StaticMutex>& lock, std::chron
     m_xLargeMap.shrinkToFit();
 }
 
-void Heap::allocateSmallBumpRanges(std::lock_guard<StaticMutex>& lock, size_t sizeClass, BumpAllocator& allocator, BumpRangeCache& rangeCache)
-{
-    BASSERT(!rangeCache.size());
-    SmallPage* page = allocateSmallPage(lock, sizeClass);
-    SmallLine* lines = page->begin();
-    BASSERT(page->hasFreeLines(lock));
-    size_t smallLineCount = m_vmPageSizePhysical / smallLineSize;
-    LineMetadata* pageMetadata = &m_smallLineMetadata[sizeClass * smallLineCount];
-
-    // Find a free line.
-    for (size_t lineNumber = 0; lineNumber < smallLineCount; ++lineNumber) {
-        if (lines[lineNumber].refCount(lock))
-            continue;
-
-        LineMetadata& lineMetadata = pageMetadata[lineNumber];
-        if (!lineMetadata.objectCount)
-            continue;
-
-        // In a fragmented page, some free ranges might not fit in the cache.
-        if (rangeCache.size() == rangeCache.capacity()) {
-            m_smallPagesWithFreeLines[sizeClass].push(page);
-            BASSERT(allocator.canAllocate());
-            return;
-        }
-
-        char* begin = lines[lineNumber].begin() + lineMetadata.startOffset;
-        unsigned short objectCount = lineMetadata.objectCount;
-        lines[lineNumber].ref(lock, lineMetadata.objectCount);
-        page->ref(lock);
-
-        // Merge with subsequent free lines.
-        while (++lineNumber < smallLineCount) {
-            if (lines[lineNumber].refCount(lock))
-                break;
-
-            LineMetadata& lineMetadata = pageMetadata[lineNumber];
-            if (!lineMetadata.objectCount)
-                continue;
-
-            objectCount += lineMetadata.objectCount;
-            lines[lineNumber].ref(lock, lineMetadata.objectCount);
-            page->ref(lock);
-        }
-
-        if (!allocator.canAllocate())
-            allocator.refill({ begin, objectCount });
-        else
-            rangeCache.push({ begin, objectCount });
-    }
-
-    BASSERT(allocator.canAllocate());
-    page->setHasFreeLines(lock, false);
-}
-
-SmallPage* Heap::allocateSmallPage(std::lock_guard<StaticMutex>& lock, size_t sizeClass)
-{
-    if (!m_smallPagesWithFreeLines[sizeClass].isEmpty())
-        return m_smallPagesWithFreeLines[sizeClass].popFront();
-    
-    if (!m_smallPages.isEmpty()) {
-        SmallPage* page = m_smallPages.pop();
-        page->setSizeClass(sizeClass);
-        return page;
-    }
-
-    size_t unalignedSize = largeMin + m_vmPageSizePhysical - largeAlignment + m_vmPageSizePhysical;
-    LargeObject largeObject = allocateLarge(lock, m_vmPageSizePhysical, m_vmPageSizePhysical, unalignedSize);
-    
-    // Transform our large object into a small object page. We deref here
-    // because our small objects will keep their own line refcounts.
-    Object object(largeObject.begin());
-    object.line()->deref(lock);
-    object.page()->setObjectType(ObjectType::Small);
-
-    SmallPage* page = object.page();
-    page->setSizeClass(sizeClass);
-    page->setSmallPageCount(m_vmPageSizePhysical / smallPageSize);
-
-    // Set a slide() value on intermediate SmallPages so they hash to their
-    // vmPageSizePhysical-sized page.
-    for (size_t i = 1; i < page->smallPageCount(); ++i)
-        page[i].setSlide(i);
-
-    return object.page();
-}
-
-void Heap::deallocateSmallLine(std::lock_guard<StaticMutex>& lock, Object object)
-{
-    BASSERT(!object.line()->refCount(lock));
-    SmallPage* page = object.page();
-    if (page->objectType() == ObjectType::Large)
-        return deallocateLarge(lock, LargeObject(object.begin()));
-
-    page->deref(lock);
-    if (!page->hasFreeLines(lock)) {
-        page->setHasFreeLines(lock, true);
-        m_smallPagesWithFreeLines[page->sizeClass()].push(page);
-
-        BASSERT(page->refCount(lock));
-        return;
-    }
-
-    if (page->refCount(lock))
-        return;
-
-    m_smallPagesWithFreeLines[page->sizeClass()].remove(page);
-    m_smallPages.push(page);
-    m_scavenger.run();
-}
-
 inline LargeObject& Heap::splitAndAllocate(std::lock_guard<StaticMutex>& lock, LargeObject& largeObject, size_t size)
 {
     BASSERT(largeObject.isFree());
@@ -275,7 +172,7 @@ inline LargeObject& Heap::splitAndAllocate(std::lock_guard<StaticMutex>& lock, L
     largeObject.setFree(false);
     Object object(largeObject.begin());
     object.line()->ref(lock);
-    BASSERT(object.page()->objectType() == ObjectType::Large);
+    BASSERT(object.chunk()->objectType() == ObjectType::Large);
 
     if (nextLargeObject) {
         BASSERT(!nextLargeObject.nextCanMerge());
@@ -309,7 +206,7 @@ inline LargeObject& Heap::splitAndAllocate(std::lock_guard<StaticMutex>& lock, L
     largeObject.setFree(false);
     Object object(largeObject.begin());
     object.line()->ref(lock);
-    BASSERT(object.page()->objectType() == ObjectType::Large);
+    BASSERT(object.chunk()->objectType() == ObjectType::Large);
 
     if (prevLargeObject) {
         LargeObject merged = prevLargeObject.merge();
@@ -330,9 +227,6 @@ void* Heap::allocateLarge(std::lock_guard<StaticMutex>& lock, size_t size)
     BASSERT(size >= largeMin);
     BASSERT(size == roundUpToMultipleOf<largeAlignment>(size));
     
-    if (size <= m_vmPageSizePhysical)
-        scavengeSmallPages(lock);
-
     LargeObject largeObject = m_largeObjects.take(size);
     if (!largeObject)
         largeObject = m_vmHeap.allocateLargeObject(lock, size);
@@ -361,9 +255,6 @@ void* Heap::allocateLarge(std::lock_guard<StaticMutex>& lock, size_t alignment,
     BASSERT(alignment >= largeAlignment);
     BASSERT(isPowerOfTwo(alignment));
 
-    if (size <= m_vmPageSizePhysical)
-        scavengeSmallPages(lock);
-
     LargeObject largeObject = m_largeObjects.take(alignment, size, unalignedSize);
     if (!largeObject)
         largeObject = m_vmHeap.allocateLargeObject(lock, alignment, size, unalignedSize);
@@ -389,7 +280,7 @@ void Heap::shrinkLarge(std::lock_guard<StaticMutex>& lock, LargeObject& largeObj
 void Heap::deallocateLarge(std::lock_guard<StaticMutex>&, const LargeObject& largeObject)
 {
     BASSERT(!largeObject.isFree());
-    BASSERT(Object(largeObject.begin()).page()->objectType() == ObjectType::Large);
+    BASSERT(Object(largeObject.begin()).chunk()->objectType() == ObjectType::Large);
     largeObject.setFree(true);
     
     LargeObject merged = largeObject.merge();
@@ -397,6 +288,164 @@ void Heap::deallocateLarge(std::lock_guard<StaticMutex>&, const LargeObject& lar
     m_scavenger.run();
 }
 
+SmallPage* Heap::allocateSmallPage(std::lock_guard<StaticMutex>& lock, size_t sizeClass)
+{
+    if (!m_smallPagesWithFreeLines[sizeClass].isEmpty())
+        return m_smallPagesWithFreeLines[sizeClass].popFront();
+
+    SmallPage* page = [&]() {
+        size_t pageClass = m_pageClasses[sizeClass];
+        if (!m_smallPages[pageClass].isEmpty())
+            return m_smallPages[pageClass].pop();
+
+        m_isAllocatingPages = true;
+        return m_vmHeap.allocateSmallPage(lock, pageClass);
+    }();
+
+    page->setSizeClass(sizeClass);
+    return page;
+}
+
+void Heap::deallocateSmallLine(std::lock_guard<StaticMutex>& lock, Object object)
+{
+    BASSERT(!object.line()->refCount(lock));
+    SmallPage* page = object.page();
+    if (object.chunk()->objectType() == ObjectType::Large)
+        return deallocateLarge(lock, LargeObject(object.begin()));
+
+    page->deref(lock);
+    if (!page->hasFreeLines(lock)) {
+        page->setHasFreeLines(lock, true);
+        m_smallPagesWithFreeLines[page->sizeClass()].push(page);
+    }
+
+    if (page->refCount(lock))
+        return;
+
+    size_t sizeClass = page->sizeClass();
+    size_t pageClass = m_pageClasses[sizeClass];
+
+    m_smallPagesWithFreeLines[sizeClass].remove(page);
+    m_smallPages[pageClass].push(page);
+
+    m_scavenger.run();
+}
+
+void Heap::allocateSmallBumpRangesByMetadata(
+    std::lock_guard<StaticMutex>& lock, size_t sizeClass,
+    BumpAllocator& allocator, BumpRangeCache& rangeCache)
+{
+    SmallPage* page = allocateSmallPage(lock, sizeClass);
+    SmallLine* lines = page->begin();
+    BASSERT(page->hasFreeLines(lock));
+    size_t smallLineCount = m_vmPageSizePhysical / smallLineSize;
+    LineMetadata* pageMetadata = &m_smallLineMetadata[sizeClass * smallLineCount];
+    
+    auto findSmallBumpRange = [&](size_t& lineNumber) {
+        for ( ; lineNumber < smallLineCount; ++lineNumber) {
+            if (!lines[lineNumber].refCount(lock)) {
+                if (pageMetadata[lineNumber].objectCount)
+                    return true;
+            }
+        }
+        return false;
+    };
+
+    auto allocateSmallBumpRange = [&](size_t& lineNumber) -> BumpRange {
+        char* begin = lines[lineNumber].begin() + pageMetadata[lineNumber].startOffset;
+        unsigned short objectCount = 0;
+        
+        for ( ; lineNumber < smallLineCount; ++lineNumber) {
+            if (lines[lineNumber].refCount(lock))
+                break;
+
+            if (!pageMetadata[lineNumber].objectCount)
+                continue;
+
+            objectCount += pageMetadata[lineNumber].objectCount;
+            lines[lineNumber].ref(lock, pageMetadata[lineNumber].objectCount);
+            page->ref(lock);
+        }
+        return { begin, objectCount };
+    };
+
+    size_t lineNumber = 0;
+    for (;;) {
+        if (!findSmallBumpRange(lineNumber)) {
+            page->setHasFreeLines(lock, false);
+            BASSERT(allocator.canAllocate());
+            return;
+        }
+
+        // In a fragmented page, some free ranges might not fit in the cache.
+        if (rangeCache.size() == rangeCache.capacity()) {
+            m_smallPagesWithFreeLines[sizeClass].push(page);
+            BASSERT(allocator.canAllocate());
+            return;
+        }
+
+        BumpRange bumpRange = allocateSmallBumpRange(lineNumber);
+        if (allocator.canAllocate())
+            rangeCache.push(bumpRange);
+        else
+            allocator.refill(bumpRange);
+    }
+}
+
+void Heap::allocateSmallBumpRangesByObject(
+    std::lock_guard<StaticMutex>& lock, size_t sizeClass,
+    BumpAllocator& allocator, BumpRangeCache& rangeCache)
+{
+    size_t size = allocator.size();
+    SmallPage* page = allocateSmallPage(lock, sizeClass);
+    BASSERT(page->hasFreeLines(lock));
+
+    auto findSmallBumpRange = [&](Object& it, Object& end) {
+        for ( ; it + size <= end; it = it + size) {
+            if (!it.line()->refCount(lock))
+                return true;
+        }
+        return false;
+    };
+
+    auto allocateSmallBumpRange = [&](Object& it, Object& end) -> BumpRange {
+        char* begin = it.begin();
+        unsigned short objectCount = 0;
+        for ( ; it + size <= end; it = it + size) {
+            if (it.line()->refCount(lock))
+                break;
+
+            ++objectCount;
+            it.line()->ref(lock);
+            it.page()->ref(lock);
+        }
+        return { begin, objectCount };
+    };
+
+    Object it(page->begin()->begin());
+    Object end(it + pageSize(m_pageClasses[sizeClass]));
+    for (;;) {
+        if (!findSmallBumpRange(it, end)) {
+            page->setHasFreeLines(lock, false);
+            BASSERT(allocator.canAllocate());
+            return;
+        }
+
+        // In a fragmented page, some free ranges might not fit in the cache.
+        if (rangeCache.size() == rangeCache.capacity()) {
+            m_smallPagesWithFreeLines[sizeClass].push(page);
+            BASSERT(allocator.canAllocate());
+            return;
+        }
+
+        BumpRange bumpRange = allocateSmallBumpRange(it, end);
+        if (allocator.canAllocate())
+            rangeCache.push(bumpRange);
+        else
+            allocator.refill(bumpRange);
+    }
+}
+
 void* Heap::allocateXLarge(std::lock_guard<StaticMutex>& lock, size_t alignment, size_t size)
 {
     void* result = tryAllocateXLarge(lock, alignment, size);
index 39d2eac..72cd6d2 100644 (file)
@@ -73,6 +73,12 @@ private:
     ~Heap() = delete;
     
     void initializeLineMetadata();
+    void initializePageMetadata();
+
+    void allocateSmallBumpRangesByMetadata(std::lock_guard<StaticMutex>&,
+        size_t sizeClass, BumpAllocator&, BumpRangeCache&);
+    void allocateSmallBumpRangesByObject(std::lock_guard<StaticMutex>&,
+        size_t sizeClass, BumpAllocator&, BumpRangeCache&);
 
     SmallPage* allocateSmallPage(std::lock_guard<StaticMutex>&, size_t sizeClass);
 
@@ -88,18 +94,16 @@ private:
     XLargeRange splitAndAllocate(XLargeRange&, size_t alignment, size_t);
 
     void concurrentScavenge();
-    void scavengeSmallPage(std::lock_guard<StaticMutex>&);
-    void scavengeSmallPages(std::lock_guard<StaticMutex>&);
+    void scavengeSmallPages(std::unique_lock<StaticMutex>&, std::chrono::milliseconds);
     void scavengeLargeObjects(std::unique_lock<StaticMutex>&, std::chrono::milliseconds);
     void scavengeXLargeObjects(std::unique_lock<StaticMutex>&, std::chrono::milliseconds);
 
     size_t m_vmPageSizePhysical;
-
     Vector<LineMetadata> m_smallLineMetadata;
+    std::array<size_t, sizeClassCount> m_pageClasses;
 
     std::array<List<SmallPage>, sizeClassCount> m_smallPagesWithFreeLines;
-
-    List<SmallPage> m_smallPages;
+    std::array<List<SmallPage>, pageClassCount> m_smallPages;
 
     SegregatedFreeList m_largeObjects;
     
@@ -113,6 +117,15 @@ private:
     VMHeap m_vmHeap;
 };
 
+inline void Heap::allocateSmallBumpRanges(
+    std::lock_guard<StaticMutex>& lock, size_t sizeClass,
+    BumpAllocator& allocator, BumpRangeCache& rangeCache)
+{
+    if (sizeClass < bmalloc::sizeClass(smallLineSize))
+        return allocateSmallBumpRangesByMetadata(lock, sizeClass, allocator, rangeCache);
+    return allocateSmallBumpRangesByObject(lock, sizeClass, allocator, rangeCache);
+}
+
 inline void Heap::derefSmallLine(std::lock_guard<StaticMutex>& lock, Object object)
 {
     if (!object.line()->deref(lock))
index 1593347..80b3381 100644 (file)
 namespace bmalloc {
 
 struct LineMetadata {
-    unsigned short startOffset;
-    unsigned short objectCount;
+    unsigned char startOffset;
+    unsigned char objectCount;
 };
 
+static_assert(
+    smallLineSize - alignment <= std::numeric_limits<unsigned char>::max(),
+    "maximum object offset must fit in LineMetadata::startOffset");
+
+static_assert(
+    smallLineSize / alignment <= std::numeric_limits<unsigned char>::max(),
+    "maximum object count must fit in LineMetadata::objectCount");
+
 } // namespace bmalloc
 
 #endif // LineMetadata_h
index f43180c..406bb0f 100644 (file)
@@ -88,6 +88,9 @@ public:
 
         next->prev = prev;
         prev->next = next;
+        
+        node->prev = node;
+        node->next = node;
     }
 
 private:
index cf1cf71..a6f6800 100644 (file)
@@ -36,19 +36,38 @@ class Object {
 public:
     Object(void*);
     Object(Chunk*, void*);
+    Object(Chunk* chunk, size_t offset)
+        : m_chunk(chunk)
+        , m_offset(offset)
+    {
+    }
     
     Chunk* chunk() { return m_chunk; }
-    void* begin();
-    void* pageBegin();
+    size_t offset() { return m_offset; }
+    char* begin();
 
     SmallLine* line();
     SmallPage* page();
+    
+    Object operator+(size_t);
+    bool operator<=(const Object&);
 
 private:
     Chunk* m_chunk;
     size_t m_offset;
 };
 
+inline Object Object::operator+(size_t offset)
+{
+    return Object(m_chunk, m_offset + offset);
+}
+
+inline bool Object::operator<=(const Object& other)
+{
+    BASSERT(m_chunk == other.m_chunk);
+    return m_offset <= other.m_offset;
+}
+
 }; // namespace bmalloc
 
 #endif // Object_h
index 562d84f..ca4e6f1 100644 (file)
@@ -35,7 +35,7 @@ ObjectType objectType(void* object)
     if (isXLarge(object))
         return ObjectType::XLarge;
     
-    return Object(object).page()->objectType();
+    return Object(object).chunk()->objectType();
 }
 
 } // namespace bmalloc
index ef73536..fc10fa4 100644 (file)
@@ -50,14 +50,20 @@ namespace Sizes {
     static const size_t smallPageSize = 4 * kB;
     static const size_t smallPageLineCount = smallPageSize / smallLineSize;
 
-    static const size_t smallMax = 1 * kB;
     static const size_t maskSizeClassMax = 512;
+    static const size_t smallMax = 16 * kB;
+
+    static const size_t pageSizeMax = 32 * kB;
+    static const size_t pageClassCount = pageSizeMax / smallPageSize;
+
+    static const size_t pageSizeWasteFactor = 8;
+    static const size_t logWasteFactor = 8;
 
     static const size_t chunkSize = 2 * MB;
     static const size_t chunkMask = ~(chunkSize - 1ul);
 
     static const size_t largeAlignment = 64;
-    static const size_t largeMin = smallMax;
+    static const size_t largeMin = 1 * kB;
     static const size_t largeObjectMax = chunkSize;
     static const size_t largeMax = largeObjectMax / 2;
 
@@ -86,7 +92,6 @@ namespace Sizes {
         return (maskSizeClass + 1) * alignment;
     }
 
-    static const size_t logWasteFactor = 8;
     static const size_t logAlignmentMin = maskSizeClassMax / logWasteFactor;
 
     static const size_t logSizeClassCount = (log2(smallMax) - log2(maskSizeClassMax)) * logWasteFactor;
@@ -120,6 +125,11 @@ namespace Sizes {
             return maskObjectSize(sizeClass);
         return logObjectSize(sizeClass - maskSizeClassCount);
     }
+    
+    inline size_t pageSize(size_t pageClass)
+    {
+        return (pageClass + 1) * smallPageSize;
+    }
 }
 
 using namespace Sizes;
index 8a123d2..26e8748 100644 (file)
@@ -38,7 +38,6 @@ class SmallPage : public ListNode<SmallPage> {
 public:
     SmallPage()
         : m_hasFreeLines(true)
-        , m_objectType(ObjectType::Large)
     {
     }
 
@@ -49,28 +48,19 @@ public:
     size_t sizeClass() { return m_sizeClass; }
     void setSizeClass(size_t sizeClass) { m_sizeClass = sizeClass; }
     
-    ObjectType objectType() const { return m_objectType; }
-    void setObjectType(ObjectType objectType) { m_objectType = objectType; }
-
     bool hasFreeLines(std::lock_guard<StaticMutex>&) const { return m_hasFreeLines; }
     void setHasFreeLines(std::lock_guard<StaticMutex>&, bool hasFreeLines) { m_hasFreeLines = hasFreeLines; }
     
     SmallLine* begin();
-    SmallLine* end();
 
     unsigned char slide() const { return m_slide; }
     void setSlide(unsigned char slide) { m_slide = slide; }
-
-    unsigned char smallPageCount() const { return m_smallPageCount; }
-    void setSmallPageCount(unsigned char smallPageCount) { m_smallPageCount = smallPageCount; }
-
+    
 private:
     unsigned char m_hasFreeLines: 1;
     unsigned char m_refCount: 7;
     unsigned char m_sizeClass;
-    unsigned char m_smallPageCount;
     unsigned char m_slide;
-    ObjectType m_objectType;
 
 static_assert(
     sizeClassCount <= std::numeric_limits<decltype(m_sizeClass)>::max(),
index d179041..a0d6b3b 100644 (file)
@@ -38,7 +38,7 @@ VMHeap::VMHeap()
 LargeObject VMHeap::allocateChunk(std::lock_guard<StaticMutex>& lock)
 {
     Chunk* chunk =
-        new (vmAllocate(chunkSize, chunkSize)) Chunk(lock);
+        new (vmAllocate(chunkSize, chunkSize)) Chunk(lock, ObjectType::Large);
 
 #if BOS(DARWIN)
     m_zone.addChunk(chunk);
@@ -47,4 +47,30 @@ LargeObject VMHeap::allocateChunk(std::lock_guard<StaticMutex>& lock)
     return LargeObject(chunk->begin());
 }
 
+void VMHeap::allocateSmallChunk(std::lock_guard<StaticMutex>& lock, size_t pageClass)
+{
+    Chunk* chunk =
+        new (vmAllocate(chunkSize, chunkSize)) Chunk(lock, ObjectType::Small);
+
+#if BOS(DARWIN)
+    m_zone.addChunk(chunk);
+#endif
+
+    size_t pageSize = bmalloc::pageSize(pageClass);
+    size_t smallPageCount = pageSize / smallPageSize;
+
+    Object begin(chunk->begin());
+    Object end(begin + chunk->size());
+
+    for (Object it = begin; it + pageSize <= end; it = it + pageSize) {
+        SmallPage* page = it.page();
+        new (page) SmallPage;
+
+        for (size_t i = 0; i < smallPageCount; ++i)
+            page[i].setSlide(i);
+
+        m_smallPages[pageClass].push(page);
+    }
+}
+
 } // namespace bmalloc
index 520e28d..c5fb516 100644 (file)
@@ -47,6 +47,9 @@ class Heap;
 class VMHeap {
 public:
     VMHeap();
+    
+    SmallPage* allocateSmallPage(std::lock_guard<StaticMutex>&, size_t);
+    void deallocateSmallPage(std::unique_lock<StaticMutex>&, size_t, SmallPage*);
 
     LargeObject allocateLargeObject(std::lock_guard<StaticMutex>&, size_t);
     LargeObject allocateLargeObject(std::lock_guard<StaticMutex>&, size_t, size_t, size_t);
@@ -55,7 +58,9 @@ public:
     
 private:
     LargeObject allocateChunk(std::lock_guard<StaticMutex>&);
+    void allocateSmallChunk(std::lock_guard<StaticMutex>&, size_t);
 
+    std::array<List<SmallPage>, pageClassCount> m_smallPages;
     SegregatedFreeList m_largeObjects;
 
 #if BOS(DARWIN)
@@ -63,6 +68,25 @@ private:
 #endif
 };
 
+inline SmallPage* VMHeap::allocateSmallPage(std::lock_guard<StaticMutex>& lock, size_t pageClass)
+{
+    if (m_smallPages[pageClass].isEmpty())
+        allocateSmallChunk(lock, pageClass);
+
+    SmallPage* page = m_smallPages[pageClass].pop();
+    vmAllocatePhysicalPagesSloppy(page->begin()->begin(), pageSize(pageClass));
+    return page;
+}
+
+inline void VMHeap::deallocateSmallPage(std::unique_lock<StaticMutex>& lock, size_t pageClass, SmallPage* page)
+{
+    lock.unlock();
+    vmDeallocatePhysicalPagesSloppy(page->begin()->begin(), pageSize(pageClass));
+    lock.lock();
+    
+    m_smallPages[pageClass].push(page);
+}
+
 inline LargeObject VMHeap::allocateLargeObject(std::lock_guard<StaticMutex>& lock, size_t size)
 {
     if (LargeObject largeObject = m_largeObjects.take(size))