B3 should have global store elimination
authorfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 29 Feb 2016 22:33:58 +0000 (22:33 +0000)
committerfpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Mon, 29 Feb 2016 22:33:58 +0000 (22:33 +0000)
https://bugs.webkit.org/show_bug.cgi?id=154658

Reviewed by Benjamin Poulain.

Source/JavaScriptCore:

Implements fairly comprehensive global store elimination:

1) If you store the result of a load with no interference in between, remove the store.

2) If you store the same thing you stored previously, remove the store.

3) If you store something that you either loaded previously or stored previously along
   arbitrarily many paths, remove the store.

4) If you store to something that is stored to again in the future with no interference in
   between, remove the store.

Rule (4) is super relevant to FTL since the DFG does not eliminate redundant PutStructures.
A constructor that produces a large object will have many redundant stores to the same base
pointer, offset, and heap range, with no code to observe that heap raneg in between.

This doesn't have a decisive effect on major benchmarks, but it's an enormous win for
microbenchmarks:

- 30% faster to construct an object with many fields.

- 5x faster to do many stores to a global variable.

The compile time cost should be very small. Although the optimization is global, it aborts as
soon as it sees anything that would confound store elimination. For rules (1)-(3), we
piggy-back the existing load elimination, which gives up on interfering stores. For rule (4),
we search forward through the current block and then globally a block at a time (skipping
block contents thanks to summary data), which could be expensive. But rule (4) aborts as soon
as it sees a read, write, or end block (Return or Oops). Any Check will claim to read TOP. Any
Patchpoint that results from an InvalidationPoint will claim to read TOP, as will any
Patchpoints for ICs. Those are usually sprinkled all over the program.

In other words, this optimization rarely kicks in. When it does kick in, it makes programs run
faster. When it doesn't kick in, it's usually O(1) because there are reasons for aborting all
over a "normal" program so the search will halt almost immediately. This of course raises the
question: how much more in compile time do we pay when the optimization does kick in? The
optimization kicks in the most for the microbenchmarks I wrote for this patch. Amazingly, the
effect of the optimization a wash for compile time: whatever cost we pay doing the O(n^2)
searches is balanced by the massive reduction in work in the backend. On one of the two
microbenchmarks, overall compile time actually shrank with this optimization even though CSE
itself cost more. That's not too surprising - the backend costs much more per instruction, so
things that remove instructions before we get to the backend tend to be a good idea.

We could consider adding a more aggressive version of this in the future, which could sink
stores into checks. That could be crazy fun: https://bugs.webkit.org/show_bug.cgi?id=152162#c3

But mainly, I'm adding this optimization because it was super fun to implement during the
WebAssembly CG summit.

* b3/B3EliminateCommonSubexpressions.cpp:
* b3/B3MemoryValue.h:
* b3/B3SuccessorCollection.h:
(JSC::B3::SuccessorCollection::begin):
(JSC::B3::SuccessorCollection::end):
(JSC::B3::SuccessorCollection::const_iterator::const_iterator):
(JSC::B3::SuccessorCollection::const_iterator::operator*):
(JSC::B3::SuccessorCollection::const_iterator::operator++):
(JSC::B3::SuccessorCollection::const_iterator::operator==):
(JSC::B3::SuccessorCollection::const_iterator::operator!=):

LayoutTests:

These two benchmarks both speed up significantly with this change.

* js/regress/build-large-object-expected.txt: Added.
* js/regress/build-large-object.html: Added.
* js/regress/many-repeat-stores-expected.txt: Added.
* js/regress/many-repeat-stores.html: Added.
* js/regress/script-tests/build-large-object.js: Added.
* js/regress/script-tests/many-repeat-stores.js: Added.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@197366 268f45cc-cd09-0410-ab3c-d52691b4dbfc

LayoutTests/ChangeLog
LayoutTests/js/regress/build-large-object-expected.txt [new file with mode: 0644]
LayoutTests/js/regress/build-large-object.html [new file with mode: 0644]
LayoutTests/js/regress/many-repeat-stores-expected.txt [new file with mode: 0644]
LayoutTests/js/regress/many-repeat-stores.html [new file with mode: 0644]
LayoutTests/js/regress/script-tests/build-large-object.js [new file with mode: 0644]
LayoutTests/js/regress/script-tests/many-repeat-stores.js [new file with mode: 0644]
Source/JavaScriptCore/ChangeLog
Source/JavaScriptCore/b3/B3EliminateCommonSubexpressions.cpp
Source/JavaScriptCore/b3/B3MemoryValue.h
Source/JavaScriptCore/b3/B3SuccessorCollection.h

index 01ecb24..d609a0b 100644 (file)
@@ -1,3 +1,19 @@
+2016-02-28  Filip Pizlo  <fpizlo@apple.com>
+
+        B3 should have global store elimination
+        https://bugs.webkit.org/show_bug.cgi?id=154658
+
+        Reviewed by Benjamin Poulain.
+
+        These two benchmarks both speed up significantly with this change.
+
+        * js/regress/build-large-object-expected.txt: Added.
+        * js/regress/build-large-object.html: Added.
+        * js/regress/many-repeat-stores-expected.txt: Added.
+        * js/regress/many-repeat-stores.html: Added.
+        * js/regress/script-tests/build-large-object.js: Added.
+        * js/regress/script-tests/many-repeat-stores.js: Added.
+
 2016-02-29  Youenn Fablet  <youenn.fablet@crf.canon.fr>
 
         streams/pipe-to.html flaky on mac-wk1 debug
diff --git a/LayoutTests/js/regress/build-large-object-expected.txt b/LayoutTests/js/regress/build-large-object-expected.txt
new file mode 100644 (file)
index 0000000..cc64ec8
--- /dev/null
@@ -0,0 +1,10 @@
+JSRegress/build-large-object
+
+On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
+
+
+PASS no exception thrown
+PASS successfullyParsed is true
+
+TEST COMPLETE
+
diff --git a/LayoutTests/js/regress/build-large-object.html b/LayoutTests/js/regress/build-large-object.html
new file mode 100644 (file)
index 0000000..e148d2d
--- /dev/null
@@ -0,0 +1,12 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+<head>
+<script src="../../resources/js-test-pre.js"></script>
+</head>
+<body>
+<script src="../../resources/regress-pre.js"></script>
+<script src="script-tests/build-large-object.js"></script>
+<script src="../../resources/regress-post.js"></script>
+<script src="../../resources/js-test-post.js"></script>
+</body>
+</html>
diff --git a/LayoutTests/js/regress/many-repeat-stores-expected.txt b/LayoutTests/js/regress/many-repeat-stores-expected.txt
new file mode 100644 (file)
index 0000000..7f9af05
--- /dev/null
@@ -0,0 +1,10 @@
+JSRegress/many-repeat-stores
+
+On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
+
+
+PASS no exception thrown
+PASS successfullyParsed is true
+
+TEST COMPLETE
+
diff --git a/LayoutTests/js/regress/many-repeat-stores.html b/LayoutTests/js/regress/many-repeat-stores.html
new file mode 100644 (file)
index 0000000..f58912c
--- /dev/null
@@ -0,0 +1,12 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+<head>
+<script src="../../resources/js-test-pre.js"></script>
+</head>
+<body>
+<script src="../../resources/regress-pre.js"></script>
+<script src="script-tests/many-repeat-stores.js"></script>
+<script src="../../resources/regress-post.js"></script>
+<script src="../../resources/js-test-post.js"></script>
+</body>
+</html>
diff --git a/LayoutTests/js/regress/script-tests/build-large-object.js b/LayoutTests/js/regress/script-tests/build-large-object.js
new file mode 100644 (file)
index 0000000..e44419a
--- /dev/null
@@ -0,0 +1,34 @@
+var g;
+(function() {
+    for (var i = 0; i < 1000000; ++i) {
+        var o = {}
+        o.a = 0;
+        o.b = 1;
+        o.c = 2;
+        o.d = 3;
+        o.e = 4;
+        o.f = 5;
+        o.g = 6;
+        o.h = 7;
+        o.i = 8;
+        o.j = 9;
+        o.k = 10;
+        o.l = 11;
+        o.m = 12;
+        o.n = 13;
+        o.o = 14;
+        o.p = 15;
+        o.q = 0;
+        o.r = 1;
+        o.s = 2;
+        o.t = 3;
+        o.u = 4;
+        o.v = 5;
+        o.w = 6;
+        o.x = 7;
+        o.y = 8;
+        o.z = 9;
+        g = o;
+    }
+    return g;
+})();
diff --git a/LayoutTests/js/regress/script-tests/many-repeat-stores.js b/LayoutTests/js/regress/script-tests/many-repeat-stores.js
new file mode 100644 (file)
index 0000000..1500601
--- /dev/null
@@ -0,0 +1,38 @@
+var g;
+(function() {
+    for (var i = 0; i < 10000000; ++i) {
+        g = i + 0;
+        g = i + 1;
+        g = i + 2;
+        g = i + 3;
+        g = i + 4;
+        g = i + 5;
+        g = i + 6;
+        g = i + 7;
+        g = i + 8;
+        g = i + 9;
+        g = i + 10;
+        g = i + 11;
+        g = i + 12;
+        g = i + 13;
+        g = i + 14;
+        g = i + 15;
+        g = i + 0;
+        g = i + 1;
+        g = i + 2;
+        g = i + 3;
+        g = i + 4;
+        g = i + 5;
+        g = i + 6;
+        g = i + 7;
+        g = i + 8;
+        g = i + 9;
+        g = i + 10;
+        g = i + 11;
+        g = i + 12;
+        g = i + 13;
+        g = i + 14;
+        g = i + 15;
+    }
+    return g;
+})();
index 5fb0147..5799c89 100644 (file)
@@ -1,3 +1,70 @@
+2016-02-28  Filip Pizlo  <fpizlo@apple.com>
+
+        B3 should have global store elimination
+        https://bugs.webkit.org/show_bug.cgi?id=154658
+
+        Reviewed by Benjamin Poulain.
+
+        Implements fairly comprehensive global store elimination:
+
+        1) If you store the result of a load with no interference in between, remove the store.
+
+        2) If you store the same thing you stored previously, remove the store.
+
+        3) If you store something that you either loaded previously or stored previously along
+           arbitrarily many paths, remove the store.
+
+        4) If you store to something that is stored to again in the future with no interference in
+           between, remove the store.
+
+        Rule (4) is super relevant to FTL since the DFG does not eliminate redundant PutStructures.
+        A constructor that produces a large object will have many redundant stores to the same base
+        pointer, offset, and heap range, with no code to observe that heap raneg in between.
+
+        This doesn't have a decisive effect on major benchmarks, but it's an enormous win for
+        microbenchmarks:
+
+        - 30% faster to construct an object with many fields.
+
+        - 5x faster to do many stores to a global variable.
+
+        The compile time cost should be very small. Although the optimization is global, it aborts as
+        soon as it sees anything that would confound store elimination. For rules (1)-(3), we
+        piggy-back the existing load elimination, which gives up on interfering stores. For rule (4),
+        we search forward through the current block and then globally a block at a time (skipping
+        block contents thanks to summary data), which could be expensive. But rule (4) aborts as soon
+        as it sees a read, write, or end block (Return or Oops). Any Check will claim to read TOP. Any
+        Patchpoint that results from an InvalidationPoint will claim to read TOP, as will any
+        Patchpoints for ICs. Those are usually sprinkled all over the program.
+
+        In other words, this optimization rarely kicks in. When it does kick in, it makes programs run
+        faster. When it doesn't kick in, it's usually O(1) because there are reasons for aborting all
+        over a "normal" program so the search will halt almost immediately. This of course raises the
+        question: how much more in compile time do we pay when the optimization does kick in? The
+        optimization kicks in the most for the microbenchmarks I wrote for this patch. Amazingly, the
+        effect of the optimization a wash for compile time: whatever cost we pay doing the O(n^2)
+        searches is balanced by the massive reduction in work in the backend. On one of the two
+        microbenchmarks, overall compile time actually shrank with this optimization even though CSE
+        itself cost more. That's not too surprising - the backend costs much more per instruction, so
+        things that remove instructions before we get to the backend tend to be a good idea.
+
+        We could consider adding a more aggressive version of this in the future, which could sink
+        stores into checks. That could be crazy fun: https://bugs.webkit.org/show_bug.cgi?id=152162#c3
+
+        But mainly, I'm adding this optimization because it was super fun to implement during the
+        WebAssembly CG summit.
+
+        * b3/B3EliminateCommonSubexpressions.cpp:
+        * b3/B3MemoryValue.h:
+        * b3/B3SuccessorCollection.h:
+        (JSC::B3::SuccessorCollection::begin):
+        (JSC::B3::SuccessorCollection::end):
+        (JSC::B3::SuccessorCollection::const_iterator::const_iterator):
+        (JSC::B3::SuccessorCollection::const_iterator::operator*):
+        (JSC::B3::SuccessorCollection::const_iterator::operator++):
+        (JSC::B3::SuccessorCollection::const_iterator::operator==):
+        (JSC::B3::SuccessorCollection::const_iterator::operator!=):
+
 2016-02-29  Filip Pizlo  <fpizlo@apple.com>
 
         Make it cheap to #include "JITOperations.h"
index c0041bb..567938e 100644 (file)
@@ -66,25 +66,85 @@ const bool verbose = false;
 
 typedef Vector<MemoryValue*, 1> MemoryMatches;
 
-struct ImpureBlockData {
-    void dump(PrintStream& out) const
+class MemoryValueMap {
+public:
+    MemoryValueMap() { }
+
+    void add(MemoryValue* memory)
+    {
+        Matches& matches = m_map.add(memory->lastChild(), Matches()).iterator->value;
+        if (matches.contains(memory))
+            return;
+        matches.append(memory);
+    }
+
+    template<typename Functor>
+    void removeIf(const Functor& functor)
     {
-        out.print("{writes = ", writes, ", memoryValues = {");
+        m_map.removeIf(
+            [&] (HashMap<Value*, Matches>::KeyValuePairType& entry) -> bool {
+                entry.value.removeAllMatching(
+                    [&] (Value* value) -> bool {
+                        if (MemoryValue* memory = value->as<MemoryValue>())
+                            return functor(memory);
+                        return true;
+                    });
+                return entry.value.isEmpty();
+            });
+    }
 
+    Matches* find(Value* ptr)
+    {
+        auto iter = m_map.find(ptr);
+        if (iter == m_map.end())
+            return nullptr;
+        return &iter->value;
+    }
+
+    template<typename Functor>
+    MemoryValue* find(Value* ptr, const Functor& functor)
+    {
+        if (Matches* matches = find(ptr)) {
+            for (Value* candidateValue : *matches) {
+                if (MemoryValue* candidateMemory = candidateValue->as<MemoryValue>()) {
+                    if (functor(candidateMemory))
+                        return candidateMemory;
+                }
+            }
+        }
+        return nullptr;
+    }
+
+    void dump(PrintStream& out) const
+    {
+        out.print("{");
         CommaPrinter comma;
-        for (auto& entry : memoryValues)
+        for (auto& entry : m_map)
             out.print(comma, pointerDump(entry.key), "=>", pointerListDump(entry.value));
-
-        out.print("}}");
+        out.print("}");
     }
     
-    RangeSet<HeapRange> writes;
-    
-    // Maps an address base to all of the MemoryValues that do things to it. After we're done
-    // processing a map, this tells us the values at tail. Note that we use a Matches array
-    // because those MemoryValues could be turned into Identity's, and we need to check for this
-    // as we go.
-    HashMap<Value*, Matches> memoryValues;
+private:
+    // This uses Matches for two reasons:
+    // - It cannot be a MemoryValue* because the key is imprecise. Many MemoryValues could have the
+    //   same key while being unaliased.
+    // - It can't be a MemoryMatches array because the MemoryValue*'s could be turned into Identity's.
+    HashMap<Value*, Matches> m_map;
+};
+
+struct ImpureBlockData {
+    void dump(PrintStream& out) const
+    {
+        out.print(
+            "{reads = ", reads, ", writes = ", writes, ", storesAtHead = ", storesAtHead,
+            ", memoryValuesAtTail = ", memoryValuesAtTail, "}");
+    }
+
+    RangeSet<HeapRange> reads; // This only gets used for forward store elimination.
+    RangeSet<HeapRange> writes; // This gets used for both load and store elimination.
+
+    MemoryValueMap storesAtHead;
+    MemoryValueMap memoryValuesAtTail;
 };
 
 class CSE {
@@ -104,20 +164,32 @@ public:
         
         m_proc.resetValueOwners();
 
+        // Summarize the impure effects of each block, and the impure values available at the end of
+        // each block. This doesn't edit code yet.
         for (BasicBlock* block : m_proc) {
             ImpureBlockData& data = m_impureBlockData[block];
             for (Value* value : *block) {
-                if (HeapRange writes = value->effects().writes)
+                Effects effects = value->effects();
+                MemoryValue* memory = value->as<MemoryValue>();
+                
+                if (memory && memory->isStore()
+                    && !data.reads.overlaps(memory->range())
+                    && !data.writes.overlaps(memory->range()))
+                    data.storesAtHead.add(memory);
+                data.reads.add(effects.reads);
+
+                if (HeapRange writes = effects.writes)
                     clobber(data, writes);
 
-                if (MemoryValue* memory = value->as<MemoryValue>())
-                    addMemoryValue(data, memory);
+                if (memory)
+                    data.memoryValuesAtTail.add(memory);
             }
 
             if (verbose)
                 dataLog("Block ", *block, ": ", data, "\n");
         }
 
+        // Perform CSE. This edits code.
         Vector<BasicBlock*> postOrder = m_proc.blocksInPostOrder();
         for (unsigned i = postOrder.size(); i--;) {
             m_block = postOrder[i];
@@ -133,6 +205,8 @@ public:
             m_impureBlockData[m_block] = m_data;
         }
 
+        // The previous pass might have requested that we insert code in some basic block other than
+        // the one that it was looking at. This inserts them.
         for (BasicBlock* block : m_proc) {
             for (unsigned valueIndex = 0; valueIndex < block->size(); ++valueIndex) {
                 auto iter = m_sets.find(block->at(valueIndex));
@@ -162,30 +236,69 @@ private:
             return;
         }
 
+        MemoryValue* memory = m_value->as<MemoryValue>();
+        if (memory && processMemoryBeforeClobber(memory))
+            return;
+
         if (HeapRange writes = m_value->effects().writes)
             clobber(m_data, writes);
         
-        if (MemoryValue* memory = m_value->as<MemoryValue>())
-            processMemory(memory);
+        if (memory)
+            processMemoryAfterClobber(memory);
+    }
+
+    // Return true if we got rid of the operation. If you changed IR in this function, you have to
+    // set m_changed even if you also return true.
+    bool processMemoryBeforeClobber(MemoryValue* memory)
+    {
+        Value* value = memory->child(0);
+        Value* ptr = memory->lastChild();
+        HeapRange range = memory->range();
+        int32_t offset = memory->offset();
+
+        switch (memory->opcode()) {
+        case Store8:
+            return handleStoreBeforeClobber(
+                ptr, range,
+                [&] (MemoryValue* candidate) -> bool {
+                    return candidate->offset() == offset
+                        && ((candidate->opcode() == Store8 && candidate->child(0) == value)
+                            || ((candidate->opcode() == Load8Z || candidate->opcode() == Load8S)
+                                && candidate == value));
+                });
+        case Store16:
+            return handleStoreBeforeClobber(
+                ptr, range,
+                [&] (MemoryValue* candidate) -> bool {
+                    return candidate->offset() == offset
+                        && ((candidate->opcode() == Store16 && candidate->child(0) == value)
+                            || ((candidate->opcode() == Load16Z || candidate->opcode() == Load16S)
+                                && candidate == value));
+                });
+        case Store:
+            return handleStoreBeforeClobber(
+                ptr, range,
+                [&] (MemoryValue* candidate) -> bool {
+                    return candidate->offset() == offset
+                        && ((candidate->opcode() == Store && candidate->child(0) == value)
+                            || (candidate->opcode() == Load && candidate == value));
+                });
+        default:
+            return false;
+        }
     }
 
     void clobber(ImpureBlockData& data, HeapRange writes)
     {
         data.writes.add(writes);
         
-        data.memoryValues.removeIf(
-            [&] (HashMap<Value*, Matches>::KeyValuePairType& entry) -> bool {
-                entry.value.removeAllMatching(
-                    [&] (Value* value) -> bool {
-                        if (MemoryValue* memory = value->as<MemoryValue>())
-                            return memory->range().overlaps(writes);
-                        return true;
-                    });
-                return entry.value.isEmpty();
+        data.memoryValuesAtTail.removeIf(
+            [&] (MemoryValue* memory) {
+                return memory->range().overlaps(writes);
             });
     }
 
-    void processMemory(MemoryValue* memory)
+    void processMemoryAfterClobber(MemoryValue* memory)
     {
         Value* ptr = memory->lastChild();
         HeapRange range = memory->range();
@@ -301,10 +414,33 @@ private:
             break;
         }
 
-        case Store8:
-        case Store16:
+        case Store8: {
+            handleStoreAfterClobber(
+                ptr, range,
+                [&] (MemoryValue* candidate) -> bool {
+                    return candidate->opcode() == Store8
+                        && candidate->offset() == offset;
+                });
+            break;
+        }
+            
+        case Store16: {
+            handleStoreAfterClobber(
+                ptr, range,
+                [&] (MemoryValue* candidate) -> bool {
+                    return candidate->opcode() == Store16
+                        && candidate->offset() == offset;
+                });
+            break;
+        }
+            
         case Store: {
-            addMemoryValue(m_data, memory);
+            handleStoreAfterClobber(
+                ptr, range,
+                [&] (MemoryValue* candidate) -> bool {
+                    return candidate->opcode() == Store
+                        && candidate->offset() == offset;
+                });
             break;
         }
 
@@ -316,6 +452,82 @@ private:
     }
 
     template<typename Filter>
+    bool handleStoreBeforeClobber(Value* ptr, HeapRange range, const Filter& filter)
+    {
+        MemoryMatches matches = findMemoryValue(ptr, range, filter);
+        if (matches.isEmpty())
+            return false;
+
+        m_value->replaceWithNop();
+        m_changed = true;
+        return true;
+    }
+
+    template<typename Filter>
+    void handleStoreAfterClobber(Value* ptr, HeapRange range, const Filter& filter)
+    {
+        if (findStoreAfterClobber(ptr, range, filter)) {
+            m_value->replaceWithNop();
+            m_changed = true;
+            return;
+        }
+
+        m_data.memoryValuesAtTail.add(m_value->as<MemoryValue>());
+    }
+
+    template<typename Filter>
+    bool findStoreAfterClobber(Value* ptr, HeapRange range, const Filter& filter)
+    {
+        // We can eliminate a store if every forward path hits a store to the same location before
+        // hitting any operation that observes the store. This search seems like it should be
+        // expensive, but in the overwhelming majority of cases it will almost immediately hit an 
+        // operation that interferes.
+
+        if (verbose)
+            dataLog(*m_value, ": looking forward for stores to ", *ptr, "...\n");
+
+        // First search forward in this basic block.
+        // FIXME: It would be cool to get rid of this linear search. It's not super critical since
+        // we will probably bail out very quickly, but it *is* annoying.
+        for (unsigned index = m_index + 1; index < m_block->size(); ++index) {
+            Value* value = m_block->at(index);
+
+            if (MemoryValue* memoryValue = value->as<MemoryValue>()) {
+                if (memoryValue->lastChild() == ptr && filter(memoryValue))
+                    return true;
+            }
+
+            Effects effects = value->effects();
+            if (effects.reads.overlaps(range) || effects.writes.overlaps(range))
+                return false;
+        }
+
+        if (!m_block->numSuccessors())
+            return false;
+
+        BlockWorklist worklist;
+        worklist.pushAll(m_block->successorBlocks());
+
+        while (BasicBlock* block = worklist.pop()) {
+            ImpureBlockData& data = m_impureBlockData[block];
+
+            MemoryValue* match = data.storesAtHead.find(ptr, filter);
+            if (match && match != m_value)
+                continue;
+
+            if (data.writes.overlaps(range) || data.reads.overlaps(range))
+                return false;
+
+            if (!block->numSuccessors())
+                return false;
+
+            worklist.pushAll(block->successorBlocks());
+        }
+
+        return true;
+    }
+
+    template<typename Filter>
     void handleMemoryValue(Value* ptr, HeapRange range, const Filter& filter)
     {
         handleMemoryValue(
@@ -332,7 +544,7 @@ private:
         MemoryMatches matches = findMemoryValue(ptr, range, filter);
         if (replaceMemoryValue(matches, replace))
             return;
-        addMemoryValue(m_data, m_value->as<MemoryValue>());
+        m_data.memoryValuesAtTail.add(m_value->as<MemoryValue>());
     }
 
     template<typename Replace>
@@ -396,39 +608,13 @@ private:
         return true;
     }
 
-    void addMemoryValue(ImpureBlockData& data, MemoryValue* memory)
-    {
-        if (verbose)
-            dataLog("Adding memory: ", *memory, "\n");
-        
-        Matches& matches = data.memoryValues.add(memory->lastChild(), Matches()).iterator->value;
-
-        if (matches.contains(memory))
-            return;
-
-        matches.append(memory);
-    }
-
     template<typename Filter>
     MemoryMatches findMemoryValue(Value* ptr, HeapRange range, const Filter& filter)
     {
         if (verbose)
-            dataLog(*m_value, ": looking for ", *ptr, "...\n");
+            dataLog(*m_value, ": looking backward for ", *ptr, "...\n");
         
-        auto find = [&] (ImpureBlockData& data) -> MemoryValue* {
-            auto iter = data.memoryValues.find(ptr);
-            if (iter != data.memoryValues.end()) {
-                for (Value* candidateValue : iter->value) {
-                    if (MemoryValue* candidateMemory = candidateValue->as<MemoryValue>()) {
-                        if (filter(candidateMemory))
-                            return candidateMemory;
-                    }
-                }
-            }
-            return nullptr;
-        };
-
-        if (MemoryValue* match = find(m_data)) {
+        if (MemoryValue* match = m_data.memoryValuesAtTail.find(ptr, filter)) {
             if (verbose)
                 dataLog("    Found ", *match, " locally.\n");
             return { match };
@@ -441,8 +627,6 @@ private:
         }
 
         BlockWorklist worklist;
-        Vector<BasicBlock*, 8> seenList;
-
         worklist.pushAll(m_block->predecessors());
 
         MemoryMatches matches;
@@ -450,11 +634,10 @@ private:
         while (BasicBlock* block = worklist.pop()) {
             if (verbose)
                 dataLog("    Looking at ", *block, "\n");
-            seenList.append(block);
 
             ImpureBlockData& data = m_impureBlockData[block];
 
-            MemoryValue* match = find(data);
+            MemoryValue* match = data.memoryValuesAtTail.find(ptr, filter);
             if (match && match != m_value) {
                 if (verbose)
                     dataLog("    Found match: ", *match, "\n");
index 5c813e8..40c8002 100644 (file)
@@ -52,6 +52,23 @@ public:
         }
     }
 
+    static bool isStore(Opcode opcode)
+    {
+        switch (opcode) {
+        case Store8:
+        case Store16:
+        case Store:
+            return true;
+        default:
+            return false;
+        }
+    }
+
+    static bool isLoad(Opcode opcode)
+    {
+        return accepts(opcode) && !isStore(opcode);
+    }
+
     ~MemoryValue();
 
     int32_t offset() const { return m_offset; }
index 4bee4f7..ea86477 100644 (file)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Apple Inc. All rights reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -90,6 +90,50 @@ public:
     iterator begin() { return iterator(*this, 0); }
     iterator end() { return iterator(*this, size()); }
 
+    class const_iterator {
+    public:
+        const_iterator()
+            : m_collection(nullptr)
+            , m_index(0)
+        {
+        }
+
+        const_iterator(const SuccessorCollection& collection, size_t index)
+            : m_collection(&collection)
+            , m_index(index)
+        {
+        }
+
+        BasicBlock* operator*() const
+        {
+            return m_collection->at(m_index);
+        }
+
+        const_iterator& operator++()
+        {
+            m_index++;
+            return *this;
+        }
+
+        bool operator==(const const_iterator& other) const
+        {
+            ASSERT(m_collection == other.m_collection);
+            return m_index == other.m_index;
+        }
+
+        bool operator!=(const const_iterator& other) const
+        {
+            return !(*this == other);
+        }
+
+    private:
+        const SuccessorCollection* m_collection;
+        size_t m_index;
+    };
+
+    const_iterator begin() const { return const_iterator(*this, 0); }
+    const_iterator end() const { return const_iterator(*this, size()); }
+
 private:
     SuccessorList& m_list;
 };