[WebGPU] Improve GPUBindGroup performance using one device-shared argument MTLBuffer
authorjustin_fan@apple.com <justin_fan@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Tue, 13 Aug 2019 19:49:40 +0000 (19:49 +0000)
committerjustin_fan@apple.com <justin_fan@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Tue, 13 Aug 2019 19:49:40 +0000 (19:49 +0000)
https://bugs.webkit.org/show_bug.cgi?id=200606

Reviewed by Myles C. Maxfield.

Source/WebCore:

Manage all argument buffer storage for GPUBindGroups in one large MTLBuffer for a GPUDevice.
Vastly improves GPUProgrammablePassEncoder.setBindGroup performance; in alpha MotionMark WebGPU benchmark,
score improves from ~12000 to ~90000.

No expected change in WebGPU behavior, though bind-groups.html has been updated to cover more cases.

* Modules/webgpu/WebGPUDevice.cpp:
(WebCore::WebGPUDevice::createBindGroup const):
* SourcesCocoa.txt:
* WebCore.xcodeproj/project.pbxproj:
* platform/graphics/gpu/GPUBindGroup.h: No longer manages one unique MTLBuffer per MTLArgumentEncoder.
(WebCore::GPUBindGroup::argumentBuffer const): Delegates to GPUBindGroupAllocator for current argument buffer.
(WebCore::GPUBindGroup::vertexArgsBuffer const): Deleted.
(WebCore::GPUBindGroup::fragmentArgsBuffer const): Deleted.
(WebCore::GPUBindGroup::computeArgsBuffer const): Deleted.
* platform/graphics/gpu/GPUBindGroupAllocator.h: Added. Allocates MTLBuffer for and assigns offsets for argument buffers.
(WebCore::GPUBindGroupAllocator::argumentBuffer const):
* platform/graphics/gpu/GPUBindGroupLayout.h:
* platform/graphics/gpu/GPUBuffer.h: Move MTLResourceUsage calculation to GPUBuffer construction.
(WebCore::GPUBuffer::platformUsage const):
* platform/graphics/gpu/GPUComputePassEncoder.h: Prevent any potiential narrowing issues, as offset can be large.
* platform/graphics/gpu/GPUDevice.cpp: Now owns a GPUBindGroupAllocator for owning all its argument buffer storage.
(WebCore::GPUDevice::tryCreateBindGroup const):
* platform/graphics/gpu/GPUDevice.h:
* platform/graphics/gpu/GPUProgrammablePassEncoder.h:
(WebCore::GPUProgrammablePassEncoder::setVertexBuffer):
(WebCore::GPUProgrammablePassEncoder::setFragmentBuffer):
(WebCore::GPUProgrammablePassEncoder::setComputeBuffer):
* platform/graphics/gpu/GPURenderPassEncoder.h:
* platform/graphics/gpu/GPUTexture.h: Move MTLResourceUsage calculation to GPUTexture construction.
(WebCore::GPUTexture::platformUsage const):
* platform/graphics/gpu/cocoa/GPUBindGroupAllocatorMetal.mm: Added.
(WebCore::GPUBindGroupAllocator::create):
(WebCore::GPUBindGroupAllocator::GPUBindGroupAllocator):
(WebCore::GPUBindGroupAllocator::allocateAndSetEncoders): Ensures that MTLArgumentEncoders have appropriate allocation for encoding.
(WebCore::GPUBindGroupAllocator::reallocate): Create new MTLBuffer large enough for new encoder requirement, and copy over old argument buffer data.
(WebCore::GPUBindGroupAllocator::tryReset): For now, resets argument buffer if all GPUBindGroups created with this allocator are destroyed.
* platform/graphics/gpu/cocoa/GPUBindGroupMetal.mm:
(WebCore::tryGetResourceAsBufferBinding): Add size check.
(WebCore::GPUBindGroup::tryCreate): No longer owns new MTLBuffers. Requests argument buffer space from GPUBindGroupAllocator.
(WebCore::GPUBindGroup::GPUBindGroup):
(WebCore::GPUBindGroup::~GPUBindGroup): Remind allocator to check for possible reset.
(WebCore::tryCreateArgumentBuffer): Deleted.
* platform/graphics/gpu/cocoa/GPUBufferMetal.mm:
(WebCore::GPUBuffer::GPUBuffer):
* platform/graphics/gpu/cocoa/GPUComputePassEncoderMetal.mm:
(WebCore::GPUComputePassEncoder::setComputeBuffer):
* platform/graphics/gpu/cocoa/GPUDeviceMetal.mm:
* platform/graphics/gpu/cocoa/GPUProgrammablePassEncoderMetal.mm:
(WebCore::GPUProgrammablePassEncoder::setBindGroup): No need to recalculate usage every time. Set appropriate argument buffer and offsets for new bind group model.
* platform/graphics/gpu/cocoa/GPURenderPassEncoderMetal.mm:
(WebCore::GPURenderPassEncoder::setVertexBuffer):
(WebCore::GPURenderPassEncoder::setFragmentBuffer):
* platform/graphics/gpu/cocoa/GPUTextureMetal.mm:
(WebCore::GPUTexture::GPUTexture):

LayoutTests:

Update bind-groups.html to better stress GPUBindGroup implementation.

* webgpu/bind-groups-expected.txt:
* webgpu/bind-groups.html:

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@248606 268f45cc-cd09-0410-ab3c-d52691b4dbfc

25 files changed:
LayoutTests/ChangeLog
LayoutTests/webgpu/bind-groups-expected.txt
LayoutTests/webgpu/bind-groups.html
Source/WebCore/ChangeLog
Source/WebCore/Modules/webgpu/WebGPUDevice.cpp
Source/WebCore/SourcesCocoa.txt
Source/WebCore/WebCore.xcodeproj/project.pbxproj
Source/WebCore/platform/graphics/gpu/GPUBindGroup.h
Source/WebCore/platform/graphics/gpu/GPUBindGroupAllocator.h [new file with mode: 0644]
Source/WebCore/platform/graphics/gpu/GPUBindGroupLayout.h
Source/WebCore/platform/graphics/gpu/GPUBuffer.h
Source/WebCore/platform/graphics/gpu/GPUComputePassEncoder.h
Source/WebCore/platform/graphics/gpu/GPUDevice.cpp
Source/WebCore/platform/graphics/gpu/GPUDevice.h
Source/WebCore/platform/graphics/gpu/GPUProgrammablePassEncoder.h
Source/WebCore/platform/graphics/gpu/GPURenderPassEncoder.h
Source/WebCore/platform/graphics/gpu/GPUTexture.h
Source/WebCore/platform/graphics/gpu/cocoa/GPUBindGroupAllocatorMetal.mm [new file with mode: 0644]
Source/WebCore/platform/graphics/gpu/cocoa/GPUBindGroupMetal.mm
Source/WebCore/platform/graphics/gpu/cocoa/GPUBufferMetal.mm
Source/WebCore/platform/graphics/gpu/cocoa/GPUComputePassEncoderMetal.mm
Source/WebCore/platform/graphics/gpu/cocoa/GPUDeviceMetal.mm
Source/WebCore/platform/graphics/gpu/cocoa/GPUProgrammablePassEncoderMetal.mm
Source/WebCore/platform/graphics/gpu/cocoa/GPURenderPassEncoderMetal.mm
Source/WebCore/platform/graphics/gpu/cocoa/GPUTextureMetal.mm

index 11ffede..97932fd 100644 (file)
@@ -1,3 +1,15 @@
+2019-08-13  Justin Fan  <justin_fan@apple.com>
+
+        [WebGPU] Improve GPUBindGroup performance using one device-shared argument MTLBuffer
+        https://bugs.webkit.org/show_bug.cgi?id=200606
+
+        Reviewed by Myles C. Maxfield.
+
+        Update bind-groups.html to better stress GPUBindGroup implementation.
+
+        * webgpu/bind-groups-expected.txt:
+        * webgpu/bind-groups.html:
+
 2019-08-13  Antti Koivisto  <antti@apple.com>
 
         Event regions collection should take clipping into account
index b7d4781..42fc70a 100644 (file)
@@ -1,3 +1,8 @@
 
-PASS Create a basic GPUBindGroup via GPUDevice. 
+PASS Create and use a basic GPUBindGroup. 
+PASS Create and use many GPUBindGroups in a single compute pass. 
+PASS Create and access a uniform-buffer in a GPUBindGroup. 
+PASS Create and access a sampled texture in a GPUBindGroup. 
+PASS Create and use multiple GPUBindGroups in a single dispatch. 
+PASS Bind a single GPUBuffer with different offsets in different GPUBindGroups 
 
index 74ac8ae..083dbfb 100644 (file)
 <script src="../resources/testharness.js"></script>
 <script src="../resources/testharnessreport.js"></script>
 <script>
-promise_test(() => {
-    return getBasicDevice().then(function(device) {
-        // GPUBindGroupLayoutBindings
-        // FIXME: Also test sampled texture bindings.
-        const bufferLayoutBinding = {
+let tests = {};
+
+const basicBufferShader = `
+[numthreads(1, 1, 1)]
+compute void compute_main(device int[] buffer : register(u0))
+{
+    ++buffer[0];
+}
+`;
+
+let basicPipeline;
+
+tests["Create and use a basic GPUBindGroup."] = async device => {
+    const bufferLayoutBinding = {
+        binding: 0,
+        visibility: GPUShaderStageBit.COMPUTE,
+        type: "storage-buffer"
+    };
+
+    const bindGroupLayout = device.createBindGroupLayout({ bindings: [bufferLayoutBinding] });
+
+    const basicBuffer = device.createBuffer({ size: 4, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.MAP_READ });
+    const bufferBinding = { buffer: basicBuffer, size: 4 };
+    const bindGroupBinding = { binding: 0, resource: bufferBinding };
+
+    const bindGroup = device.createBindGroup({ layout: bindGroupLayout, bindings: [bindGroupBinding] });
+
+    const pipelineLayout = device.createPipelineLayout({ bindGroupLayouts: [bindGroupLayout] });
+
+    const basicShaderModule = device.createShaderModule({ code: basicBufferShader, isWHLSL: true });
+    basicPipeline = device.createComputePipeline({
+        layout: pipelineLayout,
+        computeStage: {
+            module: basicShaderModule,
+            entryPoint: "compute_main"
+        }
+    });
+
+    const commandEncoder = device.createCommandEncoder();
+    const passEncoder = commandEncoder.beginComputePass();
+    passEncoder.setPipeline(basicPipeline);
+    passEncoder.setBindGroup(0, bindGroup);
+    passEncoder.dispatch(1, 1, 1);
+    passEncoder.endPass();
+    device.getQueue().submit([commandEncoder.finish()]);
+
+    const results = new Int32Array(await basicBuffer.mapReadAsync());
+    basicBuffer.unmap();
+    assert_equals(results[0], 1, "Storage buffer binding written to successfully.");
+};
+
+tests["Create and use many GPUBindGroups in a single compute pass."] = async device => {
+    const bufferLayoutBinding = {
+        binding: 0,
+        visibility: GPUShaderStageBit.COMPUTE,
+        type: "storage-buffer"
+    };
+
+    const bindGroupLayout = device.createBindGroupLayout({ bindings: [bufferLayoutBinding] });
+
+    const basicBuffer = device.createBuffer({ size: 4, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.MAP_READ });
+    const bufferBinding = { buffer: basicBuffer, size: 4 };
+    const bindGroupBinding = { binding: 0, resource: bufferBinding };
+
+    const numGroups = 1000;
+    let bindGroups = new Array(numGroups);
+    for (let i = 0; i < numGroups; ++i)
+        bindGroups[i] = device.createBindGroup({ layout: bindGroupLayout, bindings: [bindGroupBinding] });
+
+    const commandEncoder = device.createCommandEncoder();
+    const passEncoder = commandEncoder.beginComputePass();
+
+    let j = 0;
+    for (; j < numGroups; ++j) {
+        passEncoder.setPipeline(basicPipeline);
+        passEncoder.setBindGroup(0, bindGroups[j]);
+        passEncoder.dispatch(1, 1, 1);
+    }
+
+    passEncoder.endPass();
+    device.getQueue().submit([commandEncoder.finish()]);
+
+    const results = new Int32Array(await basicBuffer.mapReadAsync());
+    basicBuffer.unmap();
+    assert_equals(results[0], j, "Storage buffer accessed successfully through multiple bind groups.");
+};
+
+const uniformBufferShader = `
+[numthreads(1, 1, 1)]
+compute void compute_main(constant int[] uniforms : register(b0), device int[] buffer : register(u1))
+{
+    buffer[0] += uniforms[0];
+}
+`;
+
+tests["Create and access a uniform-buffer in a GPUBindGroup."] = async device => {
+    const [uniformBuffer, writeArrayBuffer] = device.createBufferMapped({ size: 4, usage: GPUBufferUsage.UNIFORM });
+    new Int32Array(writeArrayBuffer).set([42]);
+    uniformBuffer.unmap();
+
+    const storageBuffer = device.createBuffer({ size: 4, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.MAP_READ });
+
+    const bindGroupLayout = device.createBindGroupLayout({
+        bindings: [{
+            binding: 0,
+            visibility: GPUShaderStageBit.COMPUTE,
+            type: "uniform-buffer"
+        }, {
+            binding: 1,
+            visibility: GPUShaderStageBit.COMPUTE,
+            type: "storage-buffer"
+        }]
+    });
+
+    const bindGroup = device.createBindGroup({
+        layout: bindGroupLayout,
+        bindings: [{
+            binding: 0,
+            resource: {
+                buffer: uniformBuffer,
+                size: 4
+            }
+        }, {
+            binding: 1,
+            resource: {
+                buffer: storageBuffer,
+                size: 4
+            }
+        }]
+    });
+
+    const pipelineLayout = device.createPipelineLayout({ bindGroupLayouts: [bindGroupLayout] });
+
+    const shaderModule = device.createShaderModule({ code: uniformBufferShader, isWHLSL: true });
+
+    const pipeline = device.createComputePipeline({
+        layout: pipelineLayout,
+        computeStage: {
+            module: shaderModule,
+            entryPoint: "compute_main"
+        }
+    });
+
+    const commandEncoder = device.createCommandEncoder();
+    const passEncoder = commandEncoder.beginComputePass();
+    passEncoder.setPipeline(pipeline);
+    passEncoder.setBindGroup(0, bindGroup);
+    passEncoder.dispatch(1, 1, 1);
+    passEncoder.endPass();
+    device.getQueue().submit([commandEncoder.finish()]);
+
+    const results = new Int32Array(await storageBuffer.mapReadAsync());
+    storageBuffer.unmap();
+    assert_equals(results[0], 42, "Storage buffer binding written to successfully.");
+};
+
+const sampledTextureShader = `
+[numthreads(1, 1, 1)]
+compute void compute_main(Texture2D<uint> inputTexture : register(t0), sampler inputSampler : register(s1), device uint[] output : register(u2))
+{
+    output[0] = Sample(inputTexture, inputSampler, float2(0, 0));
+}
+`;
+
+tests["Create and access a sampled texture in a GPUBindGroup."] = async device => {
+    const [textureDataBuffer, textureArrayBuffer] = device.createBufferMapped({ size: 4, usage: GPUBufferUsage.TRANSFER_SRC });
+    new Uint32Array(textureArrayBuffer).set([42]);
+    textureDataBuffer.unmap();
+
+    const textureSize = { width: 1, height: 1, depth: 1 };
+    const texture = device.createTexture({
+        size: textureSize,
+        format: "rgba8uint",
+        usage: GPUTextureUsage.SAMPLED | GPUTextureUsage.TRANSFER_DST
+    });
+
+    const outputBuffer = device.createBuffer({ size: 4, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.MAP_READ });
+
+    const bindGroupLayout = device.createBindGroupLayout({
+        bindings: [{
+            binding: 0,
+            visibility: GPUShaderStageBit.COMPUTE,
+            type: "sampled-texture"
+        }, {
+            binding: 1,
+            visibility: GPUShaderStageBit.COMPUTE,
+            type: "sampler"
+        }, {
+            binding: 2,
+            visibility: GPUShaderStageBit.COMPUTE,
+            type: "storage-buffer"
+        }]
+    });
+    const bindGroup = device.createBindGroup({
+        layout: bindGroupLayout,
+        bindings: [{
+            binding: 0,
+            resource: texture.createDefaultView()
+        }, {
+            binding: 1,
+            resource: device.createSampler({})
+        }, {
+            binding: 2,
+            resource: {
+                buffer: outputBuffer,
+                size: 4
+            }
+        }]
+    });
+
+    const shaderModule = device.createShaderModule({ code: sampledTextureShader, isWHLSL: true });
+    const pipelineLayout = device.createPipelineLayout({ bindGroupLayouts: [bindGroupLayout] });
+
+    const pipeline = device.createComputePipeline({
+        layout: pipelineLayout,
+        computeStage: {
+            module: shaderModule,
+            entryPoint: "compute_main"
+        }
+    });
+
+    const commandEncoder = device.createCommandEncoder();
+    commandEncoder.copyBufferToTexture({
+        buffer: textureDataBuffer,
+        rowPitch: 4,
+        imageHeight: 0
+    }, { texture: texture }, textureSize);
+
+    const passEncoder = commandEncoder.beginComputePass();
+    passEncoder.setPipeline(pipeline);
+    passEncoder.setBindGroup(0, bindGroup);
+    passEncoder.dispatch(1, 1, 1);
+    passEncoder.endPass();
+
+    device.getQueue().submit([commandEncoder.finish()]);
+
+    const results = new Uint32Array(await outputBuffer.mapReadAsync());
+    outputBuffer.unmap();
+    assert_equals(results[0], 42, "Correct value sampled from a bound 2D texture.");
+};
+
+const comboShader = `
+[numthreads(1, 1, 1)]
+compute void compute_main(
+    Texture2D<uint> inputTexture : register(t0, space0), 
+    sampler inputSampler : register(s0, space1), 
+    constant uint[] input : register(b0, space2), 
+    device uint[] output : register(u0, space3))
+{
+    output[0] = input[0] + Sample(inputTexture, inputSampler, float2(0, 0));
+}
+`;
+
+tests["Create and use multiple GPUBindGroups in a single dispatch."] = async device => {
+    const [textureDataBuffer, textureArrayBuffer] = device.createBufferMapped({ size: 4, usage: GPUBufferUsage.TRANSFER_SRC });
+    new Uint32Array(textureArrayBuffer).set([17]);
+    textureDataBuffer.unmap();
+
+    const textureSize = { width: 1, height: 1, depth: 1 };
+    const texture = device.createTexture({
+        size: textureSize,
+        format: "rgba8uint",
+        usage: GPUTextureUsage.SAMPLED | GPUTextureUsage.TRANSFER_DST
+    });
+
+    const [inputBuffer, inputArrayBuffer] = device.createBufferMapped({ size: 4, usage: GPUBufferUsage.UNIFORM });
+    new Uint32Array(inputArrayBuffer).set([25]);
+    inputBuffer.unmap();
+
+    const outputBuffer = device.createBuffer({ size: 4, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.MAP_READ });
+
+    const bgl0 = device.createBindGroupLayout({
+        bindings: [{
+            binding: 0,
+            visibility: GPUShaderStageBit.COMPUTE,
+            type: "sampled-texture"
+        }]
+    });
+    const bgl1 = device.createBindGroupLayout({
+        bindings: [{
+            binding: 0,
+            visibility: GPUShaderStageBit.COMPUTE,
+            type: "sampler"
+        }]
+    });
+    const bgl2 = device.createBindGroupLayout({
+        bindings: [{
+            binding: 0,
+            visibility: GPUShaderStageBit.COMPUTE,
+            type: "uniform-buffer"
+        }]
+    });
+    const bgl3 = device.createBindGroupLayout({
+        bindings: [{
+            binding: 0,
+            visibility: GPUShaderStageBit.COMPUTE,
+            type: "storage-buffer"
+        }]
+    })
+
+    const bg0 = device.createBindGroup({
+        layout: bgl0,
+        bindings: [{
+            binding: 0,
+            resource: texture.createDefaultView()
+        }]
+    });
+    const bg1 = device.createBindGroup({
+        layout: bgl1,
+        bindings: [{
+            binding: 0,
+            resource: device.createSampler({})
+        }]
+    });
+    const bg2 = device.createBindGroup({
+        layout: bgl2,
+        bindings: [{
+            binding: 0,
+            resource: {
+                buffer: inputBuffer,
+                size: 4
+            }
+        }]
+    });
+    const bg3 = device.createBindGroup({
+        layout: bgl3,
+        bindings: [{
+            binding: 0,
+            resource: {
+                buffer: outputBuffer,
+                size: 4
+            }
+        }]
+    });
+
+    const shaderModule = device.createShaderModule({ code: comboShader, isWHLSL: true });
+    const pipelineLayout = device.createPipelineLayout({ bindGroupLayouts: [bgl0, bgl1, bgl2, bgl3] });
+
+    const pipeline = device.createComputePipeline({
+        layout: pipelineLayout,
+        computeStage: {
+            module: shaderModule,
+            entryPoint: "compute_main"
+        }
+    });
+
+    const commandEncoder = device.createCommandEncoder();
+    commandEncoder.copyBufferToTexture({
+        buffer: textureDataBuffer,
+        rowPitch: 4,
+        imageHeight: 0
+    }, { texture: texture }, textureSize);
+
+    const passEncoder = commandEncoder.beginComputePass();
+    passEncoder.setPipeline(pipeline);
+    passEncoder.setBindGroup(0, bg0);
+    passEncoder.setBindGroup(1, bg1);
+    passEncoder.setBindGroup(2, bg2);
+    passEncoder.setBindGroup(3, bg3);
+    passEncoder.dispatch(1, 1, 1);
+    passEncoder.endPass();
+
+    device.getQueue().submit([commandEncoder.finish()]);
+
+    const results = new Uint32Array(await outputBuffer.mapReadAsync());
+    outputBuffer.unmap();
+    assert_equals(results[0], 42, "Correct value sampled from a bound 2D texture.");
+};
+
+tests["Bind a single GPUBuffer with different offsets in different GPUBindGroups"] = async device => {
+    const numInputs = 4;
+    const [uniformBuffer, writeArrayBuffer] = device.createBufferMapped({ size: 4 * numInputs, usage: GPUBufferUsage.UNIFORM });
+    new Int32Array(writeArrayBuffer).set([1, 2, 3, 36]);
+    uniformBuffer.unmap();
+
+    const storageBuffer = device.createBuffer({ size: 4, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.MAP_READ });
+
+    const bindGroupLayout = device.createBindGroupLayout({
+        bindings: [{
+            binding: 0,
+            visibility: GPUShaderStageBit.COMPUTE,
+            type: "uniform-buffer"
+        }, {
             binding: 1,
-            visibility: GPUShaderStageBit.VERTEX,
+            visibility: GPUShaderStageBit.COMPUTE,
             type: "storage-buffer"
-        };
+        }]
+    });
+
+    let bindGroups = new Array(numInputs);
+    for (let i = 0; i < numInputs; ++i) {
+        bindGroups[i] = device.createBindGroup({
+            layout: bindGroupLayout,
+            bindings: [{
+                binding: 0,
+                resource: {
+                    buffer: uniformBuffer,
+                    offset: i * numInputs,
+                    size: 4
+                }
+            }, {
+                binding: 1,
+                resource: {
+                    buffer: storageBuffer,
+                    size: 4
+                }
+            }]
+        });
+    }
 
-        const bindGroupLayout = device.createBindGroupLayout({ bindings: [bufferLayoutBinding] });
+    const pipelineLayout = device.createPipelineLayout({ bindGroupLayouts: [bindGroupLayout] });
 
-        const buffer = device.createBuffer({ size: 16, usage: GPUBufferUsage.STORAGE });
-        const bufferBinding = { buffer: buffer, size: 16 };
-        const bindGroupBinding = { binding: 1, resource: bufferBinding };
+    const shaderModule = device.createShaderModule({ code: uniformBufferShader, isWHLSL: true });
 
-        const bindGroup = device.createBindGroup({ layout: bindGroupLayout, bindings: [bindGroupBinding]});
-        assert_true(bindGroup instanceof GPUBindGroup, "GPUBindGroup successfully created.");
-    }, function() {
+    const pipeline = device.createComputePipeline({
+        layout: pipelineLayout,
+        computeStage: {
+            module: shaderModule,
+            entryPoint: "compute_main"
+        }
     });
-}, "Create a basic GPUBindGroup via GPUDevice.")
+
+    const commandEncoder = device.createCommandEncoder();
+    const passEncoder = commandEncoder.beginComputePass();
+    passEncoder.setPipeline(pipeline);
+    for (let i = 0; i < numInputs; ++i) {
+        passEncoder.setBindGroup(0, bindGroups[i]);
+        passEncoder.dispatch(1, 1, 1);
+    }
+    passEncoder.endPass();
+    device.getQueue().submit([commandEncoder.finish()]);
+
+    const results = new Int32Array(await storageBuffer.mapReadAsync());
+    storageBuffer.unmap();
+    assert_equals(results[0], 42, "Storage buffer binding written to successfully.");
+};
+
+runTestsWithDevice(tests);
 </script>
 </body>
index f86c891..e54c7ac 100644 (file)
@@ -1,3 +1,66 @@
+2019-08-13  Justin Fan  <justin_fan@apple.com>
+
+        [WebGPU] Improve GPUBindGroup performance using one device-shared argument MTLBuffer
+        https://bugs.webkit.org/show_bug.cgi?id=200606
+
+        Reviewed by Myles C. Maxfield.
+
+        Manage all argument buffer storage for GPUBindGroups in one large MTLBuffer for a GPUDevice.
+        Vastly improves GPUProgrammablePassEncoder.setBindGroup performance; in alpha MotionMark WebGPU benchmark,
+        score improves from ~12000 to ~90000.
+
+        No expected change in WebGPU behavior, though bind-groups.html has been updated to cover more cases.
+
+        * Modules/webgpu/WebGPUDevice.cpp:
+        (WebCore::WebGPUDevice::createBindGroup const):
+        * SourcesCocoa.txt:
+        * WebCore.xcodeproj/project.pbxproj:
+        * platform/graphics/gpu/GPUBindGroup.h: No longer manages one unique MTLBuffer per MTLArgumentEncoder.
+        (WebCore::GPUBindGroup::argumentBuffer const): Delegates to GPUBindGroupAllocator for current argument buffer.
+        (WebCore::GPUBindGroup::vertexArgsBuffer const): Deleted.
+        (WebCore::GPUBindGroup::fragmentArgsBuffer const): Deleted.
+        (WebCore::GPUBindGroup::computeArgsBuffer const): Deleted.
+        * platform/graphics/gpu/GPUBindGroupAllocator.h: Added. Allocates MTLBuffer for and assigns offsets for argument buffers.
+        (WebCore::GPUBindGroupAllocator::argumentBuffer const):
+        * platform/graphics/gpu/GPUBindGroupLayout.h:
+        * platform/graphics/gpu/GPUBuffer.h: Move MTLResourceUsage calculation to GPUBuffer construction.
+        (WebCore::GPUBuffer::platformUsage const):
+        * platform/graphics/gpu/GPUComputePassEncoder.h: Prevent any potiential narrowing issues, as offset can be large.
+        * platform/graphics/gpu/GPUDevice.cpp: Now owns a GPUBindGroupAllocator for owning all its argument buffer storage.
+        (WebCore::GPUDevice::tryCreateBindGroup const):
+        * platform/graphics/gpu/GPUDevice.h:
+        * platform/graphics/gpu/GPUProgrammablePassEncoder.h:
+        (WebCore::GPUProgrammablePassEncoder::setVertexBuffer):
+        (WebCore::GPUProgrammablePassEncoder::setFragmentBuffer):
+        (WebCore::GPUProgrammablePassEncoder::setComputeBuffer):
+        * platform/graphics/gpu/GPURenderPassEncoder.h:
+        * platform/graphics/gpu/GPUTexture.h: Move MTLResourceUsage calculation to GPUTexture construction.
+        (WebCore::GPUTexture::platformUsage const):
+        * platform/graphics/gpu/cocoa/GPUBindGroupAllocatorMetal.mm: Added.
+        (WebCore::GPUBindGroupAllocator::create):
+        (WebCore::GPUBindGroupAllocator::GPUBindGroupAllocator):
+        (WebCore::GPUBindGroupAllocator::allocateAndSetEncoders): Ensures that MTLArgumentEncoders have appropriate allocation for encoding.
+        (WebCore::GPUBindGroupAllocator::reallocate): Create new MTLBuffer large enough for new encoder requirement, and copy over old argument buffer data.
+        (WebCore::GPUBindGroupAllocator::tryReset): For now, resets argument buffer if all GPUBindGroups created with this allocator are destroyed.
+        * platform/graphics/gpu/cocoa/GPUBindGroupMetal.mm:
+        (WebCore::tryGetResourceAsBufferBinding): Add size check.
+        (WebCore::GPUBindGroup::tryCreate): No longer owns new MTLBuffers. Requests argument buffer space from GPUBindGroupAllocator.
+        (WebCore::GPUBindGroup::GPUBindGroup):
+        (WebCore::GPUBindGroup::~GPUBindGroup): Remind allocator to check for possible reset.
+        (WebCore::tryCreateArgumentBuffer): Deleted.
+        * platform/graphics/gpu/cocoa/GPUBufferMetal.mm:
+        (WebCore::GPUBuffer::GPUBuffer):
+        * platform/graphics/gpu/cocoa/GPUComputePassEncoderMetal.mm:
+        (WebCore::GPUComputePassEncoder::setComputeBuffer):
+        * platform/graphics/gpu/cocoa/GPUDeviceMetal.mm:
+        * platform/graphics/gpu/cocoa/GPUProgrammablePassEncoderMetal.mm:
+        (WebCore::GPUProgrammablePassEncoder::setBindGroup): No need to recalculate usage every time. Set appropriate argument buffer and offsets for new bind group model.
+        * platform/graphics/gpu/cocoa/GPURenderPassEncoderMetal.mm:
+        (WebCore::GPURenderPassEncoder::setVertexBuffer):
+        (WebCore::GPURenderPassEncoder::setFragmentBuffer):
+        * platform/graphics/gpu/cocoa/GPUTextureMetal.mm:
+        (WebCore::GPUTexture::GPUTexture):
+
 2019-08-13  Antti Koivisto  <antti@apple.com>
 
         Event region collection should take clipping into account
index db8d990..84f0c6b 100644 (file)
@@ -145,7 +145,7 @@ Ref<WebGPUBindGroup> WebGPUDevice::createBindGroup(const WebGPUBindGroupDescript
     if (!gpuDescriptor)
         return WebGPUBindGroup::create(nullptr);
 
-    auto bindGroup = GPUBindGroup::tryCreate(*gpuDescriptor);
+    auto bindGroup = m_device->tryCreateBindGroup(*gpuDescriptor, m_errorScopes);
     return WebGPUBindGroup::create(WTFMove(bindGroup));
 }
 
index e83348a..ec01b82 100644 (file)
@@ -326,6 +326,7 @@ platform/graphics/cv/PixelBufferConformerCV.cpp
 platform/graphics/cv/TextureCacheCV.mm
 platform/graphics/cv/VideoTextureCopierCV.cpp
 
+platform/graphics/gpu/cocoa/GPUBindGroupAllocatorMetal.mm
 platform/graphics/gpu/cocoa/GPUBindGroupMetal.mm
 platform/graphics/gpu/cocoa/GPUBindGroupLayoutMetal.mm
 platform/graphics/gpu/cocoa/GPUBufferMetal.mm
index a1785c5..ea9d553 100644 (file)
                D0615FCC217FE5C6008A48A8 /* WebGPUShaderModule.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; path = WebGPUShaderModule.h; sourceTree = "<group>"; };
                D0615FCD217FE5C6008A48A8 /* WebGPUShaderModule.cpp */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.cpp.cpp; path = WebGPUShaderModule.cpp; sourceTree = "<group>"; };
                D0615FCE217FE5C6008A48A8 /* WebGPUShaderModule.idl */ = {isa = PBXFileReference; lastKnownFileType = text; path = WebGPUShaderModule.idl; sourceTree = "<group>"; };
+               D065BE5722FB616D0076DD60 /* GPUBindGroupAllocator.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; path = GPUBindGroupAllocator.h; sourceTree = "<group>"; };
+               D065BE5822FB616D0076DD60 /* GPUBindGroupAllocatorMetal.mm */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.cpp.objcpp; path = GPUBindGroupAllocatorMetal.mm; sourceTree = "<group>"; };
                D06A9A2122026C7A0083C662 /* GPURequestAdapterOptions.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; path = GPURequestAdapterOptions.h; sourceTree = "<group>"; };
                D06C0D8D0CFD11460065F43F /* RemoveFormatCommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RemoveFormatCommand.h; sourceTree = "<group>"; };
                D06C0D8E0CFD11460065F43F /* RemoveFormatCommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = RemoveFormatCommand.cpp; sourceTree = "<group>"; };
                        children = (
                                D087CE3721ACA94200BDE174 /* cocoa */,
                                D0BE105E21E6BAD300E42A89 /* GPUBindGroup.h */,
+                               D065BE5722FB616D0076DD60 /* GPUBindGroupAllocator.h */,
                                D0BE104E21E695E200E42A89 /* GPUBindGroupBinding.h */,
                                D0BE105121E6A70E00E42A89 /* GPUBindGroupDescriptor.h */,
                                D02454D021C4A41C00B73628 /* GPUBindGroupLayout.h */,
                D087CE3721ACA94200BDE174 /* cocoa */ = {
                        isa = PBXGroup;
                        children = (
+                               D065BE5822FB616D0076DD60 /* GPUBindGroupAllocatorMetal.mm */,
                                D0232B5821CB49B7009483B9 /* GPUBindGroupLayoutMetal.mm */,
                                D085E64A2236DEAE00C3E1E2 /* GPUBindGroupMetal.mm */,
                                D0D8649121B760C4003C983C /* GPUBufferMetal.mm */,
                                E1FE137518402A6700892F13 /* CommonCryptoUtilities.h in Headers */,
                                0F60F32B1DFBB10700416D6C /* CommonVM.h in Headers */,
                                7C93F34A1AA6BA5E00A98BAB /* CompiledContentExtension.h in Headers */,
+                               E4E94D6122FF158A00DD191F /* ComplexLineLayout.h in Headers */,
                                C2F4E78C1E45C3EF006D7105 /* ComplexTextController.h in Headers */,
                                E4BA50901BCFBD9500E34EF7 /* ComposedTreeAncestorIterator.h in Headers */,
                                E44FA1851BCA6B5A0091B6EF /* ComposedTreeIterator.h in Headers */,
                                1A7FA6190DDA3B3A0028F8A5 /* NetworkStateNotifier.h in Headers */,
                                E13EF3441684ECF40034C83F /* NetworkStorageSession.h in Headers */,
                                269397241A4A5B6400E8349D /* NFA.h in Headers */,
-                               E4E94D6122FF158A00DD191F /* ComplexLineLayout.h in Headers */,
                                269397221A4A412F00E8349D /* NFANode.h in Headers */,
                                267726011A5B3AD9003C24DD /* NFAToDFA.h in Headers */,
                                BCEF43DD0E674012001C1287 /* NinePieceImage.h in Headers */,
index a055597..a973159 100644 (file)
 
 #if ENABLE(WEBGPU)
 
+#include "GPUBindGroupAllocator.h"
 #include "GPUBuffer.h"
 #include "GPUTexture.h"
+#include <objc/NSObjCRuntime.h>
+#include <utility>
 #include <wtf/HashSet.h>
 #include <wtf/RefCounted.h>
 #include <wtf/RefPtr.h>
 #include <wtf/RetainPtr.h>
 
+#if USE(METAL)
 OBJC_PROTOCOL(MTLBuffer);
+#endif
 
 namespace WebCore {
 
 struct GPUBindGroupDescriptor;
 
+#if USE(METAL)
+using ArgumentBuffer = std::pair<const MTLBuffer *, const GPUBindGroupAllocator::ArgumentBufferOffsets&>;
+#endif
+
 class GPUBindGroup : public RefCounted<GPUBindGroup> {
 public:
-    static RefPtr<GPUBindGroup> tryCreate(const GPUBindGroupDescriptor&);
+    static RefPtr<GPUBindGroup> tryCreate(const GPUBindGroupDescriptor&, GPUBindGroupAllocator&);
+
+    ~GPUBindGroup();
     
 #if USE(METAL)
-    const MTLBuffer *vertexArgsBuffer() const { return m_vertexArgsBuffer.get(); }
-    const MTLBuffer *fragmentArgsBuffer() const { return m_fragmentArgsBuffer.get(); }
-    const MTLBuffer *computeArgsBuffer() const { return m_computeArgsBuffer.get(); }
+    const ArgumentBuffer argumentBuffer() const { return { m_allocator->argumentBuffer(), m_argumentBufferOffsets }; }
 #endif
     const HashSet<Ref<GPUBuffer>>& boundBuffers() const { return m_boundBuffers; }
     const HashSet<Ref<GPUTexture>>& boundTextures() const { return m_boundTextures; }
 
 private:
 #if USE(METAL)
-    GPUBindGroup(RetainPtr<MTLBuffer>&& vertexBuffer, RetainPtr<MTLBuffer>&& fragmentBuffer, RetainPtr<MTLBuffer>&& computeArgsBuffer, HashSet<Ref<GPUBuffer>>&&, HashSet<Ref<GPUTexture>>&&);
+    GPUBindGroup(GPUBindGroupAllocator::ArgumentBufferOffsets&&, GPUBindGroupAllocator&, HashSet<Ref<GPUBuffer>>&&, HashSet<Ref<GPUTexture>>&&);
     
-    RetainPtr<MTLBuffer> m_vertexArgsBuffer;
-    RetainPtr<MTLBuffer> m_fragmentArgsBuffer;
-    RetainPtr<MTLBuffer> m_computeArgsBuffer;
+    GPUBindGroupAllocator::ArgumentBufferOffsets m_argumentBufferOffsets;
+    Ref<GPUBindGroupAllocator> m_allocator;
 #endif
     HashSet<Ref<GPUBuffer>> m_boundBuffers;
     HashSet<Ref<GPUTexture>> m_boundTextures;
diff --git a/Source/WebCore/platform/graphics/gpu/GPUBindGroupAllocator.h b/Source/WebCore/platform/graphics/gpu/GPUBindGroupAllocator.h
new file mode 100644 (file)
index 0000000..75be1f2
--- /dev/null
@@ -0,0 +1,76 @@
+/*
+ * Copyright (C) 2019 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#pragma once
+
+#if ENABLE(WEBGPU)
+
+#include <objc/NSObjCRuntime.h>
+#include <wtf/Optional.h>
+#include <wtf/Ref.h>
+#include <wtf/RefCounted.h>
+#include <wtf/RetainPtr.h>
+
+OBJC_PROTOCOL(MTLArgumentEncoder);
+OBJC_PROTOCOL(MTLBuffer);
+
+namespace WebCore {
+
+class GPUErrorScopes;
+
+class GPUBindGroupAllocator : public RefCounted<GPUBindGroupAllocator> {
+public:
+    static Ref<GPUBindGroupAllocator> create(GPUErrorScopes&);
+
+#if USE(METAL)
+    struct ArgumentBufferOffsets {
+        Optional<NSUInteger> vertex;
+        Optional<NSUInteger> fragment;
+        Optional<NSUInteger> compute;
+    };
+
+    Optional<ArgumentBufferOffsets> allocateAndSetEncoders(MTLArgumentEncoder *vertex, MTLArgumentEncoder *fragment, MTLArgumentEncoder *compute);
+
+    void tryReset();
+
+    const MTLBuffer *argumentBuffer() const { return m_argumentBuffer.get(); }
+#endif
+
+private:
+    explicit GPUBindGroupAllocator(GPUErrorScopes&);
+
+#if USE(METAL)
+    bool reallocate(NSUInteger);
+
+    RetainPtr<MTLBuffer> m_argumentBuffer;
+    NSUInteger m_lastOffset { 0 };
+#endif
+
+    Ref<GPUErrorScopes> m_errorScopes;
+};
+
+} // namespace WebCore
+
+#endif // ENABLE(WEBGPU)
index f7762c3..89f0617 100644 (file)
@@ -28,7 +28,6 @@
 #if ENABLE(WEBGPU)
 
 #include "GPUBindGroupLayoutDescriptor.h"
-
 #include <wtf/HashMap.h>
 #include <wtf/RefCounted.h>
 #include <wtf/RefPtr.h>
index 6550a40..07f40ae 100644 (file)
@@ -83,6 +83,7 @@ public:
     bool isStorage() const { return m_usage.contains(GPUBufferUsage::Flags::Storage); }
     bool isReadOnly() const;
     bool isMappable() const { return m_usage.containsAny({ GPUBufferUsage::Flags::MapWrite, GPUBufferUsage::Flags::MapRead }); }
+    unsigned platformUsage() const { return m_platformUsage; }
     State state() const;
 
     JSC::ArrayBuffer* mapOnCreation();
@@ -132,6 +133,7 @@ private:
 
     size_t m_byteLength;
     OptionSet<GPUBufferUsage::Flags> m_usage;
+    unsigned m_platformUsage;
     unsigned m_numScheduledCommandBuffers { 0 };
     bool m_isMappedFromCreation { false };
 };
index cd9f9df..af58887 100644 (file)
@@ -56,7 +56,7 @@ private:
     void invalidateEncoder() final { m_platformComputePassEncoder = nullptr; }
 #if USE(METAL)
     void useResource(const MTLResource *, unsigned usage) final;
-    void setComputeBuffer(const MTLBuffer *, unsigned offset, unsigned index) final;
+    void setComputeBuffer(const MTLBuffer *, NSUInteger offset, unsigned index) final;
 #endif
 
     PlatformComputePassEncoderSmartPtr m_platformComputePassEncoder;
index 54d6f15..f5e43ce 100644 (file)
@@ -28,6 +28,9 @@
 
 #if ENABLE(WEBGPU)
 
+#include "GPUBindGroup.h"
+#include "GPUBindGroupAllocator.h"
+#include "GPUBindGroupDescriptor.h"
 #include "GPUBindGroupLayout.h"
 #include "GPUBindGroupLayoutDescriptor.h"
 #include "GPUBuffer.h"
@@ -92,6 +95,14 @@ RefPtr<GPUComputePipeline> GPUDevice::tryCreateComputePipeline(const GPUComputeP
     return GPUComputePipeline::tryCreate(*this, descriptor, errorScopes);
 }
 
+RefPtr<GPUBindGroup> GPUDevice::tryCreateBindGroup(const GPUBindGroupDescriptor& descriptor, GPUErrorScopes& errorScopes) const
+{
+    if (!m_bindGroupAllocator)
+        m_bindGroupAllocator = GPUBindGroupAllocator::create(errorScopes);
+
+    return GPUBindGroup::tryCreate(descriptor, *m_bindGroupAllocator);
+}
+
 RefPtr<GPUCommandBuffer> GPUDevice::tryCreateCommandBuffer() const
 {
     return GPUCommandBuffer::tryCreate(*this);
index c8fb025..48713f2 100644 (file)
 
 #if ENABLE(WEBGPU)
 
+#include "GPUBindGroupAllocator.h"
 #include "GPUQueue.h"
 #include "GPUSwapChain.h"
 #include <wtf/Function.h>
 #include <wtf/Optional.h>
+#include <wtf/Ref.h>
 #include <wtf/RefCounted.h>
+#include <wtf/RefPtr.h>
 #include <wtf/RetainPtr.h>
 #include <wtf/WeakPtr.h>
 
@@ -39,6 +42,7 @@ OBJC_PROTOCOL(MTLDevice);
 
 namespace WebCore {
 
+class GPUBindGroup;
 class GPUBindGroupLayout;
 class GPUBuffer;
 class GPUCommandBuffer;
@@ -50,6 +54,7 @@ class GPUSampler;
 class GPUShaderModule;
 class GPUTexture;
 
+struct GPUBindGroupDescriptor;
 struct GPUBindGroupLayoutDescriptor;
 struct GPUBufferDescriptor;
 struct GPUComputePipelineDescriptor;
@@ -76,6 +81,7 @@ public:
 
     RefPtr<GPUBindGroupLayout> tryCreateBindGroupLayout(const GPUBindGroupLayoutDescriptor&) const;
     Ref<GPUPipelineLayout> createPipelineLayout(GPUPipelineLayoutDescriptor&&) const;
+    RefPtr<GPUBindGroup> tryCreateBindGroup(const GPUBindGroupDescriptor&, GPUErrorScopes&) const;
 
     RefPtr<GPUShaderModule> tryCreateShaderModule(const GPUShaderModuleDescriptor&) const;
     RefPtr<GPURenderPipeline> tryCreateRenderPipeline(const GPURenderPipelineDescriptor&, GPUErrorScopes&) const;
@@ -95,6 +101,7 @@ private:
     PlatformDeviceSmartPtr m_platformDevice;
     mutable RefPtr<GPUQueue> m_queue;
     RefPtr<GPUSwapChain> m_swapChain;
+    mutable RefPtr<GPUBindGroupAllocator> m_bindGroupAllocator;
 };
 
 } // namespace WebCore
index cb36032..c9cc489 100644 (file)
@@ -29,6 +29,7 @@
 
 #include "GPUBindGroupBinding.h"
 #include "GPUCommandBuffer.h"
+#include <objc/NSObjCRuntime.h>
 #include <wtf/RefCounted.h>
 
 #if USE(METAL)
@@ -63,10 +64,10 @@ private:
     virtual void useResource(const MTLResource *, unsigned) = 0;
 
     // Render command encoder methods.
-    virtual void setVertexBuffer(const MTLBuffer *, unsigned, unsigned) { }
-    virtual void setFragmentBuffer(const MTLBuffer *, unsigned, unsigned) { }
+    virtual void setVertexBuffer(const MTLBuffer *, NSUInteger, unsigned) { }
+    virtual void setFragmentBuffer(const MTLBuffer *, NSUInteger, unsigned) { }
     // Compute.
-    virtual void setComputeBuffer(const MTLBuffer *, unsigned, unsigned) { }
+    virtual void setComputeBuffer(const MTLBuffer *, NSUInteger, unsigned) { }
 #endif // USE(METAL)
 
     Ref<GPUCommandBuffer> m_commandBuffer;
index 41e3c4b..3fa36df 100644 (file)
@@ -70,8 +70,8 @@ private:
     void invalidateEncoder() final { m_platformRenderPassEncoder = nullptr; }
 #if USE(METAL)
     void useResource(const MTLResource *, unsigned usage) final;
-    void setVertexBuffer(const MTLBuffer *, unsigned offset, unsigned index) final;
-    void setFragmentBuffer(const MTLBuffer *, unsigned offset, unsigned index) final;
+    void setVertexBuffer(const MTLBuffer *, NSUInteger offset, unsigned index) final;
+    void setFragmentBuffer(const MTLBuffer *, NSUInteger offset, unsigned index) final;
 
     RefPtr<GPUBuffer> m_indexBuffer;
     uint64_t m_indexBufferOffset;
index 61912af..a25c410 100644 (file)
@@ -56,6 +56,7 @@ public:
     bool isReadOnly() const { return m_usage.containsAny({ GPUTextureUsage::Flags::TransferSource, GPUTextureUsage::Flags::Sampled }); }
     bool isSampled() const { return m_usage.contains(GPUTextureUsage::Flags::Sampled); }
     bool isStorage() const { return m_usage.contains(GPUTextureUsage::Flags::Storage); }
+    unsigned platformUsage() const { return m_platformUsage; }
 
     RefPtr<GPUTexture> tryCreateDefaultTextureView();
     void destroy() { m_platformTexture = nullptr; }
@@ -66,6 +67,7 @@ private:
     PlatformTextureSmartPtr m_platformTexture;
 
     OptionSet<GPUTextureUsage::Flags> m_usage;
+    unsigned m_platformUsage;
 };
 
 } // namespace WebCore
diff --git a/Source/WebCore/platform/graphics/gpu/cocoa/GPUBindGroupAllocatorMetal.mm b/Source/WebCore/platform/graphics/gpu/cocoa/GPUBindGroupAllocatorMetal.mm
new file mode 100644 (file)
index 0000000..e54cc29
--- /dev/null
@@ -0,0 +1,167 @@
+/*
+ * Copyright (C) 2019 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#import "config.h"
+#import "GPUBindGroupAllocator.h"
+
+#if ENABLE(WEBGPU)
+
+#import "GPUErrorScopes.h"
+#import <Metal/Metal.h>
+#import <wtf/BlockObjCExceptions.h>
+#import <wtf/CheckedArithmetic.h>
+
+namespace WebCore {
+
+Ref<GPUBindGroupAllocator> GPUBindGroupAllocator::create(GPUErrorScopes& errors)
+{
+    return adoptRef(*new GPUBindGroupAllocator(errors));
+}
+
+GPUBindGroupAllocator::GPUBindGroupAllocator(GPUErrorScopes& errors)
+    : m_errorScopes(makeRef(errors))
+{
+}
+
+#if USE(METAL)
+
+Optional<GPUBindGroupAllocator::ArgumentBufferOffsets> GPUBindGroupAllocator::allocateAndSetEncoders(MTLArgumentEncoder *vertex, MTLArgumentEncoder *fragment, MTLArgumentEncoder *compute)
+{
+    id<MTLDevice> device = nil;
+    auto checkedOffset = Checked<NSUInteger>(m_lastOffset);
+
+    if (vertex) {
+        device = vertex.device;
+        checkedOffset += vertex.encodedLength;
+    }
+    if (fragment) {
+        device = fragment.device;
+        checkedOffset += fragment.encodedLength;
+    }
+    if (compute) {
+        device = compute.device;
+        checkedOffset += compute.encodedLength;
+    }
+
+    // No arguments; nothing to be done.
+    if (!device)
+        return { };
+
+    if (checkedOffset.hasOverflowed()) {
+        m_errorScopes->generateError("", GPUErrorFilter::OutOfMemory);
+        return { };
+    }
+
+    auto newOffset = checkedOffset.unsafeGet();
+
+    if (m_argumentBuffer && newOffset > m_argumentBuffer.get().length) {
+        if (!reallocate(newOffset))
+            return { };
+    } else if (!m_argumentBuffer) {
+        // Based off mimimum allocation for a shared-memory MTLBuffer on macOS.
+        NSUInteger minimumSize = 4096;
+        if (minimumSize < newOffset)
+            minimumSize = newOffset;
+
+        BEGIN_BLOCK_OBJC_EXCEPTIONS;
+        m_argumentBuffer = adoptNS([device newBufferWithLength:minimumSize options:0]);
+        END_BLOCK_OBJC_EXCEPTIONS;
+
+        if (!m_argumentBuffer) {
+            m_errorScopes->generateError("", GPUErrorFilter::OutOfMemory);
+            return { };
+        }
+    }
+
+    ArgumentBufferOffsets offsets;
+
+    // Math in the following section is guarded against overflow by newOffset calculation.
+    BEGIN_BLOCK_OBJC_EXCEPTIONS;
+
+    if (vertex) {
+        offsets.vertex = m_lastOffset;
+        [vertex setArgumentBuffer:m_argumentBuffer.get() offset:*offsets.vertex];
+    }
+    if (fragment) {
+        offsets.fragment = (vertex ? vertex.encodedLength : 0) + m_lastOffset;
+        [fragment setArgumentBuffer:m_argumentBuffer.get() offset:*offsets.fragment];
+    }
+    if (compute) {
+        offsets.compute = (vertex ? vertex.encodedLength : 0) + (fragment ? fragment.encodedLength : 0) + m_lastOffset;
+        [compute setArgumentBuffer:m_argumentBuffer.get() offset:*offsets.compute];
+    }
+
+    END_BLOCK_OBJC_EXCEPTIONS;
+
+    m_lastOffset = newOffset;
+
+    return offsets;
+}
+
+// FIXME: https://bugs.webkit.org/show_bug.cgi?id=200657, https://bugs.webkit.org/show_bug.cgi?id=200658 Optimize reallocation and reset behavior.
+bool GPUBindGroupAllocator::reallocate(NSUInteger newOffset)
+{
+    MTLBuffer *newBuffer = nil;
+
+    auto newLength = Checked<NSUInteger>(m_argumentBuffer.get().length);
+    while (newLength < newOffset) {
+        newLength *= 1.25;
+
+        if (newLength.hasOverflowed()) {
+            newLength = std::numeric_limits<NSUInteger>::max();
+            break;
+        }
+    }
+
+    BEGIN_BLOCK_OBJC_EXCEPTIONS;
+
+    newBuffer = [m_argumentBuffer.get().device newBufferWithLength:newLength.unsafeGet() options:0];
+    memcpy(newBuffer.contents, m_argumentBuffer.get().contents, m_argumentBuffer.get().length);
+
+    END_BLOCK_OBJC_EXCEPTIONS;
+
+    if (!newBuffer) {
+        m_errorScopes->generateError("", GPUErrorFilter::OutOfMemory);
+        return false;
+    }
+
+    m_argumentBuffer = adoptNS(newBuffer);
+    return true;
+}
+
+void GPUBindGroupAllocator::tryReset()
+{
+    if (!hasOneRef())
+        return;
+
+    m_argumentBuffer = nullptr;
+    m_lastOffset = 0;
+}
+
+#endif // USE(METAL)
+
+} // namespace WebCore
+
+#endif // ENABLE(WEBGPU)
index cbfbdd8..414f625 100644 (file)
@@ -28,6 +28,7 @@
 
 #if ENABLE(WEBGPU)
 
+#import "GPUBindGroupAllocator.h"
 #import "GPUBindGroupBinding.h"
 #import "GPUBindGroupDescriptor.h"
 #import "GPUBindGroupLayout.h"
 #import <wtf/Optional.h>
 
 namespace WebCore {
-
-static RetainPtr<MTLBuffer> tryCreateArgumentBuffer(MTLArgumentEncoder *encoder)
-{
-    RetainPtr<MTLBuffer> buffer;
-    BEGIN_BLOCK_OBJC_EXCEPTIONS;
-    buffer = adoptNS([encoder.device newBufferWithLength:encoder.encodedLength options:0]);
-    [encoder setArgumentBuffer:buffer.get() offset:0];
-    END_BLOCK_OBJC_EXCEPTIONS;
-    return buffer;
-}
     
 static Optional<GPUBufferBinding> tryGetResourceAsBufferBinding(const GPUBindingResource& resource, const char* const functionName)
 {
@@ -64,6 +55,10 @@ static Optional<GPUBufferBinding> tryGetResourceAsBufferBinding(const GPUBinding
         LOG(WebGPU, "%s: Invalid MTLBuffer in GPUBufferBinding!", functionName);
         return WTF::nullopt;
     }
+    if (!WTF::isInBounds<NSUInteger>(bufferBinding.size) || bufferBinding.size > bufferBinding.buffer->byteLength()) {
+        LOG(WebGPU, "%s: GPUBufferBinding size is too large!", functionName);
+        return WTF::nullopt;
+    }
     // MTLBuffer size (NSUInteger) is 32 bits on some platforms.
     if (!WTF::isInBounds<NSUInteger>(bufferBinding.offset)) {
         LOG(WebGPU, "%s: Buffer offset is too large!", functionName);
@@ -135,30 +130,18 @@ static void setTextureOnEncoder(MTLArgumentEncoder *argumentEncoder, MTLTexture
     [argumentEncoder setTexture:texture atIndex:index];
     END_BLOCK_OBJC_EXCEPTIONS;
 }
-    
-RefPtr<GPUBindGroup> GPUBindGroup::tryCreate(const GPUBindGroupDescriptor& descriptor)
+
+RefPtr<GPUBindGroup> GPUBindGroup::tryCreate(const GPUBindGroupDescriptor& descriptor, GPUBindGroupAllocator& allocator)
 {
     const char* const functionName = "GPUBindGroup::tryCreate()";
     
     MTLArgumentEncoder *vertexEncoder = descriptor.layout->vertexEncoder();
     MTLArgumentEncoder *fragmentEncoder = descriptor.layout->fragmentEncoder();
     MTLArgumentEncoder *computeEncoder = descriptor.layout->computeEncoder();
-    
-    RetainPtr<MTLBuffer> vertexArgsBuffer;
-    if (vertexEncoder && !(vertexArgsBuffer = tryCreateArgumentBuffer(vertexEncoder))) {
-        LOG(WebGPU, "%s: Unable to create MTLBuffer for vertex argument buffer!", functionName);
-        return nullptr;
-    }
-    RetainPtr<MTLBuffer> fragmentArgsBuffer;
-    if (fragmentEncoder && !(fragmentArgsBuffer = tryCreateArgumentBuffer(fragmentEncoder))) {
-        LOG(WebGPU, "%s: Unable to create MTLBuffer for fragment argument buffer!", functionName);
-        return nullptr;
-    }
-    RetainPtr<MTLBuffer> computeArgsBuffer;
-    if (computeEncoder && !(computeArgsBuffer = tryCreateArgumentBuffer(computeEncoder))) {
-        LOG(WebGPU, "%s: Unable to create MTLBuffer for compute argument buffer!", functionName);
+
+    auto offsets = allocator.allocateAndSetEncoders(vertexEncoder, fragmentEncoder, computeEncoder);
+    if (!offsets)
         return nullptr;
-    }
     
     HashSet<Ref<GPUBuffer>> boundBuffers;
     HashSet<Ref<GPUTexture>> boundTextures;
@@ -203,7 +186,7 @@ RefPtr<GPUBindGroup> GPUBindGroup::tryCreate(const GPUBindGroupDescriptor& descr
                 setBufferOnEncoder(fragmentEncoder, *bufferResource, layoutBinding.internalName, internalLengthName);
             if (isForCompute)
                 setBufferOnEncoder(computeEncoder, *bufferResource, layoutBinding.internalName, internalLengthName);
-            boundBuffers.addVoid(bufferResource->buffer.copyRef());
+            boundBuffers.addVoid(WTFMove(bufferResource->buffer));
             return true;
         };
 
@@ -243,17 +226,23 @@ RefPtr<GPUBindGroup> GPUBindGroup::tryCreate(const GPUBindGroupDescriptor& descr
             return nullptr;
     }
     
-    return adoptRef(new GPUBindGroup(WTFMove(vertexArgsBuffer), WTFMove(fragmentArgsBuffer), WTFMove(computeArgsBuffer), WTFMove(boundBuffers), WTFMove(boundTextures)));
+    return adoptRef(new GPUBindGroup(WTFMove(*offsets), allocator, WTFMove(boundBuffers), WTFMove(boundTextures)));
 }
     
-GPUBindGroup::GPUBindGroup(RetainPtr<MTLBuffer>&& vertexBuffer, RetainPtr<MTLBuffer>&& fragmentBuffer, RetainPtr<MTLBuffer>&& computeBuffer, HashSet<Ref<GPUBuffer>>&& buffers, HashSet<Ref<GPUTexture>>&& textures)
-    : m_vertexArgsBuffer(WTFMove(vertexBuffer))
-    , m_fragmentArgsBuffer(WTFMove(fragmentBuffer))
-    , m_computeArgsBuffer(WTFMove(computeBuffer))
+GPUBindGroup::GPUBindGroup(GPUBindGroupAllocator::ArgumentBufferOffsets&& offsets, GPUBindGroupAllocator& allocator, HashSet<Ref<GPUBuffer>>&& buffers, HashSet<Ref<GPUTexture>>&& textures)
+    : m_argumentBufferOffsets(WTFMove(offsets))
+    , m_allocator(makeRef(allocator))
     , m_boundBuffers(WTFMove(buffers))
     , m_boundTextures(WTFMove(textures))
 {
 }
+
+GPUBindGroup::~GPUBindGroup()
+{
+    GPUBindGroupAllocator& rawAllocator = m_allocator.leakRef();
+    rawAllocator.deref();
+    rawAllocator.tryReset();
+}
     
 } // namespace WebCore
 
index f80eadf..da70ae6 100644 (file)
@@ -118,6 +118,9 @@ GPUBuffer::GPUBuffer(RetainPtr<MTLBuffer>&& buffer, GPUDevice& device, size_t si
     , m_usage(usage)
     , m_isMappedFromCreation(isMapped == GPUBufferMappedOption::IsMapped)
 {
+    m_platformUsage = MTLResourceUsageRead;
+    if (isStorage())
+        m_platformUsage |= MTLResourceUsageWrite;
 }
 
 GPUBuffer::~GPUBuffer()
index a5b7ca9..d7a0096 100644 (file)
@@ -121,7 +121,7 @@ void GPUComputePassEncoder::useResource(const MTLResource *resource, unsigned us
     END_BLOCK_OBJC_EXCEPTIONS;
 }
 
-void GPUComputePassEncoder::setComputeBuffer(const MTLBuffer * buffer, unsigned offset, unsigned index)
+void GPUComputePassEncoder::setComputeBuffer(const MTLBuffer *buffer, NSUInteger offset, unsigned index)
 {
     ASSERT(m_platformComputePassEncoder);
 
index 7bb16cd..32d25f2 100644 (file)
@@ -30,7 +30,6 @@
 
 #import "GPURequestAdapterOptions.h"
 #import "Logging.h"
-
 #import <Metal/Metal.h>
 #import <pal/spi/cocoa/MetalSPI.h>
 #import <wtf/BlockObjCExceptions.h>
index 62e1158..71b645f 100644 (file)
@@ -54,37 +54,25 @@ void GPUProgrammablePassEncoder::setBindGroup(unsigned index, GPUBindGroup& bind
         LOG(WebGPU, "GPUProgrammablePassEncoder::setBindGroup(): Invalid operation: Encoding is ended!");
         return;
     }
-    
-    if (bindGroup.vertexArgsBuffer())
-        setVertexBuffer(bindGroup.vertexArgsBuffer(), 0, index);
-    if (bindGroup.fragmentArgsBuffer())
-        setFragmentBuffer(bindGroup.fragmentArgsBuffer(), 0, index);
-    if (bindGroup.computeArgsBuffer())
-        setComputeBuffer(bindGroup.computeArgsBuffer(), 0, index);
 
-    for (auto& bufferRef : bindGroup.boundBuffers()) {
-        MTLResourceUsage usage = 0;
-        if (bufferRef->isUniform()) {
-            ASSERT(!bufferRef->isStorage());
-            usage = MTLResourceUsageRead;
-        } else if (bufferRef->isStorage()) {
-            ASSERT(!bufferRef->isUniform());
-            usage = MTLResourceUsageRead | MTLResourceUsageWrite;
-        }
-        useResource(bufferRef->platformBuffer(), usage);
-        m_commandBuffer->useBuffer(bufferRef.copyRef());
+    auto argumentBuffer = bindGroup.argumentBuffer();
+    if (!argumentBuffer.first)
+        return;
+
+    if (argumentBuffer.second.vertex)
+        setVertexBuffer(argumentBuffer.first, *argumentBuffer.second.vertex, index);
+    if (argumentBuffer.second.fragment)
+        setFragmentBuffer(argumentBuffer.first, *argumentBuffer.second.fragment, index);
+    if (argumentBuffer.second.compute)
+        setComputeBuffer(argumentBuffer.first, *argumentBuffer.second.compute, index);
+
+    for (auto& buffer : bindGroup.boundBuffers()) {
+        useResource(buffer->platformBuffer(), static_cast<MTLResourceUsage>(buffer->platformUsage()));
+        m_commandBuffer->useBuffer(buffer.copyRef());
     }
-    for (auto& textureRef : bindGroup.boundTextures()) {
-        MTLResourceUsage usage = 0;
-        if (textureRef->isSampled()) {
-            ASSERT(!textureRef->isStorage());
-            usage = MTLResourceUsageRead | MTLResourceUsageSample;
-        } else if (textureRef->isStorage()) {
-            ASSERT(!textureRef->isSampled());
-            usage = MTLResourceUsageRead | MTLResourceUsageWrite;
-        }
-        useResource(textureRef->platformTexture(), usage);
-        m_commandBuffer->useTexture(textureRef.copyRef());
+    for (auto& texture : bindGroup.boundTextures()) {
+        useResource(texture->platformTexture(), static_cast<MTLResourceUsage>(texture->platformUsage()));
+        m_commandBuffer->useTexture(texture.copyRef());
     }
 }
 
index 41a0406..8938d83 100644 (file)
@@ -393,7 +393,7 @@ void GPURenderPassEncoder::useResource(const MTLResource *resource, unsigned usa
     END_BLOCK_OBJC_EXCEPTIONS;
 }
 
-void GPURenderPassEncoder::setVertexBuffer(const MTLBuffer *buffer, unsigned offset, unsigned index)
+void GPURenderPassEncoder::setVertexBuffer(const MTLBuffer *buffer, NSUInteger offset, unsigned index)
 {
     ASSERT(m_platformRenderPassEncoder);
 
@@ -402,7 +402,7 @@ void GPURenderPassEncoder::setVertexBuffer(const MTLBuffer *buffer, unsigned off
     END_BLOCK_OBJC_EXCEPTIONS;
 }
 
-void GPURenderPassEncoder::setFragmentBuffer(const MTLBuffer *buffer, unsigned offset, unsigned index)
+void GPURenderPassEncoder::setFragmentBuffer(const MTLBuffer *buffer, NSUInteger offset, unsigned index)
 {
     ASSERT(m_platformRenderPassEncoder);
 
index ee48928..d8c1875 100644 (file)
@@ -178,6 +178,11 @@ GPUTexture::GPUTexture(RetainPtr<MTLTexture>&& texture, OptionSet<GPUTextureUsag
     : m_platformTexture(WTFMove(texture))
     , m_usage(usage)
 {
+    m_platformUsage = MTLResourceUsageRead;
+    if (isSampled())
+        m_platformUsage |= MTLResourceUsageSample;
+    else if (isStorage())
+        m_platformUsage |= MTLResourceUsageWrite;
 }
 
 RefPtr<GPUTexture> GPUTexture::tryCreateDefaultTextureView()