By: Philip Taylor (philip.delete@this.zaynar.co.uk), August 4, 2016 4:06 pm
Room: Moderated Discussions
Regarding the vertex buffer: what you described is the same as how I was interpreting it already (but probably not expressing clearly), so I agree :-)
> I'll suggest that you submit triangles scattered around the whole
> screen so that you don't have a single consistent submission pattern
> that covers the entire screen.
I've tried scattering small triangles with "x += 1.0 + sin(VertexID / 3); y -= 1.0 + sin(1.7 * (VertexID / 3));" and the behaviour is essentially the same as before.
If I set it to 21 floats per vertex, it first draws the first approximately 128 triangles: It starts by drawing all those triangles in order, clipped to the top-left 256x512 px region, then it moves onto the next region and draws them all again, etc, until it's filled the screen. Then it starts again with the next ~128 triangles in the top-left region and repeats.
If I set it to 17-20 floats per vertex, it's similar but draws ~256 triangles in each iteration.
If I set it to 16 floats per vertex, it's similar but draws ~384 triangles in each iteration.
The numbers don't match up exactly, but I think that indicates there's an approximately 64KB buffer for vertex-shaded primitives. Once that buffer is nearly full (or at the end of a draw call), the rasteriser starts processing all the triangles in that buffer (multiple times, once per 256x512 region), and when it's finished it waits for another 64KB of data before starting the next pass.
(Those numbers are from code that puts unique values in every vertex output. If there are duplicate values then it draws more triangles in each pass, so I believe the buffer contains compressed data, which makes it more confusing to analyse.)
The hypothesised 0.5MB tile cache/buffer/etc comes from those 256x512 regions (at 32bpp, no MSAA, no depth): it's reading and writing the framebuffer in those regions many times as it iterates over the few hundred triangles, but it's careful not to access two regions at once, which makes sense if they have 0.5MB of dedicated memory for it (though I suppose it could still make sense if it's just sharing L2 or something).
> I'll suggest that you submit triangles scattered around the whole
> screen so that you don't have a single consistent submission pattern
> that covers the entire screen.
I've tried scattering small triangles with "x += 1.0 + sin(VertexID / 3); y -= 1.0 + sin(1.7 * (VertexID / 3));" and the behaviour is essentially the same as before.
If I set it to 21 floats per vertex, it first draws the first approximately 128 triangles: It starts by drawing all those triangles in order, clipped to the top-left 256x512 px region, then it moves onto the next region and draws them all again, etc, until it's filled the screen. Then it starts again with the next ~128 triangles in the top-left region and repeats.
If I set it to 17-20 floats per vertex, it's similar but draws ~256 triangles in each iteration.
If I set it to 16 floats per vertex, it's similar but draws ~384 triangles in each iteration.
The numbers don't match up exactly, but I think that indicates there's an approximately 64KB buffer for vertex-shaded primitives. Once that buffer is nearly full (or at the end of a draw call), the rasteriser starts processing all the triangles in that buffer (multiple times, once per 256x512 region), and when it's finished it waits for another 64KB of data before starting the next pass.
(Those numbers are from code that puts unique values in every vertex output. If there are duplicate values then it draws more triangles in each pass, so I believe the buffer contains compressed data, which makes it more confusing to analyse.)
The hypothesised 0.5MB tile cache/buffer/etc comes from those 256x512 regions (at 32bpp, no MSAA, no depth): it's reading and writing the framebuffer in those regions many times as it iterates over the few hundred triangles, but it's careful not to access two regions at once, which makes sense if they have 0.5MB of dedicated memory for it (though I suppose it could still make sense if it's just sharing L2 or something).