By: Philip Taylor (philip.delete@this.zaynar.co.uk), August 2, 2016 6:53 pm
Room: Moderated Discussions
Peter McGuinness (peter.mcguinness.delete@this.gobrach.com) on August 2, 2016 10:05 am wrote:
>
> > The distinction here is that they are keeping the tile data in
> > on-chip buffers. Normally, that would be streamed out to DRAM.
>
> They are not. That's why you can see partly rendered tiles; they ARE writing out the results
> pixel by pixel as soon as each triangle is rasterised and not waiting for tile completion.
I don't think that's right - the reason you see partly-rendered tiles is that the pixel shader stops after a certain number of invocations, so none of the possibly-on-chip tile buffers will get updated any further, and they will eventually get flushed to RAM then displayed. The demo doesn't show that they are in on-chip buffers, it just shows the sequence that pixel shaders are invoked in - but it shows a sequence that is primarily grouped by location rather than by triangle index, which only makes sense if it was designed to benefit from some kind of large tile cache.
> > You are welcome to call it what you wish, but I chose the term that seemed most appropriate
> > to me. How would you distinguish between a TBDR and a TBR in your mind?
>
> There is an established taxonomy of GPU architectures. AMD and nvidia both use immediate mode, intel
> has used both immediate and TBR in various GPUs, they currently seem to use immediate. IMG uses TBDR
> with the 'deferred' designator to indicate that pixel shading is delayed until after visibility determination
> is complete (where possible) and ARM uses TBIR with the 'immediate' designator to indicate that they
> don't use that optimisation so the term 'tile based immediate mode' is already taken.
I think that's right - the significant difference is that Mali/PowerVR(/Adreno) defer all the rasterisation until they've done the vertex processing for every draw call in the entire frame, whereas NVIDIA only seems to defer rasterisation until it's done the vertex processing for up to a few thousand triangles of a single draw call; and older immediate-mode GPUs don't defer much at all.
In all cases, the deferring is done to support reordering of triangles to be more cache-friendly, which implies grouping them into tiles. But people normally say "tile-based" for GPUs that defer the entire frame, which has significant costs (latency, DRAM needed for the processed vertexes, awkward fit to the OpenGL API, etc) and one main benefit (zero wasted framebuffer DRAM traffic when used properly). From what I've seen, NVIDIA's approach has none of those costs, and doesn't have the same benefit either (it's not going to reduce framebuffer traffic at all in cases of overdraw between multiple draw calls; it's only going to help in much more limited cases). Calling it tile-based will make people think it's far more like mobile GPUs than it really is.
Maybe call it a "rasterizer triangle reorderer and tile cache" plus some adjectives like "large" or "new" or "improved".
>
> > The distinction here is that they are keeping the tile data in
> > on-chip buffers. Normally, that would be streamed out to DRAM.
>
> They are not. That's why you can see partly rendered tiles; they ARE writing out the results
> pixel by pixel as soon as each triangle is rasterised and not waiting for tile completion.
I don't think that's right - the reason you see partly-rendered tiles is that the pixel shader stops after a certain number of invocations, so none of the possibly-on-chip tile buffers will get updated any further, and they will eventually get flushed to RAM then displayed. The demo doesn't show that they are in on-chip buffers, it just shows the sequence that pixel shaders are invoked in - but it shows a sequence that is primarily grouped by location rather than by triangle index, which only makes sense if it was designed to benefit from some kind of large tile cache.
> > You are welcome to call it what you wish, but I chose the term that seemed most appropriate
> > to me. How would you distinguish between a TBDR and a TBR in your mind?
>
> There is an established taxonomy of GPU architectures. AMD and nvidia both use immediate mode, intel
> has used both immediate and TBR in various GPUs, they currently seem to use immediate. IMG uses TBDR
> with the 'deferred' designator to indicate that pixel shading is delayed until after visibility determination
> is complete (where possible) and ARM uses TBIR with the 'immediate' designator to indicate that they
> don't use that optimisation so the term 'tile based immediate mode' is already taken.
I think that's right - the significant difference is that Mali/PowerVR(/Adreno) defer all the rasterisation until they've done the vertex processing for every draw call in the entire frame, whereas NVIDIA only seems to defer rasterisation until it's done the vertex processing for up to a few thousand triangles of a single draw call; and older immediate-mode GPUs don't defer much at all.
In all cases, the deferring is done to support reordering of triangles to be more cache-friendly, which implies grouping them into tiles. But people normally say "tile-based" for GPUs that defer the entire frame, which has significant costs (latency, DRAM needed for the processed vertexes, awkward fit to the OpenGL API, etc) and one main benefit (zero wasted framebuffer DRAM traffic when used properly). From what I've seen, NVIDIA's approach has none of those costs, and doesn't have the same benefit either (it's not going to reduce framebuffer traffic at all in cases of overdraw between multiple draw calls; it's only going to help in much more limited cases). Calling it tile-based will make people think it's far more like mobile GPUs than it really is.
Maybe call it a "rasterizer triangle reorderer and tile cache" plus some adjectives like "large" or "new" or "improved".