By: steve m (steve.marton.delete@this.gmail.com), May 16, 2017 11:49 am
Room: Moderated Discussions
Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on May 12, 2017 2:15 am wrote:
> steve m (steve.martonantispam.delete@this.gmail.com) on May 11, 2017 10:11 am wrote:
> > Gabriele, your link seems broken. Can you please specify which GDC presentation you're referring to?
>
> Sorry, here's the relevant slides on hardware.fr: http://www.hardware.fr/news/15027/gdc-nvidia-parle-tile-caching-maxwell-pascal.html
>
> I couldn't find the original presentation.
Well, I have to eat my words partially. It seems that both Nvidia (https://www.techpowerup.com/231129/on-nvidias-tile-based-rendering) and AMD (https://videocardz.com/65406/exclusive-amd-vega-presentation) have made the ROPs a client of the L2, and they are attempting to render triangles in screen space tiles. This minimizes L2 thrashing, similar to the other considerations that enforce spacial locality that I mentioned in my previous post.
Note that both Nvidia and AMD emphasize being immediate mode renderers, which implies that draw calls are executed serially, not deferred! So most of the advantages of TBDR don't apply, neither the drawbacks.
However, the behavior in David's video is puzzling in that context, if Nvidia is trying to rasterize in tile order. Why would the GPU go ahead and rasterize triangles in future tiles before it's done with one tile? Seems like unnecessary cache thrashing to me.
Based on the nvidia slides, Mr Kanter also has to eat his words. There is no such thing as a fixed sized, separate piece of memory, like a "tile buffer". The L2 cache is shared by the ROP with the other subsystems.
> steve m (steve.martonantispam.delete@this.gmail.com) on May 11, 2017 10:11 am wrote:
> > Gabriele, your link seems broken. Can you please specify which GDC presentation you're referring to?
>
> Sorry, here's the relevant slides on hardware.fr: http://www.hardware.fr/news/15027/gdc-nvidia-parle-tile-caching-maxwell-pascal.html
>
> I couldn't find the original presentation.
Well, I have to eat my words partially. It seems that both Nvidia (https://www.techpowerup.com/231129/on-nvidias-tile-based-rendering) and AMD (https://videocardz.com/65406/exclusive-amd-vega-presentation) have made the ROPs a client of the L2, and they are attempting to render triangles in screen space tiles. This minimizes L2 thrashing, similar to the other considerations that enforce spacial locality that I mentioned in my previous post.
Note that both Nvidia and AMD emphasize being immediate mode renderers, which implies that draw calls are executed serially, not deferred! So most of the advantages of TBDR don't apply, neither the drawbacks.
However, the behavior in David's video is puzzling in that context, if Nvidia is trying to rasterize in tile order. Why would the GPU go ahead and rasterize triangles in future tiles before it's done with one tile? Seems like unnecessary cache thrashing to me.
Based on the nvidia slides, Mr Kanter also has to eat his words. There is no such thing as a fixed sized, separate piece of memory, like a "tile buffer". The L2 cache is shared by the ROP with the other subsystems.