By: Philip Taylor (philip.delete@this.zaynar.co.uk), August 1, 2016 4:55 pm
Room: Moderated Discussions
Rob Clark (robdclark.delete@this.gmail.com) on August 1, 2016 10:12 am wrote:
> Rob Clark (robdclark.delete@this.gmail.com) on August 1, 2016 9:44 am wrote:
> > David Kanter (dkanter.delete@this.realworldtech.com) on August 1, 2016 7:25 am wrote:
> > > Gionatan Danti (g.danti.delete@this.assyoma.it) on August 1, 2016 3:20 am wrote:
> > > > The problem with tile based deferred rendering is that both applications
> > > > and APIs are really meant for immediate mode rendering.
> > >
> > > This is a tile-based immediate mode rasterizer. Its not deferred.
> >
> > from the PoV of how the driver turns GL api into stuff the hw executes,
> > tile based deferred and tile based immediate are the same thing.
> >
> > http://bloggingthemonkey.blogspot.com/2016/07/dirty-tricks-for-moar-fps.html
> >
>
> hmm, that said, the test program used only seems to do a single draw. You can't really
> conclude that the gpu is a tiler from that. It is a tiler if draw #1 for tile #2 happens
> after draw #2 for tile #1. (Regardless of whether it is TBIM or TBDR.)
I tried changing the test program a bit, and tested on a GTX 970. It looks like it flushes all the tiles after every draw call, even with no state changes. (i.e. it doesn't start any fragment shaders for draw 1 until all fragment shaders for draw 0 have completed). But it doesn't flush between instances in an instanced draw.
It seems to collect something on the order of 64KB of (compressed) primitives (per raster unit, I guess) - if you have more than that in a single draw call then it will start flushing partially-rendered tiles (i.e. it will run the fragment shader for the first N primitives in the first few 16x16px tiles, then will move on to the rest of the tiles, before going back to the first tile with the next N primitives). (The big ~256x512px blocks just indicate the order that it flushes the small tiles in, I think.)
That seems a significantly simpler and smaller amount of deferring than TBDR, so it doesn't have to bother collecting huge batches. From a driver perspective it sounds identical to non-tile-based immediate mode.
> Rob Clark (robdclark.delete@this.gmail.com) on August 1, 2016 9:44 am wrote:
> > David Kanter (dkanter.delete@this.realworldtech.com) on August 1, 2016 7:25 am wrote:
> > > Gionatan Danti (g.danti.delete@this.assyoma.it) on August 1, 2016 3:20 am wrote:
> > > > The problem with tile based deferred rendering is that both applications
> > > > and APIs are really meant for immediate mode rendering.
> > >
> > > This is a tile-based immediate mode rasterizer. Its not deferred.
> >
> > from the PoV of how the driver turns GL api into stuff the hw executes,
> > tile based deferred and tile based immediate are the same thing.
> >
> > http://bloggingthemonkey.blogspot.com/2016/07/dirty-tricks-for-moar-fps.html
> >
>
> hmm, that said, the test program used only seems to do a single draw. You can't really
> conclude that the gpu is a tiler from that. It is a tiler if draw #1 for tile #2 happens
> after draw #2 for tile #1. (Regardless of whether it is TBIM or TBDR.)
I tried changing the test program a bit, and tested on a GTX 970. It looks like it flushes all the tiles after every draw call, even with no state changes. (i.e. it doesn't start any fragment shaders for draw 1 until all fragment shaders for draw 0 have completed). But it doesn't flush between instances in an instanced draw.
It seems to collect something on the order of 64KB of (compressed) primitives (per raster unit, I guess) - if you have more than that in a single draw call then it will start flushing partially-rendered tiles (i.e. it will run the fragment shader for the first N primitives in the first few 16x16px tiles, then will move on to the rest of the tiles, before going back to the first tile with the next N primitives). (The big ~256x512px blocks just indicate the order that it flushes the small tiles in, I think.)
That seems a significantly simpler and smaller amount of deferring than TBDR, so it doesn't have to bother collecting huge batches. From a driver perspective it sounds identical to non-tile-based immediate mode.