By: Rob Clark (robdclark.delete@this.gmail.com), August 1, 2016 5:56 pm
Room: Moderated Discussions
Philip Taylor (philip.delete@this.zaynar.co.uk) on August 1, 2016 4:55 pm wrote:
> Rob Clark (robdclark.delete@this.gmail.com) on August 1, 2016 10:12 am wrote:
> > Rob Clark (robdclark.delete@this.gmail.com) on August 1, 2016 9:44 am wrote:
> > > David Kanter (dkanter.delete@this.realworldtech.com) on August 1, 2016 7:25 am wrote:
> > > > Gionatan Danti (g.danti.delete@this.assyoma.it) on August 1, 2016 3:20 am wrote:
> > > > > The problem with tile based deferred rendering is that both applications
> > > > > and APIs are really meant for immediate mode rendering.
> > > >
> > > > This is a tile-based immediate mode rasterizer. Its not deferred.
> > >
> > > from the PoV of how the driver turns GL api into stuff the hw executes,
> > > tile based deferred and tile based immediate are the same thing.
> > >
> > > http://bloggingthemonkey.blogspot.com/2016/07/dirty-tricks-for-moar-fps.html
> > >
> >
> > hmm, that said, the test program used only seems to do a single draw. You can't really
> > conclude that the gpu is a tiler from that. It is a tiler if draw #1 for tile #2 happens
> > after draw #2 for tile #1. (Regardless of whether it is TBIM or TBDR.)
>
> I tried changing the test program a bit, and tested on a GTX 970. It looks like it flushes all the tiles after
> every draw call, even with no state changes. (i.e. it doesn't start any fragment shaders for draw 1 until all
> fragment shaders for draw 0 have completed). But it doesn't flush between instances in an instanced draw.
>
> It seems to collect something on the order of 64KB of (compressed) primitives (per raster unit, I guess)
> - if you have more than that in a single draw call then it will start flushing partially-rendered tiles
> (i.e. it will run the fragment shader for the first N primitives in the first few 16x16px tiles, then
> will move on to the rest of the tiles, before going back to the first tile with the next N primitives).
> (The big ~256x512px blocks just indicate the order that it flushes the small tiles in, I think.)
>
> That seems a significantly simpler and smaller amount of deferring than TBDR, so it doesn't have to bother
> collecting huge batches. From a driver perspective it sounds identical to non-tile-based immediate mode.
interesting.. yeah, admittedly I'm used to looking at this from a driver perspective, and from this perspective it does not look like a tiler. (Ie. no need to play tricks to avoid flushing batches of draws.)
Admittedly, when I first read the article (and before I had time to watch the video or look into the test code), I was expecting something more like adreno/imageon, where tiling was essentially "bolted on" to an immediate mode architecture. (Which, at least for phone/tablet perf/power parameters would make, IMHO, a huge amount of sense. But perhaps something that nv has given up caring about.)
But there are gains to be had for IMR's with clever thread scheduling like this. So it is still neat. And a lot less driver trickery needed (at least for ogl.. vulkan looks friendlier for tilers in that regard, at least if used properly).
Interesting about the amount of geometry it can accumulate for executing frag shader stage OoO.. given that a lot of games have a lot of geom. I guess that must include varying data, which would cut down on # of primitives somewhat? Otherwise they would need to split out separate binning shader (which from what I know about nouveau, they do not)
> Rob Clark (robdclark.delete@this.gmail.com) on August 1, 2016 10:12 am wrote:
> > Rob Clark (robdclark.delete@this.gmail.com) on August 1, 2016 9:44 am wrote:
> > > David Kanter (dkanter.delete@this.realworldtech.com) on August 1, 2016 7:25 am wrote:
> > > > Gionatan Danti (g.danti.delete@this.assyoma.it) on August 1, 2016 3:20 am wrote:
> > > > > The problem with tile based deferred rendering is that both applications
> > > > > and APIs are really meant for immediate mode rendering.
> > > >
> > > > This is a tile-based immediate mode rasterizer. Its not deferred.
> > >
> > > from the PoV of how the driver turns GL api into stuff the hw executes,
> > > tile based deferred and tile based immediate are the same thing.
> > >
> > > http://bloggingthemonkey.blogspot.com/2016/07/dirty-tricks-for-moar-fps.html
> > >
> >
> > hmm, that said, the test program used only seems to do a single draw. You can't really
> > conclude that the gpu is a tiler from that. It is a tiler if draw #1 for tile #2 happens
> > after draw #2 for tile #1. (Regardless of whether it is TBIM or TBDR.)
>
> I tried changing the test program a bit, and tested on a GTX 970. It looks like it flushes all the tiles after
> every draw call, even with no state changes. (i.e. it doesn't start any fragment shaders for draw 1 until all
> fragment shaders for draw 0 have completed). But it doesn't flush between instances in an instanced draw.
>
> It seems to collect something on the order of 64KB of (compressed) primitives (per raster unit, I guess)
> - if you have more than that in a single draw call then it will start flushing partially-rendered tiles
> (i.e. it will run the fragment shader for the first N primitives in the first few 16x16px tiles, then
> will move on to the rest of the tiles, before going back to the first tile with the next N primitives).
> (The big ~256x512px blocks just indicate the order that it flushes the small tiles in, I think.)
>
> That seems a significantly simpler and smaller amount of deferring than TBDR, so it doesn't have to bother
> collecting huge batches. From a driver perspective it sounds identical to non-tile-based immediate mode.
interesting.. yeah, admittedly I'm used to looking at this from a driver perspective, and from this perspective it does not look like a tiler. (Ie. no need to play tricks to avoid flushing batches of draws.)
Admittedly, when I first read the article (and before I had time to watch the video or look into the test code), I was expecting something more like adreno/imageon, where tiling was essentially "bolted on" to an immediate mode architecture. (Which, at least for phone/tablet perf/power parameters would make, IMHO, a huge amount of sense. But perhaps something that nv has given up caring about.)
But there are gains to be had for IMR's with clever thread scheduling like this. So it is still neat. And a lot less driver trickery needed (at least for ogl.. vulkan looks friendlier for tilers in that regard, at least if used properly).
Interesting about the amount of geometry it can accumulate for executing frag shader stage OoO.. given that a lot of games have a lot of geom. I guess that must include varying data, which would cut down on # of primitives somewhat? Otherwise they would need to split out separate binning shader (which from what I know about nouveau, they do not)