By: pocak (pocak100.delete@this.gmail.com), May 18, 2017 4:19 pm
Room: Moderated Discussions
steve m (steve.marton.delete@this.gmail.com) on May 17, 2017 8:35 pm wrote:
> Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on May 17, 2017 5:43 am wrote:
> > steve m (steve.marton.delete@this.gmail.com) on May 16, 2017 11:49 am wrote:
> > Possibly load-balancing to prevent keeping the rasterizers from being idle while the pixels in the previous
> > tile are being shaded. If the L2 can hold two tiles at the same time then it should be fine.
>
> Yeah I still don't quite understand. Why not keep the same rasterizers busy filling in holes
> for the current tile? Why dispatch pixel shaders to future tiles, and finish them, all the way
> through ROP, when you still have holes in your earlier tile? Shouldn't your rasterizer prioritize
> pixels in the current tile first? It seems that the rasterizer has dispatched pixel shader work
> to future tiles before it finished dispatching the first tile. Or the work queue was not consumed
> in the intended order (the left hand doesn't know what the right is doing?).
>
> The cache is a shared resource. Sure, more tiles will fit, but you don't want to thrash it
> unnecessarily, because you'll be slowing down your texture units and everything else.
>
> I don't see why you would NOT want to dispatch and process all pixel work in a tile before you
> start dispatching the next tile. If you did that, presumably the work would be done in a much
> cleaner progression by tile, rather than what we're seeing in the video. Of course there would
> be some bleed between tiles, but I especially don't see any reason to get more than one tile
> ahead. I mean, the rasterizer output queue has no reason to be THAT big, does it?...
The key you're missing is that ROPs are statically assigned to portions of the framebuffer in squares of about 8x8 pixels. One ROP/rasterizer (I'm assuming those are tied together) doesn't move on to the next big tile until its portion is completed in the previous; this is clear in the video.
Don't think of it as ROPs sharing a pool of 512KiB of cache - think of it as each of four ROPs using a distinct 128KiB of cache. (example numbers)
> Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on May 17, 2017 5:43 am wrote:
> > steve m (steve.marton.delete@this.gmail.com) on May 16, 2017 11:49 am wrote:
> > Possibly load-balancing to prevent keeping the rasterizers from being idle while the pixels in the previous
> > tile are being shaded. If the L2 can hold two tiles at the same time then it should be fine.
>
> Yeah I still don't quite understand. Why not keep the same rasterizers busy filling in holes
> for the current tile? Why dispatch pixel shaders to future tiles, and finish them, all the way
> through ROP, when you still have holes in your earlier tile? Shouldn't your rasterizer prioritize
> pixels in the current tile first? It seems that the rasterizer has dispatched pixel shader work
> to future tiles before it finished dispatching the first tile. Or the work queue was not consumed
> in the intended order (the left hand doesn't know what the right is doing?).
>
> The cache is a shared resource. Sure, more tiles will fit, but you don't want to thrash it
> unnecessarily, because you'll be slowing down your texture units and everything else.
>
> I don't see why you would NOT want to dispatch and process all pixel work in a tile before you
> start dispatching the next tile. If you did that, presumably the work would be done in a much
> cleaner progression by tile, rather than what we're seeing in the video. Of course there would
> be some bleed between tiles, but I especially don't see any reason to get more than one tile
> ahead. I mean, the rasterizer output queue has no reason to be THAT big, does it?...
The key you're missing is that ROPs are statically assigned to portions of the framebuffer in squares of about 8x8 pixels. One ROP/rasterizer (I'm assuming those are tied together) doesn't move on to the next big tile until its portion is completed in the previous; this is clear in the video.
Don't think of it as ROPs sharing a pool of 512KiB of cache - think of it as each of four ROPs using a distinct 128KiB of cache. (example numbers)